Semi aggregated query for hardware details summary endpoint#1832
Conversation
9c304aa to
11b5c99
Compare
162a78e to
93f201f
Compare
MarceloRobert
left a comment
There was a problem hiding this comment.
Seems like there's some discrepancy between the listing and the details, specially with larger hardware such as kubernetes
|
Good code though, very organized 👍 |
0002e6c to
a19b158
Compare
I happened due to a misunderstanding in the filtering of dummy builds. Should be correct now. |
| "base_hardware, filters", | ||
| [ | ||
| (ASUS_HARDWARE, {"config_name": "defconfig+kcidebug+x86-board"}), | ||
| (ASUS_HARDWARE, {"config_name": "defconfig"}), |
There was a problem hiding this comment.
Check if this is change is correct.
| "base_hardware, filters", | ||
| [ | ||
| (ASUS_HARDWARE, {"architecture": "i386"}), | ||
| (ASUS_HARDWARE, {"architecture": "asus-CM1400CXA-dalboz"}), |
There was a problem hiding this comment.
Check if this is change is correct.
There was a problem hiding this comment.
Pull request overview
This PR refactors the Hardware details summary endpoint to use a semi-aggregated SQL query, aiming to reduce row counts returned from the DB and improve page load performance.
Changes:
- Replace per-record processing with server-side aggregation via new
get_hardware_details_summary()query. - Add filter prefetch/sanitization (
get_hardware_details_filters) and status-filter validation. - Update summary aggregation logic and adjust integration tests to match new behaviors.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
backend/kernelCI_app/views/hardwareDetailsSummaryView.py |
Reworks endpoint logic to consume aggregated rows and rebuild summary/common/filters. |
backend/kernelCI_app/queries/hardware.py |
Adds new aggregated summary query + filter discovery query; refactors query_records. |
backend/kernelCI_app/helpers/filters.py |
Adds status validation + helper is_filtered_out; adjusts filter handler keys. |
backend/kernelCI_app/typeModels/common.py |
Extends StatusCount.increment() to support incrementing by an arbitrary count. |
backend/kernelCI_app/typeModels/commonDetails.py |
Adds __add__/__iadd__ to BuildArchitectures to support aggregation. |
backend/kernelCI_app/tests/integrationTests/hardwareDetailsSummary_test.py |
Updates expectations for invalid ID / invalid filters; adjusts some filter values. |
backend/kernelCI_app/helpers/hardwareDetails.py |
Makes issue fields access more defensive via record.get(...). |
backend/kernelCI_app/constants/localization.py |
Adds client-facing error string for invalid filters. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| clause += ( | ||
| "AND (NOT (tests.path like 'boot.%%' or tests.path = 'boot') " | ||
| f"OR tests.duration >= {duration_min})\n" | ||
| ) |
There was a problem hiding this comment.
doesn't this mean "AND (test is not boot OR test_duration >= min)"? IIUC in this case the right would be to include the test if it is boot and its duration is >= min
What I'm understanding is that you are filtering in the results of the clause, meaning that if the clause is True then the result will return. So if we want boots where the duration is within the interval, we want to include "tests that are boot and have duration greater than min and lower than max", the current code sounds like the opposite. I might be mistaken though, not sure here
There was a problem hiding this comment.
Yes, you are correct. But the idea is to NOT apply this filter on lines that are not boot (regular tests).
The idea is (for this filter):
| not boot | passes |
| boot with duration >= min | passes |
But as you mentioned before, we should include the boot checking for the regular tests as well.
I am not entirelly satisfied with this clauses. If you have any better alternative, I am willing to try.
8b78eaa to
ea1c257
Compare
MarceloRobert
left a comment
There was a problem hiding this comment.
Looks good, the behavior seems correct. Only some small changes left about the comments that were already made
|
|
||
|
|
||
| def is_valid_status(status: str) -> bool: | ||
| return status in StatusChoices or status == "NULL" or status == "null" |
There was a problem hiding this comment.
very nit: status.lower() == "null" or status.upper() == "NULL" wouldn't work?
There was a problem hiding this comment.
got it, the only problem here, is that we might accept combinations such as "Null", "nUll". But I dont think this is a big problem.
| def is_filtered_out(value: str, filter_values: set[set]): | ||
| if filter_values and value not in filter_values: | ||
| return True | ||
| return False |
There was a problem hiding this comment.
I think its more concise:
| def is_filtered_out(value: str, filter_values: set[set]): | |
| if filter_values and value not in filter_values: | |
| return True | |
| return False | |
| def is_filtered_out(value: str, filter_values: set[set]): | |
| return filter_values and value not in filter_values |
|
|
||
|
|
||
| def test_invalid_filters(invalid_filters_input): | ||
|
|
| NULL: Optional[int] = 0 | ||
|
|
||
| def increment(self, status: Optional[str]) -> None: | ||
| def increment(self, status: Optional[str], value=1) -> None: |
There was a problem hiding this comment.
maybe type value here?
| def filter_instance( | ||
| self, | ||
| *, | ||
| hardware_id: str, | ||
| config: str, | ||
| origin: str, | ||
| lab: str, | ||
| compiler: str, | ||
| architecture: str, | ||
| status: str, | ||
| known_issues: set[str], | ||
| is_build: bool, | ||
| is_boot: bool, | ||
| is_test: bool, | ||
| ) -> bool: |
There was a problem hiding this comment.
origin seems unused here
| if is_build and is_filtered_out(status, filters.filterBuildStatus): | ||
| return True | ||
| if is_boot and is_filtered_out(status, filters.filterBootStatus): | ||
| return True | ||
| if ( | ||
| is_test | ||
| and not is_boot | ||
| and is_filtered_out(status, filters.filterTestStatus) | ||
| ): | ||
| return True |
There was a problem hiding this comment.
here I think we could do:
| if is_build and is_filtered_out(status, filters.filterBuildStatus): | |
| return True | |
| if is_boot and is_filtered_out(status, filters.filterBootStatus): | |
| return True | |
| if ( | |
| is_test | |
| and not is_boot | |
| and is_filtered_out(status, filters.filterTestStatus) | |
| ): | |
| return True | |
| filter_type = self.get_filter_type(is_build, is_boot, is_test) | |
| status_filter_map = { | |
| "build": filters.filterBuildStatus, | |
| "boot": filters.filterBootStatus, | |
| "test": filters.filterTestStatus, | |
| } | |
| if is_filtered_out(status, status_filter_map[filter_type]): | |
| return True |
| if is_filtered_out(config, filters.filterConfigs): | ||
| return True | ||
| if is_filtered_out(lab, filters.filter_labs): | ||
| return True | ||
| if is_filtered_out(architecture, filters.filterArchitecture): | ||
| return True | ||
| if is_filtered_out(hardware_id, filters.filterHardware): | ||
| return True |
There was a problem hiding this comment.
these ones couldn't be a list together with compiler? or order is important here?
| if is_filtered_out(config, filters.filterConfigs): | |
| return True | |
| if is_filtered_out(lab, filters.filter_labs): | |
| return True | |
| if is_filtered_out(architecture, filters.filterArchitecture): | |
| return True | |
| if is_filtered_out(hardware_id, filters.filterHardware): | |
| return True | |
| field_checks = [ | |
| (compiler, filters.filterCompiler), | |
| (config, filters.filterConfigs), | |
| (lab, filters.filter_labs), | |
| (architecture, filters.filterArchitecture), | |
| (hardware_id, filters.filterHardware), | |
| ] | |
| if any(is_filtered_out(val, filt) for val, filt in field_checks): | |
| return True |
| if is_build: | ||
| self.increment_build( | ||
| builds_summary=builds_summary, | ||
| status_count=status_count, | ||
| architecture=architecture, | ||
| config=config, | ||
| lab=lab, | ||
| origin=origin, | ||
| known_issues=len(known_issues) - 1, | ||
| compiler=compiler, | ||
| ) | ||
|
|
||
| elif is_boot: | ||
| self.increment_test( | ||
| tests_summary=boots_summary, | ||
| status_count=status_count, | ||
| config=config, | ||
| lab=lab, | ||
| origin=origin, | ||
| known_issues=len(known_issues) - 1, | ||
| architecture=architecture, | ||
| compiler=compiler, | ||
| platform=platform, | ||
| ) | ||
|
|
||
| elif is_test: | ||
| self.increment_test( | ||
| tests_summary=tests_summary, | ||
| status_count=status_count, | ||
| config=config, | ||
| lab=lab, | ||
| origin=origin, | ||
| known_issues=len(known_issues) - 1, | ||
| architecture=architecture, | ||
| compiler=compiler, | ||
| platform=platform, | ||
| ) |
There was a problem hiding this comment.
I think we could use get_filter_type here too to avoid multiple branches, not sure though, because of the params differences
There was a problem hiding this comment.
we could go for something like:
summary_type = self.get_summary_type(is_build=is_build, is_boot=is_boot, is_test=is_test)
increment_strategy = {
'builds': partial(increment_build, self, builds_summary=builds_summary),
'boots': partial(increment_test, self, tests_summary=boots_summary),
'tests': partial(increment_test, self, tests_summary=tests_summary),
}
increment_strategy[summary_type](
status_count=status_count,
architecture=architecture,
config=config,
lab=lab,
origin=origin,
known_issues=len(known_issues) - 1,
compiler=compiler,
platform=platform,
)But I am afraid is a little bit of overengineer for 3 conditionals
There was a problem hiding this comment.
What you think of it?
3a753af to
4270998
Compare
MarceloRobert
left a comment
There was a problem hiding this comment.
From testing, seems like the hardware compatible filter is always returning empty, and the tree filter is not working
dede999
left a comment
There was a problem hiding this comment.
Review Summary
The performance strategy is solid — moving aggregation into SQL (GROUP BY + UNION ALL) to reduce rows returned to Python is the right approach. A few items to address before merge, the most critical being the SQL injection in duration clauses.
| # builds | ||
| duration_min, duration_max = builds_duration | ||
| if duration_min: | ||
| clause += f"AND builds.duration >= {duration_min}\n" |
There was a problem hiding this comment.
🔴 bad — SQL injection via f-string interpolation.
duration_min and duration_max are interpolated directly into SQL via f-strings here and on the lines below (_get_boot_test_duration_clause has the same pattern). The rest of the query correctly uses %s placeholders — these should too.
def _get_build_duration_clause(builds_duration, params: list) -> str:
clause = ""
duration_min, duration_max = builds_duration
if duration_min:
clause += "AND builds.duration >= %s\n"
params.append(duration_min)
if duration_max:
clause += "AND builds.duration <= %s\n"
params.append(duration_max)
return clauseSame fix needed for _get_boot_test_duration_clause.
There was a problem hiding this comment.
Good call, I will include named parametrization to avoid this
| NULL_STRINGS = set(["null", UNKNOWN_STRING, "NULL"]) | ||
|
|
||
|
|
||
| def is_valid_status(status: str) -> bool: |
There was a problem hiding this comment.
🔴 bad — {*StatusChoices, "NULL"} reconstructs the set on every call, and if status is None, status.upper() will raise AttributeError.
Suggestion:
VALID_STATUSES = {choice.value for choice in StatusChoices} | {"NULL"}
def is_valid_status(status: str) -> bool:
return status is not None and status.upper() in VALID_STATUSESThere was a problem hiding this comment.
This shouldn't be a problem, as it is not in any hot path of the application, also the set is quite small.
The status is a str, not Optional[str], it should not be None.
| "test.status": self._handle_test_status, | ||
| "test.duration": self._handle_test_duration, | ||
| "build.status": self._handle_build_status, | ||
| "duration": self._handle_build_duration, # TODO: same as build.duration (should be standardized) |
There was a problem hiding this comment.
🔵 nit — If this "duration" alias is known tech debt, consider creating a ticket instead of a TODO — it may confuse future contributors about which filter key to use.
| status = instance["status"] | ||
| count = instance["count"] | ||
| incidents = instance["incidents_count"] | ||
| known_issues = set(parse_issue(issue) for issue in instance["known_issues"]) |
There was a problem hiding this comment.
🟡 medium — When instance["known_issues"] contains None entries (from array_agg when there are no incidents), parse_issue(None) returns (UNCATEGORIZED_STRING, None). These phantom tuples are then checked against filters.filterIssues, which could cause false-positive filtering.
Suggestion:
known_issues = set(parse_issue(issue) for issue in instance["known_issues"] if issue is not None)| ) | ||
|
|
||
| # ensure uniqueness on architecture and compilers (maybe we could change data structures???) | ||
| for summary in builds_summary.architectures.values(): |
There was a problem hiding this comment.
🔵 nit — The loop variable summary shadows the method parameter summary: list[dict]. Consider renaming to arch_summary or similar to avoid confusion.
| MISS=self.MISS + other.MISS, | ||
| DONE=self.DONE + other.DONE, | ||
| NULL=self.NULL + other.NULL, | ||
| compilers=self.compilers, |
There was a problem hiding this comment.
🟡 medium — __add__ only keeps self.compilers, silently discarding other.compilers. __iadd__ also doesn't merge compilers. This might be intentional (compilers are merged separately in aggregate_summaries), but it's a silent data-loss trap if these operators are used elsewhere.
There was a problem hiding this comment.
StatusCount does not have a compilers, if we do include an option to add two BuildArchitectures (which could be the case some point in future), than we would need to deal with this
There was a problem hiding this comment.
I also dont know if Build Architectures should inherit from StatusCount, composition might make more sense in this case, instead of inheritance
d0a33ef to
98ea304
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 12 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
cb8b2d9 to
366299f
Compare
MarceloRobert
left a comment
There was a problem hiding this comment.
Looks like the behavior is the same as before now, all good. The opened comments still make sense, but we can work on them later; maybe make a TODO for them or open an issue if it's major. Either way, I'll approve; it's working well
366299f to
da26100
Compare
gustavobtflores
left a comment
There was a problem hiding this comment.
Looks good, just some minor comments. I would leave some TODOs for refactoring filter_instance and aggregate_summaries in hardwareDetails.py to reduce repetition and branching, not a blocker though
| def select_commits_hashes( | ||
| self, | ||
| tree_heads: list[(str, str)], | ||
| selected_commits: Optional[dict[str, str]] = None, | ||
| ): | ||
| selected_commit_hashes = [] | ||
| if selected_commits: | ||
| for idx, head in tree_heads: | ||
| if idx in self.selected_commits: | ||
| selected_commit = self.selected_commits.get(idx, "head") | ||
| selected_commit_hashes.append( | ||
| head if selected_commit == "head" else selected_commit | ||
| ) | ||
| else: | ||
| selected_commit_hashes = [head for (_, head) in tree_heads] | ||
| return selected_commit_hashes |
There was a problem hiding this comment.
We are passing selected_commits as a param and then using self.selected_commits inside the function, I'm not sure if we need the param. The only function call that uses selected_commits also passes self.selected_commits
| def post(self, request, hardware_id) -> Response: | ||
| def select_commits_hashes( | ||
| self, | ||
| tree_heads: list[(str, str)], |
There was a problem hiding this comment.
nit:
| tree_heads: list[(str, str)], | |
| tree_heads: list[tuple[str, str]], |
* Bring data grouped by the filter values, ensuring a mid point between bringing all the data to be aggregated and filtered on the app, and needing to query for every filter change. * Build / Test duration is an exception, as they are continuous columns, and any change will trigger a new query. * Include a validation for invalid status values. * Small frontend bugfix on rerender loop when applying trees filter.
da26100 to
5be6d73
Compare
Description
This implements performance improvement on the Hardware details summary endpoint.
How to test