Skip to content

Semi aggregated query for hardware details summary endpoint#1832

Merged
MarceloRobert merged 1 commit intokernelci:mainfrom
profusion:fix/improve-hardware-details-summary-performance
Apr 16, 2026
Merged

Semi aggregated query for hardware details summary endpoint#1832
MarceloRobert merged 1 commit intokernelci:mainfrom
profusion:fix/improve-hardware-details-summary-performance

Conversation

@alanpeixinho
Copy link
Copy Markdown
Contributor

@alanpeixinho alanpeixinho commented Mar 30, 2026

Description

This implements performance improvement on the Hardware details summary endpoint.

  • Semi aggregating data on query to reduce number of returned rows, while allowing for filtering and detailed aggregation.
  • Ignoring fields not used on the frontend (failed_reasons).

How to test

  • Open the dashboard.
  • Go to Hardware page.
  • Select any available hardware.
  • The presented information should match previous versions (and staging/production).
  • The page should load fast even for cases with many instances of builds/boots and tests.

@MarceloRobert MarceloRobert added Backend Most or all of the changes for this issue will be in the backend code. Queries Issue that involves modifying some DB query labels Mar 31, 2026
Comment thread backend/kernelCI_app/queries/hardware.py Outdated
Comment thread backend/kernelCI_app/views/hardwareDetailsSummaryView.py Outdated
Comment thread backend/kernelCI_app/views/hardwareDetailsSummaryView.py Outdated
Comment thread backend/kernelCI_app/views/hardwareDetailsView.py Outdated
@alanpeixinho alanpeixinho force-pushed the fix/improve-hardware-details-summary-performance branch 10 times, most recently from 9c304aa to 11b5c99 Compare April 2, 2026 18:52
@alanpeixinho alanpeixinho marked this pull request as ready for review April 2, 2026 18:53
@alanpeixinho alanpeixinho force-pushed the fix/improve-hardware-details-summary-performance branch 2 times, most recently from 162a78e to 93f201f Compare April 6, 2026 17:18
Copy link
Copy Markdown
Collaborator

@MarceloRobert MarceloRobert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like there's some discrepancy between the listing and the details, specially with larger hardware such as kubernetes

Comment thread backend/kernelCI_app/views/hardwareDetailsSummaryView.py
Comment thread backend/kernelCI_app/views/hardwareDetailsSummaryView.py Outdated
@MarceloRobert
Copy link
Copy Markdown
Collaborator

Good code though, very organized 👍

@alanpeixinho alanpeixinho force-pushed the fix/improve-hardware-details-summary-performance branch 3 times, most recently from 0002e6c to a19b158 Compare April 7, 2026 20:26
@alanpeixinho
Copy link
Copy Markdown
Contributor Author

Seems like there's some discrepancy between the listing and the details, specially with larger hardware such as kubernetes

I happened due to a misunderstanding in the filtering of dummy builds. Should be correct now.

"base_hardware, filters",
[
(ASUS_HARDWARE, {"config_name": "defconfig+kcidebug+x86-board"}),
(ASUS_HARDWARE, {"config_name": "defconfig"}),
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if this is change is correct.

"base_hardware, filters",
[
(ASUS_HARDWARE, {"architecture": "i386"}),
(ASUS_HARDWARE, {"architecture": "asus-CM1400CXA-dalboz"}),
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if this is change is correct.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the Hardware details summary endpoint to use a semi-aggregated SQL query, aiming to reduce row counts returned from the DB and improve page load performance.

Changes:

  • Replace per-record processing with server-side aggregation via new get_hardware_details_summary() query.
  • Add filter prefetch/sanitization (get_hardware_details_filters) and status-filter validation.
  • Update summary aggregation logic and adjust integration tests to match new behaviors.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
backend/kernelCI_app/views/hardwareDetailsSummaryView.py Reworks endpoint logic to consume aggregated rows and rebuild summary/common/filters.
backend/kernelCI_app/queries/hardware.py Adds new aggregated summary query + filter discovery query; refactors query_records.
backend/kernelCI_app/helpers/filters.py Adds status validation + helper is_filtered_out; adjusts filter handler keys.
backend/kernelCI_app/typeModels/common.py Extends StatusCount.increment() to support incrementing by an arbitrary count.
backend/kernelCI_app/typeModels/commonDetails.py Adds __add__/__iadd__ to BuildArchitectures to support aggregation.
backend/kernelCI_app/tests/integrationTests/hardwareDetailsSummary_test.py Updates expectations for invalid ID / invalid filters; adjusts some filter values.
backend/kernelCI_app/helpers/hardwareDetails.py Makes issue fields access more defensive via record.get(...).
backend/kernelCI_app/constants/localization.py Adds client-facing error string for invalid filters.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread backend/kernelCI_app/views/hardwareDetailsSummaryView.py Outdated
Comment thread backend/kernelCI_app/views/hardwareDetailsSummaryView.py
Comment thread backend/kernelCI_app/views/hardwareDetailsSummaryView.py Outdated
Comment thread backend/kernelCI_app/views/hardwareDetailsSummaryView.py
Comment thread backend/kernelCI_app/views/hardwareDetailsSummaryView.py
Comment thread backend/kernelCI_app/queries/hardware.py
Comment thread backend/kernelCI_app/queries/hardware.py
Comment thread backend/kernelCI_app/helpers/filters.py
Comment thread backend/kernelCI_app/helpers/filters.py Outdated
Comment on lines +369 to +372
clause += (
"AND (NOT (tests.path like 'boot.%%' or tests.path = 'boot') "
f"OR tests.duration >= {duration_min})\n"
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this mean "AND (test is not boot OR test_duration >= min)"? IIUC in this case the right would be to include the test if it is boot and its duration is >= min

What I'm understanding is that you are filtering in the results of the clause, meaning that if the clause is True then the result will return. So if we want boots where the duration is within the interval, we want to include "tests that are boot and have duration greater than min and lower than max", the current code sounds like the opposite. I might be mistaken though, not sure here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are correct. But the idea is to NOT apply this filter on lines that are not boot (regular tests).
The idea is (for this filter):

not boot passes
boot with duration >= min passes

But as you mentioned before, we should include the boot checking for the regular tests as well.
I am not entirelly satisfied with this clauses. If you have any better alternative, I am willing to try.

Comment thread backend/kernelCI_app/queries/hardware.py Outdated
Comment thread backend/kernelCI_app/queries/hardware.py Outdated
Comment thread backend/kernelCI_app/views/hardwareDetailsSummaryView.py Outdated
Comment thread backend/kernelCI_app/views/hardwareDetailsSummaryView.py
@alanpeixinho alanpeixinho force-pushed the fix/improve-hardware-details-summary-performance branch from 8b78eaa to ea1c257 Compare April 8, 2026 20:37
Copy link
Copy Markdown
Collaborator

@MarceloRobert MarceloRobert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, the behavior seems correct. Only some small changes left about the comments that were already made

Comment thread backend/kernelCI_app/helpers/filters.py Outdated


def is_valid_status(status: str) -> bool:
return status in StatusChoices or status == "NULL" or status == "null"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nit: status.lower() == "null" or status.upper() == "NULL" wouldn't work?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, the only problem here, is that we might accept combinations such as "Null", "nUll". But I dont think this is a big problem.

Comment thread backend/kernelCI_app/helpers/filters.py Outdated
Comment on lines +137 to +140
def is_filtered_out(value: str, filter_values: set[set]):
if filter_values and value not in filter_values:
return True
return False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its more concise:

Suggested change
def is_filtered_out(value: str, filter_values: set[set]):
if filter_values and value not in filter_values:
return True
return False
def is_filtered_out(value: str, filter_values: set[set]):
return filter_values and value not in filter_values

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point



def test_invalid_filters(invalid_filters_input):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patch noise

NULL: Optional[int] = 0

def increment(self, status: Optional[str]) -> None:
def increment(self, status: Optional[str], value=1) -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe type value here?

Comment on lines +93 to +107
def filter_instance(
self,
*,
hardware_id: str,
config: str,
origin: str,
lab: str,
compiler: str,
architecture: str,
status: str,
known_issues: set[str],
is_build: bool,
is_boot: bool,
is_test: bool,
) -> bool:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

origin seems unused here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

Comment on lines +112 to +121
if is_build and is_filtered_out(status, filters.filterBuildStatus):
return True
if is_boot and is_filtered_out(status, filters.filterBootStatus):
return True
if (
is_test
and not is_boot
and is_filtered_out(status, filters.filterTestStatus)
):
return True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here I think we could do:

Suggested change
if is_build and is_filtered_out(status, filters.filterBuildStatus):
return True
if is_boot and is_filtered_out(status, filters.filterBootStatus):
return True
if (
is_test
and not is_boot
and is_filtered_out(status, filters.filterTestStatus)
):
return True
filter_type = self.get_filter_type(is_build, is_boot, is_test)
status_filter_map = {
"build": filters.filterBuildStatus,
"boot": filters.filterBootStatus,
"test": filters.filterTestStatus,
}
if is_filtered_out(status, status_filter_map[filter_type]):
return True

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call

Comment on lines +122 to +129
if is_filtered_out(config, filters.filterConfigs):
return True
if is_filtered_out(lab, filters.filter_labs):
return True
if is_filtered_out(architecture, filters.filterArchitecture):
return True
if is_filtered_out(hardware_id, filters.filterHardware):
return True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these ones couldn't be a list together with compiler? or order is important here?

Suggested change
if is_filtered_out(config, filters.filterConfigs):
return True
if is_filtered_out(lab, filters.filter_labs):
return True
if is_filtered_out(architecture, filters.filterArchitecture):
return True
if is_filtered_out(hardware_id, filters.filterHardware):
return True
field_checks = [
(compiler, filters.filterCompiler),
(config, filters.filterConfigs),
(lab, filters.filter_labs),
(architecture, filters.filterArchitecture),
(hardware_id, filters.filterHardware),
]
if any(is_filtered_out(val, filt) for val, filt in field_checks):
return True

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines +183 to +219
if is_build:
self.increment_build(
builds_summary=builds_summary,
status_count=status_count,
architecture=architecture,
config=config,
lab=lab,
origin=origin,
known_issues=len(known_issues) - 1,
compiler=compiler,
)

elif is_boot:
self.increment_test(
tests_summary=boots_summary,
status_count=status_count,
config=config,
lab=lab,
origin=origin,
known_issues=len(known_issues) - 1,
architecture=architecture,
compiler=compiler,
platform=platform,
)

elif is_test:
self.increment_test(
tests_summary=tests_summary,
status_count=status_count,
config=config,
lab=lab,
origin=origin,
known_issues=len(known_issues) - 1,
architecture=architecture,
compiler=compiler,
platform=platform,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could use get_filter_type here too to avoid multiple branches, not sure though, because of the params differences

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could go for something like:

            summary_type = self.get_summary_type(is_build=is_build, is_boot=is_boot, is_test=is_test)
            increment_strategy = {
                'builds': partial(increment_build, self, builds_summary=builds_summary),
                'boots': partial(increment_test, self, tests_summary=boots_summary),
                'tests': partial(increment_test, self, tests_summary=tests_summary),
            }

            increment_strategy[summary_type](
                status_count=status_count,
                architecture=architecture,
                config=config,
                lab=lab,
                origin=origin,
                known_issues=len(known_issues) - 1,
                compiler=compiler,
                platform=platform,
            )

But I am afraid is a little bit of overengineer for 3 conditionals

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you think of it?

@alanpeixinho alanpeixinho force-pushed the fix/improve-hardware-details-summary-performance branch 2 times, most recently from 3a753af to 4270998 Compare April 9, 2026 21:58
Comment thread backend/kernelCI_app/queries/hardware.py Outdated
Copy link
Copy Markdown
Collaborator

@MarceloRobert MarceloRobert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From testing, seems like the hardware compatible filter is always returning empty, and the tree filter is not working

Copy link
Copy Markdown
Contributor

@dede999 dede999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

The performance strategy is solid — moving aggregation into SQL (GROUP BY + UNION ALL) to reduce rows returned to Python is the right approach. A few items to address before merge, the most critical being the SQL injection in duration clauses.

# builds
duration_min, duration_max = builds_duration
if duration_min:
clause += f"AND builds.duration >= {duration_min}\n"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 bad — SQL injection via f-string interpolation.

duration_min and duration_max are interpolated directly into SQL via f-strings here and on the lines below (_get_boot_test_duration_clause has the same pattern). The rest of the query correctly uses %s placeholders — these should too.

def _get_build_duration_clause(builds_duration, params: list) -> str:
    clause = ""
    duration_min, duration_max = builds_duration
    if duration_min:
        clause += "AND builds.duration >= %s\n"
        params.append(duration_min)
    if duration_max:
        clause += "AND builds.duration <= %s\n"
        params.append(duration_max)
    return clause

Same fix needed for _get_boot_test_duration_clause.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I will include named parametrization to avoid this

Comment thread backend/kernelCI_app/queries/hardware.py
Comment thread backend/kernelCI_app/queries/hardware.py Outdated
Comment thread backend/kernelCI_app/queries/hardware.py Outdated
NULL_STRINGS = set(["null", UNKNOWN_STRING, "NULL"])


def is_valid_status(status: str) -> bool:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 bad{*StatusChoices, "NULL"} reconstructs the set on every call, and if status is None, status.upper() will raise AttributeError.

Suggestion:

VALID_STATUSES = {choice.value for choice in StatusChoices} | {"NULL"}

def is_valid_status(status: str) -> bool:
    return status is not None and status.upper() in VALID_STATUSES

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be a problem, as it is not in any hot path of the application, also the set is quite small.
The status is a str, not Optional[str], it should not be None.

Comment thread backend/kernelCI_app/helpers/filters.py Outdated
"test.status": self._handle_test_status,
"test.duration": self._handle_test_duration,
"build.status": self._handle_build_status,
"duration": self._handle_build_duration, # TODO: same as build.duration (should be standardized)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 nit — If this "duration" alias is known tech debt, consider creating a ticket instead of a TODO — it may confuse future contributors about which filter key to use.

status = instance["status"]
count = instance["count"]
incidents = instance["incidents_count"]
known_issues = set(parse_issue(issue) for issue in instance["known_issues"])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 medium — When instance["known_issues"] contains None entries (from array_agg when there are no incidents), parse_issue(None) returns (UNCATEGORIZED_STRING, None). These phantom tuples are then checked against filters.filterIssues, which could cause false-positive filtering.

Suggestion:

known_issues = set(parse_issue(issue) for issue in instance["known_issues"] if issue is not None)

)

# ensure uniqueness on architecture and compilers (maybe we could change data structures???)
for summary in builds_summary.architectures.values():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 nit — The loop variable summary shadows the method parameter summary: list[dict]. Consider renaming to arch_summary or similar to avoid confusion.

MISS=self.MISS + other.MISS,
DONE=self.DONE + other.DONE,
NULL=self.NULL + other.NULL,
compilers=self.compilers,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 medium__add__ only keeps self.compilers, silently discarding other.compilers. __iadd__ also doesn't merge compilers. This might be intentional (compilers are merged separately in aggregate_summaries), but it's a silent data-loss trap if these operators are used elsewhere.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StatusCount does not have a compilers, if we do include an option to add two BuildArchitectures (which could be the case some point in future), than we would need to deal with this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also dont know if Build Architectures should inherit from StatusCount, composition might make more sense in this case, instead of inheritance

@alanpeixinho alanpeixinho force-pushed the fix/improve-hardware-details-summary-performance branch 4 times, most recently from d0a33ef to 98ea304 Compare April 14, 2026 19:38
@MarceloRobert MarceloRobert requested a review from Copilot April 14, 2026 19:59
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 12 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread backend/kernelCI_app/queries/hardware.py Outdated
Comment thread backend/kernelCI_app/typeModels/commonDetails.py Outdated
Comment thread dashboard/src/pages/hardwareDetails/HardwareDetails.tsx Outdated
Comment thread dashboard/src/pages/hardwareDetails/HardwareDetailsHeaderTable.tsx
Comment thread backend/kernelCI_app/views/hardwareDetailsSummaryView.py Outdated
Comment thread backend/kernelCI_app/queries/hardware.py Outdated
Comment thread backend/kernelCI_app/queries/hardware.py Outdated
Comment thread backend/kernelCI_app/views/hardwareDetailsSummaryView.py
Comment thread backend/kernelCI_app/views/hardwareDetailsSummaryView.py
Comment thread backend/kernelCI_app/queries/hardware.py Outdated
@alanpeixinho alanpeixinho force-pushed the fix/improve-hardware-details-summary-performance branch 3 times, most recently from cb8b2d9 to 366299f Compare April 14, 2026 21:10
Copy link
Copy Markdown
Collaborator

@MarceloRobert MarceloRobert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the behavior is the same as before now, all good. The opened comments still make sense, but we can work on them later; maybe make a TODO for them or open an issue if it's major. Either way, I'll approve; it's working well

Comment thread dashboard/src/pages/hardwareDetails/HardwareDetails.tsx Outdated
@alanpeixinho alanpeixinho force-pushed the fix/improve-hardware-details-summary-performance branch from 366299f to da26100 Compare April 15, 2026 21:21
Copy link
Copy Markdown
Contributor

@gustavobtflores gustavobtflores left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some minor comments. I would leave some TODOs for refactoring filter_instance and aggregate_summaries in hardwareDetails.py to reduce repetition and branching, not a blocker though

Comment on lines +471 to +486
def select_commits_hashes(
self,
tree_heads: list[(str, str)],
selected_commits: Optional[dict[str, str]] = None,
):
selected_commit_hashes = []
if selected_commits:
for idx, head in tree_heads:
if idx in self.selected_commits:
selected_commit = self.selected_commits.get(idx, "head")
selected_commit_hashes.append(
head if selected_commit == "head" else selected_commit
)
else:
selected_commit_hashes = [head for (_, head) in tree_heads]
return selected_commit_hashes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are passing selected_commits as a param and then using self.selected_commits inside the function, I'm not sure if we need the param. The only function call that uses selected_commits also passes self.selected_commits

def post(self, request, hardware_id) -> Response:
def select_commits_hashes(
self,
tree_heads: list[(str, str)],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
tree_heads: list[(str, str)],
tree_heads: list[tuple[str, str]],

 * Bring data grouped by the filter values, ensuring a mid point between
   bringing all the data to be aggregated and filtered on the app, and
   needing to query for every filter change.
 * Build / Test duration is an exception, as they are continuous
   columns, and any change will trigger a new query.
 * Include a validation for invalid status values.
 * Small frontend bugfix on rerender loop when applying trees filter.
@alanpeixinho alanpeixinho force-pushed the fix/improve-hardware-details-summary-performance branch from da26100 to 5be6d73 Compare April 16, 2026 13:58
@MarceloRobert MarceloRobert added this pull request to the merge queue Apr 16, 2026
Merged via the queue into kernelci:main with commit 16fecbe Apr 16, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend Most or all of the changes for this issue will be in the backend code. Queries Issue that involves modifying some DB query

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants