Semi aggregated query for hardware details summary endpoint by alanpeixinho · Pull Request #1832 · kernelci/dashboard

alanpeixinho · 2026-03-30T21:04:14Z

Description

This implements performance improvement on the Hardware details summary endpoint.

Semi aggregating data on query to reduce number of returned rows, while allowing for filtering and detailed aggregation.
Ignoring fields not used on the frontend (failed_reasons).

How to test

Open the dashboard.
Go to Hardware page.
Select any available hardware.
The presented information should match previous versions (and staging/production).
The page should load fast even for cases with many instances of builds/boots and tests.

MarceloRobert

Seems like there's some discrepancy between the listing and the details, specially with larger hardware such as kubernetes

MarceloRobert · 2026-04-06T17:30:09Z

Good code though, very organized 👍

alanpeixinho · 2026-04-07T20:36:54Z

Seems like there's some discrepancy between the listing and the details, specially with larger hardware such as kubernetes

I happened due to a misunderstanding in the filtering of dummy builds. Should be correct now.

alanpeixinho · 2026-04-07T20:38:08Z

    "base_hardware, filters",
    [
-        (ASUS_HARDWARE, {"config_name": "defconfig+kcidebug+x86-board"}),
+        (ASUS_HARDWARE, {"config_name": "defconfig"}),


Check if this is change is correct.

alanpeixinho · 2026-04-07T20:38:16Z

    "base_hardware, filters",
    [
-        (ASUS_HARDWARE, {"architecture": "i386"}),
+        (ASUS_HARDWARE, {"architecture": "asus-CM1400CXA-dalboz"}),


Check if this is change is correct.

Copilot

Pull request overview

This PR refactors the Hardware details summary endpoint to use a semi-aggregated SQL query, aiming to reduce row counts returned from the DB and improve page load performance.

Changes:

Replace per-record processing with server-side aggregation via new get_hardware_details_summary() query.
Add filter prefetch/sanitization (get_hardware_details_filters) and status-filter validation.
Update summary aggregation logic and adjust integration tests to match new behaviors.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`backend/kernelCI_app/views/hardwareDetailsSummaryView.py`	Reworks endpoint logic to consume aggregated rows and rebuild summary/common/filters.
`backend/kernelCI_app/queries/hardware.py`	Adds new aggregated summary query + filter discovery query; refactors `query_records`.
`backend/kernelCI_app/helpers/filters.py`	Adds status validation + helper `is_filtered_out`; adjusts filter handler keys.
`backend/kernelCI_app/typeModels/common.py`	Extends `StatusCount.increment()` to support incrementing by an arbitrary count.
`backend/kernelCI_app/typeModels/commonDetails.py`	Adds `__add__/__iadd__` to `BuildArchitectures` to support aggregation.
`backend/kernelCI_app/tests/integrationTests/hardwareDetailsSummary_test.py`	Updates expectations for invalid ID / invalid filters; adjusts some filter values.
`backend/kernelCI_app/helpers/hardwareDetails.py`	Makes issue fields access more defensive via `record.get(...)`.
`backend/kernelCI_app/constants/localization.py`	Adds client-facing error string for invalid filters.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

MarceloRobert · 2026-04-08T14:26:42Z

+        clause += (
+            "AND (NOT (tests.path like 'boot.%%' or tests.path = 'boot') "
+            f"OR tests.duration >= {duration_min})\n"
+        )


doesn't this mean "AND (test is not boot OR test_duration >= min)"? IIUC in this case the right would be to include the test if it is boot and its duration is >= min

What I'm understanding is that you are filtering in the results of the clause, meaning that if the clause is True then the result will return. So if we want boots where the duration is within the interval, we want to include "tests that are boot and have duration greater than min and lower than max", the current code sounds like the opposite. I might be mistaken though, not sure here

Yes, you are correct. But the idea is to NOT apply this filter on lines that are not boot (regular tests).
The idea is (for this filter):

not boot passes

boot with duration >= min passes

But as you mentioned before, we should include the boot checking for the regular tests as well.
I am not entirelly satisfied with this clauses. If you have any better alternative, I am willing to try.

MarceloRobert

Looks good, the behavior seems correct. Only some small changes left about the comments that were already made

gustavobtflores · 2026-04-09T14:06:50Z



+def is_valid_status(status: str) -> bool:
+    return status in StatusChoices or status == "NULL" or status == "null"


very nit: status.lower() == "null" or status.upper() == "NULL" wouldn't work?

got it, the only problem here, is that we might accept combinations such as "Null", "nUll". But I dont think this is a big problem.

gustavobtflores · 2026-04-09T14:08:13Z

+def is_filtered_out(value: str, filter_values: set[set]):
+    if filter_values and value not in filter_values:
+        return True
+    return False


I think its more concise:

Suggested change

def is_filtered_out(value: str, filter_values: set[set]):

if filter_values and value not in filter_values:

return True

return False

def is_filtered_out(value: str, filter_values: set[set]):

return filter_values and value not in filter_values

gustavobtflores · 2026-04-09T14:36:22Z



 def test_invalid_filters(invalid_filters_input):
+


patch noise

gustavobtflores · 2026-04-09T14:37:18Z

    NULL: Optional[int] = 0

-    def increment(self, status: Optional[str]) -> None:
+    def increment(self, status: Optional[str], value=1) -> None:


maybe type value here?

gustavobtflores · 2026-04-09T14:40:51Z

+    def filter_instance(
+        self,
+        *,
+        hardware_id: str,
+        config: str,
+        origin: str,
+        lab: str,
+        compiler: str,
+        architecture: str,
+        status: str,
+        known_issues: set[str],
+        is_build: bool,
+        is_boot: bool,
+        is_test: bool,
+    ) -> bool:


origin seems unused here

gustavobtflores · 2026-04-09T14:55:28Z

+        if is_build and is_filtered_out(status, filters.filterBuildStatus):
+            return True
+        if is_boot and is_filtered_out(status, filters.filterBootStatus):
+            return True
+        if (
+            is_test
+            and not is_boot
+            and is_filtered_out(status, filters.filterTestStatus)
+        ):
+            return True


here I think we could do:

Suggested change

if is_build and is_filtered_out(status, filters.filterBuildStatus):

return True

if is_boot and is_filtered_out(status, filters.filterBootStatus):

return True

if (

is_test

and not is_boot

and is_filtered_out(status, filters.filterTestStatus)

):

return True

filter_type = self.get_filter_type(is_build, is_boot, is_test)

status_filter_map = {

"build": filters.filterBuildStatus,

"boot": filters.filterBootStatus,

"test": filters.filterTestStatus,

}

if is_filtered_out(status, status_filter_map[filter_type]):

return True

gustavobtflores · 2026-04-09T14:57:06Z

+        if is_filtered_out(config, filters.filterConfigs):
+            return True
+        if is_filtered_out(lab, filters.filter_labs):
+            return True
+        if is_filtered_out(architecture, filters.filterArchitecture):
+            return True
+        if is_filtered_out(hardware_id, filters.filterHardware):
+            return True


these ones couldn't be a list together with compiler? or order is important here?

Suggested change

if is_filtered_out(config, filters.filterConfigs):

return True

if is_filtered_out(lab, filters.filter_labs):

return True

if is_filtered_out(architecture, filters.filterArchitecture):

return True

if is_filtered_out(hardware_id, filters.filterHardware):

return True

field_checks = [

(compiler, filters.filterCompiler),

(config, filters.filterConfigs),

(lab, filters.filter_labs),

(architecture, filters.filterArchitecture),

(hardware_id, filters.filterHardware),

]

if any(is_filtered_out(val, filt) for val, filt in field_checks):

return True

gustavobtflores · 2026-04-09T14:58:27Z

+            if is_build:
+                self.increment_build(
+                    builds_summary=builds_summary,
+                    status_count=status_count,
+                    architecture=architecture,
+                    config=config,
+                    lab=lab,
+                    origin=origin,
+                    known_issues=len(known_issues) - 1,
+                    compiler=compiler,
+                )
+
+            elif is_boot:
+                self.increment_test(
+                    tests_summary=boots_summary,
+                    status_count=status_count,
+                    config=config,
+                    lab=lab,
+                    origin=origin,
+                    known_issues=len(known_issues) - 1,
+                    architecture=architecture,
+                    compiler=compiler,
+                    platform=platform,
+                )
+
+            elif is_test:
+                self.increment_test(
+                    tests_summary=tests_summary,
+                    status_count=status_count,
+                    config=config,
+                    lab=lab,
+                    origin=origin,
+                    known_issues=len(known_issues) - 1,
+                    architecture=architecture,
+                    compiler=compiler,
+                    platform=platform,
+                )


I think we could use get_filter_type here too to avoid multiple branches, not sure though, because of the params differences

we could go for something like:

summary_type = self.get_summary_type(is_build=is_build, is_boot=is_boot, is_test=is_test) increment_strategy = { 'builds': partial(increment_build, self, builds_summary=builds_summary), 'boots': partial(increment_test, self, tests_summary=boots_summary), 'tests': partial(increment_test, self, tests_summary=tests_summary), } increment_strategy[summary_type]( status_count=status_count, architecture=architecture, config=config, lab=lab, origin=origin, known_issues=len(known_issues) - 1, compiler=compiler, platform=platform, )

But I am afraid is a little bit of overengineer for 3 conditionals

What you think of it?

MarceloRobert

From testing, seems like the hardware compatible filter is always returning empty, and the tree filter is not working

dede999

Review Summary

The performance strategy is solid — moving aggregation into SQL (GROUP BY + UNION ALL) to reduce rows returned to Python is the right approach. A few items to address before merge, the most critical being the SQL injection in duration clauses.

dede999 · 2026-04-13T19:34:53Z

+    # builds
+    duration_min, duration_max = builds_duration
+    if duration_min:
+        clause += f"AND builds.duration >= {duration_min}\n"


🔴 bad — SQL injection via f-string interpolation.

duration_min and duration_max are interpolated directly into SQL via f-strings here and on the lines below (_get_boot_test_duration_clause has the same pattern). The rest of the query correctly uses %s placeholders — these should too.

def _get_build_duration_clause(builds_duration, params: list) -> str: clause = "" duration_min, duration_max = builds_duration if duration_min: clause += "AND builds.duration >= %s\n" params.append(duration_min) if duration_max: clause += "AND builds.duration <= %s\n" params.append(duration_max) return clause

Same fix needed for _get_boot_test_duration_clause.

Good call, I will include named parametrization to avoid this

dede999 · 2026-04-13T19:34:54Z

 NULL_STRINGS = set(["null", UNKNOWN_STRING, "NULL"])


+def is_valid_status(status: str) -> bool:


🔴 bad — {*StatusChoices, "NULL"} reconstructs the set on every call, and if status is None, status.upper() will raise AttributeError.

Suggestion:

VALID_STATUSES = {choice.value for choice in StatusChoices} | {"NULL"} def is_valid_status(status: str) -> bool: return status is not None and status.upper() in VALID_STATUSES

This shouldn't be a problem, as it is not in any hot path of the application, also the set is quite small.
The status is a str, not Optional[str], it should not be None.

dede999 · 2026-04-13T19:34:54Z

            "test.status": self._handle_test_status,
            "test.duration": self._handle_test_duration,
            "build.status": self._handle_build_status,
+            "duration": self._handle_build_duration,  # TODO: same as build.duration (should be standardized)


🔵 nit — If this "duration" alias is known tech debt, consider creating a ticket instead of a TODO — it may confuse future contributors about which filter key to use.

dede999 · 2026-04-13T19:34:54Z

+            status = instance["status"]
+            count = instance["count"]
+            incidents = instance["incidents_count"]
+            known_issues = set(parse_issue(issue) for issue in instance["known_issues"])


🟡 medium — When instance["known_issues"] contains None entries (from array_agg when there are no incidents), parse_issue(None) returns (UNCATEGORIZED_STRING, None). These phantom tuples are then checked against filters.filterIssues, which could cause false-positive filtering.

Suggestion:

known_issues = set(parse_issue(issue) for issue in instance["known_issues"] if issue is not None)

dede999 · 2026-04-13T19:34:54Z

+                )
+
+        # ensure uniqueness on architecture and compilers (maybe we could change data structures???)
+        for summary in builds_summary.architectures.values():


🔵 nit — The loop variable summary shadows the method parameter summary: list[dict]. Consider renaming to arch_summary or similar to avoid confusion.

dede999 · 2026-04-13T19:34:54Z

+            MISS=self.MISS + other.MISS,
+            DONE=self.DONE + other.DONE,
+            NULL=self.NULL + other.NULL,
+            compilers=self.compilers,


🟡 medium — __add__ only keeps self.compilers, silently discarding other.compilers. __iadd__ also doesn't merge compilers. This might be intentional (compilers are merged separately in aggregate_summaries), but it's a silent data-loss trap if these operators are used elsewhere.

StatusCount does not have a compilers, if we do include an option to add two BuildArchitectures (which could be the case some point in future), than we would need to deal with this

I also dont know if Build Architectures should inherit from StatusCount, composition might make more sense in this case, instead of inheritance

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 12 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

MarceloRobert

Looks like the behavior is the same as before now, all good. The opened comments still make sense, but we can work on them later; maybe make a TODO for them or open an issue if it's major. Either way, I'll approve; it's working well

gustavobtflores

Looks good, just some minor comments. I would leave some TODOs for refactoring filter_instance and aggregate_summaries in hardwareDetails.py to reduce repetition and branching, not a blocker though

gustavobtflores · 2026-04-16T13:29:35Z

+    def select_commits_hashes(
+        self,
+        tree_heads: list[(str, str)],
+        selected_commits: Optional[dict[str, str]] = None,
+    ):
+        selected_commit_hashes = []
+        if selected_commits:
+            for idx, head in tree_heads:
+                if idx in self.selected_commits:
+                    selected_commit = self.selected_commits.get(idx, "head")
+                    selected_commit_hashes.append(
+                        head if selected_commit == "head" else selected_commit
+                    )
+        else:
+            selected_commit_hashes = [head for (_, head) in tree_heads]
+        return selected_commit_hashes


We are passing selected_commits as a param and then using self.selected_commits inside the function, I'm not sure if we need the param. The only function call that uses selected_commits also passes self.selected_commits

gustavobtflores · 2026-04-16T13:31:13Z

-    def post(self, request, hardware_id) -> Response:
+    def select_commits_hashes(
+        self,
+        tree_heads: list[(str, str)],


nit:

Suggested change

tree_heads: list[(str, str)],

tree_heads: list[tuple[str, str]],

* Bring data grouped by the filter values, ensuring a mid point between bringing all the data to be aggregated and filtered on the app, and needing to query for every filter change. * Build / Test duration is an exception, as they are continuous columns, and any change will trigger a new query. * Include a validation for invalid status values. * Small frontend bugfix on rerender loop when applying trees filter.

MarceloRobert assigned alanpeixinho Mar 31, 2026

MarceloRobert added Backend Most or all of the changes for this issue will be in the backend code. Queries Issue that involves modifying some DB query labels Mar 31, 2026

MarceloRobert reviewed Mar 31, 2026

View reviewed changes

alanpeixinho force-pushed the fix/improve-hardware-details-summary-performance branch 10 times, most recently from 9c304aa to 11b5c99 Compare April 2, 2026 18:52

alanpeixinho marked this pull request as ready for review April 2, 2026 18:53

alanpeixinho force-pushed the fix/improve-hardware-details-summary-performance branch 2 times, most recently from 162a78e to 93f201f Compare April 6, 2026 17:18

MarceloRobert requested changes Apr 6, 2026

View reviewed changes

Comment thread backend/kernelCI_app/views/hardwareDetailsSummaryView.py

Comment thread backend/kernelCI_app/views/hardwareDetailsSummaryView.py Outdated

alanpeixinho force-pushed the fix/improve-hardware-details-summary-performance branch 3 times, most recently from 0002e6c to a19b158 Compare April 7, 2026 20:26

alanpeixinho requested a review from MarceloRobert April 7, 2026 20:29

alanpeixinho commented Apr 7, 2026

View reviewed changes

MarceloRobert requested a review from Copilot April 8, 2026 14:11

Copilot started reviewing on behalf of MarceloRobert April 8, 2026 14:11 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

MarceloRobert reviewed Apr 8, 2026

View reviewed changes

alanpeixinho force-pushed the fix/improve-hardware-details-summary-performance branch from 8b78eaa to ea1c257 Compare April 8, 2026 20:37

MarceloRobert reviewed Apr 9, 2026

View reviewed changes

gustavobtflores reviewed Apr 9, 2026

View reviewed changes

alanpeixinho force-pushed the fix/improve-hardware-details-summary-performance branch 2 times, most recently from 3a753af to 4270998 Compare April 9, 2026 21:58

alanpeixinho requested review from MarceloRobert and gustavobtflores April 9, 2026 22:00

MarceloRobert reviewed Apr 10, 2026

View reviewed changes

Comment thread backend/kernelCI_app/queries/hardware.py Outdated

MarceloRobert reviewed Apr 10, 2026

View reviewed changes

dede999 reviewed Apr 13, 2026

View reviewed changes

alanpeixinho force-pushed the fix/improve-hardware-details-summary-performance branch 4 times, most recently from d0a33ef to 98ea304 Compare April 14, 2026 19:38

MarceloRobert requested a review from Copilot April 14, 2026 19:59

Copilot started reviewing on behalf of MarceloRobert April 14, 2026 20:00 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

alanpeixinho force-pushed the fix/improve-hardware-details-summary-performance branch 3 times, most recently from cb8b2d9 to 366299f Compare April 14, 2026 21:10

MarceloRobert approved these changes Apr 15, 2026

View reviewed changes

Comment thread dashboard/src/pages/hardwareDetails/HardwareDetails.tsx Outdated

alanpeixinho force-pushed the fix/improve-hardware-details-summary-performance branch from 366299f to da26100 Compare April 15, 2026 21:21

MarceloRobert approved these changes Apr 16, 2026

View reviewed changes

gustavobtflores approved these changes Apr 16, 2026

View reviewed changes

alanpeixinho force-pushed the fix/improve-hardware-details-summary-performance branch from da26100 to 5be6d73 Compare April 16, 2026 13:58

MarceloRobert added this pull request to the merge queue Apr 16, 2026

Merged via the queue into kernelci:main with commit 16fecbe Apr 16, 2026
7 checks passed

gustavobtflores mentioned this pull request Apr 20, 2026

Improve performance on the hardware summary endpoint #1821

Closed



		def is_valid_status(status: str) -> bool:
		return status in StatusChoices or status == "NULL" or status == "null"

		NULL_STRINGS = set(["null", UNKNOWN_STRING, "NULL"])


		def is_valid_status(status: str) -> bool:

	tree_heads: list[(str, str)],
	tree_heads: list[tuple[str, str]],

Conversation

alanpeixinho commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How to test

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MarceloRobert left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MarceloRobert commented Apr 6, 2026

Uh oh!

alanpeixinho commented Apr 7, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MarceloRobert left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MarceloRobert left a comment

alanpeixinho commented Mar 30, 2026 •

edited

Loading