License database schema enhancement and query optimization#2287
License database schema enhancement and query optimization#2287mrrajan merged 10 commits intoguacsec:mainfrom
Conversation
Reviewer's GuideNormalize expanded licenses into new expanded_license and sbom_license_expanded tables, populate them at ingestion, and refactor license/SPDX/CycloneDX listing and filtering logic across services to query the new schema instead of runtime expansion functions/CTEs. Sequence diagram for ingestion-time expanded license populationsequenceDiagram
actor IngestionProcess
participant SbomContext
participant ComponentCreator
participant ExpandedLicenseCreator
participant DB
IngestionProcess->>SbomContext: ingest_spdx_or_cyclonedx(sbom)
SbomContext->>ComponentCreator: new(sbom_id)
SbomContext->>ComponentCreator: create_related_entities(db)
ComponentCreator->>DB: persist_packages_purls_cpes_licenses
DB-->>ComponentCreator: ok
note over ComponentCreator: After SBOM packages and licenses are stored
ComponentCreator->>ExpandedLicenseCreator: new(sbom_id)
ComponentCreator->>ExpandedLicenseCreator: create(db)
ExpandedLicenseCreator->>DB: INSERT expanded_license
DB-->>ExpandedLicenseCreator: upsert unique expanded_text rows
ExpandedLicenseCreator->>DB: INSERT sbom_license_expanded
DB-->>ExpandedLicenseCreator: upsert (sbom_id, license_id, expanded_license_id)
ExpandedLicenseCreator-->>ComponentCreator: ok
ComponentCreator-->>SbomContext: ok
SbomContext-->>IngestionProcess: ingestion complete with normalized licenses
ER diagram for expanded_license normalization schemaerDiagram
sbom {
uuid sbom_id PK
}
license {
uuid id PK
text text
}
sbom_package_license {
uuid sbom_id PK
uuid license_id PK
uuid node_id
text license_type
}
expanded_license {
int id PK
text expanded_text
}
sbom_license_expanded {
uuid sbom_id PK
uuid license_id PK
int expanded_license_id FK
}
licensing_infos {
uuid sbom_id FK
text license_id
text name
}
sbom ||--o{ sbom_package_license : contains
license ||--o{ sbom_package_license : used_in
sbom ||--o{ sbom_license_expanded : has
license ||--o{ sbom_license_expanded : expanded_for
expanded_license ||--o{ sbom_license_expanded : referenced_by
sbom ||--o{ licensing_infos : has
sbom_package_license ||--o{ sbom_license_expanded : maps_to
Class diagram for expanded_license entities and ingestion creatorclassDiagram
class expanded_license_Model {
+i32 id
+String expanded_text
}
class sbom_license_expanded_Model {
+Uuid sbom_id
+Uuid license_id
+i32 expanded_license_id
}
class ExpandedLicenseCreator {
-Uuid sbom_id
+new(sbom_id: Uuid) ExpandedLicenseCreator
+create(db: ConnectionTrait) Result
}
class sbom_package_license_Model {
+Uuid sbom_id
+Uuid license_id
+Uuid node_id
+String license_type
}
class licensing_infos_Model {
+Uuid sbom_id
+String license_id
+String name
}
expanded_license_Model <|-- sbom_license_expanded_Model : referenced_by
sbom_license_expanded_Model --> sbom_package_license_Model : keyed_by
sbom_license_expanded_Model --> licensing_infos_Model : uses_mappings
ExpandedLicenseCreator --> expanded_license_Model : populates
ExpandedLicenseCreator --> sbom_license_expanded_Model : populates
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
6051d81 to
be04c3c
Compare
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In
m0002120_normalize_expanded_license/up.sqltheON CONFLICT (md5(expanded_text))clauses are not valid becausemd5(expanded_text)is an expression, not a column; consider either usingON CONFLICT ON CONSTRAINT idx_expanded_license_text_hashor adding a generated column for the hash and indexing that column instead. - The
sbom_license_expandedtable only enforces a foreign key onexpanded_license_id; if you want referential integrity tosbomandlicenseto match the SeaORM relations, add FKs for(sbom_id) -> sbom.sbom_idand(license_id) -> license.idin the migration.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `m0002120_normalize_expanded_license/up.sql` the `ON CONFLICT (md5(expanded_text))` clauses are not valid because `md5(expanded_text)` is an expression, not a column; consider either using `ON CONFLICT ON CONSTRAINT idx_expanded_license_text_hash` or adding a generated column for the hash and indexing that column instead.
- The `sbom_license_expanded` table only enforces a foreign key on `expanded_license_id`; if you want referential integrity to `sbom` and `license` to match the SeaORM relations, add FKs for `(sbom_id) -> sbom.sbom_id` and `(license_id) -> license.id` in the migration.
## Individual Comments
### Comment 1
<location path="modules/fundamental/src/sbom/service/test.rs" line_range="782-783" />
<code_context>
+ )
+ .await?;
+
+ // REFACTOR: Verify filtering works on COALESCE result
+ assert!(apache_results.total > 0, "Should find Apache licenses");
+
</code_context>
<issue_to_address>
**issue (testing):** This test passes even if filtering returns zero results, which can hide regressions
In `test_sbom_package_license_filtering_with_coalesce`, the assertion on Apache licenses is guarded by `if apache_packages.total > 0`, so a regression that returns zero results won’t fail the test. Instead, assert that `apache_packages.total > 0` and then verify that at least one returned package has an Apache license, e.g.:
```rust
assert!(apache_packages.total > 0, "Expected at least one package to match Apache filter");
let has_apache = apache_packages.items.iter().any(|p| {
p.licenses
.iter()
.any(|l| l.license_name.to_lowercase().contains("apache"))
});
assert!(has_apache, "Filtered packages should contain Apache licenses");
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2287 +/- ##
==========================================
- Coverage 68.10% 67.98% -0.12%
==========================================
Files 425 430 +5
Lines 24886 24731 -155
Branches 24886 24731 -155
==========================================
- Hits 16948 16814 -134
+ Misses 7019 6989 -30
- Partials 919 928 +9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
1041f18 to
7eb7bd4
Compare
|
@mrrajan This is great work. Had a few comments, but it's amazing to see the work you did here! 🥳 |
|
Thanks for the review @ctron and @jcrossley3. All credit goes to @mrizzi, he developed a comprehensive and detailed plan which helped to implement this enhancement with Claude's assistance :). |
32380ea to
b432ce6
Compare
Signed-off-by: mrrajan <86094767+mrrajan@users.noreply.github.com.> Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: mrrajan <86094767+mrrajan@users.noreply.github.com.> Assisted-by: Claude Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: mrrajan <86094767+mrrajan@users.noreply.github.com.> Assisted-by: Claude Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: mrrajan <86094767+mrrajan@users.noreply.github.com.> Assisted-by: Claude Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: mrrajan <86094767+mrrajan@users.noreply.github.com.> Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
I think we'd like to keep the Query processing logic consistent, i.e. DRY. So we can take a mutable reference to the Select struct and create the union before applying the sort filtering. And we use an expression for the license alias that effectively does the field name translation for us. I removed the test for an invalid sort direction as that behavior is inconsistent with every other TPA endpoint accepting a 'q' parameter. If we think we should accept invalid sort directions, we should make that change in the query module, but since the API is largely used internally, I think it's fine to expect a valid direction from callers.
…ced ExpandedLicenseCreator with a function Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
b432ce6 to
03fbc7b
Compare
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
03fbc7b to
0c5a54b
Compare
|
Thanks @mrizzi :) |
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin release/0.4.z
git worktree add -d .worktree/backport-2287-to-release/0.4.z origin/release/0.4.z
cd .worktree/backport-2287-to-release/0.4.z
git switch --create backport-2287-to-release/0.4.z
git cherry-pick -x 49185eb8b05b06fc2baec157f14ce7e4f706a95d 3fa00869c5294b3cd81cbbb2550c1d09ae0ed6c0 e24f31863b4da0d0bfa4ea99a8207ecc9b90e236 bc100e47c7dbf8608e37269ef3b3f47059212a18 23c8277d6196f85da8ae1df4a4b846353049a450 a1cf05386d6d2394f71f4a0be589016f6cf0cb84 42752adfaf9ba89af15eb8da23ad9bbac8588c0e 2d650e8a34ddf8f6c80c8804a9052fa5682aac21 8d4632b56d416850e5189336c8cc0c9b92e3e487 43685c7578e2f053797cbc6ea96e9e3875dd69d4 |
JIRA - TC-3710 and TC-3747
Key Changes:
Summary by Sourcery
Normalize expanded license expressions into dedicated tables and update services to use the new schema for querying and filtering licenses across SBOMs and PURLs.
New Features:
Bug Fixes:
Enhancements:
Documentation:
Tests:
Chores: