Open
Conversation
- Replace epoch-based captureTime (int) with RFC 3339 strings (Optional[str]) across hardware, software, SBOM, and provenance models - Update Software creation to use RFC 3339 timestamps instead of time.time() - Remove unused `time` import - Add utility (utc_now_rfc3339) to generate schema-compliant UTC timestamps This aligns with CyTRICS v1.0.1 schema requirements: - type: ["string", "null"] - format: "date-time" (RFC 3339) Ensures consistent, timezone-aware timestamps across all captureTime fields.
for more information, see https://pre-commit.ci
- Add tests/schema/test_cytrics_schema.py to validate captureTime against CyTRICS schema (docs/cytrics_schema/schema.json) - Verify RFC 3339 date-time compliance for generated timestamps - Ensure hardware and software captureTime accept string/null as defined - Ensure file captureTime accepts string and rejects null - Add negative tests for invalid formats (missing timezone, epoch integers) - Include JSON Schema FormatChecker to enforce date-time validation These tests prevent regression to epoch-based timestamps and ensure alignment with CyTRICS v1.0.1 captureTime requirements.
Use correct relative import (`..utils`) to resolve ModuleNotFoundError
for more information, see https://pre-commit.ci
🧪 SBOM Results (16/16)
|
- Add validate_capture_time utility enforcing RFC 3339 with required timezone using regex + datetime parsing - Support nullable behavior and consistent error handling across models - Add unit tests for capture_time validation (valid cases, null, and failures) - Remove schema tests that incorrectly assumed jsonschema enforces timezone - Keep schema tests focused on actual schema validation behavior This separates schema validation from stricter application-level validation and ensures captureTime values are consistently timezone-aware.
for more information, see https://pre-commit.ci
- add jsonschema to test dependency groups and optional test extras - update pytest fixtures to use `name=` to avoid pylint redefined-outer-name warnings - ensure schema validation tests run correctly in CI
- validate captureTime in dataclass __post_init__ (File, Hardware, Software, provenance) - add validation at SBOM.create_software boundary - enforce validation on captureTime updates in Software Aligns runtime behavior with CyTRICS schema requirements.
for more information, see https://pre-commit.ci
…dling
- validate captureTime for software entries and nested components in CLI add path
- fix installPath update logic to handle None containerPath/installPath safely
- remove deprecated system-related CLI options and merge logic
- drop create_system_object and captureStart/captureEnd handling
- standardize graph node type comparison ("path" vs "Path")
Aligns CLI and merge behavior with CyTRICS v1.0.1 schema and improves input validation robustness.
for more information, see https://pre-commit.ci
…idation tests - replace legacy epoch-based captureTime values with RFC 3339 strings in test data - update CLI tests to use schema-compliant captureTime values - add tests for nested SoftwareComponent captureTime validation - add negative test for invalid component captureTime - ensure test fixtures align with CyTRICS v1.0.1 date-time requirements Fixes test failures caused by stricter captureTime validation and completes schema migration.
for more information, see https://pre-commit.ci
- remove legacy system config helper from merge tests - update merge test calls to use the new three-argument signature - delete tests for deprecated add_system and system_uuid behavior - add coverage to assert merged output does not emit systems Aligns merge tests with the CyTRICS v1.0.1 merge behavior.
…chema alignment - remove System model and all system-related logic from SBOM - remove provenance models and references across hardware, software, and observation - delete analysisData model and related handling - simplify SBOM merge and graph logic to software-only relationships - update SBOM deserialization to drop legacy systems/analysisData handling - clean up exports in sbomtypes __init__ Aligns SBOM data model with CyTRICS v1.0.1 schema by removing deprecated system and provenance constructs and simplifying merge behavior.
…engthen validation Add new schema-aligned models: Author, CommentEntry, NameEntry, Tool Refactor Software, Hardware, File, and Relationship to match CyTRICS v1.0.1 structure Replace primitive fields with structured types (e.g., name, comments, metadata) Add enum support for relationshipAssertion Introduce notHashable and enforce hash presence rules Implement comprehensive runtime validation across all models (types, lists, UUIDs, RFC3339 timestamps) Refactor Software.merge() to: Treat scalar vs array fields consistently Route all updates through _update_field() with full-object revalidation Prevent invalid state via rollback on validation failure Remove deprecated/unsupported schema elements: Observation, StarRelationship, SoftwareComponent Update SBOM: Remove legacy observation/star relationship handling Normalize relationship handling through graph as single source of truth Ensure list fields use explicit typing (List[...]) and validate item types Standardize metadata handling to List[Dict[str, Any]] Update module exports (__init__.py) to reflect new schema types BREAKING CHANGE: Removes legacy SBOM fields (observations, starRelationships, SoftwareComponent) Changes structure of Software, Hardware, and related types to match schema Updates merge() behavior to enforce strict validation
for more information, see https://pre-commit.ci
Use explicit exception chaining in hardware, relationship, and software UUID validation paths to satisfy Ruff B904 and preserve the original exception context during error handling.
…ests Update CycloneDX and SPDX writers to stop importing and emitting the removed System model, and adjust ELF relationship tests to construct SBOMs without the obsolete systems field as part of the CyTRICS v1.0.1 schema migration.
Update the Syft plugin to use RFC 3339 capture times and structured NameEntry and CommentEntry values, and remove obsolete Software fields dropped by the schema migration. Add TODOs to revisit nameType mapping and migrate supplementary file paths to sbomtypes.File objects.
Remove CLI handling and tests for obsolete software components and systems, update malformed test fixtures to use structured name/comment fields, and migrate dotnet relationship tests to valid UUIDs with notHashable software entries under the v1.0.1 schema.
for more information, see https://pre-commit.ci
Update ELF and PE relationship tests to use valid UUID strings and mark synthetic software fixtures as notHashable under the CyTRICS v1.0.1 schema. Also refresh expected relationship assertions and clean up stale test docstrings that referenced old placeholder IDs.
for more information, see https://pre-commit.ci
- add top-level BOM fields and schema-aware serialization to SBOM - preserve relationship comments through load, merge, and JSON output - validate relationship data before mutating the graph - make File and NameEntry optional-field handling schema compliant - enforce UUIDv4 validation in Software - update create_software to match the migrated model and sync graph/fs_tree state - migrate fs_tree tests to valid UUID fixtures and add regression coverage
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
- rename hookspec.py reference to hookspecs.py - update identify_file_type docs to reflect string or list[str] returns - update extract_file_info docs to note metadata-object return - fix hookspec line anchors for plugin hook links
Add Software.update_field() as a public wrapper around validated field updates, and switch generate.py and _sbom.py to use it instead of accessing _update_field() directly. This preserves the existing validation behavior while resolving pylint protected-access warnings.
Add CyTRICS schema validation to SBOM.from_dict/from_json and SBOM.to_dict/to_json using the repository schema at docs/cytrics_schema/schema.json. Update merge fixtures to be valid CyTRICS 1.0.1 documents by adding required bomFormat/specVersion fields and software hashes. Adjust file-type tests to construct schema-valid Software entries via Software.create_software_from_file() so strict Software validation remains intact.
Replace the implicit raw-dict loading path in SBOM.__post_init__() with an explicit SBOM._from_raw_dict() constructor and route from_dict()/from_json() through it. Update the HELICS sample SBOM to match the CyTRICS 1.0.1 schema by adding bomFormat/specVersion, removing legacy unsupported fields, and normalizing software comments to null.
for more information, see https://pre-commit.ci
Make relationship test SBOM fixtures explicitly declare bomFormat="cytrics" and specVersion="1.0.1" so the tests reflect the required CyTRICS document root fields instead of relying on SBOM defaults. Also update Java relationship fixtures to use valid UUIDs and schema-valid Software entries with hashes, preserving strict Software validation while keeping the tests green.
Update helics_binaries_sbom.json and helics_libs_sbom.json to match the CyTRICS 1.0.1 schema by adding bomFormat/specVersion, normalizing comments to null, removing legacy unsupported fields, and dropping obsolete top-level sections.
Add jsonschema to the main project dependencies since schema validation is now used outside the test-only path. Remove jsonschema from the test dependency group and test optional dependencies to avoid duplicating the requirement.
Refactor Software.__post_init__ into focused private validation helpers without changing validation behavior. Also omit notHashable from serialized software entries when it is unset, so hashed artifacts do not emit notHashable: null.
Rewrite merge_sbom.py to use typed SBOM load/merge/write logic, log roots and cycles, and remove the legacy raw-dict/system-entry merge path. Update README-merge_sbom.md to document the supported merge workflow and remove the obsolete --config_file/system-entry behavior. Harden merge_additional_metadata.py by routing through SBOM validation/serialization, matching strict sidecar filenames, handling duplicate sha256 matches, deduping repeated metadata, deep-copying appended metadata, and erroring on malformed or unmatched sidecar sha256hash values. Remove the obsolete merge_config.json file.
for more information, see https://pre-commit.ci
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR updates Surfactant’s CyTRICS implementation to align with schema v1.0.1. It introduces schema-validated read/write paths, adds the v1.0.1 BOM root fields and structured types expected by the schema (
bomUUID,bomFormat,bomDescription,specVersion, optionalauthors, and optionaltools), adds support for structuredname/comments,softwareType, and relationship comments, switchescaptureTimehandling to RFC 3339 strings, and enforces the requirement that software has at least one hash unlessnotHashableis true. It also removes legacy CyTRICS structures and command flows that are not part of the v1.0.1 document shape.What changed
Added schema-backed CyTRICS validation and normalization in
sbomtypes.SBOM.from_dict(...)/SBOM.from_json(...)now validate input against the checked-in CyTRICS 1.0.1 JSON schema, andSBOM.to_dict(...)/SBOM.to_json(...)validate serialized output before writing. This work also addsAuthor,Tool,NameEntry, andCommentEntrytypes plus RFC 3339 capture-time helpers, and updatesSBOM,Software,Hardware,File, andRelationshipto the 1.0.1 shape, including BOM root fields,softwareType, and relationship comments.Removed legacy CyTRICS constructs from the typed model and merge/generate flow, including
systems,analysisData,observations,starRelationships, provenance types,recordedInstitution, and legacy software component wrappers. Themergecommand and helper scripts no longer support config/system-wrapping behavior, the generate path drops--recorded_institution, and the CLI/script paths no longer create or wire in a synthetic top-level system object. The CycloneDX and SPDX writers were updated accordingly so they no longer emit system-derived components/packages from the old model.Updated generation and plugin contracts to match the new schema-valid data flow.
identify_file_typecan now return either a string orlist[str], generator code normalizes file-type results before extraction, extractor hooks are documented and enforced to return metadata JSON objects, and field hints are normalized into schema-compatiblename,comments,vendor, and string fields before being applied. The Syft plugin was updated to emitNameEntry,CommentEntry, and RFC 3339 timestamps, and the plugin documentation was refreshed to describe the new hook behavior.Hardened merge, serialization, and CLI behavior around the new model. Merge now rejects colliding hash data, preserves and merges relationship comments, rewrites and de-duplicates merged
containerPathvalues, routes newly added software through the normal index/FS-tree update path, and rebuilds symlink edges from merged metadata after in-place software merges.cli addnow validatescaptureTimebefore deserialization, install-path augmentation de-duplicates derived paths,cli_baseno longer relies on the old dataclass-field pickle workaround, andstatnow reads through the configured input plugin rather than assuming raw CyTRICS JSON.Updated supporting scripts, docs, and dependencies to match the new workflow.
merge_sbom.pyis now a direct typed CyTRICS merge helper with stdin/stdout support plus root/cycle logging,merge_additional_metadata.pynow validates sidecar metadata objects and fails on malformed or unmatchedsha256hashvalues,merge_config.jsonwas removed, andpyproject.tomladds thejsonschemaruntime dependency pluspytest-asynciofor tests.Refreshed fixtures and tests for 1.0.1 compliance. Sample SBOMs now include the required root fields, remove legacy root and per-software fields, convert
captureTimevalues from epoch integers to RFC 3339 strings, and update relationship/unit tests to use valid UUIDs plus hashes ornotHashablewhere required. New schema-focused and capture-time-focused tests were added, along with updated CLI, merge, file-type, FS-tree, serialization, and relationship coverage.Why
The old CyTRICS path still allowed legacy document shapes, loosely typed metadata, legacy CLI/config flows, and timestamp/hash handling that do not match the v1.0.1 schema. This branch moves compliance checks into model construction, serialization, plugin ingestion, CLI entry points, and merge utilities so invalid data fails early instead of being silently carried forward or emitted.
Reviewer notes
This is intentionally a breaking cleanup for the typed CyTRICS model: top-level
systemshandling, related merge/config flows, provenance classes,recordedInstitution, and legacy software component wrappers are removed because they are not part of the v1.0.1 schema shape Surfactant now validates against.User-visible command behavior changed with that cleanup.
generateno longer exposes--recorded_institution.merge_sbom.pyno longer supports--config_file, andmergeno longer supports top-level system creation/wrapping options such as--system_uuid,--system_relationship, and--add_system.Plugin authors should review the hook contract changes.
identify_file_typemay now return multiple file-type matches, andextract_file_infois expected to return a metadata object orNone; plugin docs and built-in file-type ID plugins were updated to reflect that contract.captureTimeis now schema-driven everywhere and must be an RFC 3339date-timestring with timezone information. Legacy epoch-style timestamps are rejected by the new validation path and by the added tests.Sidecar metadata merging is stricter, and merge semantics are safer: malformed additional metadata now errors, unmatched
sha256hashvalues now fail fast, software entries with colliding hash data are refused during merge, and relationship comment data is preserved when duplicate relationships are merged.Testing
Updated CLI, merge, file-type, FS-tree, serialization, Java/.NET/ELF/PE relationship, fixture, schema, and capture-time tests to match the v1.0.1 model and serialization rules. The test dependency set was also expanded to include
pytest-asyncio.