Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import Component without a PURL (or type) from a CDX 1.4 SBOM #1249

Closed
mjherzog opened this issue May 31, 2024 · 6 comments
Closed

Import Component without a PURL (or type) from a CDX 1.4 SBOM #1249

mjherzog opened this issue May 31, 2024 · 6 comments
Assignees
Labels
medium priority outputs This issue is related to one of the SCIO output files/

Comments

@mjherzog
Copy link
Member

We have recently received some CDX 1.4 (JSON) SBOMs generated with Black Duck tools. There is a pattern of Components (CDX terminology) without PURLs that were processed as Error Messages from the SCIO load_sbom pipeline.

The pattern for the Error Messages is:

warning 	DiscoveredPackage 	
No values for the following required fields: type
name: FSharp.Core
version: Unknown
datafile_paths: []
declared_license_expression: mit
extracted_license_statement: MIT

There are several potential issues here:

  1. These are all cases where the SBOM Component has a name and version, but no PURL.
  2. type is a CDX required field so I expect that the warning is from a CDX library. On the other hand, we want to be able to load the data into SCIO even if type is missing so that we can work with the data in SCIO or its XLSX output - i.e. we want to have more forgiving validation than CDX - esp. since type is arguably somewhat arbitrary in many cases where we see every Component with type=library.
  3. I am not sure what SCIO would do with this data if it contained a valid type because we do not seem to have enough data to construct a PURL and it is not a file (Resource). We need some way to record Components without a PURL in SCIO (as we already have in DejaCode) because there is no requirement for every Component to have a PURL.

SCIO version is 34.4.0 on deja08.

@mjherzog mjherzog added medium priority outputs This issue is related to one of the SCIO output files/ labels May 31, 2024
@tdruez
Copy link
Member

tdruez commented May 31, 2024

In ScanCode.io the PURL is the identifier of Packages, making the type and name fields required by design to create a Package entry.

On the other hand, we want to be able to load the data into SCIO even if type is missing so

I can see 2 approaches to this:

  • Set a generic/unknown type when no type value is provided so the Package can be created (assuming at least a name is available). This requires very small code modification.
  • Update the model and re-design the Package identifier. This will have a major impact on everything, the code, the pipelines, and the UI, as everything is based on PURL at the moment, and not having a PURL on all Packages will break most of the features related to Packages.

as we already have in DejaCode

DejaCode also supports a filename+download_url combination as an alternative to providing a purl, but in the case of the SBOMs, it's unlikely those 2 fields are provided, as they are not even part of the CDX schema.

@pombredanne
Copy link
Member

Some thoughts:

  • "Component" in CycloneDX has a ref which is a PURL in the general case or something entirely different
  • Therefore we need some way to either:
    • map these to a package and PURL in SCIO
    • OR use DejaCode "component" concept and model and bring it to SCIO

"Map these to a package and PURL in SCIO" is likely the better option and we could also promote this with the PRUL spec so everyone can benefit.

  • Using generic would be a bit of an overload as the semantics of a "generic" PURL type are already defined @ PURL and they demand a download URL.
  • We could define a new type, like "unknown" or "custom"

@tdruez overall, I am seconding your approach to "Set a generic/unknown type when no type value is provided so the Package can be created (assuming at least a name is available). This requires very small code modification."

@mjherzog
Copy link
Member Author

So can we use unknown/unknown for the type/namespace? or do we just skip namespace since we do not know whether namespace will have any meaning in this context. I think that we can start with just pkg:/unknown/name&version

@tdruez
Copy link
Member

tdruez commented May 31, 2024

We can skip namespace as it's not a required PURL field, only type and name are.
Ok, we can go ahead with pkg:unknown/name&version for unspecified type.

tdruez added a commit that referenced this issue Jun 3, 2024
tdruez added a commit that referenced this issue Jun 3, 2024
@tdruez
Copy link
Member

tdruez commented Jun 3, 2024

Merged in #1251

@tdruez tdruez closed this as completed Jun 3, 2024
@mjherzog
Copy link
Member Author

mjherzog commented Jun 3, 2024

Looks good from my testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
medium priority outputs This issue is related to one of the SCIO output files/
Projects
None yet
Development

No branches or pull requests

4 participants