Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group non-package files together for analysis and reporting purposes #914

Closed
DennisClark opened this issue Sep 7, 2023 · 10 comments
Closed
Assignees
Labels
enhancement New feature or request high priority outputs This issue is related to one of the SCIO output files/ reporting analyze scan results

Comments

@DennisClark
Copy link
Member

Scan results are currently focused on packages, but in a typical product codebase there are lots of files that are not (or no longer) associated with packages, and these files may be very important from a license compliance perspective. Some cases include:

  • first-party files that could be reported together as belonging to the product copyright holder under the product license expression
  • files that have been extracted from third-party archives and used individually, often containing copyright or license statements or both

These miscellaneous files are currently difficult to analyze in the product codebase scan results, especially when there are lots of them, and they are not always being included in attribution and SBOM outputs.

Consider grouping these miscellaneous files into "local" packages (or "custom", "virtual", "logical", "file_set" packages), perhaps combining them by license_expression. It may also make sense to define these using the PURL convention; consider:

Type: "local" (or some other descriptive label that distinguishes them from standard packages)
Namespace: the name of the product codebase
Name: the common license_expression for the group of files
Version: the version of the product codebase
Qualifiers: optional, most likely null
Subpath: optional, most likely null

The primary advantage of creating "local" packages in the scan results are:

  • organize the data for review and analysis, where the assigned PURL makes it clear that these "packages" are a special case
  • take advantage of the existing ScanCode.io UI
  • take advantage of existing code that generates attribution and SBOMs

I prefer the term "local" for the PURL Type value, but this is of course open to discussion.

@DennisClark DennisClark added enhancement New feature or request high priority reporting analyze scan results outputs This issue is related to one of the SCIO output files/ labels Sep 7, 2023
@tdruez
Copy link
Member

tdruez commented Sep 11, 2023

@DennisClark Could you clarify the following:

  • "the name of the product codebase"
  • "the version of the product codebase"

There's no concept of Product in the ScanCode.io context, the Pipelines/Scans are regrouped by "Analysis projects".

@DennisClark
Copy link
Member Author

@tdruez I guess that we would have to use the scancode.io project name for the Name, and we don't have any value for the version field.

@DennisClark
Copy link
Member Author

DennisClark commented Sep 11, 2023

but the best solution would be to add the concept of Product (product name + product version) to SCIO to improve integration, ultimately, with other applications.

@mjherzog
Copy link
Member

In the big picture we need to be sure that we fully support reporting of first-party code, especially first-party code that has a copyright or license notice.

@DennisClark
Copy link
Member Author

let's go with this:
Type: "local" (or some other descriptive label that distinguishes them from standard packages)
Namespace: the name of the SCIO Project
Name: the common license_expression for the group of files

@DennisClark
Copy link
Member Author

alternative value for "Name": a UUID

@DennisClark
Copy link
Member Author

Let's go with this:
Type: "local-files"
Namespace: a generated UUID
Name: the common license_expression for the group of files

the rest of the PURL is probably null

@DennisClark
Copy link
Member Author

Let's go with this:
Type: "local-files"
Namespace: the SCIO project as a slug
Name: generated UUID

the rest of the PURL is probably null

tdruez added a commit that referenced this issue Sep 15, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 15, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 15, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 15, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 15, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 18, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 18, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 18, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 18, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 18, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 18, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
tdruez added a commit that referenced this issue Sep 18, 2023
Signed-off-by: Thomas Druez <tdruez@nexb.com>
@tdruez
Copy link
Member

tdruez commented Sep 18, 2023

PR #927 merged in main.

The "local-files" packages are now created as part of the d2d pipeline.

@tdruez tdruez closed this as completed Sep 18, 2023
@DennisClark
Copy link
Member Author

If possible and practical, I think that this new feature should be expanded to include these pipelines (and possibly others):

  • scan_codebase
  • scan_single_package

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request high priority outputs This issue is related to one of the SCIO output files/ reporting analyze scan results
Projects
None yet
Development

No branches or pull requests

4 participants