Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency false positives from deps.dev collector #1357

Closed
mdeicas opened this issue Oct 5, 2023 · 2 comments · Fixed by #1584
Closed

Dependency false positives from deps.dev collector #1357

mdeicas opened this issue Oct 5, 2023 · 2 comments · Fixed by #1584
Assignees
Labels
long-term Things for the future

Comments

@mdeicas
Copy link
Collaborator

mdeicas commented Oct 5, 2023

Guac, through the deps.dev collector, currently pulls in numerous dependency false positives.

The Problem

The deps.dev collector attempts to pull in the dependencies of each package it learns about. The result is that Guac ingests all of the packages in any dependency tree. In these trees, there can be many different versions of the same package, and as a result all of these packages are ingested as dependencies into Guac.

This is, generally, not how executables are built. Compilers employ dependency resolution algorithms to (in most cases) select a single version of each package from each dependency tree.

As a result, Guac instances can contain (as dependencies) numerous versions of packages that are not actually in the built artifact. Furthermore, different versions of a package can themselves have different dependencies, so entire packages, not just package versions, in Guac can be false positives.

Aside: Wider Perspective

The underlying problem is that dependency resolution is done in specific contexts: of a specific root package, of the dependency resolver, of a specific architecture, etc. The deps.dev data takes into account what probably is the most important of these contexts (the root package and dependency resolver), but there may be cases where it differs from the “ground truth” build (docs).

Ultimately, data from deps.dev will be tied to an actor tree, and as such, it is OK if some dependency false positives are ingested. However, the current behavior of Guac / deps.dev collector, which is to ignore the context of which package is top-level, leads to such a large number of false positives that it is worth handling now, before actor trees are introduced.

Examples

Small and Representative:

  • Create a Go module that imports github.com/google/go-cmp@v0.5.9 and github.com/google/wire@v0.5.0, both the latest versions. Then, create an SBOM for that module by running Syft on the directory, and ingest it into Guac.
  • The deps.dev collector will pull in the dependencies of github.com/google/wire@v0.5.0 via the deps.dev collector, which include github.com/google/go-cmp@v0.2.0
  • The Guac instance now incorrectly maintains that both versions of github.com/google/go-cmp are dependencies of the go module. In reality, the go compiler will only use version v0.50.

Larger Example:

Ideally, ingesting a single complete SBOM should not lead to any new dependencies being pulled into Guac through the deps.dev collector. However, this is not the current behavior of Guac. For this example, we can assume SBOMs generated from Go executables are complete (because the compiler inserts into the executable a list of the build dependencies).

  • Run Syft on the github.com/golangci/golangci-lint binary and ingest the SBOM into Guac.
  • After running the deps.dev collector on the Guac instance, roughly 100 new versions of already existing packages appear and roughly 100 completely new packages appear.
  • This is in contrast to the roughly 170 dependencies originally reported by the SBOM and present in Guac before the running deps.dev collector

This example shows that there can be as many dependency false positives as true positives.

Solutions

The motivation for the following changes is to ingest deps.dev dependencies only when there would be few or no dependency false positives. Note that the deps.dev collector should still run in polling mode by default to pull in source information.

Add a flag to disable pulling in dependencies from deps.dev (#1359)

#### Label Packages as suitable for collecting deps.dev dependencies (#1358)

@mdeicas
Copy link
Collaborator Author

mdeicas commented Oct 10, 2023

After discussion with maintainers, the idea is to treat the dependency information from deps.dev as an SBOM through the HasSBOM node. This depends on the changes in #1367.

@lumjjb lumjjb added the long-term Things for the future label Nov 10, 2023
@pxp928 pxp928 self-assigned this Dec 13, 2023
@pxp928
Copy link
Collaborator

pxp928 commented Dec 13, 2023

I will update the deps.dev collector to utilize hasSBOM node. With the completion of #1549 and #1550

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
long-term Things for the future
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants