Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Querying a package for direct vulnerabilities (experimental query API) #1044

Open
ctron opened this issue Jul 11, 2023 · 14 comments · Fixed by #1073
Open

[feature] Querying a package for direct vulnerabilities (experimental query API) #1044

ctron opened this issue Jul 11, 2023 · 14 comments · Fixed by #1073
Labels
enhancement New feature or request
Milestone

Comments

@ctron
Copy link
Contributor

ctron commented Jul 11, 2023

Question

I am trying out guac, approaching it with I would consider a simple and primary use case: getting known vulnerabilities for a set of packages.

I did ingest some SBOMs, ran certifiers, and then try to replicate the query from the demo: https://github.com/guacsec/guac/blob/main/demo/workflow/queries.gql

Modified example query
query CertifyVuln {
  CertifyVuln(
    certifyVulnSpec: {
      package: {
        type: "npm"
      }
    }
  ) {
    id
    package {
      id
      type
      namespaces {
        id
        namespace
        names {
          id
          name
          versions {
            id
            version
            qualifiers {
              key
              value
            }
            subpath
          }
        }
      }
    }
    vulnerability {
      __typename
      ... on CVE {
        id
        year
        cveId
      }
      ... on OSV {
        id
        osvId
      }
      ... on GHSA {
        id
        ghsaId
      }
      ... on NoVuln {
        id
      }
    }
    metadata {
      dbUri
      dbVersion
      scannerUri
      scannerVersion
      timeScanned
      origin
      collector
    }
  }
}

Now that spits out all kind of entries, like this:

        "vulnerability": {
          "__typename": "NoVuln",
          "id": "1"
        },
        "metadata": {
          "dbUri": "",
          "dbVersion": "",
          "scannerUri": "osv.dev",
          "scannerVersion": "0.0.14",
          "timeScanned": "2023-07-11T08:47:42.866726647Z",
          "origin": "guac",
          "collector": "guac"
        }

Which to my understanding tells me that there is no vulnerability known for this package.

However, I don't see any way to ask for "vulnerable packages only"? I am pretty sure I am missing something obvious, but what?

EDIT (pxp928):

This issue is changed from a question to a feature request. Currently, this is not supported but an experimental query API (that is implemented in the inmem backend) that returns the direct vulnerabilities with a filter on the type of vulnerability (cve, ghsa, osv, or novuln) would be a good starting point.

@ctron ctron added the question Further information is requested label Jul 11, 2023
@pxp928
Copy link
Collaborator

pxp928 commented Jul 11, 2023

Hello @ctron, currently there is no way to filter the vulnerabilities via the graphQL query in the playground. The best option, for now, is to use the vulnerability CLI to determine the vulnerabilities for a package or if there is a path between a certain vulnerability and a package.

Our future plans are to expand the graphQL API to include some of these popular queries so that the information can be obtained directly. We are currently making a list of the common use cases people would like to see and will work to make it easier to quickly obtain such information directly via the API.

@lumjjb
Copy link
Contributor

lumjjb commented Jul 11, 2023

getting known vulnerabilities for a set of packages.

Quick clarification: is this the vulnerability for each package or also including transitive vulnerabilities? The earlier should be simple if we add a filter in the query API, the later, would be along what @pxp928 said.

Adding on to @pxp928 's comment, this is something that we're looking towards, the design of these APIs will end up being influenced by the backends that are implemented.

I think having an experimental API implementation on the inmem backend would be great :). Our current engineering time is towards the persistent ArangoDB backend, which would be a pre-requisite for efficient queries with a persistent backend, but if there's a PR for the experimental API, we'd probably be accepting to it.

@ctron
Copy link
Contributor Author

ctron commented Jul 12, 2023

Hello @ctron, currently there is no way to filter the vulnerabilities via the graphQL query in the playground. The best option, for now, is to use the vulnerability CLI to determine the vulnerabilities for a package or if there is a path between a certain vulnerability and a package.

That's unfortunate. Using the CLI isn't a real option, as I want to use this from a Rust application. And forking out doesn't seem to be a good approach.

Quick clarification: is this the vulnerability for each package or also including transitive vulnerabilities? The earlier should be simple if we add a filter in the query API, the later, would be along what @pxp928 said.

I know there are different opinions on that. I personally am not convinced that transitive dependencies is a actually a thing. It is in software development, but not when running stuff. Transitive dependencies can have ranges, and be optional. During the build all of that will be materialized, and maybe even replaced with patched versions. You basically end up with a flat structure. I get the idea that it might be helpful to figure out "why" that dependency ended up in a container image. So transitive dependency information feels more like debugging information for the build.

Back to your question: my primary use case right now is to figure out which (direct) packages are vulnerable. Something like:

fn check(packages: Vec<PackageUrl>) -> HashMap<PackageUrl, Vec<Vulnerability>>;

Indeed it may (later on, for others) be interesting to consider transient dependencies for this too.

@pxp928
Copy link
Collaborator

pxp928 commented Jul 12, 2023

Back to your question: my primary use case right now is to figure out which (direct) packages are vulnerable. Something like:
fn check(packages: Vec<PackageUrl>) -> HashMap<PackageUrl, Vec<Vulnerability>>;
Indeed it may (later on, for others) be interesting to consider transient dependencies for this too.

As @lumjjb said, it would be a good start to have an experimental query API (that is implemented in the inmem backend) that returns the direct vulnerabilities with a filter on the type of vulnerability (cve, ghsa, osv, or novuln) for the time being. That should meet your needs. Would you be interested in opening a PR for it?

@ctron
Copy link
Contributor Author

ctron commented Jul 12, 2023

Interested yes, capable most likely no. With Rust of Java that might be a different story. I peeked at the code, but failed get a basic understanding how this all works together.

@dejanb
Copy link
Contributor

dejanb commented Jul 12, 2023

I could give it a try

@pxp928 pxp928 changed the title [question] Querying a package for vulnerabilities [feature] Querying a package for direct vulnerabilities (experimental query API) Jul 12, 2023
@pxp928
Copy link
Collaborator

pxp928 commented Jul 12, 2023

Changed the issue from a question to a feature request so we can use it to track.

@pxp928 pxp928 added enhancement New feature or request and removed question Further information is requested labels Jul 12, 2023
@mihaimaruseac
Copy link
Collaborator

In the absence of an API that returns just the vulnerable nodes, you can also filter the current output back in the Rust app. Sure, it's more work, but that's what we do in the CLI tool too, until we get the custom GQL API.

@ctron
Copy link
Contributor Author

ctron commented Jul 12, 2023

In the absence of an API that returns just the vulnerable nodes, you can also filter the current output back in the Rust app. Sure, it's more work, but that's what we do in the CLI tool too, until we get the custom GQL API.

However that feels like a very bad approach. Because basically that means that if I want to check for affected packages, I need to download all the package information and then do the work myself. I understand, its could be a workaround (and we did that) but I think that issue should really be solved. Thanks @dejanb for helping out.

@mihaimaruseac
Copy link
Collaborator

I completely agree that it is only a workaround as we currently only have low-level GraphQL APIs. We are currently looking at usage patterns to guide us in developing the higher level API that would return only the data that is needed.

@lumjjb
Copy link
Contributor

lumjjb commented Jul 14, 2023

@dejanb @ctron i may have a use case to work together on this as well (similar around the vulns but also want to provide the VEX statements). We have some work going on that may make this a lot simpler in the coming weeks from #966. Maybe we can chat and figure out a very rough version that works for 80% of the cases. Will you be able to make it to the Office Hours next week (Friday 2-3pm ET), or the community meeting the day before?

@dejanb
Copy link
Contributor

dejanb commented Jul 17, 2023

@lumjjb Great ... Let's plan it for the community meeting ... I'll polish requirements from our side and do some investigative work until then

@dejanb
Copy link
Contributor

dejanb commented Jul 19, 2023

I played with this in #1073 ... Let me know if you think an approach like this could be a way to solve this particular requirement.

@kodiakhq kodiakhq bot closed this as completed in #1073 Jul 19, 2023
@lumjjb
Copy link
Contributor

lumjjb commented Jul 20, 2023

Hmm, checking in here, i think this may not be fully resolved? so reopening

@lumjjb lumjjb reopened this Jul 20, 2023
@lumjjb lumjjb added this to the GUAC v0.2 milestone Jul 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
5 participants