-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature] IsDeployed #1777
Comments
Nice writeup! I've always wondered how this fits into guac / how people are solving this |
Thanks for the ping @sozercan . While I think the gist of your comment makes sense, I just need to mention that Trivy KBOM does not include information about workloads, but rather information about the Kubernetes cluster components. |
We had a quick discussion about this with @ridhoq during the community meeting. Overall, this is a great feature that would add value. There are a few points that need discussion:
|
@ridhoq and I have been discussing this offline. Our view is that we would like to have one single place where we can store all information related to the supply chain (of containers), not only the composition. The reason being if that information is stored in two (or more) separate systems, joining the information becomes more problematic.
Hope this helps steer the conversation further. |
New Deployment attestation being developed: in-toto/attestation#341 Will need to determine if this fits into this discussion or not. |
Is your feature request related to a problem? Please describe.
The GUAC ontology represents a graph of software artifacts, actors, and actions taken by actors. One of the most important actions that an actor can perform on a software artifact is deploying it to a production environment. Software artifacts running in production are exposed to users, either directly or indirectly, and thus vulnerable to attacks by malicious actors. When a vulnerability is present or detected in a software artifact running in production, software operators need to know the location of the vulnerable artifact, so they can either mitigate the risk or attest that the artifact is not actually vulnerable.
Consider an organization that uses Kubernetes clusters to deploy container artifacts. The organization has many teams that manage their own clusters, increasing the attack surface. Suppose a new critical vulnerability affects many of the running container images. The organization can use GUAC to query collected SBOMs and identify the affected images and registries. However, they cannot query which images are running on which clusters. A monitoring system based on GUAC may report false positives, as not all vulnerable images are in use. If GUAC could show the cluster location of the images (e.g.,
Deployment
orDaemonSet
), the teams could mitigate the vulnerability more effectively.Describe the solution you'd like
IsDeployed
is a new node that can have edges toPackage
nodes to denote when a particular package has been deployed. Generally, only top level packages with types such asoci
orswid
will have edges toIsDeployed
.Here's what the GraphQL schema might look like:
The associated
Query
andMutation
is not proposed here but would follow similar patterns to the rest of the schema.This proposal centers on two key timestamps:
deployedSince
(required) for when a package becomes active, anddeployedUntil
(optional) for when it's no longer active. An unpopulateddeployedUntil
implies ongoing activity.IsDeployed
will also feature aDeploymentMetadata
list for detailed deployment descriptions, allowing users to tailor metadata to their specific deployment environments.One proposal for how to collect this data would be to receive cdevents from the deployment system that are pushed to a collector via NATS. Another proposal for Kubernetes is to create an admission webhook that can send events about when a deployment occurred. The collection side will have many different options and will ultimately depend on the deployment target for the package.
A major consideration for this proposal is how to effectively prevent data growth. As deployments happen frequently in many organizations, the
IsDeployed
nodes will grow indefinitely. There should be a mechanism for pruning stale data or archiving data for future use. While this data growth issue exists for GUAC broadly, it will be particularly problematic for in this case. Please feel free to provide feedback on how this can be addressed.Describe alternatives you've considered
An alternative is to simply not include this in GUAC. Packages that are actively deployed could be tracked by another system altogether. If such a system existed, GUAC can be used to join package and vulnerability data with this system to determine if a user needs to take action on their environments. However, this joining could also be expensive and prone to failure. Since GUAC's charter is to create a graph of the entire software supply chain, including data about where a package is deployed seems fitting and necessary to make actionable decisions.
Additional context
This is still not a comprehensive proposal so any feedback is welcome! If there is positive feedback about this proposal, a separate document can be written to delve into this more in depth.
The text was updated successfully, but these errors were encountered: