-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature] Add new field to store document reference for persistent storage #1833
Comments
cc @nchelluri |
I say we do it for all verbs. I think it's a bit more extra work for helping not just from an evidence perspective but debuggability and maintainability perspective for GUAC. We can use the blobs whether or not they are API responses or documents or something else for help debugging when something goes wrong as we have the content saved. It also helps if we parse a document, API request/response, etc. differently in the future and need to re-parse the data. |
+1 on doing it for all verbs. We should not combine fields with separators. I think doing so will open us to issues when we'd need to escape the separator further down. |
+1 , this looks good on adding this to SourceInformation
As an aside, one of the issues we've run to before with one of our other projects that has a similar data pipeline is the reprocessing is the issue of duplicates (@mdeicas has been looking at this). |
oh yes, this is not the solution to "re-ingest in case of a failure". This is your database blows up and you have to start from scratch. |
Interesting, if there is lessons learned we can apply here that would be great. |
Sorry for the late response, but I think the lesson learned is that an ingestion pipeline may have been designed with an assumption of only ingesting documents once, or otherwise to be idempotent, and so it won't support re-ingesting documents to pick up new parsing features. It might be prudent to document this somewhere for clients? |
hmmm that is an interesting case. I added to our agenda to discuss |
Is your feature request related to a problem? Please describe.
With the new addition of the blob store, GUAC now can store documents that were collected and ingested into GUAC. This will allow users to find the original document (or re-ingest in the case of failure) if they need. In order to accurately determine the location of a document in the blob store, a document reference (blob store key) needs to be stored in the DB.
Currently, replacing the
origin
with this information, as we did in #1811 would result in loss of the original location from where the document originated. We should add a new field to preserve the existing functionality.This would require an update to the graphQL schema to add in a new field called
documentRef
that would be empty if the persistent blob store is not being used. We would also add a new field to theSourceInformation
struct inDocument
to contain the new field:We could either do this for all "verbs" or we can do it for only non-ephemeral documents like
hasSBOM
,hasSLSA
, andCertifyVEX
.Though, in the future, we may want to move to a more in-toto attestation-specific ingestion for verbs like
certifyBad
,certifyGood
,hasSourceAt
,pkgEquals
to name a few. So that we have evidence that something was attested by someone at this time. We again store this in the blob store for record keeping.Describe the solution you'd like
Update to the graphQL schema to add in a new field called
documentRef
that would be empty if the persistent blob store is not being used.Decision needs to be made if this should be done for all "verbs" or only a select few.
Describe alternatives you've considered
We could concatenate the strings together via a separator value (maybe something like https://github.com/guacsec/guac/blob/main/pkg/assembler/helpers/package.go#L27 so that it does not interfere in the future)?
For example:
mcr.microsoft.com/oss/kubernetes/kubectl@sha256:8035089a59a6f8577255f494c1ced250e1206667d8462869fc0deeca98d79427guac-empty-@@sha256:8534561615616161894984126517
That way we can split out the source in the future and get both the original source and the new blob store key back.
But following this method, we would lose the capability to query via
origin
and loss of functionality in GUAC.The text was updated successfully, but these errors were encountered: