Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add comments for references and evidence type #50

Closed

Conversation

joshbuker
Copy link
Contributor

Opening this up for discussion

@oliverchang
Copy link
Contributor

Thanks for opening this @joshbuker !

As discussed in person, our goal with this schema was to keep as many fields machine readable as possible, and keep the core fields as minimal as possible (i.e. focus on the purpose of enabling vulnerability scanners and triage).

The "type" field was intended to provide this kind of context for references, with a consistent mapping to a text description for what they mean.

However, I recognize that different databases may want to track additional information for humans/triage purposes, e.g. @kurtseifried also suggested adding timestamps here as part of #25. Our mechanism for doing this was database_specific, but it wasn't very generalized.

Perhaps an alternative here might be to extend the places where database_specific could apply. i.e. it can go in any part of the OSV schema, rather than the explicitly listed locations. i.e. this might look like:

{
  "references": [
    {
      "type": "WEB",
      "url": "https://blah.com",
      "database_specific": {
        "timestamp": "....",
        "comment": "GSD specific comment"
      }
    }
  ]
}

We can also build out a more well specified way to define the database_specific specs from all the databases. e.g. linking to a JSON schema/spec page somewhere that describes all the fields that have been extended by that database.

This mechanism doesn't exclude any of these fields from being added as a core field in the future -- if enough databases use similar fields it makes a strong case to include this as a core field.

@rsc @chrisbloom7 thoughts?

@kurtseifried
Copy link
Contributor

I think if we're going to have a tag that then contains JSON and is scattered all over the place in order to allow new/different data we should look at how to make this less painful. Some thoughts:

Assume we keep the name database_specific for now. I suggest we usually add some standard metadata to it, e.g.

data_format (e.g. OSV, GSD, CVE, CSAF, whatever)
data_version (e.g. 1.2.3, 5.0, etc.)

So people know what schema in turn to look at rather than having to write magic parsers/etc.

We can also use it to ascertain if stuff should be included in the schema, e.g. if we see thousands of timestamp tags then maybe we should make that part of the official schema, e.g. how HTTP headers work (if enough people do it, it's a de facto standard).

@chrisbloom7
Copy link
Collaborator

chrisbloom7 commented Jun 28, 2022

While it might be nice to augment the machine-parsable data with info relevant to human readers, ultimately that's what the references are for - "go here for more info". I have been viewing the existing database_specific fields as info that is necessary for the publishing system to track and manage the vulnerability. So far we have resisted publishing parsing documentation for those fields because the relevant info should be represented by the core schema (already documented) and the extra info is only useful internally. It is of course possible to dump any structured data in those fields, for humans or machines, but I would worry that if we start nesting schemas inside schemas to map database specific info then it might be easier for publishers to just fork the OSV schema and extend it with their own core fields leading to fractionalization of the standard - OSV, GSD-OSV, CVE-OSV, etc.

@joshbuker
Copy link
Contributor Author

An example of where a human focused optional description/comment for references would be useful is something like Log4Shell where there are massive amounts of references, a time pressure for the reader, and potentially non-security/dev folks consuming the ID.

The type field would help narrow down overall categories of links, and humans could manually review each reference to understand its relevance, but that approach would go counter to providing both machine-readable and human-readable interfaces (and at scale would also waste of lot of people-hours). In the same way that providing a human-centric description of the ID itself is valuable for humans but unnecessary from a machine point of view, I feel that the same is true for references. It's an optional field that can be added specifically with the intent of aiding humans viewing an ID, and generally will be ignored by automation.

All that being said, it might be that we can expand the type field to have enough nuance to provide both machine readability and a "good enough" experience for humans by attaching descriptions to the various types. It's too early to say one way or the other I think.

@joshbuker
Copy link
Contributor Author

Having thought this through more over the last week or two, I think we look at being very quick to iterate on adding new categories/reference types, and try to avoid adding a prose field for references until the actual demand from folks using it comes up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants