Skip to content

Aliases, and how they are supposed to be used #888

Closed
@nscuro

Description

@nscuro

Hey OSV team, thanks for your great work!

We're currently looking at how we can correlate vulnerabilities that describe the same thing.

As per specification, OSV has the aliases field for this:

The aliases field gives a list of IDs of the same vulnerability in other databases, in the form of the id field.

At least in my interpretation, aliasing is a bidirectional relationship that also applies transitively.
If X aliases Y and Z, Y should also alias X, and Y should also alias Z. If they all describe the same thing, that should be a valid assumption.

However, in reality, we see that many vulnerability databases (ab-)use the OSV schema to publish advisories. In my understanding, a vulnerability would describe one defect, and that one defect only. Whereas an advisory can potentially refer to multiple vulnerabilities (as in "we patched all these vulnerabilities in version 1.2.3 of our package"). This appears to be a common thing for at least the Go, Rust, and (especially) Debian ecosystems in the OSV database. There are most likely more, but these have been the most obvious candidates to us.

For example, GO-2022-0586 presumably aliases four CVEs and four GHSAs:

These are four different vulnerabilities, with different CWEs, descriptions and severities. CVEs and GHSAs actually alias each other in pairs of two (GHSA-28r2-q6m8-9hpx aliases CVE-2022-30323, but not CVE-2022-26945 etc.):

Aliases of GO-2022-0586

In cases of advisories like this, the "aliases" are neither bidirectional (GHSA-28r2-q6m8-9hpx isn't really the same as GO-2022-0586), nor are they fully transitive (CVE-2022-26945 is not the same as CVE-2022-30323). If one was to attempt to find all aliases for GHSA-28r2-q6m8-9hpx here, traversing this graph would yield wrong results.

The Debian ecosystem especially has many of these scenarios, where one DLA can refer to loads of CVEs:

image

I have the feeling that OSV entries of type "advisory" (maybe such a distinction would be good to have?) should instead use the related field. Although I imagine this will be hard to enforce, and even harder to apply in an automated fashion.

Am I understanding aliasing in OSV correctly? Is this a data quality issue with the databases that use the OSV schema? Is there anything we can do about it?

Metadata

Metadata

Assignees

No one assigned

    Labels

    backlogImportant but currently unprioritizeddata qualityIssues with data quality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions