Description
Hey OSV team, thanks for your great work!
We're currently looking at how we can correlate vulnerabilities that describe the same thing.
As per specification, OSV has the aliases
field for this:
The
aliases
field gives a list of IDs of the same vulnerability in other databases, in the form of the id field.
At least in my interpretation, aliasing is a bidirectional relationship that also applies transitively.
If X
aliases Y
and Z
, Y
should also alias X
, and Y
should also alias Z
. If they all describe the same thing, that should be a valid assumption.
However, in reality, we see that many vulnerability databases (ab-)use the OSV schema to publish advisories. In my understanding, a vulnerability would describe one defect, and that one defect only. Whereas an advisory can potentially refer to multiple vulnerabilities (as in "we patched all these vulnerabilities in version 1.2.3 of our package"). This appears to be a common thing for at least the Go, Rust, and (especially) Debian ecosystems in the OSV database. There are most likely more, but these have been the most obvious candidates to us.
For example, GO-2022-0586 presumably aliases four CVEs and four GHSAs:
- CVE-2022-26945
- CVE-2022-30321
- CVE-2022-30322
- CVE-2022-30323
- GHSA-28r2-q6m8-9hpx
- GHSA-cjr4-fv6c-f3mv
- GHSA-fcgg-rvwg-jv58
- GHSA-x24g-9w7v-vprh
These are four different vulnerabilities, with different CWEs, descriptions and severities. CVEs and GHSAs actually alias each other in pairs of two (GHSA-28r2-q6m8-9hpx
aliases CVE-2022-30323
, but not CVE-2022-26945
etc.):
In cases of advisories like this, the "aliases" are neither bidirectional (GHSA-28r2-q6m8-9hpx
isn't really the same as GO-2022-0586
), nor are they fully transitive (CVE-2022-26945
is not the same as CVE-2022-30323
). If one was to attempt to find all aliases for GHSA-28r2-q6m8-9hpx
here, traversing this graph would yield wrong results.
The Debian ecosystem especially has many of these scenarios, where one DLA can refer to loads of CVEs:
I have the feeling that OSV entries of type "advisory" (maybe such a distinction would be good to have?) should instead use the related
field. Although I imagine this will be hard to enforce, and even harder to apply in an automated fashion.
Am I understanding aliasing in OSV correctly? Is this a data quality issue with the databases that use the OSV schema? Is there anything we can do about it?