-
Notifications
You must be signed in to change notification settings - Fork 180
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Document the properties of a high quality OSV record
- Loading branch information
1 parent
2e1117e
commit 12cd78b
Showing
2 changed files
with
120 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
# Properties of a High Quality OSV Record | ||
|
||
## Version | ||
|
||
1.0.0 (SEMVER) | ||
|
||
## Purpose | ||
|
||
Describe the “good enough” OSV record that will be imported by OSV.dev | ||
|
||
### Out of scope | ||
|
||
This does not discuss the problem of record bit rot over time, after initial successful import. The problem of continuous revalidation and treatment of records that have been successfully imported will be dealt with separately in . | ||
|
||
Deferred to a future iteration: validating the existence of vulnerable functions in the `ecosystem_specific` field, if supplied. | ||
|
||
## Audience | ||
|
||
1. Current and aspiring OSV record producers | ||
2. Downstream OSV.dev record consumers | ||
|
||
## Rationale | ||
|
||
OSV.dev seeks to be an comprehensive, accurate and timely database of known vulnerabilities (that is highly automation friendly). In order to meet this accuracy goal, a quality bar needs to be both defined and sustainably enforced. | ||
|
||
## Properties of a High Quality OSV Record | ||
|
||
### Valid | ||
|
||
As a prerequisite, it is assumed that a record passes [JSON Schema validation](#appendix-a-osv-schema-validation) for the version of the OSV Schema it declares itself to comply with in the `schema_version` field, or 1.0.0 if it does not. | ||
|
||
### Precise | ||
|
||
A high quality OSV record allows a consumer of that record to be able to answer the following questions in an **automated** way, at scale: | ||
|
||
* “Does this vulnerability, as described, impact me? | ||
* “What version do I need to upgrade to for it not to impact me?” | ||
|
||
The definition of “impact” will vary depending on how fine-grained the information available is (i.e. package-level or symbol-level for software library packages). Package-level precision is the minimum standard. | ||
|
||
#### Properties | ||
|
||
* for version and commit ranges | ||
* `affected[]`.`ranges[]`.`introduced` is defined | ||
* prefer `affected[]`.`ranges[]`.`fixed` over `affected[]`.`ranges[]`.`last_affected` | ||
* this minimizes false negatives | ||
* distinct ranges for `introduced..fixed` and/or `introduced..last_affected` *(i.e. introduced and fixed commits can't be the same)* | ||
* values in `introduced` are before/less than `fixed`/`last_affected` | ||
* for version (`ECOSYSTEM` and `SEMVER`) ranges | ||
* the versions exist in the specific package ecosystem | ||
* for commit (`GIT`) ranges | ||
* the commits exist in the specified `repo` *(i.e. they are not from another GitHub fork)* | ||
* the `package.ecosystem`, and a unique `identifier` prefix for it, are defined in the OSV Schema | ||
* the `package.name` exists within the defined `package.ecosystem, and is canonically encoded for unambiguity *(i.e. normalized)* | ||
* Package URLs in the `package.url` field in conform to the [specification](https://github.com/package-url/purl-spec) | ||
* `reference` URLs return a 2xx or 3xx response | ||
|
||
### Identifiable | ||
|
||
#### Properties | ||
|
||
* Where relevant, an `alias` to the equivalent CVE record is present | ||
* Where an OSV record consolidates multiple vulnerabilities in another ecosystem (or universe), multiple `related` identifiers are present | ||
|
||
## Examples | ||
|
||
* [GO-2024-2687](https://api.osv.dev/v1/vulns/GO-2024-2687) | ||
* Has `introduced` and `fixed` versions | ||
* Has an alias to a CVE record ID | ||
* Has a purl | ||
* [OSV-2024-98](https://api.osv.dev/v1/vulns/OSV-2024-98) | ||
* Has `introduced` and `fixed` commits | ||
* commits exist in repo | ||
* [DSA-5678-1](https://api.osv.dev/v1/vulns/DSA-5678-1) | ||
* Has `introduced` and `fixed` versions | ||
* Has multiple `related` CVE record IDs | ||
|
||
## Appendix A: OSV Schema validation | ||
|
||
(As at version 1.6.3, generated by Gemini) | ||
|
||
**Top-Level Information:** | ||
|
||
* **id:** A unique string identifier for the vulnerability. | ||
* **modified:** A timestamp (in a specific format) indicating when the vulnerability information was last updated. | ||
|
||
**Optional, but validated when present:** | ||
|
||
* **schema\_version:** A string specifying the version of the schema being used. | ||
* **published/withdrawn:** Timestamps for when the vulnerability was published or withdrawn. | ||
* **aliases/related:** Arrays of strings for alternate identifiers or related vulnerabilities. | ||
* **summary/details:** String descriptions of the vulnerability. | ||
* **severity:** An array of objects detailing the severity using different scoring systems (e.g., CVSS v2, v3, or v4), if available. | ||
* **affected:** An array of objects describing which packages are affected, including details like: | ||
* **package:** The ecosystem (e.g., npm, PyPI), name, and Package URL (PURL) of the affected package. | ||
* **severity:** Severity for the specific package (if different from the overall severity). | ||
* **ranges:** Information on the affected version ranges, commit ranges, or ecosystem-specific identifiers. | ||
* **versions:** A list of specific affected versions. | ||
* **ecosystem\_specific/database\_specific:** Additional data specific to the package ecosystem or the vulnerability database. | ||
* **references:** An array of objects providing URLs to external resources about the vulnerability, categorized by type (e.g., advisory, article, discussion). | ||
* **credits:** An array of objects giving credit to individuals or organizations involved in discovering, reporting, or fixing the vulnerability. | ||
* **database\_specific:** A flexible object for any extra information specific to the database using this schema. | ||
|
||
**Additional Validation Rules:** | ||
|
||
* **timestamp:** A custom definition that ensures timestamps adhere to a specific date-time format (e.g., "2023-11-15T12:34:56Z"). | ||
* **additionalProperties: false:** This prevents any extra properties from being added to the JSON object beyond those defined in the schema. | ||
* **Specific Requirements in `affected` Array: | ||
* There are conditional validations based on the `type` of range, ensuring the correct properties are present (e.g., `repo` is required when `type` is `GIT`). | ||
* A logical check ensures that if `last_affected` is specified in `events`, then `fixed` cannot be present in the same `events` array. | ||
|
||
**Overall:** | ||
|
||
This schema enforces a consistent and detailed structure for representing open source vulnerabilities, including information about affected packages, severity assessments, references, and credits. It helps ensure that the data is accurate and comprehensive while remaining flexible enough to accommodate various package ecosystems and additional data specific to the database using the schema. |