Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a SARIF v2.2.0 Seed Schema by Enhancing the v2.1.0 Errata 01 Schema #580

Open
sthagen opened this issue May 20, 2023 · 2 comments
Open

Comments

@sthagen
Copy link
Contributor

sthagen commented May 20, 2023

When supporting with the regex pattern issue on the language object (#488) I noticed, that we had multiple definitions of the same concept (in part with slightly varying descriptions) inside our short and sweet 3333+ lines main schema file.

I propose to define types once with a sufficiently good description and reuse them where needed.

Maintaining duplicates (esp. with complex pattern constraints) is an unneeded complication and risk. Esp. guid and language (the latter which might receive a more complex pattern to better match the targeted language syntax) are best defined only once.

Taxonomy Examples

Just looking at language and guid.

Two language Definitions

Lines [2366 - 2371] in patched 2.1.0-errata-01 schema:

        "language": {
          "description": "The language of the messages emitted into the log file during this run (expressed as an ISO 639-1 two-letter lowercase culture code) and an optional region (expressed as an ISO 3166-1 two-letter uppercase subculture code associated with a country or region). The casing is recommended but not required (in order for this data to conform to RFC5646).",
          "type": "string",
          "default": "en-US",
          "pattern": "^[a-zA-Z]{2}(-[a-zA-Z]{2})?$"
        },

and lines [3076 - 3081] in patched 2.1.0-errata-01 schema:

        "language": {
          "description": "The language of the messages emitted into the log file during this run (expressed as an ISO 639-1 two-letter lowercase language code) and an optional region (expressed as an ISO 3166-1 two-letter uppercase subculture code associated with a country or region). The casing is recommended but not required (in order for this data to conform to RFC5646).",
          "type": "string",
          "default": "en-US",
          "pattern": "^[a-zA-Z]{2}(-[a-zA-Z]{2})?$"
        },

Nine guid Definitions

Lines [596 - 600] in patched 2.1.0-errata-01 schema:

        "guid": {
          "description": "A stable, unique identifier for this external properties object, in the form of a GUID.",
          "type": "string",
          "pattern": "^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-5][0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}$"
        },

and lines [783 - 787] in patched 2.1.0-errata-01 schema:

        "guid": {
          "description": "A stable, unique identifier for the external property file in the form of a GUID.",
          "type": "string",
          "pattern": "^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-5][0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}$"
        },

and lines [1036 - 1040] in patched 2.1.0-errata-01 schema:

        "guid": {
          "description": "A unique identifier for the reporting descriptor in the form of a GUID.",
          "type": "string",
          "pattern": "^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-5][0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}$"
        },

and lines [1978 - 1982] in patched 2.1.0-errata-01 schema:

        "guid": {
          "description": "A guid that uniquely identifies the descriptor.",
          "type": "string",
          "pattern": "^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-5][0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}$"
        },

and lines [2093 - 2097] in patched 2.1.0-errata-01 schema:

        "guid": {
          "description": "A stable, unique identifier for the result in the form of a GUID.",
          "type": "string",
          "pattern": "^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-5][0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}$"
        },

and lines [2606 - 2610] in patched 2.1.0-errata-01 schema:

        "guid": {
          "description": "A stable, unique identifier for this object's containing run object in the form of a GUID.",
          "type": "string",
          "pattern": "^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-5][0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}$"
        },

and lines [2718 - 2722] in patched 2.1.0-errata-01 schema:

        "guid": {
          "description": "A stable, unique identifier for the suprression in the form of a GUID.",
          "type": "string",
          "pattern": "^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-5][0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}$"
        },

and lines [2951 - 2955] in patched 2.1.0-errata-01 schema:

        "guid": {
          "description": "A unique identifier for the tool component in the form of a GUID.",
          "type": "string",
          "pattern": "^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-5][0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}$"
        },

and lines [3161 - 3165] in patched 2.1.0-errata-01 schema:

        "guid": {
          "description": "The 'guid' property of the referenced toolComponent.",
          "type": "string",
          "pattern": "^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-5][0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}$"
        },
@schlaman-ms
Copy link

Document location for issue:

Generic issue - no specific location.

@sthagen
Copy link
Contributor Author

sthagen commented Nov 8, 2023

This proposal has been implemented in five steps within #599:

  1. Added seed schema for SARIF v2.2 from upstream predecessor
  2. Normalized the SARIF v2.2 JSON Schema file
  3. Transformed schema to v2.2 based on draft/2020-12 of JSON Schema
  4. Refactored the language definition
  5. Refactored the guid definition

All steps produced a valid JSON file and the validation of a valid v2.1.0 example instance (sarif-v2.2/prose/edit/src/json-examples/examples_minimal-recommended-sarif-log-file-with-source-information_codeblock-json-1.sarif.json) succeeded / failed as expected and required:

After

  1. success
  2. fail with 2.1.0: '2.1.0' is not one of ['2.2']
  3. fail with 2.1.0: '2.1.0' is not one of ['2.2']
  4. fail with 2.1.0: '2.1.0' is not one of ['2.2']
  5. fail with 2.1.0: '2.1.0' is not one of ['2.2']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants