Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve software integrity and verification documentation #602

Closed
wants to merge 3 commits into from

Conversation

jeff-schutt
Copy link
Collaborator

@jeff-schutt jeff-schutt commented Jan 15, 2024

Improves documentation in the SPDX 3 model to direct users towards existing support for package verification.

RE: #345 (comment)

CC: @goneall

Copy link
Member

@goneall goneall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion and one question for @zvr on spec parser support for the subheadings.

model/Software/Properties/contentIdentifier.md Outdated Show resolved Hide resolved
model/Software/Properties/contentIdentifier.md Outdated Show resolved Hide resolved
model/Software/Properties/contentIdentifier.md Outdated Show resolved Hide resolved
@kestewart
Copy link
Contributor

Bob and Dick to take a pass through this, per call today.

@rjb4standards
Copy link

I concur with Gary's proposed language. It may be worth noting that component ordering can effect OmniBor results. SBOM's have no implied ordering requirement and may not be easily "verifiable" using OmniBor due to this mismatch in ordering requirements.

@@ -4,11 +4,14 @@ SPDX-License-Identifier: Community-Spec-1.0

## Summary

Provides a place to record the canonical, unique, immutable identifier for each software artifact using the artifact's gitoid.
Used by SPDX producers to record the artifact’s gitoid: a canonical, unique, immutable identifier that can be used for software integrity verification.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously is not only a "gitoid", but any content identifier.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be more appropriate to include an identifier tag in the syntax, like IANA uses to qualify identifier types - similar to what SPDX does today : dns:www.foo.com SHA256:.......

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other content identifiers out there, as Alexios points out. Let's handle the broadening as a separate PR though, and move this forward. The original had gitoid in it, and is just restructured here.

@@ -4,11 +4,14 @@ SPDX-License-Identifier: Community-Spec-1.0

## Summary

Provides a place to record the canonical, unique, immutable identifier for each software artifact using the artifact's gitoid.
Used by SPDX producers to record the artifact’s gitoid: a canonical, unique, immutable identifier that can be used for software integrity verification.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Used by SPDX producers to record the artifact’s gitoid: a canonical, unique, immutable identifier that can be used for software integrity verification.
Used to record a canonical, unique, immutable identifier of the artifact, that can be used for verification its identity and integrity.

Provides a place to record the canonical, unique, immutable identifier for each software artifact using the artifact's gitoid.
Used by SPDX producers to record the artifact’s gitoid: a canonical, unique, immutable identifier that can be used for software integrity verification.

Used by SPDX consumers to verify the integrity of a software artifact they received.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Used by SPDX consumers to verify the integrity of a software artifact they received.

No need to have it separate for producers/consumers.


## Description

The contentIdentifier provides a canonical, unique, immutable artifact identifier for each software artifact. SPDX 3.0 describes software artifacts as Snippet, File, or Package Elements. The ContentIdentifier can be calculated for any software artifact and can be recorded for any of these SPDX 3.0 Elements using Omnibor, an attempt to standardize how software artifacts are identified independent of which programming language, version control system, build tool, package manager, or software distribution mechanism is in use.
### SPDX Producers
The contentIdentifier is a canonical, unique, immutable artifact identifier for each software artifact. The ContentIdentifier for any software artifact can be calculated and recorded in SPDX 3.0 Snippet, File, or Package Elements. For additional information, see [OmniBOR](https://omnibor.io): an attempt to standardize how software artifacts are identified independent of which programming language, version control system, build tool, package manager, or software distribution mechanism is in use.

The contentIdentifier is defined as the [Git Object Identifier](https://git-scm.com/book/en/v2/Git-Internals-Git-Objects) (gitoid) of type `blob` of the software artifact. The use of a git-based version control system is not necessary to calculate a contentIdentifier for any software artifact.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree that contentIdentifier has to be a gitoid. We should leave it for any kind of unique identifier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that not all content identifiers support use in verification. We would also need to somehow specify which content identifier is used (e.g. the enumeration of supported content identifiers).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That appears similar to how SPDX handles this today with the SHA* identifying tags accompanying a hash value in an SBOM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zvr - Let's move forward with this definition and open a separate issue to include better support for SWHID after RC2 - just trying to get RC2 released.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove gitoid and leave generic "unique, immutable artifact identifier" or we can mention both gitoid and swhid that we already know of.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not objecting just because I don't want SWHIDs to be forgotten:
The most important part is, given that we know there might be more than one format, modeling it as a single value (of type URI) will not work. There has to be a type defined somewhere (maybe with a vocabulary), etc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My recollection of the discussions on this topic from last year were that for v3.0 we would specify gitoid as the default but that the structures and definitions would be generalized (as @zvr is suggesting above) to enable addition of other possible types post v3.0 without breaking backward compatibility.

In other words, I agree that the definitions should be generalized and not locked to one approach.

jeff-schutt and others added 2 commits January 18, 2024 14:01
Signed-off-by: Jeff Schutt <jeff-schutt@users.noreply.github.com>

Co-authored-by: Gary O'Neall <gary@sourceauditor.com>
@goneall goneall added this to the 3.0-rc2 milestone Jan 23, 2024
Copy link
Member

@goneall goneall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the updated changes - all looks good

@bobmartin3000
Copy link

Agree that the changes addressed the concerns for RC2.

@@ -4,11 +4,14 @@ SPDX-License-Identifier: Community-Spec-1.0

## Summary

Provides a place to record the canonical, unique, immutable identifier for each software artifact using the artifact's gitoid.
Used by SPDX producers to record the artifact’s gitoid: a canonical, unique, immutable identifier that can be used for software integrity verification.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other content identifiers out there, as Alexios points out. Let's handle the broadening as a separate PR though, and move this forward. The original had gitoid in it, and is just restructured here.


## Description

The contentIdentifier provides a canonical, unique, immutable artifact identifier for each software artifact. SPDX 3.0 describes software artifacts as Snippet, File, or Package Elements. The ContentIdentifier can be calculated for any software artifact and can be recorded for any of these SPDX 3.0 Elements using Omnibor, an attempt to standardize how software artifacts are identified independent of which programming language, version control system, build tool, package manager, or software distribution mechanism is in use.
### SPDX Producers
The contentIdentifier is a canonical, unique, immutable artifact identifier for each software artifact. The ContentIdentifier for any software artifact can be calculated and recorded in SPDX 3.0 Snippet, File, or Package Elements. For additional information, see [OmniBOR](https://omnibor.io): an attempt to standardize how software artifacts are identified independent of which programming language, version control system, build tool, package manager, or software distribution mechanism is in use.

The contentIdentifier is defined as the [Git Object Identifier](https://git-scm.com/book/en/v2/Git-Internals-Git-Objects) (gitoid) of type `blob` of the software artifact. The use of a git-based version control system is not necessary to calculate a contentIdentifier for any software artifact.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not objecting just because I don't want SWHIDs to be forgotten:
The most important part is, given that we know there might be more than one format, modeling it as a single value (of type URI) will not work. There has to be a type defined somewhere (maybe with a vocabulary), etc.

@goneall
Copy link
Member

goneall commented Jan 24, 2024

I'm not objecting just because I don't want SWHIDs to be forgotten:
The most important part is, given that we know there might be more than one format, modeling it as a single value (of type URI) will not work. There has to be a type defined somewhere (maybe with a vocabulary), etc.

@zvr - Since the contentIdentifier property defined as a GitOID has been there for a while and making this field accept other formats and/or changing the model for verifiedUsing to a yet to be specified alternative will take some time - do you think we should delay RC2 to resolve this or should we merge this in and discuss as a post RC2 issue?

@zvr
Copy link
Member

zvr commented Jan 25, 2024

This should be resolved before RC2, since it most probably will result in changes in the model.
If it was only the description text, we could leave it.

@zvr
Copy link
Member

zvr commented Jan 26, 2024

In an effort to resolve the different discussions here, I've broken down the different parts and submitted a number of PRs:

and then, an option of two approaches:

Note that there is no mention of "preferred", "recommended", "suggested", etc. anywhere. Policy is something different than model definition and can be decided later -- and expressed with various conformance tests, profile definitions, etc.

@goneall , @jeff-schutt , @kestewart and everyone else, please review these new PRs.

@goneall
Copy link
Member

goneall commented Jan 26, 2024

In an effort to resolve the different discussions here, I've broken down the different parts and submitted a number of PRs:

and then, an option of two approaches:

Note that there is no mention of "preferred", "recommended", "suggested", etc. anywhere. Policy is something different than model definition and can be decided later -- and expressed with various conformance tests, profile definitions, etc.

@goneall , @jeff-schutt , @kestewart and everyone else, please review these new PRs.

Thanks @zvr - having the specific pull requests definitely makes it easier to compare / review / decide.

I'm OK with either of your two choices in the model changes. I'm also OK with the current state, but you do raise a good point about contentIdentifier as a name implies a broader range than gitoid, which leans me towards changing the property name as the easiest path forward.

I do like PR #602 as a better starting point for the documentation for reasons noted in the review comments. We should probably decide on the 3 choices of the model -1) keep as is; 2) PR #610 rename the property or 3) PR #611 generalize contentIdentifier.

@jeff-schutt @zvr Since I'm OK with all three approaches, perhaps you can connect and decide which works OK for both of you.

@goneall goneall mentioned this pull request Jan 30, 2024
32 tasks
that maps data of arbitrary size to a bit string (the hash) and is a one-way function, that is,
a function which is practically infeasible to invert. This is commonly used for integrity checking of data.

The recommended method to verify the integrity of `SoftwareArtifacts` Elements (including `Files`, `Snippets`, and `Packages`) is to use the SoftwareArtifact’s `contentIdentifier` property.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that these sort of case specific details should be placed here.
This is the general structure to be used across SPDX.
Placing case specific variations here is likely to lead to an eventual laundry list of such specializations.

I propose that it would be more appropriate to add this note to the markdown files for the case specific structures (SoftwareArtifact, File, Package, Snippet).

@@ -13,6 +13,8 @@ of a specific Element that correlates to the data in this SPDX document. This id
a recipient to determine if anything in the original Element has been changed and eliminates
confusion over which version or modification of a specific Element is referenced.

The recommended method to verify the integrity of `SoftwareArtifacts` Elements (including `Files`, `Snippets`, and `Packages`) is to use the SoftwareArtifact’s `contentIdentifier` property.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that these sort of case specific details should be placed here.
This is the general structure to be used across SPDX.
Placing case specific variations here is likely to lead to an eventual laundry list of such specializations.

I propose that it would be more appropriate to add this note to the markdown files for the case specific structures (SoftwareArtifact, File, Package, Snippet).

@@ -10,6 +10,8 @@ Provides an IntegrityMethod with which the integrity of an Element can be assert

VerifiedUsing provides an IntegrityMethod with which the integrity of an Element can be asserted.

The recommended method to verify the integrity of `SoftwareArtifacts` Elements (including `Files`, `Snippets`, and `Packages`) is to use the SoftwareArtifact’s `contentIdentifier` property.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that these sort of case specific details should be placed here.
This is the general structure to be used across SPDX.
Placing case specific variations here is likely to lead to an eventual laundry list of such specializations.

I propose that it would be more appropriate to add this note to the markdown files for the case specific structures (SoftwareArtifact, File, Package, Snippet).

@goneall
Copy link
Member

goneall commented Feb 15, 2024

This is a bit stale with the renaming to gitoid - retargeting to post RC2 for discussion

@goneall goneall modified the milestones: 3.0-rc2, 3.0 Feb 15, 2024
@goneall
Copy link
Member

goneall commented Apr 3, 2024

This is now quite stale and there are other PR's which overlap.

I'm going to go ahead and close this PR.

@jeff-schutt @zvr - Please feel free to open a fresh PR if you think the documentation needs to be updated.

@goneall goneall closed this Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants