Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conflicting property names prevents compation #651

Closed
JPEWdev opened this issue Feb 22, 2024 · 8 comments
Closed

Conflicting property names prevents compation #651

JPEWdev opened this issue Feb 22, 2024 · 8 comments
Milestone

Comments

@JPEWdev
Copy link
Contributor

JPEWdev commented Feb 22, 2024

There are a few places in the model where multiple namespaces define the same property. While RDF can handle this fine because each property path is fully qualified, it causes problems when attempting to compact a document, since the context can't map both property paths to the same non-namespaced name. The current ones I've found are:

sensitivePersonalInformation defined in both ai and dataset (preferred)
locator defined in both security and core (preferred)
contentType defined in both software and core (preferred)

The namespace listed as "preferred" is the one that currently "wins" in the context, meaning the property in the other namespace ends up requiring a prefix (e.g. "security:locator" is required to be used instead of "locator")

@goneall
Copy link
Member

goneall commented Feb 24, 2024

This is a bit of a challenge - I know in the case of security, it was a conscious decision to use an existing property name since that property name was used in a different standard.

Can scoped contexts help here? It would make processing more complicated, however.

A few other possible solutions:

  1. Prefix all properties with the namespace (e.g. security:locator)
  2. Somehow generate non-conflicting property names (e.g. securityLocator)
  3. Have a policy where we guarantee unique property names across all profiles

@JPEWdev
Copy link
Contributor Author

JPEWdev commented Feb 24, 2024

This is a bit of a challenge - I know in the case of security, it was a conscious decision to use an existing property name since that property name was used in a different standard.

Can scoped contexts help here? It would make processing more complicated, however.

I would rather avoid that as you point out it makes the processing more complicated.

A few other possible solutions:

1. Prefix all properties with the namespace (e.g. `security:locator`)

FWIW, this is the "do nothing" option because with the current context, this is what you get when you compact. If you just use locator in security today, it's expanded to core:locator, which is wrong (although I suspect no one noticed because we don't yet have a SHACL model that can validate).

2. Somehow generate non-conflicting property names (e.g. `securityLocator`)

3. Have a policy where we guarantee unique property names across all profiles

I'm not a SHACL expert, but would option 4 be to use core:locator as the property in the security profile? AFAICT, it means the same thing in both places (same with the other 2 I found), but I'm not sure if classes need unique property paths or not.

@JPEWdev
Copy link
Contributor Author

JPEWdev commented Mar 1, 2024

This is a bit of a challenge - I know in the case of security, it was a conscious decision to use an existing property name since that property name was used in a different standard.

Can scoped contexts help here? It would make processing more complicated, however.

After some thought, I don't think scoped contexts will actually work because of inheritance: You can't know what derived classes might exist, and they would all need the correct context applied or it won't work.

A few other possible solutions:

1. Prefix all properties with the namespace (e.g. `security:locator`)

Given all the examples in this repo, I don't think people actually want to write SPDX this way; I don't at least.

2. Somehow generate non-conflicting property names (e.g. `securityLocator`)

While this is possible, it's actually just a worse version of 3, because either way the serialized property is not going to be named what is in the Markdown files. If the tool generates a different name this is confusing for the end users as the property doesn't match the documentation; much better to have us choose a unique name so it's clear to users what is going on.

3. Have a policy where we guarantee unique property names across all profiles

This is my preference

@goneall
Copy link
Member

goneall commented Mar 2, 2024

  1. Have a policy where we guarantee unique property names across all profiles

This would prevent us from using properties external to the SPDX namespace - solvable by have a policy where we only use SPDX namespaced properties.

@jeff-schutt - I recall we made a concious decision to use a duplicate namespace in one of our security meetings (it was probably my suggestion) to align the property name with an external standard. What do you think about renaming those properties to be unique?

@zvr @sbarnum - thoughts?

@goneall goneall added this to the 3.0-rc3 milestone Mar 4, 2024
@bact
Copy link
Contributor

bact commented Mar 5, 2024

For sensitivePersonalInformation in AI and Dataset Profiles, the difference is

  • for Dataset, it means the Dataset HAS sensitive info
  • for AI, it means the AI (model) USED sensitive info during training (but not necessarily has that info remains in the model)

They are related, but not similar.

Proposed PR #656 to rename:

  • AIPackage: sensitivePersonalInformation -> useSensitivePersonalInformation
  • Dataset: sensitivePersonalInformation -> hasSensitivePersonalInformation

@goneall
Copy link
Member

goneall commented Mar 5, 2024

Discussed on the tech call - agreed to always use namespace prefixes for all profiles using an underscore for the separator with the exception of Core (e.g. software_copyrightText) for all properties and all classes. Profile name should be all lowercased.

Actions:

  • Update generation of the context file and code generator and schema generator
  • Generate the prefixed property names in the generated documentation

@bennetkl
Copy link

bennetkl commented Mar 5, 2024

Changes for AI and dataset to be unique ie. useSensitivePersonalInformation and hasSensitivePersonalInformation makes sense

JPEWdev added a commit to JPEWdev/spec-parser that referenced this issue Mar 5, 2024
Per discussion on spdx/spdx-3-model#651,
generate the JSON-LD context such that non-Core Properties and Classes
are prefixed with a lower case version of the namespace + "_" (e.g.
"https://rdf.spdx.org/v3/Software/File" compacts to "software_File")
@kestewart
Copy link
Contributor

Discussed with @JPEWdev and this has been resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants