Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naming convention of additional properties in the descriptor #663

Closed
cpina opened this issue Apr 15, 2020 · 8 comments · Fixed by frictionlessdata/datapackage-v2-draft#50
Assignees
Milestone

Comments

@cpina
Copy link
Contributor

cpina commented Apr 15, 2020

In the table-schema.md spec it says:
https://github.com/frictionlessdata/specs/blob/master/specs/table-schema.md#descriptor

The descriptor MAY have the additional properties set out below and MAY contain any number of other properties (not defined in this specification).

If a user of the specification wanted to add a property to be used in their organisation: which naming convention should the user use?

E.g. we would like to add a "cfVariable": "http://cfconventions.org/Data/cf-standard-names/71/build/cf-standard-name-table.html#air_temperature". Naming it just cfVariable has some problems:

  • people finding a cfVariable property in a table schema might assume that it is a Frictionless Data standard and either use it or expect everyone to use it, expect the Frictionless Data tools to support it, etc.

  • if in the future Frictionless Data wanted to introduce cfVariable with different semantics or a different format it could cause incompatibility problems (e.g. we might just use a string to identify the version and the variable but it could also be a dictionary to identify the version and the variable).

  • currently the validation of schema files can be less strict as unknown properties now are ok (it could be one of the "other properties not defined in the specification") but perhaps it is a typo of an optional property

Prefixing the "additional properties" as "x-" (used by HTTP protocol but it seems that it has been deprecated1 in the HTTP case, or just "_" would help and is simple enough to identify non-official properties.

I've also thought of adding a prefix that would identify the organisation who introduced the property. E.g. if in the Swiss Polar Institute we wanted to use cfVariable it could be "spi_cfVariable". Two organisations could add the same property name using two "bas_cfVariable" if it had a different format/meaning.

Related and closed in favour of this

@rufuspollock rufuspollock transferred this issue from frictionlessdata/forum Apr 16, 2020
@rufuspollock
Copy link
Contributor

rufuspollock commented Apr 16, 2020

@cpina i've moved this issue from the forum to here in specs.

First noting this related (and unresolved) issue on extensibility #103 (closed in favour of this issue now).

Options:

  • Do nothing: Allow any kind of variables to be introduced in something of a free-for-all.
  • Custom sub-namespace: Move away from current approach that you can extend at top level and introduce a designated sub-namespace e.g. custom - advocated for in Encapsulated extensibility #103
  • Custom prefix e.g. x-
    • May have issues e.g. this approach for http headers was deprecated - see this RFC for extensive detail (worth reading) and https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers

      Custom proprietary headers have historically been used with an X- prefix, but this convention was deprecated in June 2012 because of the inconveniences it caused when nonstandard fields became standard in RFC 6648;

  • A registry: of extensions so it is easy for people to find existing efforts and avoid conflicts - connects with Table Schema catalog idea https://github.com/frictionlessdata/forum/issues/5

@rufuspollock
Copy link
Contributor

Read the excellent RFC https://tools.ietf.org/html/rfc6648 "Deprecating the 'X-' Prefix and Similar Constructs in Application Protocols".

Why using a prefix is a bad idea

The primary problem with the "X-" convention is that unstandardized
parameters have a tendency to leak into the protected space of
standardized parameters, thus introducing the need for migration from
the "X-" name to a standardized name. Migration, in turn, introduces
interoperability issues (and sometimes security issues) because older
implementations will support only the "X-" name and newer
implementations might support only the standardized name. To
preserve interoperability, newer implementations simply support the
"X-" name forever, which means that the unstandardized name has
become a de facto standard (thus obviating the need for segregation
of the name space into standardized and unstandardized areas in the
first place).

How you should do it (protocol creators i.e. us Frictionless Data)

Designers of new application protocols that allow extensions using
parameters:

  1. SHOULD establish registries with potentially unlimited value-
    spaces, defining both permanent and provisional registries if
    appropriate.

  2. SHOULD define simple, clear registration procedures.

  3. SHOULD mandate registration of all non-private parameters,
    independent of the form of the parameter names.

  4. SHOULD NOT prohibit parameters with an "X-" prefix or similar
    constructs from being registered.

  5. MUST NOT stipulate that a parameter with an "X-" prefix or
    similar constructs needs to be understood as unstandardized.

  6. MUST NOT stipulate that a parameter without an "X-" prefix or
    similar constructs needs to be understood as standardized.

How you should d o it: creators of parameters (e.g. @cpina )

Creators of new parameters to be used in the context of application
protocols:

  1. SHOULD assume that all parameters they create might become
    standardized, public, commonly deployed, or usable across
    multiple implementations.

  2. SHOULD employ meaningful parameter names that they have reason to
    believe are currently unused.

  3. SHOULD NOT prefix their parameter names with "X-" or similar
    constructs.

RFC https://tools.ietf.org/html/rfc4288 - Media Type Specifications and Registration Procedures

This is their recommended approach. Essentially create namespaces and allow people easily to register into those names

Note: If the relevant parameter name space has conventions about
associating parameter names with those who create them, a parameter
name could incorporate the organization's name or primary domain name
(see Appendix B for examples).

3 registration trees:

  • standards: root - requires standard track approval
  • vendor: vnd: root tree and subtrees created per vendor created a sub namespace they can use
  • personal prs

Registrations in the vendor tree will be distinguished by the leading
facet "vnd.". That may be followed, at the discretion of the
registrant, by either a media subtype name from a well-known producer
(e.g., "vnd.mudpie") or by an IANA-approved designation of the
producer's name that is followed by a media type or product
designation (e.g., vnd.bigcompany.funnypictures).

@rufuspollock
Copy link
Contributor

So ...

  • Prefixing with X- is a bad idea
  • A root namespace for non-standards stuff is a good idea
  • Allowing people to quickly and easily register themselves into this root namespace is a good idea
  • you don't use other people's root namespace

OK, so here's the recommendation following the media types approach

  • We need to establish some base namespace for community owned metadata that is not in the standard name. This could be vnd, this could be x. Recommendations welcome
  • We need a way to allow registration of sub-namespaces within that ... e.g. the "Swiss Polar Institute" might register "spi" etc
  • Within that space you can create to your heart's content ...
  • We need a separator to distinguish these potential namespaces. media types use . but this could be tricky for variables we frequently want to address in software since . is frequently used for attribute addressing. The issue is that any other obvious separator is also a legal part of a variable name which makes it hard to distinguish e.g. vnd_spi_myVar and a standards variable named vnd_spi_myVar. One option would be to prohibit _ in variable names or to prohibit vnd as a prefix of any standards name etc. Thoughts welcome.

@cpina
Copy link
Contributor Author

cpina commented Apr 17, 2020

Thanks very much for finding all the RFC and summarising/copying pasting relevant parts!

For now I want to comment only one quick thing:

* Allowing people to quickly and easily register themselves into this root namespace is a good idea

As a user of Frictionless Data this would work for me.

As a Frictionless Data project: do you have anything in mind to the registering? A PR to a file might be enough to avoid potential duplicates and have a sanity naming check. I also thought of something hands-off for Frictionless Data: piggyback some existing system: domains is one and very ubiquitous, another one could be something along the lines of https://www.grid.ac/ where people can register an organisation, or ORCID (only for researchers), etc. but I haven't found something that is not only organisations and tight to some existing organisation (besides domains) (ORCID ids would be a headache to see in the JSON file!)

@cpina
Copy link
Contributor Author

cpina commented Apr 17, 2020

  • We need to establish some base namespace for community owned metadata that is not in the standard name. This could be vnd, this could be x. Recommendations welcome

I think that I'm splitting hairs myself here about. vnd: I don't think of the Swiss Polar Institute as a vendor on the traditional sense (when I think of a vendor is is something like Amazon, Lenovo, Microsoft, etc. not institutes/organisations or individuals). I'm happy to take vnd as a tradition name for custom headers :-) (if it was in the docs I would not have thought much of this and just used it).

As a user any of the options would work for me, but do you think that all of this would be valid?: (I mean using variables vnd_spi_something or vnp_spi and then an object):

"vnd_spi_cfVariable": "string",
"vnd_spi_cfVariable": {"version": 71, "name": something"},
"vnd_spi": {"cfVariable": "name", "version": 71}, "DOI": "some_DOI"},

  • We need a separator to distinguish these potential namespaces. media types use . but this could be tricky for variables we frequently want to address in software since . is frequently used for attribute addressing. The issue is that any other obvious separator is also a legal part of a variable name which makes it hard to distinguish e.g. vnd_spi_myVar and a standards variable named vnd_spi_myVar. One option would be to prohibit _ in variable names or to prohibit vnd as a prefix of any standards name etc. Thoughts welcome.

If it counts my thought is to prohibit vnd as a prefix of any standards name.

* you don't use other people's root namespace

I guess that don't use unless agreed. We want to pass data to at least a partner organisation and they would read it :-)

@rufuspollock
Copy link
Contributor

I think that I'm splitting hairs myself here about. vnd: I don't think of the Swiss Polar Institute as a vendor on the traditional sense (when I think of a vendor is is something like Amazon, Lenovo, Microsoft, etc. not institutes/organisations or individuals). I'm happy to take vnd as a tradition name for custom headers :-) (if it was in the docs I would not have thought much of this and just used it).

I don't link "vendor" either. SPI is not a vendor 😄 . We can create any name for the non-standard namespace we want. We could even use x for extension 😉 (the issue as i understand with X in http headers was more b/c there was no namespacing and stuff would move from x to some other space).

Any thoughts here

/cc @pwalsh @roll @lauragift21

As a user any of the options would work for me, but do you think that all of this would be valid?: (I mean using variables vnd_spi_something or vnp_spi and then an object):

Yes, you can have a string or object.

As a Frictionless Data project: do you have anything in mind to the registering? A PR to a file might be enough to avoid potential duplicates and have a sanity naming check. I also thought of something hands-off for Frictionless Data: piggyback some existing system: domains is one and very ubiquitous, another one could be something along the lines of https://www.grid.ac/ where people can register an organisation, or ORCID (only for researchers), etc. but I haven't found something that is not only organisations and tight to some existing organisation (besides domains) (ORCID ids would be a headache to see in the JSON file!)

I would like to piggyback and an obvious thing would be domain names as java do with their namespacing. However this seems a bit cumbersome and I can't think of something super authoritative. i think the easiest is simply to allow people to register an id in an easy way e.g. via a PR to a repo and to have a recommendation e.g. reuse your github / org handle if you have one.

@roll
Copy link
Member

roll commented Apr 20, 2020

Data Package Pipelines uses prefixes like dpp:property

@roll
Copy link
Member

roll commented Apr 11, 2024

FIXED by #50

@roll roll closed this as completed Apr 11, 2024
@roll roll added this to the v2-final milestone Apr 11, 2024
@roll roll self-assigned this Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment