Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add contributor.roles property #18

Merged
merged 10 commits into from
Mar 29, 2024
Merged

Add contributor.roles property #18

merged 10 commits into from
Mar 29, 2024

Conversation

roll
Copy link
Member

@roll roll commented Jan 26, 2024


Rationale

Please take a look at frictionlessdata/specs#804. Also, this pull request tests the way we can deprecate things:

  • we remove them from the main text (contributor.role is removed)
  • we add a data consumer note that it still needs to be supported (mapped)

Later we can update all other deprecations to this style e.g. resource.url etc

@peterdesmet
Copy link
Member

peterdesmet commented Jan 26, 2024

I'm not familiar enough with JSON schemas, but is there a way to express the fallback in the JSON schema, so that implementors automatically fall back to role when roles is not provided?

@nichtich
Copy link

I'm not familiar enough with JSON schemas, but is there a way to express the fallback in the JSON schema, so that implementors automatically fall back to role when roles is not provided?

JSON Schema only specifies which fields are allowed, not how to process them. The schema could make sure not both of role and roles are used in the same contributor.

Co-authored-by: Peter Desmet <peter.desmet.work@gmail.com>
Copy link

cloudflare-pages bot commented Feb 20, 2024

Deploying datapackage with  Cloudflare Pages  Cloudflare Pages

Latest commit: c3d4338
Status: ✅  Deploy successful!
Preview URL: https://98851fce.datapackage.pages.dev
Branch Preview URL: https://804-contributor-roles.datapackage.pages.dev

View logs

@roll
Copy link
Member Author

roll commented Feb 23, 2024

@nichtich @pschumm
What would you propose to change? I think, obviously, a contributor can have many roles and we need a way to express it. We can't touch the existent contributor.role property for backward-compatibility reasons

@peterdesmet
Copy link
Member

Given yesterday's call and given ourselves a bit more freedom for v2. I suggest to add roles (plural) that is always an array.

@ezwelty
Copy link

ezwelty commented Feb 26, 2024

Given that the value of role is only RECOMMENDED to be one of author, publisher, ... and that these values are nowhere defined (except for author) nor how they are interpreted by software, what additional value does an array of such values bring? My first instinct would be to use a string describing their role (e.g. 'Wrote the package metadata and compiled the data for resource country-codes').

@peterdesmet
Copy link
Member

what additional value does an array of such values bring?

Such roles can be defined by specifications that build on top of Data Package. Currently however, Data Package dictates that role should be a single string value, meaning that specifications can't expand that to allow multiple values. In my opinion that is too strict on Data Package's part.

@PietrH
Copy link
Member

PietrH commented Mar 14, 2024

I agree, intuitively I would think contributors to a data package could have multiple roles.

I'm aware of the Relators in MARC (MAchine Readable Cataloging) : https://www.loc.gov/marc/relators/relaterm.html, which form the basis for the system used in R Packages: https://journal.r-project.org/archive/2012-1/RJournal_2012-1_Hornik~et~al.pdf

In R packages, it's extremely common for contributors to have multiple roles, see: https://r-pkgs.org/description.html#sec-description-authors-at-r

For example:

cre: the creator or maintainer, the person you should bother if you have problems. Despite being short for “creator”, this is the correct role to use for the current maintainer, even if they are not the initial creator of the package.

aut: authors, those who have made significant contributions to the package.

ctb: contributors, those who have made smaller contributions, like patches.

cph: copyright holder. This is used to list additional copyright holders who are not authors, typically companies, like an employer of one or more of the authors.

fnd: funder, the people or organizations that have provided financial support for the development of the package.

With this in mind, an array makes sense to me.

@roll
Copy link
Member Author

roll commented Mar 14, 2024

Interestingly, that DataCite doesn't support multiple roles - https://github.com/inveniosoftware/datacite/blob/master/datacite/schemas/datacite-v4.3.json#L51-L76

BTW, we can also keep role (primary role) and add additionalRoles (secondary roles)

@peterdesmet
Copy link
Member

I prefer roles over role and additionalRoles. It is sometimes hard to assess what is a primary role (e.g. is author higher than data curator?). A mapping to DataCite is still possible by denormalizing the contributors based on their role.

@roll
Copy link
Member Author

roll commented Mar 14, 2024

I like roles but somehow we can't yet get WG really interested in this one 😃

@pschumm
Copy link

pschumm commented Mar 14, 2024

I like roles but somehow we can't yet get WG really interested in this one 😃

Yeah, my apologies—this isn't something I use much myself nor have strong feelings about. Glad to defer to the wisdom of the group (i.e., I'll support whatever group consensus emerges).

@PietrH
Copy link
Member

PietrH commented Mar 14, 2024

I support roles, who doesn't like some good roles? Especially cinnamon...

image
Photo by Fallon Michael on Unsplash

@ezwelty
Copy link

ezwelty commented Mar 15, 2024

Interestingly, that DataCite doesn't support multiple roles - https://github.com/inveniosoftware/datacite/blob/master/datacite/schemas/datacite-v4.3.json#L51-L76

@roll Nor does Zenodo, because they use the same DataCite model. https://developers.zenodo.org/#representation (see contributors). On the other hand, CRediT (Contributor Roles Taxonomy) allows multiple roles per author.

@peterdesmet As you said, the main value of roles (array) is that it allows data packages to directly employ established role taxonomies. With that in mind, I'm fine with roles. But if so, I would suggest that the spec needs to provide definitions for its own RECOMMENDED taxonomy. Recently I needed to publish a Data Package on GitHub to Zenodo and found that while the DataCite roles are well defined, the Data Package roles not so much. Anyway, here is what I came up with in our particular case:

  • creator (person listed in author list: designed study, built package, performed lion's share of wrangler role, curated data from contributor, etc) → author
  • contributor: DataCollector (person who contributed some of their own data) → contributor
  • contributor: DataCurator (person who wrangled some data from publications) → wrangler

@roll
Copy link
Member Author

roll commented Mar 15, 2024

Can we use roles with some existent taxonomy while role keep to be just a free text?

@roll roll added the candidate label Mar 15, 2024
@ezwelty
Copy link

ezwelty commented Mar 15, 2024

Can we use roles with some existent taxonomy while role keep to be just a free text?

That feels unnecessarily complicated and too restrictive. There is no universally-accepted role taxonomy, but rather different taxonomies in different domains.

@peterdesmet
Copy link
Member

peterdesmet commented Mar 15, 2024

I agree that role and elements in roles should not have an enum in Data Package, as it restricts other schemas from building on top of that. I do think it makes sense to RECOMMEND using an existing vocabulary.

Here's an attempt:

roles: an array of strings describing the roles of the contributor. A role is RECOMMENDED to follow an established vocabulary, such as DataCite Metadata Schema's contributorRole or CreDIT. Useful roles to indicate are: creator, contact, rightsHolder, and dataCurator.

Note: I dropped author, publisher, maintainer, wrangler. Note also there is no actual default imposed by Data Package.

@ezwelty I notice in the DataCite documentation that a contributor has a givenName and familyName. That aligns nicely with what we eventually adopted in #20.

@ezwelty
Copy link

ezwelty commented Mar 15, 2024

@peterdesmet DataCite's contributor (with required contributorType) is optional and separate from creator, which is required (and has no type). See https://support.datacite.org/docs/datacite-metadata-schema-v44-mandatory-properties#2-creator. So to your list of roles I would add creator (closest equivalent to author in Data Package recommendation). What DataCite contributorType would your contributor map to? (other?). The default seems like it should be null (roles not specified).

I'd support recommending using an existing taxonomy, and happy to either preserve the Data Package v1 role taxonomy (if better defined) or dropping it.

@peterdesmet
Copy link
Member

... and separate from creator, which is required (and has no type).

Right! I agree with adding creator (in favour of Data Package v1's author). I have updated my comment above.

What DataCite contributorType would your contributor map to? (other?).

Yeah, it would likely be other (for Data Package v1's contributor). But I agree, role should not have a default and we should not advocate for contributor as default value (rather, don't have roles). I have updated my comment above.

I have also removed projectLeader, dataCollector as those are already a bit niche imo. So we suggest: "Useful roles to indicate are: creator, contact, rightsHolder, and dataCurator."

@roll
Copy link
Member Author

roll commented Mar 18, 2024

Thanks @peterdesmet,

I've upated after your definition

@roll
Copy link
Member Author

roll commented Mar 28, 2024

@khughitt
@pwalsh
@nichtich
@khusmann
@nichtich
Hi, this proposal has been updated based on the results of the discussion. Would you consider re-checking it? As Phill expressed a general support of the group decision, we need one more vote to accept the change. If decided, It would be great if we could ship it with the draft release on April 1.

@roll
Copy link
Member Author

roll commented Mar 29, 2024

ACCEPTED by WG (6/9)

@roll roll merged commit fd74e17 into main Mar 29, 2024
2 checks passed
@roll roll deleted the 804/contributor-roles branch March 29, 2024 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make contributor role an array of strings
6 participants