Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added contributor.given/familyName #20

Merged
merged 12 commits into from
Mar 14, 2024
Merged

Conversation

roll
Copy link
Member

@roll roll commented Jan 26, 2024


Rationale

Please take a look at frictionlessdata/specs#852. Personally, I'm not sure about this change but if we think about Data Package as just a language (or definition table) we create for humans/machines to communicate with each other than standardising two additional properties which obviously make sense is a logical move.

@peterdesmet
Copy link
Member

Thanks! To exemplify with a use case: we have introduced this properties for biodiversity datasets, to allow generating a citation as Lastname F, Lastname F ... (see gbif/ipt#2087 (comment)). We currently have to work around title being required, but that is resolved in #7 and is no longer required in frictionless v2.

Adopting this PR will as @roll writes "standardis[e] two additional properties", rather than communities inventing them.

@nichtich
Copy link

If the use case is to generate citations that the data model should be aligned with citation data (or applications will need to reinvent citation generators). The most common citation generator and data model is CSL. It's data model for contributor names is described here and internals as JSON Schema here. So in the best case there is support of name fields:

  • family - surname minus any particles and suffixes
  • given - given names, either full (“John Edward”) or initialized (“J. E.”)
  • suffix - name suffix, e.g. “Jr.” in “John Smith Jr.” and “III” in “Bill Gates III”
  • non-dropping-particle - name particles that are not dropped when only the surname is shown (“van” in the Dutch surname “van Gogh”) but which may be treated separately from the family name, e.g. for sorting
  • dropping-particle - name particles that are dropped when only the surname is shown (“van” in “Ludwig van Beethoven”, which becomes “Beethoven”, or “von” in “Alexander von Humboldt”, which becomes “Humboldt”)

As simplicity is preferred and suffix, non-dropping-particle and dropping-particle can be detected automatically to some degree, I'd say it is enough to support given and family by these names. The title field should be kept, it maps to CSL name field literal to be used as exemplified here.

@peterdesmet
Copy link
Member

Thanks @nichtich, good that there is a standard. I support using family and given over lastName and firstName.

@khusmann does this address your comment at frictionlessdata/specs#852 (comment)?

@khusmann
Copy link
Contributor

If the use case is to generate citations that the data model should be aligned with citation data (or applications will need to reinvent citation generators).

Well said, @nichtich , I strongly agree.

@khusmann does this address your comment at frictionlessdata/specs#852 (comment)?

@peterdesmet yes, it does! I fully support this now.

Copy link

cloudflare-pages bot commented Feb 20, 2024

Deploying with  Cloudflare Pages  Cloudflare Pages

Latest commit: 5e8b201
Status: ✅  Deploy successful!
Preview URL: https://7293d14d.datapackage.pages.dev
Branch Preview URL: https://852-contributor-first-last-n.datapackage.pages.dev

View logs

@roll
Copy link
Member Author

roll commented Feb 20, 2024

Thanks @nichtich! It's been updated -- does it look OK now?

@roll roll changed the title Added contributor.first/lastName Added contributor.given/familyName Feb 20, 2024
@PietrH
Copy link
Member

PietrH commented Feb 21, 2024

I prefer last name over family name, after all, not everyone has a family name at all. Other cultures have other practises, and where I can I have personally switched to asking for a single "username" in forms. In this case ,this isn't very practical, but I still believe we should avoid using Family over Last.

https://softwareforgood.com/why-you-should-stop-asking-for-first-and-last-names-on-forms/

https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/

Names are really complicated!

@@ -269,12 +269,16 @@ The people or organizations who contributed to this Data Package. It `MUST` be a
```

- `title`: name/title of the contributor (name for person, name/title of organization)
- `givenName`: a given names, either full (“John Edward”) or initialized (“J. E.”), if the contributor is a person
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `givenName`: a given names, either full (“John Edward”) or initialized (“J. E.”), if the contributor is a person
- `givenName`: given name(s), either full (“John Edward”) or initialized (“J. E.”), if the contributor is a person

@@ -269,12 +269,16 @@ The people or organizations who contributed to this Data Package. It `MUST` be a
```

- `title`: name/title of the contributor (name for person, name/title of organization)
- `givenName`: a given names, either full (“John Edward”) or initialized (“J. E.”), if the contributor is a person
- `familyName`: a surname minus any particles and suffixes if the contributor is a person
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `familyName`: a surname minus any particles and suffixes if the contributor is a person
- `familyName`: familial name, if the contributor is a person

@peterdesmet
Copy link
Member

peterdesmet commented Feb 21, 2024

I personally don't know what the most inclusive way is to represent names, other than the single name which we already have (but that is problematic for citations).

  • The EML standard uses givenName, surName
  • The currently referenced citeproc-js uses given, family
  • Wikipedia uses surname (canonical for Last name or Family name) and given name
  • Bibtex uses a single author field, parsing is done on comma
  • ...

I'm fine with an alternative where we don't add 2 new fields, but rather provide formatting rules for the existing title (cf. bibtex)

@roll
Copy link
Member Author

roll commented Feb 21, 2024

BTW I would take into account, that datapackage.json is still a machine-focused communication format so I would say that it's not that important how we exactly name properties if it provides desired functionality. So if it's contributor.lastName in JSON it doesn't mean that it must be Last Name in e.g. UI

Personally, I have zero experience in citation but first/lastName feels you know kind of more universal I would say -- it's just about parts of a string, seems to be less tied to naming complexity

@roll
Copy link
Member Author

roll commented Feb 21, 2024

For example, if we take a look at the Bibtext definition, nothing prevents us from having:

  • firstName: First names or given names
  • lastName: Last name or family name

@nichtich
Copy link

I'm fine with an alternative where we don't add 2 new fields, but rather provide formatting rules for the existing title

Parsing and formatting names is a form of art but not relevant to this specification. All we need to do is provide a way to store a common subset of name information. Existing standards suggest three fields (given and family=surname if available and applicable, title otherwise) is a good solution.

@roll
Copy link
Member Author

roll commented Feb 21, 2024

So, what is the conclusion for property names? 😄

@peterdesmet
Copy link
Member

Let's put it to a vote, as an emoji on this comment, you can vote for multiple options. The options are:

  1. 🎉 Add the fields firstName and lastName (initial proposal)
  2. ❤️ Add the fields givenName and familyName (second proposal)
  3. 🚀 Don't add new fields for contributor, but suggest structure by updating the title definition to

    Name/title of the contributor (name for person, name/title of organization). For person names, it is RECOMMENDED to format the name as Lastname, Firstname(s).

  4. 😕 Don't add new fields for contributor
  5. 👀 I have no stake in this discussion.

@roll
Copy link
Member Author

roll commented Feb 23, 2024

I've updated back to first/lastName but using more broad BibTeX definition. It now mentions both first / given and last / family terms, and the exact names of the properties are not so important, IMHO, while it has proper definitions. Also, added a recommendation to follow BibTeX guidance for title.

@roll roll changed the title Added contributor.given/familyName Added contributor.first/lastName Feb 23, 2024
Copy link
Member

@peterdesmet peterdesmet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of text.

content/docs/specifications/data-package.md Outdated Show resolved Hide resolved
content/docs/specifications/data-package.md Outdated Show resolved Hide resolved
content/docs/specifications/data-package.md Show resolved Hide resolved
@peterdesmet
Copy link
Member

@roll Great! I've made some suggestions to the wording, to align with what's already there.

roll and others added 2 commits February 23, 2024 09:49
Co-authored-by: Peter Desmet <peter.desmet.work@gmail.com>
Co-authored-by: Peter Desmet <peter.desmet.work@gmail.com>
Co-authored-by: Peter Desmet <peter.desmet.work@gmail.com>
Co-authored-by: Peter Desmet <peter.desmet.work@gmail.com>
@roll roll added the candidate label Feb 23, 2024
@roll
Copy link
Member Author

roll commented Feb 23, 2024

@pschumm
@ezwelty
@pwalsh
@nichtich
Can you please take a look

@ezwelty
Copy link

ezwelty commented Feb 26, 2024

I have a few thoughts on this. As others have noted, title is the only universal way of representing a person's name. If a more granular model is needed for generating citations:

  • I would NOT name the properties firstName, lastName.
  • I would NOT suggest people muck up title to match some formatting convention.
  • I would approve either one of:
    • Add two new properties givenName, familyName
    • Add a single citationName in Bibtext format
    • ... where in either case, use of title is encouraged to make explicit name order and secondary name parts.

My reasoning below.

Citation conventions require the family name (aka surname), which is not the last name in notable cases (examples below).

  • Most modern Chinese names begin with a single-symbol family name followed by the given name. As a result of the last name assumption, Chinese authors are sometimes cited by their given name.
  • In most Spanish-speaking countries, most people have two family names, with the first often taking precedence. So for example Guillermo Cobos Campos should in Bibtex format be cited as "Cobos Campos, Guillermo" or "Cobos, Guillermo", but certainly not "Campos, Guillermo Cobos" (as is frequently found).

At the WGMS, where we work with scientists from all over the world, we store their full name (equivalent to title), and the part of their name (it must be a subset of their full name) that should be treated as the family name when citation formats require a distinction. This allows us to support either order of given and family name, names without family names, etc with just two fields. A similar approach that would more robustly deal with suffixes, non-dropping-particles and dropping-particles would be to have a full name (title: 'John Smith Jr.') and a full citation name in Bibtex format (citation: 'Smith, John Jr.').

Note that even CSL doesn't have an attribute to specify given-family name order (some software tries to infer this using hacks, e.g. based on the script of the name); the only way to achieve this is with their literal, aka our title.

@pschumm
Copy link

pschumm commented Feb 26, 2024

FWIW, I find @ezwelty's argument(s) above pretty convincing, and now agree that either givenName and familyName or citationName is the way to go. I don't have a strong preference between these two alternatives. However, I do feel strongly that the final solution should accommodate all languages.

@peterdesmet
Copy link
Member

@ezwelty thanks for clarifying. I think the citation field is the best approach then: it is flexible and clearly communicates that the purpose is for citations. Some questions/suggestions

  1. Definition: citation: Name of the contributor in BibTex author format for use in citations.
  2. Name: Should this property be called citation (quite broad), author (maybe confusing with a role), bibtex, or bibtexAuthor?
  3. Use: I'm guessing that not providing this property doesn't directly mean that the contributor should not be included in a citation?

Examples

[
  {
    "title": "John Smith Jr.",
    "citation": "Smith, Jr, John" <- Suffix in 2nd place
  },
  {
    "title": "Guillermo Cobos Campos",
    "citation": "Cobos Campos, G"
  },
  {
    "title": "Monitoring, understanding and forecasting global biomass flows of aerial migrants (GloBAM)",
    "citation": "{GloBAM project}" <- Wrapped in {}
  }
]

@nichtich
Copy link

nichtich commented Mar 1, 2024

I oppose building on top of BibTeX instead of CSL and I oppose adding another field with already same semantics as title we already have. As summarized above CSL can be stripped to one field for the full name (we already have title, so this should be fixed) and two fields for parts of the name.

@peterdesmet
Copy link
Member

Looking for a consensus:

I would therefore suggest the givenName/familyName properties, with definitions derived from CSL:

The “family” property represents the familial name that a person inherits. The “given” property represents the name a person has been given.

@PietrH could you agree with that approach?

@PietrH
Copy link
Member

PietrH commented Mar 14, 2024

If I'm the blocker I'll capitulate.

I propose we reopen this discussion when we come across a user story that doesn't fit in the CSL derived definitions.

@roll
Copy link
Member Author

roll commented Mar 14, 2024

@peterdesmet @PietrH
Thanks a lot for making it work! I've updated the PR

@ezwelty @pschumm
Please take a look if it's ok now

@roll
Copy link
Member Author

roll commented Mar 14, 2024

ACCEPTED by WG (6/9)

@roll roll merged commit bcbb2f3 into main Mar 14, 2024
1 check passed
@roll roll deleted the 852/contributor-first-last-name branch March 14, 2024 14:38
@ezwelty
Copy link

ezwelty commented Mar 15, 2024

@roll Looks good. My only additional suggestion would be to mention, when linking to the CSL docs, to follow their guidance on where to put suffixes and particles, since we don't include dedicated fields for those.

@roll roll changed the title Added contributor.first/lastName Added contributor.given/familyName Mar 29, 2024
@peterdesmet peterdesmet mentioned this pull request Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add contributor firstName and lastName (in favour of title)
7 participants