Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align citation with GBIF.org #1360

Closed
timrobertson100 opened this issue Sep 13, 2017 · 28 comments
Closed

Align citation with GBIF.org #1360

timrobertson100 opened this issue Sep 13, 2017 · 28 comments
Assignees
Milestone

Comments

@timrobertson100
Copy link
Member

The IPT citation does not reflect what the GBIF.org system does. This is highly confusing, and GBIF's publishing tool need to be fully consistent and intuitive.

I am not sure which is incorrect.

@timrobertson100
Copy link
Member Author

The corresponding GBIF registry issue is gbif/registry#4

@dschigel
Copy link

Should user still have an opportunity to give a free text citation to IPT + a warning that this won't be displayed at GBIF.org, or should this opportunity be closed completely?

@peterdesmet
Copy link
Member

I agree that both should be aligned! Regarding allowing the user to add a free text: that's something to avoid if it's never going to end up on GBIF (with or without warning)!

I think it's better to limit the freedom to the user, but still allow him to provide some things to be included in an automatically generated citation.

Here's what we do

  1. Start with an automatic citation
  2. Include DOI as citation identifier
  3. If we have a data paper: turn automatic citation off and add reference to the data paper
  4. Remove version number (because with the automatic citation turned off, it will just get obsolete)... there's not much we can do about the year

Now, if we would be able to provide the DOI of the dataset, and the DOI of the data paper, and the citation is build from there, we would be happy.

Other remarks

  • Sometimes it's useful to have a citation without the version number (as a general citation for the dataset as a whole... see how Zenodo does this)
  • Sometimes it's annoying that year is the last publication year. If you reference a dataset, it sometimes makes more sense to have the first publication year. The Florabank for example is published in 2012, but the citation implies 2017
  • The "accessed via GBIF.org on 2017-09-15" that is added on GBIF does not make sense in the citation on the IPT.
  • Some resource creators seem to be dropped from the citation on GBIF (maybe because they are groups, not people?). We would rather have those included.

Example IPT vs GBIF citation

Differences in bold.

Citation on IPT: http://data.inbo.be/ipt/resource?r=dagvlinders-inbo-occurrences

Maes D, Brosens D, Beck O, Van Dyck H, Desmet P, Vlinderwerkgroep Natuurpunt, all butterfly recorders (2016): Vlinderdatabank - Butterflies in Flanders and the Brussels Capital Region, Belgium. Research Institute for Nature and Forest (INBO). Dataset/Occurrence. http://doi.org/10.15468/njgbmh Data paper: http://doi.org/10.3897/zookeys.585.8019

Citation on GBIF: https://www.gbif.org/dataset/7888f666-f59e-4534-8478-3a10a3bfee45

Maes D, Brosens D, Beck O, Van Dyck H, Desmet P (2017). Vlinderdatabank - Butterflies in Flanders and the Brussels Capital Region, Belgium. Version 1.4. Research Institute for Nature and Forest (INBO). Occurrence Dataset https://doi.org/10.15468/njgbmh accessed via GBIF.org on 2017-09-15.

@dschigel
Copy link

dschigel commented Oct 30, 2017

As discussed with, but not yet checked by @ahahn-gbif:

In the IPT, next to the custom citation field, please add, somehow visibly, the following warning:
"
Note that the citation for GBIF.org will be automatically generated based on your metadata fields, such as dataset.authors, dataset.pubDate, dataset.title, dataset.version, organization.title, dataset.type, dataset.doi, while your free text citation will be ignored by GBIF.org. By editing these metadata fields you may modify the appearance of the citation at GBIF.org. This custom (free text) citation can, however, be used for citing the data for direct access through IPT. Wondering why? Welcome to FAQ https://www.gbif.org/faq?q=citation
"

@camiplata
Copy link

I see this is a long standing issue, I would like to know if there are any plans to address it for the coming release (if there is one coming soon).

@SiBColombia

@dschigel
Copy link

dschigel commented Mar 4, 2021

@ahahn-gbif as we just discussed this again lately, I think we should add some closing remarks and close this, do you agree?

@camiplata
Copy link

@dschigel @ahahn-gbif that would be great as this disparity between IPT and GBIF creates a lot of noise between data publishers.

@ahahn-gbif
Copy link
Contributor

@dschigel We will still need to pick this issue up in future IPT development work, but may want to move it to a different repository. There is no scheduled development work or release plan for the IPT at this point, the future IPT work being in early scoping stages. At this point, the IPT citation editing page contains the following warning against free-text citations:

image

I assume we are only talking about the auto-generated citations from here on. I agree that their generation logic should be aligned between gbif.org and the IPT, so that an IPT data administrator has a reasonably clear idea of how the citation will later show on gbif.org. However, for several of the reasons @peterdesmet states above, both citations will never be fully identical, especially in the details pointing back at the source - citing an IPT endpoint is not referencing a dataset version fully identical to the one accessible through gbif.org (it may not even be published in a GBIF context), and both citations will continue to differ.

Maybe it is important to communicate this latter point: accessing the dataset through the IPT endpoint will give access to the data as configured at source, potentially including unindexed extensions, different preferred taxon names, geodetic organization, etc. It will, on the other hand, not contain components added or annotated during the ingestion processes of GBIF et al.: alignment with a core taxonomic structure for unified search access, standardization of certain data fields, annotation of potential issues, data interpretation to mediate detected issues.

@dschigel
Copy link

We came to this in the Humboldt core topic, and I realize we can possibly solve it by focusing on DOI and by stressing that it's the target context that dictates citation format.

I come from two assumptions:

  • GBIF and GBIF publishers really only care about presence of DOI of the citation CiteTheDOI, as this is enabling data citation mechanism.
  • Exact formulation of the textual citation is dictated by the target context of the rules (e.g. guidelines for authors where a paper is publisher, or a report that refers to a GBIF download or a dataset)

If you agree with these statements, we actually don't need a recommended citation at all. For DOI based data citation to work, we don't need a publisher preferred citation either (but we can keep it in the IPT resource as it seems to be important, and yes we can stress that it is IPT resource which is then cited, not the GBIF view). Every GBIF page instead of the current Citation footer can instead have a section along the CitetheDOI lines with approx the following text, see below. In IPT view we need to have much softer wording e.g. "should" -> "example". The sentence on the publisher recommended view would be only shown if publisher recommended citation is not null. I think the suggestion below would capture key wishes expressed above. Styling and English will need to be fixed.

How to cite
GBIF mediated data is free for all, but is not free from obligations. Every GBIF user, according to data user agreement, is requested to cite the DOI for downloads or DOI for datasets when referring to such data. To format your references, please follow guidelines for authors and styling advice of your use case, e.g. journal, but do not omit the DOI. In the absence of clear formatting guidelines, you may also take into account citation recommendations provided by the data publisher, or follow GBIF's recommendation:

Karlsholt O, Pedersen J, Hansen (deceased) M, Schigel D, Braak K (2016). Insects from light trap (1992–2009), rooftop Zoological Museum, Copenhagen. Version 1.4. Natural History Museum of Denmark. Sampling event dataset https://doi.org/10.15468/xabmiz accessed via GBIF.org on 2021-03-17.

@abubelinha
Copy link
Contributor

abubelinha commented Oct 19, 2022

We sometimes tried to use test IPTs + test registry for checking how different sections of metadata would finally look once published (links and other html stuff, which authors will appear in citation and in which order, and things like that). Authors always prefer to see a test version of the final product.

The problem -at least several times we tried- is the slowness of the test-portal in reflecting those changes after test IPT publications (even for metadata-only datasets).
That's a bottleneck. Do you know if there is a previous issue where I can comment about this?
If not, which is the most appropriate repository for opening it? (IPT, registry, portal-feedback, ...)

Thanks a lot in advance
@abubelinha
(@dgasl also interested in this)

@mike-podolskiy90
Copy link
Contributor

@abubelinha Thank you for the questions. This is a portal thing I think.

@ptyk
Copy link

ptyk commented Nov 30, 2022

Hello everyone. I found the discussion after realizing the isssue of author names and order. My case is a set of checklist datasets, that I am going to publish on behalf of a large group of authors as a series of chapters. I originated the idea and now I prepared a standard metadata description, which will be applied to all of the checklists (and modified in some cases).
So this is my main input in terms of the content. I cannot be treated as the author of the dataset. But:

  • the IPT automated citation (as well as the one I saw in the ChecklistBank) excludes me from the author list, which is correct,
  • the Portal citation, publicly visible, adds my name after the authors, which will be treated as a misuse or usurpation of authorship.

If I delete myself from the metadata authorship it will also be wrong. I think IPT should provide a way to control it, using a simple checkbox near the metadata author part (ie "tick for inclusion to the author string" or so).

What do you think?

@peterdesmet
Copy link
Member

I agree. I think only the resource creators should be included as authors (in the order provided), as is done by the IPT.

@ahahn-gbif
Copy link
Contributor

I am a bit reluctant about that direction. The original proposal was to have the IPT follow the logic of GBIF.org. The inclusion of metadata authors in the citation string had been discussed and decided in favor of, because metadata can often contribute substantially to the quality and usability of a dataset, and are not necessarily provided by the curators of the datasets themselves. Also, on a more procedural level, this would change citations for quite a number of datasets in GBIF.org without prior consultation or even information, which does not sound quite right.

I would rather propose to

  • update the IPT to follow the same citation generation logic as the main GBIF system does (include metadata authors in the auto-generated citation as displayed in the IPT for feedback), and
  • give guidelines on roles that may be more suitable than the metadata author one for purely editorial support (editor? distributor? )

@mike-podolskiy90 mike-podolskiy90 self-assigned this Dec 1, 2022
@mike-podolskiy90 mike-podolskiy90 added this to the 2.7.0 milestone Dec 1, 2022
@ahahn-gbif
Copy link
Contributor

I am sorry - I overlooked that it is not possible in the IPT to not declare a metadata author, which makes this more tricky. So if I understand correctly, the situation we would like to reach is one where

  • the citation string as shown in GBIF.org and in the IPT should be consistent, as far as the contained data elements are concerned (DOI will still be GBIF.org-side only, URL IPT side-only)
  • in some cases a publisher may want to not include metadata authors in the citation, especially when the role is purely editorial without intellectual contribution/ownership

At present

  • the IPT enforces mapping at least one metadata provider - is that required for other functions of either the IPT or gbif.org, or negotiable?
  • other contributor roles are available, but only in addition under "associated parties", and not as an alternative
  • the citation generation of the IPT (automated version) and GBIF.org are technically independent. The citation string in GBIF.org is based on the registry API, in the IPT on its own code base. This means that at least at present, an "include me or not" checkbox in the IPT would not have any impact on the citation generation within GBIF.org, and inclusion into the citation generation logic would require changes both in the IPT, communication with the registry, and API citation generation
  • we do not want to automatically change already vetted citations for existing datasets without explicit interaction

questions to check into:

  • what decision is the mandatory requirement of a metadata provider based on - could it be optional?
  • are there alternatives for handling optional metadata author inclusion based on a publisher decision?

@MattBlissett
Copy link
Member

Remember we are generating EML here, so we don't have complete flexibility.

The metadata authorship becomes dataset/metadataProvider which is optional in EML, only dataset/creator is required. I don't think it's possible to include a metadataProvider in EML but somehow mark it to be excluded from a citation.

See https://eml.ecoinformatics.org/schema/, specifically https://eml.ecoinformatics.org/schema/eml_xsd.html#eml_dataset and https://eml.ecoinformatics.org/schema/eml-resource_xsd.html#ResourceGroup_metadataProvider

@mdoering
Copy link
Member

mdoering commented Dec 1, 2022

I never understood the reasons for including the metadata author in the generated citation. I would strongly consider to remove that. If that person wants/needs to be cited I think it should become also a proper author/creator of the resource. Offering an option to include/exclude the metadata author would only add to the complexity I think.

It might also be worth mentioning that In ChecklistBank we have decided to follow yet another approach.
The citation string really is mixing citation information with citation styles. There are thousands of styles out there and journals pick and require different citation styles that authors have to follow. Isn't it much better to use a structured citation like BibTex or CSL-JSON that users can format according to the style they need? It would free us from discussing some of the citation details and GBIF.org could pick its preferred style for formatting. But an IPT installation could select the style of their choice instead but could always preview the citation in the GBIF style when publishing. Note also that EML 2.2 has added support for structured citations. More information on the CLB implementation can be found in CatalogueOfLife/backend#989

Removing the metadata author from the citation would align better with CLB/COL.

@mike-podolskiy90
Copy link
Contributor

mike-podolskiy90 commented Dec 1, 2022

@mdoering Removing metadata authors would affect thousands of citations at GBIF.org

I think we should make metadata providers an optional section in the IPT basic metadata since it's optional in EML

@mdoering
Copy link
Member

mdoering commented Dec 1, 2022

Yes. On the other hand it would mean that I am forced to not say who authored the metadata just to remove me from the citation.

@ahahn-gbif
Copy link
Contributor

Yes. We do have to consider the situation today, however. Silently changing thousands of citations on GBIF.org can have major fallout, as nice as the change may be. This is not a quick fix to push through.

@albenson-usgs
Copy link
Contributor

I reviewed a few metadata records in the Environmental Data Initiative (EDI) repository and it does seem to me that the requirement for metadata provider that the IPT has enforced is unusual / not standard practice. In the few EML records I examined from EDI, none of them had metadata provider.

@dschigel
Copy link

dschigel commented Dec 1, 2022

@albenson-usgs I think this is actually quite revealing about attitudes towards metadata. When citation generating formula was rolled out (and now indeed affects thousands of dataset) GBIF.org view of the published datasets - which is the citable object in this case - is a product of data creation & reworks -> front authorship and of metadata creation & reworks -> metadata authorship. We have enough trouble with poor metadata across so many infrastructures, so removing the metadata authorhsip from the GBIF dataset equation will send us back to metadata-careless stone age thought metadata anonymity. I would be very protective of the second bullet here, and I understood @ahahn-gbif, too? Authorship is not only credit - it is responsibility, and this fully applied to metadata authoship. Please note that dataset can be cited (at its endpoint location) differently from the GBIF.org displayed instance.

@ptyk
Copy link

ptyk commented Dec 1, 2022

@dschigel I fully understand the need to maintain and increase the quality of metadata in GBIF datasets, but the statement you mention ("Name(s) of the dataset’s metadata author(s) [to be included to the author string], if one is registered, but only if also an originating author is named") does not necesarily close the topic. We may imagine the following scenario:

  • the Resource Creators part is obligatory
  • the Metadata Providers part is obligatory
  • the behavior of the algorithm constructing the author string in the citation depends on the decision of the provider (by ticking a checkbox, e.g.):
    -- by default the metadata provider person is added to the author string
    -- if the provider decides to stay hidden (in the author string), then he is excluded, although still listed, and visible (and responsible) as the Metadata Creator

So if you decide to add this option to the IPT, it will not affect the existing datasets, and no one should complain. But ones who care will have the option. And it may also be appreciated by many curators of the older datasets.

@peterdesmet
Copy link
Member

Metadata editing is indeed important and should be acknowledged. So is data collection, managing, etc. This is why we (INBO) include all these people as creators of a dataset, so they are included in the (IPT) citation. The metadata editor field is superfluous for us, because we already include those people as creators.

I’d rather have one list of people (contributors), that are all included as EML creators and who are all included as authors (cf. GBIF citation). That way names don’t have to be repeated too.

The IPT could still offer to indicate roles for those contributors (e.g. contact). That can be expressed in EML by listing those people under a specific property for that role (cf. current implementation), but in addition to them being listed as creator. It also provides a way to migrate info in the IPT: make metadata editors, creators and contacts all contributors and remove duplicates.

To acknowledge people (cf. acknowledgement in paper) that should not be included in the citation, use additional parties.

@albenson-usgs
Copy link
Contributor

@dschigel I believe there is an assumption here that requiring someone to identify themselves as the metadata provider makes them 1) create better metadata and 2) feel more responsibility for the dataset. I am dubious that either of those things are true. The issue is not whether or not it should be an option to include metadata provider, it's whether it should be required.

I’d rather have one list of people (contributors), that are all included as EML creators and who are all included as authors (cf. GBIF citation). That way names don’t have to be repeated too.

While I agree that having to repeat author information up to three times (contact, creator, metadata provider) is quite tedious (you can copy from resource contact but only for the first contact), I don't agree with having only one list and it's only the contributors. I know for some of the projects I help share data it is nice to know who processed the data to Darwin Core but those people don't want to be listed as authors (and it would make the data originators frustrated to see that person's name in the citation).

It does seem to me that we need a more flexible way for IPT data managers to select and decide the authorship and order of authors in the citation for the IPT and on GBIF.org.

@mdoering
Copy link
Member

mdoering commented Dec 1, 2022

... interestingly we have followed in ChecklistBank DataCite and CSL to list contributors with an optional note that can express how they contributed, but explicitly excluded them from being cited as authors. I really like the traditional way of separating authors, editors, the publisher (included in the citation string) and a flexible list of other contributors that are not part of the citation string. This way you can control who is part of the citation string, but still attribute others. I know some people prefer to cite each and everyone equally, but I don't think we should require such practices but instead leave this to the dataset publisher.

@peterdesmet
Copy link
Member

Maybe it would be good then to have one list where all people are only listed once, but can be assigned multiple roles. Someone with author role is included in the citation.

mike-podolskiy90 added a commit that referenced this issue Dec 15, 2022
- include metadata providers
- allow agents with lastname only
- punctuation
@mike-podolskiy90
Copy link
Contributor

Further discussion here #1917

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests