Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duplicate nodes in cff file #37

Closed
dpprdan opened this issue Aug 10, 2022 · 8 comments
Closed

duplicate nodes in cff file #37

dpprdan opened this issue Aug 10, 2022 · 8 comments

Comments

@dpprdan
Copy link
Member

dpprdan commented Aug 10, 2022

When cff_create()ing a cff file from a DESCRIPTION file (either locally from an in-development package or from a path (see below)) some node duplicates are created under the preferred-citation node, e.g. keywords, license, contact or abstract.

cffr::cff_create(system.file("DESCRIPTION", package="jsonlite"), dependencies = FALSE)
#> cff-version: 1.2.0
#> message: 'To cite package "jsonlite" in publications use:'
#> type: software
#> license: MIT
#> title: 'jsonlite: A Simple and Robust JSON Parser and Generator for R'
#> version: 1.8.0
#> abstract: A reasonably fast JSON parser and generator, optimized for statistical data
#>   and the web. Offers simple, flexible tools for working with JSON in R, and is particularly
#>   powerful for building pipelines and interacting with a web API. The implementation
#>   is based on the mapping described in the vignette (Ooms, 2014). In addition to converting
#>   JSON data from/to R objects, 'jsonlite' contains functions to stream, validate,
#>   and prettify JSON data. The unit tests included with the package verify that all
#>   edge cases are encoded and decoded consistently for use with dynamic data in systems
#>   and applications.
#> authors:
#> - family-names: Ooms
#>   given-names: Jeroen
#>   email: jeroen@berkeley.edu
#>   orcid: https://orcid.org/0000-0002-4035-0289
#> preferred-citation:
#>   type: manual
#>   title: 'jsonlite: A Simple and Robust JSON Parser and Generator for R'
#>   authors:
#>   - family-names: Ooms
#>     given-names: Jeroen
#>     email: jeroen@berkeley.edu
#>     orcid: https://orcid.org/0000-0002-4035-0289
#>   version: 1.8.0
#>   abstract: A reasonably fast JSON parser and generator, optimized for statistical
#>     data and the web. Offers simple, flexible tools for working with JSON in R, and
#>     is particularly powerful for building pipelines and interacting with a web API.
#>     The implementation is based on the mapping described in the vignette (Ooms, 2014).
#>     In addition to converting JSON data from/to R objects, 'jsonlite' contains functions
#>     to stream, validate, and prettify JSON data. The unit tests included with the
#>     package verify that all edge cases are encoded and decoded consistently for use
#>     with dynamic data in systems and applications.
#>   repository: https://CRAN.R-project.org/package=jsonlite
#>   repository-code: https://github.com/jeroen/jsonlite
#>   url: https://arxiv.org/abs/1403.2805
#>   date-released: '2022-02-22'
#>   contact:
#>   - family-names: Ooms
#>     given-names: Jeroen
#>     email: jeroen@berkeley.edu
#>     orcid: https://orcid.org/0000-0002-4035-0289
#>   keywords:
#>   - json
#>   - parser
#>   - r
#>   - rstats
#>   license: MIT
#>   year: '2022'
#> repository: https://CRAN.R-project.org/package=jsonlite
#> repository-code: https://github.com/jeroen/jsonlite
#> url: https://arxiv.org/abs/1403.2805
#> date-released: '2022-02-22'
#> contact:
#> - family-names: Ooms
#>   given-names: Jeroen
#>   email: jeroen@berkeley.edu
#>   orcid: https://orcid.org/0000-0002-4035-0289
#> keywords:
#> - json
#> - parser
#> - r
#> - rstats
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.1 (2022-06-23 ucrt)
#>  os       Windows 10 x64 (build 19044)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language en
#>  collate  German_Germany.utf8
#>  ctype    German_Germany.utf8
#>  tz       Europe/Berlin
#>  date     2022-08-10
#>  pandoc   2.18 @ C:/Program Files/RStudio/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cffr          0.2.2   2022-08-10 [1] Github (ropensci/cffr@b419e44)
#>  cli           3.3.0   2022-04-25 [1] CRAN (R 4.2.0)
#>  desc          1.4.1   2022-03-06 [1] CRAN (R 4.2.0)
#>  digest        0.6.29  2021-12-01 [1] CRAN (R 4.2.0)
#>  evaluate      0.15    2022-02-18 [1] CRAN (R 4.2.0)
#>  fansi         1.0.3   2022-03-24 [1] CRAN (R 4.2.0)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.2.0)
#>  fs            1.5.2   2021-12-08 [1] CRAN (R 4.2.0)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.0)
#>  highr         0.9     2021-04-16 [1] CRAN (R 4.2.0)
#>  htmltools     0.5.3   2022-07-18 [1] CRAN (R 4.2.1)
#>  jsonlite      1.8.0   2022-02-22 [1] CRAN (R 4.2.0)
#>  knitr         1.39    2022-04-26 [1] CRAN (R 4.2.0)
#>  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.2.0)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.0)
#>  pillar        1.8.0   2022-07-18 [1] CRAN (R 4.2.1)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.0)
#>  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.2.0)
#>  R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.2.1)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.2.0)
#>  R.oo          1.25.0  2022-06-12 [1] CRAN (R 4.2.0)
#>  R.utils       2.12.0  2022-06-28 [1] CRAN (R 4.2.1)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.0)
#>  reprex        2.0.1   2021-08-05 [1] CRAN (R 4.2.0)
#>  rlang         1.0.4   2022-07-12 [1] CRAN (R 4.2.1)
#>  rmarkdown     2.14    2022-04-25 [1] CRAN (R 4.2.0)
#>  rprojroot     2.0.3   2022-04-02 [1] CRAN (R 4.2.0)
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.2.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.0)
#>  stringi       1.7.8   2022-07-11 [1] CRAN (R 4.2.1)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.2.0)
#>  styler        1.7.0   2022-03-13 [1] CRAN (R 4.2.0)
#>  tibble        3.1.8   2022-07-22 [1] CRAN (R 4.2.1)
#>  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.2.0)
#>  vctrs         0.4.1   2022-04-13 [1] CRAN (R 4.2.0)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.0)
#>  xfun          0.31    2022-05-10 [1] CRAN (R 4.2.0)
#>  yaml          2.3.5   2022-02-21 [1] CRAN (R 4.2.0)
#> 
#>  [1] C:/Users/Daniel.AK-HAMBURG/AppData/Local/R/win-library/4.2
#>  [2] C:/Program Files/R/R-4.2.1/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

additional examples:

This does not happen, when there is a CITATION file in the repo (i.e. it has a proper preferred citation) or when cff_create() is run on a locally installed package.

`cff_create()` run locally, {tidygeocoder} has a CITATION file
> cffr::cff_create(dependencies = FALSE)
cff-version: 1.2.0
message: 'To cite package "tidygeocoder" in publications use:'
type: software
license: MIT
title: 'tidygeocoder: Geocoding Made Easy'
version: 1.0.4.9000
doi: 10.21105/joss.03544
abstract: An intuitive interface for getting data from geocoding services.
authors:
- family-names: Cambon
  given-names: Jesse
  email: jesse.cambon@gmail.com
  orcid: https://orcid.org/0000-0001-6854-1514
- family-names: Hernangómez
  given-names: Diego
  email: diego.hernangomezherrero@gmail.com
  orcid: https://orcid.org/0000-0001-8457-4658
- family-names: Belanger
  given-names: Christopher
  email: christopher.a.belanger@gmail.com
  orcid: https://orcid.org/0000-0003-2070-5721
- family-names: Possenriede
  given-names: Daniel
  email: possenriede+r@gmail.com
  orcid: https://orcid.org/0000-0002-6738-9845
preferred-citation:
  type: article
  title: 'tidygeocoder: An R package for geocoding'
  authors:
  - family-names: Cambon
    given-names: Jesse
    email: jesse.cambon@gmail.com
    orcid: https://orcid.org/0000-0001-6854-1514
  - family-names: Hernangómez
    given-names: Diego
    email: diego.hernangomezherrero@gmail.com
    orcid: https://orcid.org/0000-0001-8457-4658
  - family-names: Belanger
    given-names: Christopher
    email: christopher.a.belanger@gmail.com
    orcid: https://orcid.org/0000-0003-2070-5721
  - family-names: Possenriede
    given-names: Daniel
    email: possenriede+r@gmail.com
    orcid: https://orcid.org/0000-0002-6738-9845
  doi: 10.21105/joss.03544
  url: https://doi.org/10.21105/joss.03544
  journal: Journal of Open Source Software
  publisher:
    name: The Open Journal
  year: '2021'
  volume: '6'
  issue: '65'
  notes: R package version 1.0.4
  start: '3544'
repository: https://CRAN.R-project.org/package=tidygeocoder
repository-code: https://github.com/jessecambon/tidygeocoder
url: https://jessecambon.github.io/tidygeocoder/
contact:
- family-names: Cambon
  given-names: Jesse
  email: jesse.cambon@gmail.com
  orcid: https://orcid.org/0000-0001-6854-1514
keywords:
- geocoding
- r
- rspatial
- rstats
- tidyverse
`cff_create()` an installed package
cffr::cff_create("jsonlite", dependencies = FALSE)
#> cff-version: 1.2.0
#> message: 'To cite package "jsonlite" in publications use:'
#> type: software
#> license: MIT
#> title: 'jsonlite: A Simple and Robust JSON Parser and Generator for R'
#> version: 1.8.0
#> abstract: A reasonably fast JSON parser and generator, optimized for statistical data
#>   and the web. Offers simple, flexible tools for working with JSON in R, and is particularly
#>   powerful for building pipelines and interacting with a web API. The implementation
#>   is based on the mapping described in the vignette (Ooms, 2014). In addition to converting
#>   JSON data from/to R objects, 'jsonlite' contains functions to stream, validate,
#>   and prettify JSON data. The unit tests included with the package verify that all
#>   edge cases are encoded and decoded consistently for use with dynamic data in systems
#>   and applications.
#> authors:
#> - family-names: Ooms
#>   given-names: Jeroen
#>   email: jeroen@berkeley.edu
#>   orcid: https://orcid.org/0000-0002-4035-0289
#> preferred-citation:
#>   type: article
#>   title: 'The jsonlite Package: A Practical and Consistent Mapping Between JSON Data
#>     and R Objects'
#>   authors:
#>   - family-names: Ooms
#>     given-names: Jeroen
#>     email: jeroen@berkeley.edu
#>     orcid: https://orcid.org/0000-0002-4035-0289
#>   journal: arXiv:1403.2805 [stat.CO]
#>   year: '2014'
#>   url: https://arxiv.org/abs/1403.2805
#> repository: https://CRAN.R-project.org/package=jsonlite
#> repository-code: https://github.com/jeroen/jsonlite
#> url: https://arxiv.org/abs/1403.2805
#> date-released: '2022-02-22'
#> contact:
#> - family-names: Ooms
#>   given-names: Jeroen
#>   email: jeroen@berkeley.edu
#>   orcid: https://orcid.org/0000-0002-4035-0289
#> keywords:
#> - json
#> - parser
#> - r
#> - rstats

Have you thought about omiting the preferred citation altogether if there is no CITATION file? I think a preferred citation that effectively just duplicates the info from the main entry doesn't add much value.

@dieghernan
Copy link
Member

Hi @dpprdan

Thanks for the feedback. Indeed, the duplication is intended for a single reason: preferred-citation includes a field type that is somehow assimilated to the Bibtex entries.

In short: Adding preferred-citation even with duplicated fields ensures that GitHub renders a valid BiBTeX entry for the repo, similarly to what happens when using R citation() on a package without a inst/CITATION file.

What happens is that, if no preferred-citation is present, GitHub renders the Bibtex entry of the repo with something like:

@software{Lisa_My_Research_Software_2017,
  author = {Lisa, Mona and Bot, Hew},
  doi = {10.5281/zenodo.1234},
  month = {12},
  title = {{My Research Software}},
  url = {https://github.com/github/linguist},
  version = {2.0.4},
  year = {2017}
}

@software is in fact not a recognised entry type on the stantard BiBTeX format (maybe it is in BiBLateX?). So in order to provide a valid BiBTeX entry created by GitHub it is needed to add a preferred-citation with the corresponding type, that leads to a duplication of fields on the CITATION.cff. From https://docs.github.com/es/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-citation-files#citing-something-other-than-software:

If you would prefer the GitHub citation information to link to another resource such as a research article, then you can use the preferred-citation override in CFF with the following types.

So this cff:


cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Lisa"
  given-names: "Mona"
  orcid: "https://orcid.org/0000-0000-0000-0000"
- family-names: "Bot"
  given-names: "Hew"
  orcid: "https://orcid.org/0000-0000-0000-0000"
title: "My Research Software"
version: 2.0.4
doi: 10.5281/zenodo.1234
date-released: 2017-12-18
url: "https://github.com/github/linguist"
preferred-citation:
  type: article
  authors:
  - family-names: "Lisa"
    given-names: "Mona"
    orcid: "https://orcid.org/0000-0000-0000-0000"
  - family-names: "Bot"
    given-names: "Hew"
    orcid: "https://orcid.org/0000-0000-0000-0000"
  doi: "10.0000/00000"
  journal: "Journal Title"
  month: 9
  start: 1 # First page number
  end: 10 # Last page number
  title: "My awesome research software"
  issue: 1
  volume: 1
  year: 2021

renders on GitHub as

@article{Lisa_My_awesome_research_2021,
  author = {Lisa, Mona and Bot, Hew},
  doi = {10.0000/00000},
  journal = {Journal Title},
  month = {9},
  number = {1},
  pages = {1--10},
  title = {{My awesome research software}},
  volume = {1},
  year = {2021}
}

Hope this provides an understanding of why preferred-citation is always created by cffr.

@dpprdan
Copy link
Member Author

dpprdan commented Aug 22, 2022

Thanks @dieghernan for the explanation!

First off, my main point (which, after reading my post again, may not have been entirely clear) was that there are some duplicate nodes under preferred-citation that are superfluous, IMHO. Put differently, you get different preferred_citation entries when you run cff_create() on a local DESCRIPTION file than when you run it on a locally installed package.

cffr::cff_create(system.file("DESCRIPTION", package="cli"), dependencies = FALSE)
#> cff-version: 1.2.0
#> message: 'To cite package "cli" in publications use:'
#> type: software
#> license: MIT
#> title: 'cli: Helpers for Developing Command Line Interfaces'
#> version: 3.3.0
#> abstract: 'A suite of tools to build attractive command line interfaces (''CLIs''),
#>   from semantic elements: headings, lists, alerts, paragraphs, etc. Supports custom
#>   themes via a ''CSS''-like language. It also contains a number of lower level ''CLI''
#>   elements: rules, boxes, trees, and ''Unicode'' symbols with ''ASCII'' alternatives.
#>   It support ANSI colors and text styles as well.'
#> authors:
#> - family-names: Csárdi
#>   given-names: Gábor
#>   email: csardi.gabor@gmail.com
#> preferred-citation:
#>   type: manual
#>   title: 'cli: Helpers for Developing Command Line Interfaces'
#>   authors:
#>   - family-names: Csárdi
#>     given-names: Gábor
#>     email: csardi.gabor@gmail.com
#>   version: 3.3.0
#>   abstract: 'A suite of tools to build attractive command line interfaces (''CLIs''),
#>     from semantic elements: headings, lists, alerts, paragraphs, etc. Supports custom
#>     themes via a ''CSS''-like language. It also contains a number of lower level ''CLI''
#>     elements: rules, boxes, trees, and ''Unicode'' symbols with ''ASCII'' alternatives.
#>     It support ANSI colors and text styles as well.'
#>   repository: https://CRAN.R-project.org/package=cli
#>   repository-code: https://github.com/r-lib/cli
#>   url: https://cli.r-lib.org
#>   identifiers:
#>   - type: url
#>     value: https://github.com/r-lib/cli#readme
#>   date-released: '2022-04-25'
#>   contact:
#>   - family-names: Csárdi
#>     given-names: Gábor
#>     email: csardi.gabor@gmail.com
#>   keywords:
#>   - cli
#>   - r
#>   license: MIT
#>   year: '2022'
#> repository: https://CRAN.R-project.org/package=cli
#> repository-code: https://github.com/r-lib/cli
#> url: https://cli.r-lib.org
#> date-released: '2022-04-25'
#> contact:
#> - family-names: Csárdi
#>   given-names: Gábor
#>   email: csardi.gabor@gmail.com
#> keywords:
#> - cli
#> - r
#> identifiers:
#> - type: url
#>   value: https://github.com/r-lib/cli#readme
cffr::cff_create("cli", dependencies = FALSE)
#> cff-version: 1.2.0
#> message: 'To cite package "cli" in publications use:'
#> type: software
#> license: MIT
#> title: 'cli: Helpers for Developing Command Line Interfaces'
#> version: 3.3.0
#> abstract: 'A suite of tools to build attractive command line interfaces (''CLIs''),
#>   from semantic elements: headings, lists, alerts, paragraphs, etc. Supports custom
#>   themes via a ''CSS''-like language. It also contains a number of lower level ''CLI''
#>   elements: rules, boxes, trees, and ''Unicode'' symbols with ''ASCII'' alternatives.
#>   It support ANSI colors and text styles as well.'
#> authors:
#> - family-names: Csárdi
#>   given-names: Gábor
#>   email: csardi.gabor@gmail.com
#> preferred-citation:
#>   type: manual
#>   title: 'cli: Helpers for Developing Command Line Interfaces'
#>   authors:
#>   - family-names: Csárdi
#>     given-names: Gábor
#>     email: csardi.gabor@gmail.com
#>   year: '2022'
#>   notes: R package version 3.3.0
#>   url: https://CRAN.R-project.org/package=cli
#> repository: https://CRAN.R-project.org/package=cli
#> repository-code: https://github.com/r-lib/cli
#> url: https://cli.r-lib.org
#> date-released: '2022-04-25'
#> contact:
#> - family-names: Csárdi
#>   given-names: Gábor
#>   email: csardi.gabor@gmail.com
#> keywords:
#> - cli
#> - r
#> identifiers:
#> - type: url
#>   value: https://github.com/r-lib/cli#readme

I think both should yield the same preferred-citation node, namely the shorter second one.

The above only applies when there is no `CITATION` file found.

BTW, the jsonlite example was a particularly bad one, because it has a CITATION file. It just isn't found when running cffr::cff_create(system.file("DESCRIPTION", package="jsonlite")), because the inst folder does not exist for installed packages. In other words: the CITATION file is in system.file("CITATION", package="jsonlite"), not system.file("inst/CITATION", package="jsonlite") and therefore, I presume, not found by cff_create().

End-of-main-issue

Only then I thought "why is there a preferred-citation when there is no CITATION file?`.

I gather now that GitHub renders a @software entry type when there is no preferred-citation and that this is not a valid Bibtex entry type. And the rationale for adding a @manual preferred-citation is to create a valid Bibtex entry type.

So I wondered "why would GitHub create an invalid Bibtex entry type in the first place? Shouldn't this be fixed on their end, then?" Turns out that the ruby-cff gem that GitHub uses to parse the CITATION.cff and create the other citation formats, switched from @misc to @software. They also worried about Bibtex compatibility, but the gist is this: 1. biblatex supports @software (see also section "2.1.1 Regular Types" in the biblatex manual) and 2. Bibtex falls back on the @misc entry type if it encounters a @software entry. So it seems to me that overriding this with a @manual entry is not necessary.

After all, it is primarily the software that is usually used and should be cited, not the documentation. Uh, strike that, that's just silly. Given what I learned from the ruby-cff issue I'd omit the preferred-citation when there is no CITATION file. But I might as well be missing something else and I certainly don't feel strongly about it. I 'd prefer if the unnecessary keys (IMO) in preferred-citation (keywords, contact, abstract and so forth) were omitted, though, so that both ways of creating the CFF file would render the same output.

@dieghernan
Copy link
Member

So I see here two issues:

1. Should cffr create a preferred-citation if the user didn't provide a CITATION (R) file? And how?

This is related with your comment:

I'd omit the preferred-citation when there is no CITATION file. But I might as well be missing something else and I certainly don't feel strongly about it

I would prefer to create it even in the case that no CITATION is provided. My main points are:

  1. A citation is created with R even if no CITATION is present in the package. So I think this would be consistent with the experience of an R developer:
file.exists(system.file("CITATION", package = "cli"))
#> [1] FALSE
toBibtex(citation("cli"))
#> @Manual{,
#>   title = {cli: Helpers for Developing Command Line Interfaces},
#>   author = {Gábor Csárdi},
#>   year = {2022},
#>   note = {R package version 3.3.0},
#>   url = {https://CRAN.R-project.org/package=cli},
#> }
  1. Regardless of how Github would render this, CITATION.cff is meant to be both human and machine-readable. I think it is useful to have a explicit preferred-citation on the file so a human can spot easily the basic information about how to cite a package.

And how?

I see your point here and I agree with this other comment:

I 'd prefer if the unnecessary keys (IMO) in preferred-citation (keywords, contact, abstract and so forth) were omitted, though, so that both ways of creating the CFF file would render the same output.

So this call:

> cffr::cff_create("cli")$`preferred-citation`
type: manual
title: 'cli: Helpers for Developing Command Line Interfaces'
authors:
- family-names: Csárdi
  given-names: Gábor
  email: csardi.gabor@gmail.com
year: '2022'
notes: R package version 3.3.0
url: https://CRAN.R-project.org/package=cli

And this one should lead to the same results (i.e. the shorter version above):

> cffr::cff_create("cli")$`preferred-citation`
type: manual
title: 'cli: Helpers for Developing Command Line Interfaces'
authors:
- family-names: Csárdi
  given-names: Gábor
  email: csardi.gabor@gmail.com
year: '2022'
notes: R package version 3.3.0
url: https://CRAN.R-project.org/package=cli
> cffr::cff_create(system.file("DESCRIPTION", package = "cli"))$`preferred-citation`
type: manual
title: 'cli: Helpers for Developing Command Line Interfaces'
authors:
- family-names: Csárdi
  given-names: Gábor
  email: csardi.gabor@gmail.com
version: 3.3.0
abstract: 'A suite of tools to build attractive command line interfaces (''CLIs''),
  from semantic elements: headings, lists, alerts, paragraphs, etc. Supports custom
  themes via a ''CSS''-like language. It also contains a number of lower level ''CLI''
  elements: rules, boxes, trees, and ''Unicode'' symbols with ''ASCII'' alternatives.
  It support ANSI colors and text styles as well.'
repository: https://CRAN.R-project.org/package=cli
repository-code: https://github.com/r-lib/cli
url: https://cli.r-lib.org
identifiers:
- type: url
  value: https://github.com/r-lib/cli#readme
date-released: '2022-04-25'
contact:
- family-names: Csárdi
  given-names: Gábor
  email: csardi.gabor@gmail.com
keywords:
- cli
- r
license: MIT
year: '2022'

2. CITATION files detection

You were also right on this:

the jsonlite example was a particularly bad one, because it has a CITATION file. It just isn't found when running cffr::cff_create(system.file("DESCRIPTION", package="jsonlite")), because the inst folder does not exist for installed packages. In other words: the CITATION file is in system.file("CITATION", package="jsonlite"), not system.file("inst/CITATION", package="jsonlite") and therefore, I presume, not found by cff_create()

This needs to be changed, not a problem

@dpprdan
Copy link
Member Author

dpprdan commented Aug 25, 2022

Should cffr create a preferred-citation if the user didn't provide a CITATION (R) file?

I am not on board yet (but I don't have to 😄, just giving another perspective)

These are my reasons:

  1. It violates the CFF schema.
    According to the the Guide to Citation File Format schema version 1.2.0 a preferred-citation is supposed to be:

    A reference to another work that should be cited instead of the software or dataset itself.
    (Emphasis mine)

    So this is the equivalent of the CITATION file in R, really.
    In any case it should be another work, not the package itself.
    Does that matter? Well, as with any schema violation it can cause unexpected behaviour downstream, e.g. in other parsers of the created CFF files.

  2. A preferred-citation as a summary may not be unambiguously positive.

    I think it is useful to have a explicit preferred-citation on the file so a human can spot easily the basic information about how to cite a package.

    I see some merit in that, but this is essentially just a summary of the other info in the CFF, which, arguably, is just as easily readable. A valid counter-argument would be that this is just clutter and a more succinct CFF file would be better. (Not necessarily my stance, but just as valid, IMO). E.g. using preferred-citation like this may confuse users, especially those who expect the preferred-citation to be another work. ("Wait, isn't this the same?") In addition it may cause bugs like the main one here.

Finally, regarding

A citation is created with R even if no CITATION is present in the package.

The point of {cffr} is to create a CITATION.cff file. So there really is no equivalent to "no CITATION present", because we now have a CITATION.cff, which has all citation info with or without preferred-citation (if there is no other work we'd prefer to get cited, of course).

@dieghernan
Copy link
Member

dieghernan commented Aug 26, 2022

Good points:

From the CFF perspective I agree on all that. And your last point seems very valid also to me:

The point of {cffr} is to create a CITATION.cff file.

So I would switch cffr to create preferred-citation only if the package has a CITATION file.

There is another point then and I would like to gather your opinion here: some installed packages (namely cli) may not have a CITATION file but citation("cli") still produces an auto-citation that is transferred to preferred-citation. So the question is, should cff_create("cli") produce a preferred-citation)? I mentioned this because cff_create(system.file("DESCRIPTION", package = "cli")) won’t produce it.

Thanks for your feedback

@dpprdan
Copy link
Member Author

dpprdan commented Aug 26, 2022

some installed packages (namely cli) may not have a CITATION file but citation("cli") still produces an auto-citation that is transferred to preferred-citation.

Isn't this the default behaviour, i.e. that auto-citations are always produced if no CITATION file is present?

From the citation() help file:

If the name of a non-base package is given, the function either returns the information contained in the ‘CITATION’ file of the package [...] or auto-generates citation information from the ‘DESCRIPTION’ file.

Do you have an example where there is no CITATION file and no auto-citation info is produced by citation()?

So the question is, should cff_create("cli") produce a preferred-citation)?

I'd say it shouldn't. Because this is the one derived from the info in DESCRIPTION, which cff_create() already uses for the main CFF entries (main = not preferred-citation).

I mentioned this because cff_create(system.file("DESCRIPTION", package = "cli")) won’t produce it.

I am seeing (with cff 0.2.2)

cli_cff <- cffr::cff_create(system.file("DESCRIPTION", package="cli"))

cli_cff$`preferred-citation`
#> type: manual
#> title: 'cli: Helpers for Developing Command Line Interfaces'
#> authors:
#> - family-names: Csárdi
#>   given-names: Gábor
#>   email: csardi.gabor@gmail.com
#> version: 3.3.0
#> abstract: 'A suite of tools to build attractive command line interfaces (''CLIs''),
#>   from semantic elements: headings, lists, alerts, paragraphs, etc. Supports custom
#>   themes via a ''CSS''-like language. It also contains a number of lower level ''CLI''
#>   elements: rules, boxes, trees, and ''Unicode'' symbols with ''ASCII'' alternatives.
#>   It support ANSI colors and text styles as well.'
#> repository: https://CRAN.R-project.org/package=cli
#> repository-code: https://github.com/r-lib/cli
#> url: https://cli.r-lib.org
#> identifiers:
#> - type: url
#>   value: https://github.com/r-lib/cli#readme
#> date-released: '2022-04-25'
#> contact:
#> - family-names: Csárdi
#>   given-names: Gábor
#>   email: csardi.gabor@gmail.com
#> keywords:
#> - cli
#> - r
#> license: MIT
#> year: '2022'

But like I said, I think there shouldn't be a preferred-citation in this case.

@dieghernan
Copy link
Member

I think #38 is ready,

Basically, preferred-citation is created only if a CITATION (for installed packages) or inst/CITATION (for in-development packages) is detected:

# devtools::install()
library(cffr)

packageVersion("cffr")
#> [1] '0.2.3.9000'

# Package with CITATION file

e1 <- cff_create("jsonlite")
cff_validate(e1)
#> 
#> cff_validate results-----
#> Congratulations! This cff object is valid
e1$`preferred-citation`
#> type: article
#> title: 'The jsonlite Package: A Practical and Consistent Mapping Between JSON Data
#>   and R Objects'
#> authors:
#> - family-names: Ooms
#>   given-names: Jeroen
#>   email: jeroen@berkeley.edu
#>   orcid: https://orcid.org/0000-0002-4035-0289
#> journal: arXiv:1403.2805 [stat.CO]
#> year: '2014'
#> url: https://arxiv.org/abs/1403.2805


# Now using system.file
e2 <- cff_create(system.file("DESCRIPTION", package = "jsonlite"))
identical(e1, e2)
#> [1] TRUE

# Package without CITATION file

e3 <- cff_create("cli")
cff_validate(e3)
#> 
#> cff_validate results-----
#> Congratulations! This cff object is valid
e3$`preferred-citation`
#> NULL


# Now using system.file
e4 <- cff_create(system.file("DESCRIPTION", package = "cli"))
identical(e3, e4)
#> [1] TRUE


# Dev package with CITATION file
e5 <- cff_create(dependencies = FALSE)
e5$`preferred-citation`
#> type: article
#> title: 'cffr: Generate Citation File Format Metadata for R Packages'
#> authors:
#> - family-names: Hernangómez
#>   given-names: Diego
#>   email: diego.hernangomezherrero@gmail.com
#>   orcid: https://orcid.org/0000-0001-8457-4658
#> doi: 10.21105/joss.03900
#> url: https://doi.org/10.21105/joss.03900
#> year: '2021'
#> publisher:
#>   name: The Open Journal
#> volume: '6'
#> issue: '67'
#> journal: Journal of Open Source Software
#> start: '3900'

Created on 2022-08-29 with reprex v2.0.2

@dpprdan
Copy link
Member Author

dpprdan commented Aug 29, 2022

Awesome, thanks a lot!

dieghernan added a commit that referenced this issue Aug 29, 2022
* Improve preferred-citation

* Create preferred-citation only if a CITATION file is detected

#37

* Update docs with pkgdev

* Fix Roxygen warning

Co-authored-by: dieghernan <dieghernan@users.noreply.github.com>
@dpprdan dpprdan closed this as completed Aug 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants