Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducible bibliographic references #128

Open
antaldaniel opened this issue Nov 10, 2018 · 18 comments
Open

Reproducible bibliographic references #128

antaldaniel opened this issue Nov 10, 2018 · 18 comments
Assignees

Comments

@antaldaniel
Copy link
Contributor

antaldaniel commented Nov 10, 2018

I am thinking on a way how to create .bib files for the data that is downloaded by the eurostat package. I have a code that downloads my most important data, and updates my bib files that cite the data, i.e. data accessed, but it not a fully general solution.

I use the following template and add this to a collected bib. file

`@misc{eurostat_sbs_na_dt_r2_year,
title = {Annual detailed enterprise statistics for trade {(NACE Rev. 2 G)} [sbs_na_dt_r2]},

url = {https://ec.europa.eu/eurostat/web/products-datasets/-/sbs_na_dt_r2},

language = {en},

    year = {year},

urldate = {not_dated},

publisher = {{Eurostat}},

author = {{Eurostat}},

keywords = {structural business indicators, dataset, statistics, Eurostat}

}`

I change the statistics product code sbs_na_dt_r2 in the unique identifier, use the current date for urldate, replace the year with the year component of the download date.

I think that the title could be created by get_eurostat_dic, but I have no idea how to create an url to the data. I wonder if there is any metadata directory that may be used to create a permanent reference either to a reproducible download address or metadata description?

I think that in the spirit of truly reproducible research, it would be reasonable not only to update Eurostat statistics in an RMarkdown document, but also update the details of the .bib file. I had a misfortune that Eurostat removed completely an earlier data product, and I think that a full documentation would be good.

Of course, I just used a simple bib template from Zotero, but maybe using some Datacite metadata best practices could help. I'd gladly create a new function if somebody can put me into direction with the url issue.

@antagomir
Copy link
Member

This is a really neat idea. @jhuovari and @pbiecek are more familiar with this part of the pkg, let's first see if they have a comment.

@jhuovari
Copy link

I think you get the title best with:

label_eurostat_tables("sbs_na_dt_r2")

url to data you can get with identifier.
Bulk data is in:
https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?file=data/sbs_na_dt_r2.tsv.gz

But I think a more user friendly link could be:
https://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=sbs_na_dt_r2&lang=en

Hope, this helps.

@antaldaniel
Copy link
Contributor Author

Yes, I came to the same conclusion. I am just wondering if there are exceptions to the

https://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=sbs_na_dt_r2&lang=en

link. So far I have not seen data that would not open this way, and in this case the task is very easy. I'll create a pull request later the week.

@pbiecek
Copy link
Member

pbiecek commented Nov 12, 2018

Cool idea.
What about such function:

get_bibentry <- function(code = "t2020_rk310", toBibtex = FALSE) {
  toc <- get_eurostat_toc()
  toc <- toc[toc$code == code, ]

  if (nrow(toc) == 0) {
    warning(paste0("Code ",code, "not found"))
    return()
  }  
  
  entry <- bibentry(
    bibtype = "misc",
    title = paste0(toc$title[1]," [",code,"]"),
    url = paste0("https://ec.europa.eu/eurostat/web/products-datasets/-/",code),
    language = "en",
    year = paste0(toc$`last update of data`[1]),
    publisher = "Eurostat",
    author = "Eurostat"   
  )
  if (toBibtex) {
    toBibtex(entry)
  } else {
    entry
  }
}

Then you can do such things:

> get_bibentry("sbs_na_dt_r2")
Eurostat (12.11.2018). “Annual detailed enterprise statistics for trade
(NACE Rev. 2 G) [sbs_na_dt_r2].” <URL:
https://ec.europa.eu/eurostat/web/products-datasets/-/sbs_na_dt_r2>.

> get_bibentry("sbs_na_dt_r2", toBibtex = TRUE)
@Misc{,
  title = {Annual detailed enterprise statistics for trade (NACE Rev. 2 G) [sbs_na_dt_r2]},
  url = {https://ec.europa.eu/eurostat/web/products-datasets/-/sbs_na_dt_r2},
  language = {en},
  year = {12.11.2018},
  publisher = {Eurostat},
  author = {{Eurostat}},
}

@antagomir
Copy link
Member

Beautiful. How about replacing "toBibtex" argument with "format" (or similar)? This would then become: get_bibentry("sbs_na_dt_r2", format = "bibtex") or get_bibentry("sbs_na_dt_r2", format = "plaintext"). Later it would be possible to add other formats (RIS etc) if needs arise.

@antaldaniel
Copy link
Contributor Author

antaldaniel commented Nov 13, 2018

Very nice ,much simpler, than I thought, I was trying to figure out how the url changes in the interactive data viewer, but your solution is far better and more elegant.

I'd probably add optional keywords, and the url date, where keywords can be a parameter of the function as a vector, or have some default like c("Eurostat", "statistics", "dataset")

if ( length(keywords)>1) {
  keywords <- paste0('{', paste(my_keywords, collapse=', '), '}')

} 

urldate <- paste0('{', as.character(Sys.Date()), '}')

paste0("@misc_eurostat_", code, "_", substr(as.character(Sys.Date()), 1, 4))


entry <- bibentry(
  bibtype = "misc",
  title = paste0(toc$title[1]," [",code,"]"),
  url = paste0("https://ec.europa.eu/eurostat/web/products-datasets/-/",code),
  language = "en",
  year = paste0(toc$`last update of data`[1]),
  publisher = "Eurostat",
  author = "Eurostat" ,  
  urldate = urldate,
  keywords = keywords
)

I know that the urldate is superfluous logically, but may be a requirement in many formatting guides.
Furthermore, I wonder how it is possible to add unique identifiers to the bib entries, so that they can immediately be used in knitr, which means adding

paste0("@misc_eurostat_", code, "_", substr(as.character(Sys.Date()), 1, 4))

to the bibentry.

@antagomir antagomir reopened this Nov 14, 2018
@antaldaniel
Copy link
Contributor Author

antaldaniel commented Jan 17, 2019

My take on the issue. This would depend on the rOpenSci package RefManageR, but creates a Biblatex output that can be attached to a journal article or bookdown book immediately, or imported to Zotero.

My only concern is the last comma after the last metadata field, I don't know if it will cause any issue. Any further comments?

Compared to @pbiecek 's function this adds three extras,

  • use of keywords,
  • creating a unique ID key for Biblatex,
  • three format choices (bibentry, bibtex, biblatex)
get_bibentry <- function(code = c("tran_hv_frtra", "t2020_rk310","tec00001"), 
                                        keywords = list ( c("railways", "freight", "transport"), 
                                                                   c("railways", "passengers", "modal split") ),
                                       format = "Biblatex") {

    toc <- get_eurostat_toc()
    toc <- toc[toc$code %in% code, ]
    toc <- toc[! duplicated(toc), ]
    
  urldate <- as.character(Sys.Date())
    
    if (nrow(toc) == 0) {
      warning(paste0("Code ",code, "not found"))
      return()
    }  
    
    eurostat_id <- paste0( toc$code, "_", 
                           gsub("\\.", "-",  toc$`last update of data`)) 

    for ( i in 1:nrow(toc) ) {
      
      if ( !is.null(keywords) ) {                             #if user entered keywords
        if ( length(keywords)<i ) {                           #last keyword not entered
          keyword_entry <- NULL } else if ( nchar(keywords)[i] > 0 ) {         #not empty keyword entry
            keyword_entry <- paste( keywords[[i]], collapse = ', ' )  
          } 
      } else {
        keyword_entry <- NULL
      }
      
      entry <- RefManageR::BibEntry(
        bibtype = "misc",
        key = eurostat_id[i],
        title = paste0(toc$title[i]," [",code[i],"]"),
        url = paste0("https://ec.europa.eu/eurostat/web/products-datasets/-/",code[i]),
        language = "en",
        year = paste0(toc$`last update of data`[1]),
        publisher = "Eurostat",
        author = "Eurostat", 
        keywords = keyword_entry,
        urldate = urldate
      )  

    if ( i > 1 ) {
        entries <- c(entries, entry) 
      } else {
        entries <- entry
      }
    }
    
    if (format == "Bibtex") {
      
      entries <- toBibtex(entries)
    } else if ( format == "Biblatex") {
      entries <- toBiblatex ( entries )
    }

  entries 
}

@antaldaniel
Copy link
Contributor Author

I created a pull request, with the new function, documentation and unit tests. However, if you can, take a look at my last comment, the superflous comma.

@antagomir
Copy link
Member

Thanks, excellent. Let us try to get this merged asap.

antagomir added a commit that referenced this issue Jan 17, 2019
@pompm
Copy link

pompm commented May 10, 2019

It seems to me that:
in package eurostat (version 3.3.5) in function get_bibentry is error.
On line 16 of function code is
code = c("tran_hv_frtra", "t2020_rk310", "tec00001")
that rewrite user request of code

@antaldaniel
Copy link
Contributor Author

I just tried with default is 'Biblatex', alternatives are 'bibentry' or 'Bibtex' and worked for me on a Window computer well. Can you somehow reproduce the error?

@pompm
Copy link

pompm commented May 10, 2019

Hi,
there is an example of my problem.
(If I copy definition of function get_bibentry and remove line16, all is ok)
Marek

version _
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 3
minor 3.3
year 2017
month 03
day 06
svn rev 72310
language R
version.string R version 3.3.3 (2017-03-06)
nickname Another Canoe
packageVersion("eurostat")
[1] ‘3.3.5’
get_bibentry(code="sbs_na_dt_r2")
@misc{tran_hv_frtra_30-04-2019,
title = {Volume of freight transport relative to GDP [tran_hv_frtra]},
url = {https://ec.europa.eu/eurostat/web/products-datasets/-/tran_hv_frtra},
year = {30.04.2019},
publisher = {Eurostat},
author = {{Eurostat}},
month = {kvě},
note = {Last visited on 05/10/2019},
}
@misc{tec00001_08-05-2019,
title = {Gross domestic product at market prices [t2020_rk310]},
url = {https://ec.europa.eu/eurostat/web/products-datasets/-/t2020_rk310},
year = {30.04.2019},
publisher = {Eurostat},
author = {{Eurostat}},
month = {kvě},
note = {Last visited on 05/10/2019},
}

@misc{t2020_rk310_21-03-2019,
title = {Modal split of passenger transport [tec00001]},
url = {https://ec.europa.eu/eurostat/web/products-datasets/-/tec00001},
year = {30.04.2019},
publisher = {Eurostat},
author = {{Eurostat}},
month = {kvě},
note = {Last visited on 05/10/2019},
}

@antaldaniel
Copy link
Contributor Author

Indeed, there is a line left that is hardcoding the data. Sorry. I will correct a.s.a.p and create a pull request.

@antaldaniel
Copy link
Contributor Author

@pompm thanks for the report! Bibtex and Biblatex entries are anyway can be tricky, let me know if you have other issues using them.

@antagomir
Copy link
Member

Can we close this one?

@antaldaniel
Copy link
Contributor Author

Yes, we can close this.

@antagomir
Copy link
Member

I just got info from CRAN that RefManageR will be deprecated and removed from CRAN on 2020-10-21 due to lack of maintenance. If this will happen, this part of eurostat R pkg will go defunct.

We can either remove this functionality, or implement the necessary parts directly in our pkg. The RefManageR pkg is with GPL2/3 license, therefore we could not borrow the code from there directly without changing the eurostat R pkg license.

@antagomir antagomir reopened this Oct 8, 2020
@antagomir
Copy link
Member

antagomir commented Oct 17, 2020

@antaldaniel if you have an opinion about this it would be good to hear - the DL is Wednesday (Oct 21).

However I just noticed that RefManageR allows also BSD3 license (we have BSD2). I think BSD2 allows us to switch to BSD3 (or even GPL2/3). I think will just switch to BSD3 and copy the missing functions in our (eurostat) package before RefManageR is deprecated, and then inform all authors about the change. If anyone objects, we can switch back to BSD2 license and remove bib functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants