Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import bibtex from scopus generates thousands of variables #52

Open
ccamara opened this issue Nov 28, 2021 · 0 comments
Open

Import bibtex from scopus generates thousands of variables #52

ccamara opened this issue Nov 28, 2021 · 0 comments

Comments

@ccamara
Copy link

ccamara commented Nov 28, 2021

Whenever I want to import an scopus export, the resulting dataframe is completely messed up and has thousands of columns. Apparently, this should be fixed after #33 or #34 , but I'm afraid it is not.

Steps:

  1. Make a query in scopus
  2. Export bibtext file (see here: 20210604_scopus_urban_commons.zip)
  3. install development version from bib2df (devtools::install_github("ropensci/bib2df") - 28th November 2021)
  4. run testbib <- bib2df::bib2df("<attached file>")

Result:

 testbib
# A tibble: 307 × 55
   CATEGORY   BIBTEXKEY ADDRESS ANNOTE AUTHOR BOOKTITLE CHAPTER CROSSREF EDITION EDITOR HOWPUBLISHED INSTITUTION JOURNAL KEY   MONTH
   <chr>      <chr>     <chr>   <chr>  <list> <chr>     <chr>   <chr>    <chr>   <list> <chr>        <chr>       <chr>   <chr> <chr>
 1 ARTICLE    Köpper20… NA      NA     <chr … NA        NA      NA       NA      <chr … NA           NA          Resear… NA    NA   
 2 CONFERENCE Manfredi… NA      NA     <chr … NA        NA      NA       NA      <chr … NA           NA          IOP Co… NA    NA   
 3 ARTICLE    Avdikos2… NA      NA     <chr … NA        NA      NA       NA      <chr … NA           NA          Geofor… NA    NA   
 4 ARTICLE    Petrescu… NA      NA     <chr … NA        NA      NA       NA      <chr … NA           NA          Enviro… NA    NA   
 5 ARTICLE    Parikh20… NA      NA     <chr … NA        NA      NA       NA      <chr … NA           NA          Enviro… NA    NA   
 6 ARTICLE    Dekeyser… NA      NA     <chr … NA        NA      NA       NA      <chr … NA           NA          Enviro… NA    NA   
 7 BOOK       Stuber20… NA      NA     <chr … NA        NA      NA       NA      <chr … NA           NA          Balanc… NA    NA   
 8 ARTICLE    Wang2021… NA      NA     <chr … NA        NA      NA       NA      <chr … NA           NA          Americ… NA    NA   
 9 ARTICLE    Marino20… NA      NA     <chr … NA        NA      NA       NA      <chr … NA           NA          Territ… NA    NA   
10 ARTICLE    Sardeshp… NA      NA     <chr … NA        NA      NA       NA      <chr … NA           NA          Cities  NA    NA   
# … with 297 more rows, and 40 more variables: NOTE <chr>, NUMBER <chr>, ORGANIZATION <chr>, PAGES <chr>, PUBLISHER <chr>,
#   SCHOOL <chr>, SERIES <chr>, TITLE <chr>, TYPE <chr>, VOLUME <chr>, YEAR <dbl>, DOI <chr>, URL <chr>, AFFILIATION <chr>,
#   ABSTRACT <chr>, AUTHOR_KEYWORDS <chr>, REFERENCES <chr>, ISSN <chr>, LANGUAGE <chr>, ABBREV_SOURCE_TITLE <chr>,
#   DOCUMENT_TYPE <chr>, SOURCE <chr>, ART_NUMBER <chr>, KEYWORDS <chr>, FUNDING_DETAILS <chr>, FUNDING_TEXT <chr>,
#   CORRESPONDENCE_ADDRESS1 <chr>, SPONSORS <chr>, FUNDING_TEXT.1 <chr>, FUNDING_DETAILS.1 <chr>, FUNDING_DETAILS.2 <chr>,
#   ISBN <chr>, FUNDING_DETAILS.3 <chr>, FUNDING_DETAILS.4 <chr>, FUNDING_TEXT.2 <chr>, CODEN <chr>, FUNDING_DETAILS.5 <chr>,
#   PUBMED_ID <chr>, PAGE_COUNT <chr>, CHEMICALS_CAS <chr>

sessioninfo:

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: KDE neon User - Plasma 25th Anniversary Edition

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=ca_ES.UTF-8       LC_NUMERIC=C               LC_TIME=es_ES.UTF-8        LC_COLLATE=ca_ES.UTF-8    
 [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=ca_ES.UTF-8    LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7         rstudioapi_0.13    magrittr_2.0.1     tidyselect_1.1.1   R6_2.5.1           rlang_0.4.12      
 [7] fansi_0.5.0        stringr_1.4.0      httr_1.4.2         dplyr_1.0.7        tools_4.1.2        humaniformat_0.6.0
[13] utf8_1.2.2         cli_3.1.0          DBI_1.1.1          ellipsis_0.3.2     assertthat_0.2.1   tibble_3.1.5      
[19] lifecycle_1.0.1    crayon_1.4.2       purrr_0.3.4        vctrs_0.3.8        glue_1.4.2         stringi_1.7.5     
[25] compiler_4.1.2     pillar_1.6.4       generics_0.1.1     renv_0.13.2        bib2df_1.1.2       pkgconfig_2.0.3   
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant