Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for parsing .bib from scopus #32

Closed
benjaminschwetz opened this issue Aug 6, 2019 · 2 comments
Closed

Support for parsing .bib from scopus #32

benjaminschwetz opened this issue Aug 6, 2019 · 2 comments

Comments

@benjaminschwetz
Copy link

I tried to parse .bib files exported from scopus today but ended up with a total mess of column names (see below).

bib_string <- "@ARTICLE{Brulc20091948,
author={Brulc, J.M. and Antonopoulos, D.A. and Berg Miller, M.E. and Wilson, M.K. and Yannarell, A.C. and Dinsdale, E.A. and Edwards, R.E. and Frank, E.D. and Emerson, J.B. and Wacklin, P. and Coutinho, P.M. and Henrissat, B. and Nelson, K.E. and White, B.A.},
title={Gene-centric metagenomics of the fiber-adherent bovine rumen microbiome reveals forage specific glycoside hydrolases},
journal={Proceedings of the National Academy of Sciences of the United States of America},
year={2009},
doi={10.1073/pnas.0806191105},
url={https://www.scopus.com/inward/record.uri?eid=2-s2.0-60549114321&doi=10.1073%2fpnas.0806191105&partnerID=40&md5=8d70a27545328d4cbb538bdb4757335b},
affiliation={Department of Animal Sciences, University of Illinois, Urbana, IL 61801, United States; Institute for Genomics and Systems Biology, Argonne National Laboratory, Argonne, IL 60439, United States; Department of Biology, San Diego State University, San Diego, CA 92813, United States; School of Biological Sciences, Flinders University, Adelaide, SA 5001, Australia; Center for Microbial Sciences, San Diego State University, San Diego, CA 92813, United States; Department of Computer Sciences, San Diego State University, San Diego, CA 92813, United States; Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, United States; J. Craig Venter Institute, 9712 Medical Center Drive, Rockville, MD 20850, United States; Architecture et Fonction des Macromolecules Biologiques, Unité Mixte de Recherche 6098, Universites Aix-Marseille I and II, Case 932, 163 Avenue de Luminy, 13288 Marseille, France; Institute for Genomic Biology, University of Illinois, Urbana, IL 61801, United States},
abstract={The complex microbiome of the rumen functions as an effective system for the conversion of plant cell wall biomass to microbial protein, short chain fatty acids, and gases. As such, it provides a unique genetic resource for plant cell wall degrading microbial enzymes that could be used in the production of biofuels. The rumen and gastrointestinal tract harbor a dense and complex microbiome. To gain a greater understanding of the ecology and metabolic potential of this microbiome, we used comparative metagenomics (phylotype analysis and SEED subsystems-based annotations) to examine randomly sampled pyrosequence data from 3 fiber-adherent microbiomes and 1 pooled liquid sample (a mixture of the liquid microbiome fractions from the same bovine rumens). Even though the 3 animals were fed the same diet, the community structure, predicted phylotype, and metabolic potentials in the rumen were markedly different with respect to nutrient utilization. A comparison of the glycoside hydrolase and cellulosome functional genes revealed that in the rumen microbiome, initial colonization of fiber appears to be by organisms possessing enzymes that attack the easily available side chains of complex plant polysaccharides and not the more recalcitrant main chains, especially cellulose. Furthermore, when compared with the termite hindgut microbiome, there are fundamental differences in the glycoside hydrolase content that appear to be diet driven for either the bovine rumen (forages and legumes) or the termite hindgut (wood). © 2009 by The National Academy of Sciences of the USA.},
author_keywords={CAZymes;  Cellulases;  Plant cell wall;  Pyrosequencing},
Isoptera},
document_type={Article},
source={Scopus},
}"
fil <- tempfile("data")
write(bib_string, fil)
bib2df::bib2df(fil)
#> Column `YEAR` contains character strings.
#>               No coercion to numeric applied.
#> # A tibble: 1 x 37
#>   CATEGORY BIBTEXKEY ADDRESS ANNOTE AUTHOR BOOKTITLE CHAPTER CROSSREF
#>   <chr>    <chr>     <chr>   <chr>  <list> <chr>     <chr>   <chr>   
#> 1 ARTICLE  Brulc200~ <NA>    <NA>   <chr ~ <NA>      <NA>    <NA>    
#> # ... with 29 more variables: EDITION <chr>, EDITOR <list>,
#> #   HOWPUBLISHED <chr>, INSTITUTION <chr>, JOURNAL <chr>, KEY <chr>,
#> #   MONTH <chr>, NOTE <chr>, NUMBER <chr>, ORGANIZATION <chr>,
#> #   PAGES <chr>, PUBLISHER <chr>, SCHOOL <chr>, SERIES <chr>, TITLE <chr>,
#> #   TYPE <chr>, VOLUME <chr>, YEAR <chr>, AUTHOR..BRULC. <chr>,
#> #   TITLE..GENE.CENTRIC <chr>, JOURNAL..PROCEEDINGS <chr>,
#> #   YEAR..2009.. <chr>, DOI..10.1073.PNAS.0806191105.. <chr>,
#> #   URL..HTTPS...WWW.SCOPUS.COM.INWARD.RECORD.URI.EID.2.S2.0.60549114321.DOI.10.1073.2FPNAS.0806191105.PARTNERID.40.MD5.8D70A27545328D4CBB538BDB4757335B.. <chr>,
#> #   AFFILIATION..DEPARTMENT <chr>, ABSTRACT..THE <chr>,
#> #   AUTHOR_KEYWORDS..CAZYMES. <chr>, DOCUMENT_TYPE..ARTICLE.. <chr>,
#> #   SOURCE..SCOPUS.. <chr>

Created on 2019-08-06 by the reprex package (v0.3.0)

@ottlngr
Copy link
Contributor

ottlngr commented Aug 15, 2019

On hold. Probably fixed by #34

@ottlngr
Copy link
Contributor

ottlngr commented Jul 2, 2020

Hi @benjaminschwetz ,

sorry for the delay. Yesterday I merged your changes into master. The test cases you added run successfully, so I'm going to close this issue.

@ottlngr ottlngr closed this as completed Jul 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants