-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fatal flex scanner internal error--end of buffer missed #16
Comments
In some of the .bib-files I have encountered the error was caused by a single long field containing > 10000 characters. Also see #14. |
Anything happening here? I have the error as well and would really like to read the references into R. Or are there any alternatives? I can use |
Can you prepare a reprex ? |
I am using Python for the task now. I had to adapt the workflow a bit, but now it works; and I am learning some python in parallel. |
@narayanibarve do you still have this problem ? If so can you prepare a reproducible example using the |
Here's a reprex for a case of a long field causing bibtex::read.bib("long_field.txt")
#> Error: lex fatal error:
#> input buffer overflow, can't enlarge buffer because scanner uses REJECT I used the current development version of |
Similarly, some reference managers (in this case Zotero) add a jabref comment to the bottom of the file, which causes the same error. bibtex::read.bib("jabref_comment.txt")
#> Error: lex fatal error:
#> input buffer overflow, can't enlarge buffer because scanner uses REJECT |
Thanks. I'll have a look for the next version |
Just wanted to add to this that I'm having a similar problem reading in the attached .bib file from WoS. |
This cleans the BibTex comments, for anybody else dealing with this:
However, for some reason it still fails to import, despite no field having even close to 10K characters in it. So there seem to be other errors, as well. Perhaps simply allowing one to specify a string to parse, and thereby letting people import the files on their own, can be a simple, relatively quick fix? Plus, would add functionality that can more generically be useful, so it wouldn't even be lost functionality once this bug (if it is once :-)) has been resolved :-) |
I'm no closer to solving this, but I remembered I'd actually written 'my own' function to import BibTex files, for a package I'm working on ('metabefor'). It's at https://github.com/Matherion/metabefor/blob/master/R/importBibtex.r, in case anybody's struggling with the same. |
Any news on this? |
Something new on this? I had the same error using both My try: download.file(url = "https://gist.githubusercontent.com/kguidonimartins/6ca03106109cef5a891c67748b895e6a/raw/32c0e203de7875a1d13db6705aa9b507914a9fd9/library.bib",
destfile = "library.bib")
bibtex::read.bib(file = "library.bib")
RefManageR::ReadBib(file = "library.bib") My session info:
|
The funny thing is that the code works using the download.file(url = "https://gist.githubusercontent.com/kguidonimartins/6ca03106109cef5a891c67748b895e6a/raw/32c0e203de7875a1d13db6705aa9b507914a9fd9/library.bib",
destfile = "library.bib")
bibtex::read.bib(file = "library.bib")
#> Vellend M (2001). "Do commonly used indices of $\beta$ -diversity
#> measure species turnover ?" _Journal of Vegetation Science_, *12*,
#> pp. 545-552.
#>
#> López-Mart\'inez JO, Sanaphre-Villanueva L, Dupuy JM,
#> Hernández-Stefanoni JL, Meave JA and Gallardo-Cruz JA (2013).
#> "$\beta$-Diversity of functional groups of woody plants in a
#> tropical dry forest in Yucatan." _PloS one_, *8*(9), pp. e73660.
#> ISSN 1932-6203, doi: 10.1371/journal.pone.0073660 (URL:
#> http://doi.org/10.1371/journal.pone.0073660), <URL:
#> http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3769343{\&}tool=pmcentrez{\&}rendertype=abstract>.
#>
#> Swenson NG, Stegen JC, Davies SJ, Erickson DL, Forero-Montaña J,
#> Hurlbert AH, Kress WJ, Thompson J, Uriarte M, Wright SJ and
#> Zimmerman JK (2012). "Temporal turnover in the composition of
#> tropical tree communities: functional determinism and phylogenetic
#> stochasticity." _Ecology_, *93*(3), pp. 490-499. ISSN 0012-9658,
#> doi: 10.1890/11-1180.1 (URL: http://doi.org/10.1890/11-1180.1),
#> <URL: http://doi.wiley.com/10.1890/11-1180.1>.
RefManageR::ReadBib(file = "library.bib")
#> Warning in parse_Rd(Rd, encoding = encoding, fragment = fragment, ...):
#> <connection>:3: unknown macro '\beta'
#> Warning in parse_Rd(Rd, encoding = encoding, fragment = fragment, ...):
#> <connection>:3: unknown macro '\beta'
#> [1] J. O. López-Mart\'inez, L. Sanaphre-Villanueva, J. M. Dupuy,
#> et al. "$\beta$-Diversity of functional groups of woody plants in
#> a tropical dry forest in Yucatan.". In: _PloS one_ 8.9 (Jan.
#> 2013), p. e73660. ISSN: 1932-6203. DOI:
#> 10.1371/journal.pone.0073660. <URL:
#> http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3769343{\&}tool=pmcentrez{\&}rendertype=abstract>.
#>
#> [2] N. G. Swenson, J. C. Stegen, S. J. Davies, et al. "Temporal
#> turnover in the composition of tropical tree communities:
#> functional determinism and phylogenetic stochasticity". In:
#> _Ecology_ 93.3 (Mar. 2012), pp. 490-499. ISSN: 0012-9658. DOI:
#> 10.1890/11-1180.1. <URL: http://doi.wiley.com/10.1890/11-1180.1>.
#>
#> [3] M. Vellend. "Do commonly used indices of $\beta$ -diversity
#> measure species turnover ?". In: _Journal of Vegetation Science_
#> 12 (2001), pp. 545-552. |
I've been reading bib files with |
Hi, I am using citr and Rmarkdown with Zotero. I partially got around this problem with crsh's suggestion of omitting abstract, but some bibtex entries have 500/1000+ author names, that reproduces the problem. Any suggestions, has anyone come around with a solution to this? |
I have the same problem with Rmarkdown and citr. Any suggested solution for this please ? |
I am having this issue for parsing a long list of authors too. Any progress? |
Hi, I think this issue may be closed after #47 I parsed all your example files with the upcoming version of # PR 47 https://github.com/ropensci/bibtex/pull/47
library(bibtex)
# File 1 ----
f1 <- tempfile("file1", fileext = ".txt")
download.file(
"https://github.com/romainfrancois/bibtex/files/1120203/long_field.txt",
f1
)
ex1 <- read.bib(f1)
ex1
#> Batzill M (2012). "The Surface Science of Graphene: Metal Interfaces,
#> CVD Synthesis, Nanoribbons, Chemical Modifications, and Defects."
#> _SURFACE SCIENCE REPORTS_, *67*(3-4), 83-115. ISSN 0167-5729, doi:
#> 10.1016/j.surfrep.2011.12.001 (URL:
#> https://doi.org/10.1016/j.surfrep.2011.12.001).
# File 2 ----
f2 <- tempfile("file2", fileext = ".txt")
download.file(
"https://github.com/romainfrancois/bibtex/files/1120229/jabref_comment.txt",
f2
)
ex2 <- read.bib(f2)
ex2
#> Gómez RL (2002). "Variability and Detection of Invariant Structure."
#> _Psychological Science_, *13*(5), 431-436. ISSN 0956-7976, 1467-9280,
#> doi: 10.1111/1467-9280.00476 (URL:
#> https://doi.org/10.1111/1467-9280.00476), <URL: 2015-01-20>.
# File 3 -----
f3 <- tempfile("file3", fileext = ".zip")
download.file(
"https://github.com/romainfrancois/bibtex/files/1229495/soil.health_healthy.soil_1to500.bib.zip",
f3
)
unzip(f3, junkpaths = TRUE, exdir = tempdir())
ex3 <- read.bib(
file.path(
tempdir(),
"soil.health_healthy.soil_1to500.bib"
)
)
#> ignoring entry 'ISI:000268383100002' (line 34779) because :
#> A bibentry of bibtype 'InCollection' has to specify the field: author
#> ignoring entry 'ISI:000268383100003' (line 34853) because :
#> A bibentry of bibtype 'InCollection' has to specify the field: author
#> ignoring entry 'ISI:000268383100004' (line 34928) because :
#> A bibentry of bibtype 'InCollection' has to specify the field: author
#> ignoring entry 'ISI:000268383100005' (line 34999) because :
#> A bibentry of bibtype 'InCollection' has to specify the field: author
#> ignoring entry 'ISI:000268383100006' (line 35080) because :
#> A bibentry of bibtype 'InCollection' has to specify the field: author
#> ignoring entry 'ISI:000268383100008' (line 35134) because :
#> A bibentry of bibtype 'InCollection' has to specify the field: author
#> ignoring entry 'ISI:000268383100010' (line 35192) because :
#> A bibentry of bibtype 'InCollection' has to specify the field: author
length(ex3)
#> [1] 493
# Small sample of entries, since the file has 500 (493 read)
ex3[1:5]
#> FORMAN J (1951). "SOIL, HEALTH, AND THE DENTAL PROFESSION." _JOURNAL OF
#> PROSTHETIC DENTISTRY_, *1*(5), 508-522. ISSN 0022-3913, doi:
#> 10.1016/0022-3913(51)90037-6 (URL:
#> https://doi.org/10.1016/0022-3913(51)90037-6).
#>
#> SHARMA N, MADAN M (1983). "EARTHWORMS FOR SOIL HEALTH AND
#> POLLUTION-CONTROL." _JOURNAL OF SCIENTIFIC \& INDUSTRIAL RESEARCH_,
#> *42*(10), 575-583. ISSN 0022-4456.
#>
#> HABERERN J (1992). "A SOIL HEALTH INDEX." _JOURNAL OF SOIL AND WATER
#> CONSERVATION_, *47*(1), 6. ISSN 0022-4561.
#>
#> [Anonymous] (1993). "THE BREAD CORNER - NO BREAD WITHOUT HEALTHY SOIL."
#> _ALIMENTA_, *32*(3), 45. ISSN 0002-5402.
#>
#> Watts M (1994). "Pesticide residues in food: The views of the Soil \&
#> Health Association of New Zealand." In Savage, GP (ed.), _PROCEEDINGS
#> OF THE NUTRITION SOCIETY OF NEW ZEALAND, VOL 19_, volume 19 number 0
#> series PROCEEDINGS OF THE NUTRITION SOCIETY OF NEW ZEALAND, 58-63. Nutr
#> Soc New Zealand, ANIMAL \& VETERINARY SCI GROUP, LINCOLN UNIVERSITY, PO
#> BOX 84, CANTERBURY, NEW ZEALAND. 29th Annual Conference of the
#> Nutrition-Society-of-New-Zealand, CHRISTCHURCH, NEW ZEALAND, AUG, 1994.
# From gist ----
gist <- tempfile(fileext = ".bib")
download.file(
url = "https://gist.githubusercontent.com/kguidonimartins/6ca03106109cef5a891c67748b895e6a/raw/32c0e203de7875a1d13db6705aa9b507914a9fd9/library.bib",
destfile = gist
)
bibtex::read.bib(file = gist)
#> Vellend M (2001). "Do commonly used indices of $\beta$ -diversity
#> measure species turnover ?" _Journal of Vegetation Science_, *12*,
#> 545-552.
#>
#> López-Mart\'inez JO, Sanaphre-Villanueva L, Dupuy JM,
#> Hernández-Stefanoni JL, Meave JA, Gallardo-Cruz JA (2013).
#> "$\beta$-Diversity of functional groups of woody plants in a tropical
#> dry forest in Yucatan." _PloS one_, *8*(9), e73660. ISSN 1932-6203,
#> doi: 10.1371/journal.pone.0073660 (URL:
#> https://doi.org/10.1371/journal.pone.0073660), <URL:
#> http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3769343{\&}tool=pmcentrez{\&}rendertype=abstract>.
#>
#> Swenson NG, Stegen JC, Davies SJ, Erickson DL, Forero-Montaña J,
#> Hurlbert AH, Kress WJ, Thompson J, Uriarte M, Wright SJ, Zimmerman JK
#> (2012). "Temporal turnover in the composition of tropical tree
#> communities: functional determinism and phylogenetic stochasticity."
#> _Ecology_, *93*(3), 490-499. ISSN 0012-9658, doi: 10.1890/11-1180.1
#> (URL: https://doi.org/10.1890/11-1180.1), <URL:
#> http://doi.wiley.com/10.1890/11-1180.1>. Created on 2022-01-17 by the reprex package (v2.0.1) Session infosessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R version 4.1.2 (2021-11-01)
#> os Windows 10 x64 (build 22000)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate Spanish_Spain.1252
#> ctype Spanish_Spain.1252
#> tz Europe/Paris
#> date 2022-01-17
#> pandoc 2.14.0.3 @ C:/Program Files/RStudio/bin/pandoc/ (via rmarkdown)
#>
#> - Packages -------------------------------------------------------------------
#> package * version date (UTC) lib source
#> backports 1.4.1 2021-12-13 [1] CRAN (R 4.1.2)
#> bibtex * 0.5.0 2022-01-17 [1] local
#> cli 3.1.0 2021-10-27 [1] CRAN (R 4.1.1)
#> crayon 1.4.2 2021-10-29 [1] CRAN (R 4.1.1)
#> digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.2)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.1)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.1)
#> fansi 1.0.0 2022-01-10 [1] CRAN (R 4.1.2)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.1)
#> fs 1.5.2 2021-12-08 [1] CRAN (R 4.1.2)
#> glue 1.6.0 2021-12-17 [1] CRAN (R 4.1.2)
#> highr 0.9 2021-04-16 [1] CRAN (R 4.1.1)
#> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.1)
#> knitr 1.37 2021-12-16 [1] CRAN (R 4.1.2)
#> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.1)
#> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.1)
#> pillar 1.6.4 2021-10-18 [1] CRAN (R 4.1.1)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.1)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.1)
#> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.1.1)
#> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.1.1)
#> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.1.1)
#> R.utils 2.11.0 2021-09-26 [1] CRAN (R 4.1.1)
#> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.1)
#> rlang 0.4.12 2021-10-18 [1] CRAN (R 4.1.1)
#> rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.1.1)
#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.1)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.2)
#> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.1)
#> styler 1.6.2 2021-09-23 [1] CRAN (R 4.1.1)
#> tibble 3.1.6 2021-11-07 [1] CRAN (R 4.1.2)
#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.1)
#> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.1)
#> withr 2.4.3 2021-11-30 [1] CRAN (R 4.1.2)
#> xfun 0.29 2021-12-14 [1] CRAN (R 4.1.2)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.1)
#>
#> [1] C:/Users/diego/Documents/R/win-library/4.1
#> [2] C:/Program Files/R/R-4.1.2/library
#>
#> ------------------------------------------------------------------------------ |
This error when I read .bib file. First I thought it happens because file is huge, with something like 5000 citations, so I exported only 4 citations from this set in bibtex format in a .bib format file. But even this 4 citations files does not work. I get the same error.
The text was updated successfully, but these errors were encountered: