Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug in reading BibTeX #7049

Closed
jgm opened this issue Jan 26, 2021 · 3 comments
Closed

Possible bug in reading BibTeX #7049

jgm opened this issue Jan 26, 2021 · 3 comments

Comments

@jgm
Copy link
Owner

jgm commented Jan 26, 2021

 % pandoc -f bibtex -t native -s
@article{aaron_ri-etal:1961,
author = {Aaron, Richard Ithamar and Rotenstreich, Nathan and Passmore, 
John A. and Mercier, Andr{\'e} and Russell, Leonard and Moreau, Joseph },
year = { 1961 },
title = { Discussion sur \citet{hersch_j:1961} et \citet{marias:1961} },
journal = { dialectica },
volume = { 15 },
number = { 57--58 },
pages = { 253--257 },
}
^D
Pandoc (Meta {unMeta = fromList [("nocite",MetaInlines [Cite [Citation {citationId = "*", citationPrefix = [], citationSuffix = [], citationMode = NormalCitation, citationNoteNum = 0, citationHash = 0}] [Str "[@*]"]]),("references",MetaList [MetaMap (fromList [("author",MetaList [MetaMap (fromList [("family",MetaString "Aaron"),("given",MetaString "Richard Ithamar")]),MetaMap (fromList [("family",MetaString "Rotenstreich"),("given",MetaString "Nathan")]),MetaMap (fromList [("family",MetaString "Passmore"),("given",MetaString "John A.")]),MetaMap (fromList [("family",MetaString "Mercier"),("given",MetaString "Andr\233")]),MetaMap (fromList [("family",MetaString "Russell"),("given",MetaString "Leonard")]),MetaMap (fromList [("family",MetaString "Moreau"),("given",MetaString "Joseph")])]),("container-title",MetaInlines [Str "dialectica"]),("id",MetaString "aaron_ri-etal:1961"),("issue",MetaInlines [Str "57\8211\&58"]),("issued",MetaString "1961"),("page",MetaInlines [Str "253-257"]),("title",MetaInlines [Str "Discussion",Space,Str "sur",Space,Cite [Citation {citationId = "hersch_j:1961", citationPrefix = [], citationSuffix = [], citationMode = AuthorInText, citationNoteNum = 0, citationHash = 0}] [RawInline (Format "latex") "\\citet{hersch_j:1961}"],Space,Str "et",Space,Cite [Citation {citationId = "marias:1961", citationPrefix = [], citationSuffix = [], citationMode = AuthorInText, citationNoteNum = 0, citationHash = 0}] [Span ("",["nocase"],[]) [Str "marias:1961"]]]),("type",MetaString "article-journal"),("volume",MetaInlines [Str "15"])])])]})
[]

The thing to note here is that we get the raw latex preserved as a fallback for \citet{hersch_j:1961} but not for \citet{marias:1961}. Outside the bibtex context this doesn't happen:

 % pandoc -f latex -t native
Discussion sur \citet{hersch_j:1961} et \citet{marias:1961}
^D
[Para [Str "Discussion",Space,Str "sur",Space,Cite [Citation {citationId = "hersch_j:1961", citationPrefix = [], citationSuffix = [], citationMode = AuthorInText, citationNoteNum = 0, citationHash = 0}] [RawInline (Format "latex") "\\citet{hersch_j:1961}"],Space,Str "et",Space,Cite [Citation {citationId = "marias:1961", citationPrefix = [], citationSuffix = [], citationMode = AuthorInText, citationNoteNum = 0, citationHash = 0}] [RawInline (Format "latex") "\\citet{marias:1961}"]]]
@jdutant
Copy link
Contributor

jdutant commented Jan 26, 2021

On a few quick tests it does look like we get LaTeX fallback iff there's an underscore. Whether I change the keys, their order, place them in the "note" field rather than "title", or remove ":", I get LaTeX fallback just if there's an underscore in the key. There are only 4 raw LaTeX citations in the json and there are just those with underscore.

pandoc -f bibtex -t native -s
@article{aaron_ri-etal:1961,
	author = {	Aaron, Richard Ithamar	},
	year = {	1961	},
	title = {	Discussion sur \citet{marj1961} et \citet{marj:1961} et \citet{hersch} 
                            \citet{mari_as} et \citet{hersch_j:1961} },
	journal = {	dialectica	},
	volume = {	15	},
	number = {	57--58	},
	pages = {	253--257	},
	note = { Avec commentaires de \citet{marj} et \citet{marj:1961} 
               et \citet{mar_1961} et \citet{hersh_test} }
}
Pandoc (Meta {unMeta = fromList [("nocite",MetaInlines [Cite [Citation
{citationId = "*", citationPrefix = [], citationSuffix = [],
citationMode = NormalCitation, citationNoteNum = 0, citationHash = 0}]
[Str "[@*]"]]),("references",MetaList [MetaMap (fromList
[("author",MetaList [MetaMap (fromList [("family",MetaString
"Aaron"),("given",MetaString "Richard
Ithamar")])]),("container-title",MetaInlines [Str
"dialectica"]),("id",MetaString
"aaron_ri-etal:1961"),("issue",MetaInlines [Str
"57\8211\&58"]),("issued",MetaString "1961"),("note",MetaInlines [Str
"Avec",Space,Str "commentaires",Space,Str "de",Space,Cite [Citation
{citationId = "marj", citationPrefix = [], citationSuffix = [],
citationMode = AuthorInText, citationNoteNum = 0, citationHash = 0}]
[Span ("",[],[]) [Str "marj"]],Space,Str "et",Space,Cite [Citation
{citationId = "marj:1961", citationPrefix = [], citationSuffix = [],
citationMode = AuthorInText, citationNoteNum = 0, citationHash = 0}]
[Span ("",[],[]) [Str "marj:1961"]],Space,Str "et",Space,Cite [Citation
{citationId = "mar_1961", citationPrefix = [], citationSuffix = [],
citationMode = AuthorInText, citationNoteNum = 0, citationHash = 0}]
[RawInline (Format "latex") "\\citet{mar_1961}"],Space,Str
"et",Space,Cite [Citation {citationId = "hersh_test", citationPrefix =
[], citationSuffix = [], citationMode = AuthorInText, citationNoteNum =
0, citationHash = 0}] [RawInline (Format "latex")
"\\citet{hersh_test}"]]),("page",MetaInlines [Str
"253-257"]),("title",MetaInlines [Str "Discussion",Space,Str
"sur",Space,Cite [Citation {citationId = "marj1961", citationPrefix =
[], citationSuffix = [], citationMode = AuthorInText, citationNoteNum =
0, citationHash = 0}] [Span ("",["nocase"],[]) [Str
"marj1961"]],Space,Str "et",Space,Cite [Citation {citationId =
"marj:1961", citationPrefix = [], citationSuffix = [], citationMode =
AuthorInText, citationNoteNum = 0, citationHash = 0}] [Span
("",["nocase"],[]) [Str "marj:1961"]],Space,Str "et",Space,Cite
[Citation {citationId = "hersch", citationPrefix = [], citationSuffix =
[], citationMode = AuthorInText, citationNoteNum = 0, citationHash = 0}]
[Span ("",["nocase"],[]) [Str "hersch"]],Space,Cite [Citation
{citationId = "mari_as", citationPrefix = [], citationSuffix = [],
citationMode = AuthorInText, citationNoteNum = 0, citationHash = 0}]
[RawInline (Format "latex") "\\citet{mari_as}"],Space,Str
"et",Space,Cite [Citation {citationId = "hersch_j:1961", citationPrefix
= [], citationSuffix = [], citationMode = AuthorInText, citationNoteNum
= 0, citationHash = 0}] [RawInline (Format "latex")
"\\citet{hersch_j:1961}"]]),("type",MetaString
"article-journal"),("volume",MetaInlines [Str "15"])])])]}) []

It doesn't seem to be non-alphanumeric characters in general, cf. the colon above and the - and ! below:

pandoc -f bibtex -t native -s
@article{aaron_ri-etal:1961,
	author = {	Aaron, Richard Ithamar	},
	year = {	1961	},
	title = {	Discussion sur \citet{mari-as}, \citet{hecto!r}, },
	journal = {	dialectica	},
	volume = {	15	},
	pages = {	253--257	},
}
Pandoc (Meta {unMeta = fromList [("nocite",MetaInlines [Cite [Citation {citationId = "*", citationPrefix = [], citationSuffix = [], citationMode = NormalCitation, citationNoteNum = 0, citationHash = 0}] [Str "[@*]"]]),("references",MetaList [MetaMap (fromList [("author",MetaList [MetaMap (fromList [("family",MetaString "Aaron"),("given",MetaString "Richard Ithamar")])]),("container-title",MetaInlines [Str "dialectica"]),("id",MetaString "aaron_ri-etal:1961"),("issued",MetaString "1961"),("page",MetaInlines [Str "253-257"]),("title",MetaInlines [Str "Discussion",Space,Str "sur",Space,Cite [Citation {citationId = "mari-as", citationPrefix = [], citationSuffix = [], citationMode = AuthorInText, citationNoteNum = 0, citationHash = 0}] [Span ("",["nocase"],[]) [Str "mari-as"]],Str ",",Space,Cite [Citation {citationId = "hecto!r", citationPrefix = [], citationSuffix = [], citationMode = AuthorInText, citationNoteNum = 0, citationHash = 0}] [Span ("",["nocase"],[]) [Str "hecto!r"]],Str ","]),("type",MetaString "article-journal"),("volume",MetaInlines [Str "15"])])])]})
[]

@jgm
Copy link
Owner Author

jgm commented Jan 26, 2021

Slightly more compact test case: plain latex

 % pandoc -f latex -t native
Discussion sur \citet{marj1961} et \citet{marj:1961} et \citet{hersch} 
                            \citet{mari_as} et \citet{hersch_j:1961}   
^D
[Para [Str "Discussion",Space,Str "sur",Space,Cite [Citation {citationId = "marj1961", citationPrefix = [], citationSuffix = [], citationMode = AuthorInText, citationNoteNum = 0, citationHash = 0}] [RawInline (Format "latex") "\\citet{marj1961}"],Space,Str "et",Space,Cite [Citation {citationId = "marj:1961", citationPrefix = [], citationSuffix = [], citationMode = AuthorInText, citationNoteNum = 0, citationHash = 0}] [RawInline (Format "latex") "\\citet{marj:1961}"],Space,Str "et",Space,Cite [Citation {citationId = "hersch", citationPrefix = [], citationSuffix = [], citationMode = AuthorInText, citationNoteNum = 0, citationHash = 0}] [RawInline (Format "latex") "\\citet{hersch}"],SoftBreak,Cite [Citation {citationId = "mari_as", citationPrefix = [], citationSuffix = [], citationMode = AuthorInText, citationNoteNum = 0, citationHash = 0}] [RawInline (Format "latex") "\\citet{mari_as}"],Space,Str "et",Space,Cite [Citation {citationId = "hersch_j:1961", citationPrefix = [], citationSuffix = [], citationMode = AuthorInText, citationNoteNum = 0, citationHash = 0}] [RawInline (Format "latex") "\\citet{hersch_j:1961}"]]]

Now bibtex

% pandoc -f bibtex -t native -s
@book{test,
title = {Discussion sur \citet{marj1961} et \citet{marj:1961} et \citet{hersch} 
                            \citet{mari_as} et \citet{hersch_j:1961}}
}
^D
[]
Pandoc (Meta {unMeta = fromList [("nocite",MetaInlines [Cite [Citation {citationId = "*", citationPrefix = [], citationSuffix = [], citationMode = NormalCitation, citationNoteNum = 0, citationHash = 0}] [Str "[@*]"]]),("references",MetaList [MetaMap (fromList [("id",MetaString "test"),("title",MetaInlines [Str "Discussion",Space,Str "sur",Space,Cite [Citation {citationId = "marj1961", citationPrefix = [], citationSuffix = [], citationMode = AuthorInText, citationNoteNum = 0, citationHash = 0}] [Span ("",["nocase"],[]) [Str "marj1961"]],Space,Str "et",Space,Cite [Citation {citationId = "marj:1961", citationPrefix = [], citationSuffix = [], citationMode = AuthorInText, citationNoteNum = 0, citationHash = 0}] [Span ("",["nocase"],[]) [Str "marj:1961"]],Space,Str "et",Space,Cite [Citation {citationId = "hersch", citationPrefix = [], citationSuffix = [], citationMode = AuthorInText, citationNoteNum = 0, citationHash = 0}] [Span ("",["nocase"],[]) [Str "hersch"]],Space,Cite [Citation {citationId = "mari_as", citationPrefix = [], citationSuffix = [], citationMode = AuthorInText, citationNoteNum = 0, citationHash = 0}] [RawInline (Format "latex") "\\citet{mari_as}"],Space,Str "et",Space,Cite [Citation {citationId = "hersch_j:1961", citationPrefix = [], citationSuffix = [], citationMode = AuthorInText, citationNoteNum = 0, citationHash = 0}] [RawInline (Format "latex") "\\citet{hersch_j:1961}"]]),("type",MetaString "book")])])]})
[]

As you note, the ones with underscore have raw latex fallbacks, and the others just have the citekey in brackets.

Note that the bibtex reader uses the latex reader, so this divergence is surprising.

@jgm
Copy link
Owner Author

jgm commented Jan 26, 2021

OK, I found out what is happening.
When we parse the contents of titles, we send it through a latex function (in T.P.Citeproc.BibTeX readBibTeXString). This calls adjustSpans, which runs parseRawLaTeX on raw tex inlines.
parseRawLaTeX is doing this transformation (before and after):

"\\citet{marj1961}" -> Span ("",[],[]) [Str "marj1961"]
"\\citet{marj:1961}" -> Span ("",[],[]) [Str "marj:1961"]
"\\citet{hersch}" -> Span ("",[],[]) [Str "hersch"]
"\\citet{mari_as}" -> RawInline (Format "latex") "\\citet{mari_as}"
"\\citet{hersch_j:1961}" -> RawInline (Format "latex") "\\citet{hersch_j:1961}"

This strips off the {hersh_j:1961} part and tries to parse it as LateX, failing because of the _.
This code is incredibly tortuous and ugly, and needs to be reworked. Now that citeproc is integrated, we should be able to integrate some of the bibtex-specific parsing into the LaTeX reader itself, which will be cleaner.

@jgm jgm closed this as completed in 98c2a52 Jan 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants