Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 quotation marks (lower-99/„) aren’t recognized, leading also to wrong citation conversion #6869

Closed
pripple opened this issue Nov 20, 2020 · 5 comments

Comments

@pripple
Copy link

pripple commented Nov 20, 2020

Hi, I am using pandoc 2.11.2 and have also tried the nightly build with the same output under on a Mac running Big Sur 11.0.1 (20B29).

I think, pandoc (including citeproc) doesn’t recognize German UTF-8 opening quotes (lower-99/„). Or is there a still more explicit way of setting the language? Still, I could be using different languages in one document …

I am just offering the example as a LaTeX → MD conversion for simplicity; usually, I am exporting to DOCX. I would just like to get rid of the extra \[\] in the output. In DOCX, the quotation marks aren’t converted to straight inch characters (") anyway, so you can’t notice it there.

Consider this MWE in LaTeX:

% !TEX encoding = UTF-8 Unicode
\documentclass{article}
\usepackage[ngerman]{babel}
\usepackage[german=quotes]{csquotes}
\usepackage[style=footnote-dw]{biblatex}

\begin{filecontents}{\jobname.bib}
@article{my_article,
	author = {Doe, John},
	journal = {The Pandoc Journal},
	number = {3},
	pages = {393–396},
	title = {A Bibliographer’s \TeX nic Inquiry},
	volume = {25},
	year = {1989}}
\end{filecontents}
\addbibresource{\jobname.bib}
\setlength\parindent{0pt}
\begin{document}
	
Obviously, Pandoc recognizes “english-style” UTF-8 quotes, but not „deutsche“. 

This is a problem when citing.\cite[Vgl.][394]{my_article} 

This works: Und nochmal mit Anführungszeichen.\cite[Vgl.][394. In English: “And again, with quotation marks.”]{my_article} 

However, it doesn’t work with German quotation marks.\cite[Vgl.][394. In German: „Aber mit deutschen Anführungszeichen geht es nicht.“]{my_article} 

(Expected output: Like before, without extra \texttt{\textbackslash[\textbackslash]} in Markdown.)

Now, process it with the following command: \texttt{pandoc \jobname.tex -C --bibliography=\jobname.bib --verbose -o \jobname.md}

\end{document}

This is the output I get from the command mentioned in the generated PDF, pandoc filename.tex -C –bibliography= filename.bib –verbose -o filename.md:

Obviously, Pandoc recognizes "english-style" UTF-8 quotes, but not
„deutsche".

This is a problem when citing.[Vgl. @my_article 394]

This works: Und nochmal mit Anführungszeichen.[Vgl. @my_article 394. In
English: "And again, with quotation marks."]

However, it doesn't work with German quotation marks.[Vgl. @my_article
\[394. In German: „Aber mit deutschen Anführungszeichen geht es nicht."\]]

(Expected output: Like before, without extra `\[\]` in Markdown.)

Now, process it with the following command:
`pandoc .tex -C –bibliography=.bib –verbose -o .md`

This is what I would expect:

Obviously, Pandoc recognizes "english-style" UTF-8 quotes, but not
"deutsche".

This is a problem when citing.[Vgl. @my_article 394]

This works: Und nochmal mit Anführungszeichen.[Vgl. @my_article 394. In
English: "And again, with quotation marks."]

However, it doesn't work with German quotation marks.[Vgl. @my_article 394. In German: "Aber mit deutschen Anführungszeichen geht es nicht."]

(Expected output: Like before, without extra `\[\]` in Markdown.)

Now, process it with the following command:
`pandoc .tex -C –bibliography=.bib –verbose -o .md`
@mb21
Copy link
Collaborator

mb21 commented Nov 20, 2020

By doing pandoc -f latex -t native on:

“english-style” quotes but not „deutsche“

we can see that this is indeed a potential problem with the LaTeX reader:

[Quoted DoubleQuote [Str "english-style"],Space,Str "quotes",Space,Str "but",Space,Str "not",Space,Str "\8222deutsche\8220"]

You can get slightly different output with pandoc -f latex -t markdown-smart, but I don't think it really solves your issue..

@pripple
Copy link
Author

pripple commented Nov 20, 2020

Thank you for this first direction! However, I’m not familiar enough with the pandoc code to be able to solve the issue myself now … 🙈

By the way, using csquotes is a pain compared to just typing UTF-8 quotes. Still, I tried that. It leads to bloated output, because it now includes language-specific marks. Even though it does recognize those marks, when exporting to DOCX, it doesn’t put lower-99-upper-66 quotation marks as I would expect when the language is set so explicitly. – Instead, upper 66-99-quotation marks, also inside the explicitly German parts. — Even with that resolved, using \enquote and \foreignlanguage all the time is really a pain, because you can barely read the source anymore. I have lots of citations in different languages in my document and I could barely read the source when doing so.

% !TEX encoding = UTF-8 Unicode
\documentclass{article}
\usepackage[ngerman,british]{babel}
\usepackage[autostyle,german=quotes,english=british]{csquotes}
\usepackage[style=footnote-dw]{biblatex}

\begin{filecontents}{\jobname.bib}
@article{my_article,
	author = {Doe, John},
	journal = {The Pandoc Journal},
	number = {3},
	pages = {393–396},
	title = {A Bibliographer’s \TeX nic Inquiry},
	volume = {25},
	year = {1989}}
\end{filecontents}
\addbibresource{\jobname.bib}
\setlength\parindent{0pt}
\begin{document}
	
Using csquotes, it works: \enquote{english-style} csquotes, also \foreignlanguage{ngerman}{\enquote{deutsche} Anführungszeichen}.

This works: \foreignlanguage{ngerman}{Und nochmal mit Anführungszeichen.}\cite[Vgl.][394. In English: \enquote{And again, with quotation marks.}]{my_article} 

Also now with German quotation marks.\cite[Vgl.][394. In German: \foreignlanguage{ngerman}{\enquote{Jetzt auch mit deutschen Anführungszeichen.}}]{my_article} 

\end{document}

Example line from the output in Markdown:

Also now with German quotation marks.[Vgl. @my_article 394. In German:
["Jetzt auch mit deutschen Anführungszeichen."]{lang="de-DE"}]

Example line from the DOCX-output:

Also now with German quotation marks.(Vgl. Doe 1989, 394. In German: “Jetzt auch mit deutschen Anführungszeichen.”)

@jgm
Copy link
Owner

jgm commented Nov 20, 2020

More minimal:

% pandoc -f latex -t native  
\cite[Vgl.][394. In German: „Aber mit deutschen Anführungszeichen geht es nicht.“]{my_article}
[Para [Cite [Citation {citationId = "my_article", citationPrefix = [Str "Vgl."], citationSuffix = [Str "[394. In German: \8222Aber mit deutschen Anf\252hrungszeichen geht es nicht.\8220]"], citationMode = NormalCitation, citationNoteNum = 0, citationHash = 0}] [RawInline (Format "latex") "\\cite[Vgl.][394. In German: \8222Aber mit deutschen Anf\252hrungszeichen geht es nicht.\8220]{my_article}"]]]

% pandoc -f latex -t native
\cite[Vgl.][394. In English: “And again, with quotation marks.”]{my_article} 
[Para [Cite [Citation {citationId = "my_article", citationPrefix = [Str "Vgl."], citationSuffix = [Str "394.",Space,Str "In",Space,Str "English:",Space,Quoted DoubleQuote [Str "And",Space,Str "again,",Space,Str "with",Space,Str "quotation",Space,Str "marks."]], citationMode = NormalCitation, citationNoteNum = 0, citationHash = 0}] [RawInline (Format "latex") "\\cite[Vgl.][394. In English: \8220And again, with quotation marks.\8221]{my_article}"]]]

In the German case the citationSuffix includes the brackets.
We need to figure out why.

@jgm
Copy link
Owner

jgm commented Nov 20, 2020

Even more minimal:

% pandoc -f latex -t native
\cite[„Aber“]{key}
[Para [Cite [Citation {citationId = "key", citationPrefix = [], citationSuffix = [Str "[\8222Aber\8220]"], citationMode = NormalCitation, citationNoteNum = 0, citationHash = 0}] [RawInline (Format "latex") "\\cite[\8222Aber\8220]{key}"]]]

@jgm jgm closed this as completed in 9a40976 Nov 20, 2020
@pripple
Copy link
Author

pripple commented Nov 20, 2020

Thank you! 😃 Looking forward to the nightly build … 😎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants