Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latex to epub – German Quotes not transformed correctly #5470

Open
aschatt opened this issue Apr 29, 2019 · 13 comments
Open

Latex to epub – German Quotes not transformed correctly #5470

aschatt opened this issue Apr 29, 2019 · 13 comments

Comments

@aschatt
Copy link

aschatt commented Apr 29, 2019

Pandoc transformes German quotes incorrectly.

(1) Context

(a) in the LaTeX file the Babel / German package is used

\usepackage[ngerman]{babel}

(b) The quotes are written correctly such as: "`Das ist ein Zitat"'

(c) In the pandoc command also language is defined as German with:

-V lang=de-AT

(2) Correct result would be:

The correct translation to ePub would have to be: „Das ist ein Zitat“ or alternatively possible also: »Das ist ein Zitat«

(2) Bug:

Pandoc actually renders English quotation marks: "Das ist ein Zitat"

@mb21
Copy link
Collaborator

mb21 commented Apr 29, 2019

parsing latex

You're saying the exact input is the following?

`Das ist ein Zitat"

Shouldn't it be in LaTeX:

``Das ist ein Zitat''

Or is this a syntax I don't know?

outputting html

The second part of this issue is that we currently don't output quotes in different locales, see #84

But for ePUB/HTML output you can use --html-q-tags and use some CSS to your liking.

@jgm
Copy link
Owner

jgm commented Apr 29, 2019

Note that -V lang=de-AT is not generally what you want. Set a metadata field instead of a variable. Variables only affect template rendering, while metadata fields can affect parsing as well. I don't expect this will make a difference in this case, though.

@aschatt
Copy link
Author

aschatt commented Apr 30, 2019

@mb21 No, that is actually correct. Your Syntax creates English quotation marks. The German ones are like this: "`here ist the quote"'

@jgm
Copy link
Owner

jgm commented May 1, 2019

So am I correctly understanding that in babel, "` and "' are ligatures for and respectively? Is this just with babel or more widely? Is it just when the german option is used with babel?

@aschatt
Copy link
Author

aschatt commented May 1, 2019

First part: exactly right. If this is just with babel, I don't know. This is the usually recommended version to set German text. I found this in several books and tutorials. I personally always used that method.

@jgm
Copy link
Owner

jgm commented May 1, 2019

Experimented -- looks like it's babel-specific, and only when language is german.

@aschatt
Copy link
Author

aschatt commented May 3, 2019

This is possible.

Anyways: unfortunately I currently see no way to easily create german quotation marks in pub. This is really bad for me because my whole publishing workflow otherwise works very well.

@agusmba
Copy link
Contributor

agusmba commented May 3, 2019

Since they are not "smart" quotes (start and end are different in the source material), wouldn't a simple filter or a preprocessor take care of that conversion for you?

@jgm
Copy link
Owner

jgm commented May 3, 2019

I just saw that in the LaTeX reader we have this code in smart quote parsing:

   -- the following is used by babel for localized quotes:
   <|> quoted' doubleQuoted (try $ sequence [symbol '"', symbol '`'])
                            (void $ try $ sequence [symbol '"', symbol '\''])

This causes the ligatures to be rendered as regular English-style quotes in most output formats, which isn't desirable. Instead of parsing "`hi"' as Quoted DoubleQuoted, we should simply parse these ligatures as unicode characters (the German quotes). We could also make this behavior sensitive to whether babel / german is being used, although it might be safe to assume that these ligatures won't occur otherwise.

Note that you could write a simple lua filter that renders Quoted DoubleQuoted elements with the German quotes. See lua filter documentation on the website.

@jgm
Copy link
Owner

jgm commented May 3, 2019

-- quote.lua
function Quoted(el)
  if el.quotetype == 'DoubleQuote' then
    return {pandoc.Str(""), pandoc.Span(el.content), pandoc.Str("")}
  end
end

Run with pandoc --lua-filter quote.lua.

@Delanii
Copy link

Delanii commented Sep 24, 2020

Hit upon the same issue with docx and odt formats (and also with latex format, for which I am using custom template with always-enabled csquotes package). Last comment of @jgm solved that, with help of docs on lua filters to accomodate for different formats for me, and I am a lua programming newbie.

From that perspective, I guess that such simple filter is OK to use and easier to maintain that any change in pandoc itself. Also, there is already a filter in pandoc-lua-filters repository. On that basis, would it be appropriate to close this issue and also #84 ? (which seems pretty old to me, labelled also with "high complexity"; but I would guess that it is solved now with this filter)

@jgm
Copy link
Owner

jgm commented Sep 24, 2020

I want to keep this issue open to track the issue noted above about the special babel ligatures.
We could handle those better.

@wanddynosios
Copy link

Extending the answer above, you can also add single quote handling like this:

    if el.quotetype == 'SingleQuote' then
        return {pandoc.Str("‚"), pandoc.Span(el.content), pandoc.Str("‘")}
      end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants