Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LaTeX reader: improve parsing of otherlanguage environment #9202

Closed
jgm opened this issue Nov 20, 2023 · 7 comments
Closed

LaTeX reader: improve parsing of otherlanguage environment #9202

jgm opened this issue Nov 20, 2023 · 7 comments
Labels

Comments

@jgm
Copy link
Owner

jgm commented Nov 20, 2023

\begin{otherlanguage}{english}
Here's a div in English. Code is ignored: \texttt{baoeuthasoe}. So are
\href{http://example.com/notaword}{URLs}.
\end{otherlanguage}

is being parsed as

[ Div
    ( "" , [ "otherlanguage" ] , [] )
 [ Para
        [ Span ( "" , [] , [] ) [ Str "english" ]
        , SoftBreak
...

Instead, pandoc should recognize {english} as an argument to the environment and populate the lang attribute (not with english but with en).

@jgm jgm added the bug label Nov 20, 2023
@pauloney
Copy link

John, it is not just the command

\begin{otherlanguage}{english}

there are quite a few more ways to choose the language in Babel and Polyglossia. I can list them all in here for you.

It is also not just english --> en but there are a number of languages that have specific names in Babel/Polyglossia and Aspell uses BCP-47 language tags. I can work that table of conversion for you as well. It is not "readly" avilable in Polyg, as I mentioned, but it can be deduced from the packages files - I just want to find an automated way to do it, so we can use future Polyg distributions.

@pauloney
Copy link

pauloney commented Nov 20, 2023

Here are (what I believe) are all possible ways to set and use a language in LaTeX:

Babel:

Setting:

\documentclass ‣ \documentclass[⟨lang⟩]{article}
hyperref ‣ \usepackage[pdflang=es-MX]{hyperref}
\DocumentMetadata ‣ \DocumentMetadata{lang=es-MX}
\PassOptionsToPackage ‣ \PassOptionsToPackage{main=english}{babel}
\usepackage[⟨lang⟩]{babel}
\usepackage[english,russian,french]{babel} % default lang is the last one.
\usepackage[main=english,russian,french]{babel} % main selection key use.
\usepackage[georgian, provide=*]{babel}
\babelprovide[import]{thai}
\babelprovide[import,main]{arabic} % main is arabic
\babeltags ‣ \babeltags{de = german}

Using:

\selectlanguage ‣ \selectlanguage[⟨options⟩]{⟨lang⟩}
\foreignlanguage ‣ \foreignlanguage[⟨options⟩]{⟨lang⟩}{⟨…⟩}
otherlanguage (env.) ‣ \begin{otherlanguage}[⟨options⟩]{⟨lang⟩} … \end{otherlanguage}
otherlanguage* (env.) ‣ \begin{otherlanguage*}[⟨options⟩]{⟨lang⟩} … \end{otherlanguage*}
\text⟨lang⟩ ‣ \text⟨lang⟩{...} % If \babeltags is set.
⟨lang⟩ (env.) ‣ \begin{⟨lang⟩} ... \end{⟨lang⟩} % If \babeltags is set.

Polyglossia:

Setting:

\setdefaultlanguage ‣ \setdefaultlanguage[⟨options⟩]{⟨lang⟩}
\setmainlanguage ‣ \setmainlanguage[⟨options⟩]{⟨lang⟩}
\resetdefaultlanguage ‣ \resetdefaultlanguage[⟨options⟩]{⟨lang⟩}
\setlanguagealias ‣ \setlanguagealias[⟨options⟩]{⟨language⟩}{⟨alias⟩}
\setlanguagealias* ‣ \setlanguagealias*[⟨options⟩]{⟨language⟩}{⟨alias⟩}

Using:

\text⟨lang⟩ ‣ \text⟨lang⟩[⟨options⟩]{...}
\textlang ‣ \textlang[⟨options⟩]{⟨lang⟩}{...} 
⟨lang⟩ (env.) ‣ \begin{⟨lang⟩}[⟨options⟩] ... \end{⟨lang⟩}
⟨alias⟩ (env.) ‣ \begin{lang}{⟨alias⟩} ...  \end{lang}{⟨alias⟩} % If \setlanguagealias is set.

@jpcirrus
Copy link
Contributor

Adding to @pauloney's above comment. Babel, together with other packages, also recognizes languages set in the options to \documentclass, with the last listed language being the main language. Since babel 3.49 (2020-10-03) these can then be used with \usepackage[package,options,provide*=*]{babel}, which works with \babelprovide{} and automatically sets the options import and main (section 1.13 of babel manual).

@pauloney
Copy link

Thnks @jpcirrus! I reviewed my list after your comments.

@pauloney
Copy link

Here is the spreadsheet containg the:

  1. Languages supported by Babel
  2. Languages supported by Polyglossia
  3. The BCP-47 code of each one.
  4. If it is supported by Aspell
  5. If it is supported by Hunspell

I added Hunspell because it is a better speller and there is way more development there now, and the set of supported languages is sligthly different. Having an option to use either (or both) would be realy nice.

The Babel list has just the names of the langs, the Polyglossia one is more detailed because of the variations -- most of them not important for the choise of lang (one can spell an es-MX file with an es-ES disctionary for the most part), but some are really important, for example both Aspell and Hunspell have pt-PT and pt-BR dictionaries.

The BCP-47 is certainly the best wayt to pass a parameter from LaTeX to Pandoc to Aspell, so that is included as well.

Supported_Languages.txt

Supported_Languages.ods

@jgm
Copy link
Owner Author

jgm commented Nov 26, 2023

Here is the code we use to do these conversions:
https://github.com/jgm/pandoc/blob/main/src/Text/Pandoc/Readers/LaTeX/Lang.hs#L14-L237
If you notice omissions, perhaps do a PR so we can update?

@pauloney
Copy link

John, this is great! I am not able to follow up on all the details of the code because of my limited Haskell skills, but the logic down in the languages looks all right.

Is there a way I can do some quick tests, command line or small files? I want to check if things are indeed correct and complete -- in making the list I found at least two wrong BCP-47 tags in Aspell.

@jgm jgm closed this as completed in 80ea048 Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants