Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal support for SIRange in LaTeX reader #6418

Merged
merged 6 commits into from Jul 23, 2020
Merged

Conversation

efharkin
Copy link
Contributor

@efharkin efharkin commented Jun 2, 2020

This patch adds minimal support for \SIRange{firstnumber}{secondnumber}{unit} (provided by siunitx) consistent with existing support for \SI{value}{unit}. Implementation is an extension of the existing dosiunitx function in the LaTeX reader. Added unit tests for SIRange in test/command/3587.md.

Code passes all new and existing unit tests and compiles without warnings*.

This PR implements a fix for issue #6417.

Example

LaTeX input

Something important happens \SIRange{100}{200}{\ms} after the event.

Markdown output

Something important happens 100 ms to 200 ms after the event.

*As far as I can tell... I don't see any warning messages when I compile normally. I'm completely new to haskell and cabal build -Wall and cabal test -Wall both raise errors about unrecognized options, so it's possible there's something I've missed. If there's something different I should be doing please let me know.

Add support for \SIRange{firstnumber}{secondnumber}{unit} provided by
siunitx. Support is implemented by new doSIRange function in
src/Text/Pandoc/Readers/LaTeX.hs, which simply extends the existing
dosiunitx function. Unit tests for doSIRange are in
test/command/3587.md

Example
-------

Input (LaTeX):
Something important happens \SIRange{100}{200}{\ms} after the event.

Result (Markdown):
Something important happens 100 ms to 200 ms after the event.
@jgm
Copy link
Owner

jgm commented Jun 3, 2020

Thanks, this generally looks good!
One comment: people use pandoc in many different languages, so we avoid English words like "to".
Two options here:

  • use an en-dash instead
  • use our relatively new "Translations" to fetch a localized version of "to" (see translateTerm in T.P.Class.PandocMonad; translateTerm To should do it). Perhaps fall back to en-dash if not found?

@efharkin
Copy link
Contributor Author

efharkin commented Jun 5, 2020

Thanks for the input @jgm! I tried to implement the Translations fix but got a bit stuck. I tried inserting translateTerm Translations.To into my code, but it looks like I have to convert the type from Text to Inlines. I tried using text $ translateTerm Translations.To for the conversion, but my compiler is complaining that the argument to text is m0 Text rather than the expected Text. Any chance you could point me in the right direction to fixing this?

If it's too much trouble, I might just fall back to the en-dash fix myself.

@mb21
Copy link
Collaborator

mb21 commented Jun 5, 2020

it looks like I have to convert the type from Text to Inlines

Hoogle is great: https://www.stackage.org/lts-15.15/hoogle?q=Text+-%3E+Inlines

ah, I see you already found text.... so the m0 refers to some monad, in this case the PandocMonad. To get the value "temporarily out of the monad" when in a do block you can use <-, like:

do
  myTxt <- translateTerm Translations.To
  let myInline = text myTxt

P.S. back when I was learning this, I really liked learnyouahaskell.com

Use an en-dash instead of "to" in doSIRange to support languages other
than English.
Try to get a locale-based separator SI units in doSIRange ("to" in
English) and fall back to an en-dash if one is not found.
Currently, unit tests are only available for English using "to" as
a separator.
@efharkin
Copy link
Contributor Author

efharkin commented Jun 6, 2020

Thanks everyone for your feedback! I ended up implementing both versions in separate commits.

  • 116ec3f uses an en-dash and passes all unit tests.
  • bdc370c uses a lower-case locale-based "to" and falls back to an en-dash if one can't be found.

The locale-based version might be a bit more brittle for a couple of reasons

  1. The result of translateTerm Translations.To is "To", which I had to coerce to lower case. This is correct in English, but might not be in other languages.
  2. Right now I only have unit tests for English because I couldn't figure out how to set the locale in the test file. (A quick search for tests mentioning translateTerm came up empty.)
  3. Fallback to an en-dash is triggered by the result of translateTerm being an empty string when translation fails. I'm not sure how to mock a failed translation, so right now the fallback behaviour of doSIRange is untested. Unfortunately that means that if the return value of translateTerm for a failed translation changes, doSIRange fallback will break and there won't be any unit tests to detect the regression.

Let me know if you have any ideas for how to address these issues, or whether you think it's ok as-is. Otherwise, I can always revert to the en-dash version.

@efharkin
Copy link
Contributor Author

@jgm just wanted to flag that I implemented your suggestion in bdc370c:

  localTo <- translateTerm Translations.To
  let separator = (if localTo /= ""
                     then text $ T.toLower localTo 
                     else "\8211")

This means that in an English locale we get the following translation, which is covered by unit tests.

Something important happens \SIRange{100}{200}{\ms} after the event.
Something important happens 100 ms to 200 ms after the event.

If the translation can't be fetched, we get \8211, which is an en-dash. This branch isn't covered by unit tests.

@jgm
Copy link
Owner

jgm commented Jun 24, 2020

So I guess you're asking which branch should be preferred?
I'm not too confident that the "To" translations will always be appropriate in this context. I think I'd feel more solid just using en-dash all the time, and that would give acceptable results.

@efharkin
Copy link
Contributor Author

efharkin commented Jul 1, 2020

Thanks for the comments @jgm! I think you're right that the en-dash is more robust, so I reverted to that version. Good catch about the spaces around the separator. I've removed them to be consistent with the native behaviour of SIRange. Let me know if there's anything else I can do to improve this PR!

@jgm jgm merged commit 1b8f161 into jgm:master Jul 23, 2020
@jgm
Copy link
Owner

jgm commented Jul 23, 2020

Excellent, thanks for contributing to pandoc!

@pauliacomi
Copy link

Hi @efharkin. Thanks a lot for going through the trouble of implementing this addition.

I have had some trouble making it work for my conversions since the siunitx syntax is \SIrange, rather than \SIRange. As such, the latest pandoc version is still ignoring the command. Am I missing something?

@efharkin
Copy link
Contributor Author

Oops, thanks for catching that @pauliacomi! That's what I get for implementing that feature on a different machine than my usual latex setup. Will submit a fix later today, just triple-checking that everything works first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants