Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Unicode Support (Alternative II) #89

Closed
wants to merge 2 commits into from

Conversation

certik
Copy link
Member

@certik certik commented Apr 22, 2016

This is an alternative, perhaps a better, patch to #88, since this shows how to make unicode printing work with pdflatex.

@certik
Copy link
Member Author

certik commented Apr 22, 2016

The advantage of this approach is that it allows us to define exactly what latex symbol to use for unicode characters.

@certik certik changed the title Enable Unicode Support Enable Unicode Support (Alternative II) Apr 22, 2016
@asmeurer
Copy link
Member

Is there a good way to catch it when new Unicode characters are added?

@asmeurer
Copy link
Member

Actually, I'm -1 to this. This prints Unicode characters with the wrong characters. (BOX DRAWINGS LIGHT HORIZONTAL) really is different from - (HYPHEN-MINUS). LaTeX converts -- and --- to en-dashes and em-dashes, but these are also different. Plus all examples are rendered verbatim meaning LaTeX does no formatting or kerning on the characters.

And how would this even work for

⌠
⎮
⌡

I think if we are going to show off Unicode pretty printing, we need to use the actual characters emitted by the printer.

For accents in names, it's less important. Those can be represented using pure (ASCII) LaTeX quite easily. If that's all we do (i.e., we decide we don't want to show off Unicode pretty printing in the paper), let's just add another field, latex_name to the authors.json, and not worry about non-ASCII characters. But if we do, we should figure out how to show them. I've made it work in the past with xetex (side-note, I've also made use of DejaVu Sans Mono in the past, since that's one of the few fonts where the Unicode integral renders correctly).

I say we hold off on this for now until a) we figure out if we want to show off Unicode pretty printing and b) we determine if the journal we submit to can handle that.

@certik
Copy link
Member Author

certik commented Apr 22, 2016

Then please merge alternative I.

Holding off leaves serious issues. Yes, there is a way to tell. Latex now
errors out (the right behavior ). Previously it silently ignored unicode
characters (wrong behavior ), so the final formulas were missing minus
signs and such!

Sent from my mobile phone.
On Apr 21, 2016 7:15 PM, "Aaron Meurer" notifications@github.com wrote:

Actually, I'm -1 to this. This prints Unicode characters with the wrong
characters. ─ (BOX DRAWINGS LIGHT HORIZONTAL) really is different from -
(HYPHEN-MINUS). LaTeX converts -- and --- to en-dashes and em-dashes, but
these are also different. Plus all examples are rendered verbatim meaning
LaTeX does no formatting or kerning on the characters.

And how would this even work for



I think if we are going to show off Unicode pretty printing, we need to
use the actual characters emitted by the printer.

For accents in names, it's less important. Those can be represented using
pure (ASCII) LaTeX quite easily. If that's all we do (i.e., we decide we
don't want to show off Unicode pretty printing in the paper), let's just
add another field, latex_name to the authors.json, and not worry about
non-ASCII characters. But if we do, we should figure out how to show them.
I've made it work in the past with xetex (side-note, I've also made use of
DejaVu Sans Mono in the past, since that's one of the few fonts where the
Unicode integral renders correctly).

I say we hold off on this for now until a) we figure out if we want to
show off Unicode pretty printing and b) we determine if the journal we
submit to can handle that.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#89 (comment)

@certik
Copy link
Member Author

certik commented Apr 22, 2016

Alternative I (#88) was merged. I will keep this one open, as I think this PR shows how to get unicode working for sympy unicode output in the verbatim mode. I just need to play with it some more.

@asmeurer
Copy link
Member

My preferred way, if we can get it to work, is to use xetex. I've used it before, and it definitely works.

@certik
Copy link
Member Author

certik commented Apr 22, 2016

Ok, I am not against. Just a warning that e.g. we wouldn't be able to upload to arxiv.org, since it only seems to support pdflatex.

@asmeurer
Copy link
Member

Maybe let's just plan to not use any Unicode pretty printing examples for now. We can use ASCII pretty printing, or probably better (for space), the default str printing (all the examples in the paper will need to be rewritten to be in a consistent style, by the way). For ASCII characters like the ones in "Ondřej Čertík" we can use the workarounds we've already got, or use \v{r} and so on.

Unless you really want to spend some time to figure out how to make it work (I personally don't).

@certik
Copy link
Member Author

certik commented Apr 22, 2016

I'll have a look at this if I have time. The documentation to the inputenc package that we use is at: https://www.tug.org/texmf-dist/doc/latex/base/inputenc.pdf

@certik
Copy link
Member Author

certik commented Apr 22, 2016

One can use this to lookup the unicode symbols:

http://www.fileformat.info/info/unicode/char/search.htm

and then this to find the corresponding latex symbol:

http://www.johndcook.com/unicode_latex.html

E.g. is U+2320 and that is \inttop in latex (that seems to be defined in the unicode-math package, but I haven't tried it).

This pdf has nice tables with all these symbols: ftp://ftp.dante.de/pub/tex/macros/latex/contrib/unicode-math/unimath-symbols.pdf

@asmeurer
Copy link
Member

asmeurer commented May 2, 2016

I tried this. If I \usepackage{unicode-math}, it tells me

! Package unicode-math Error: Cannot be run with pdfLaTeX!
(unicode-math)                Use XeLaTeX or LuaLaTeX instead..

Without it, \inttop doesn't seem to be defined.


[10])
! Undefined control sequence.
\u8:⌠ ->\inttop

So it seems we need xetex. When I try compiling the paper with xelatex, I get

! Undefined control sequence.
<argument> \headerps@out
                         {/burl@stx null def /BU.S { /burl@stx null def } de...
l.311 }

?

which I think is an error from the style file.

The solution I've used in the past is to compile the LaTeX examples as separate documents with xetex, and include them using \includegraphics. I tested this and it works, although we would need to tweak it so that it lines up correctly.

Alternately, we could just take a screenshot, and show that. That would also let us show the MathJax rendering in Jupyter.

@certik
Copy link
Member Author

certik commented May 2, 2016

I see, so we would need to define \inttop ourselves. It sucks to compile the examples separately, but perhaps we can just include them as figures.

@asmeurer
Copy link
Member

asmeurer commented May 3, 2016

So our options are:

  1. Don't include anything about Unicode pretty printing (or don't include anything about any pretty printing). I've already written some stuff on printing and I'd like to include it, so this is not my preferred option.

  2. Render the Unicode pretty printing in a separate TeX file using XeTeX, and import it as an image using \includegraphics. This actually works well if you render it as a pdf, because in the end document it basically looks exactly like it is just part of the text (it isn't an image---you can select the text and stuff). Another advantage is that we can use a font that renders the Unicode characters well (although this may look bad, if it's a different font from the rest of the paper).

    The downside here is that we have to figure out how to do the formatting on the text and image so that things line up. Here's what a first pass doing this naively looks like:

    screen shot 2016-05-03 at 3 07 02 pm

    You can see it doesn't line up. We'd have to play with it to make that work. Also you can definitely tell that the font is different.

    A potential issue here is that the alignment may be sensitive to the formatting of the paper, meaning that if the journal changes the formatting, it may not look correct anymore.

  3. Take a screenshot of the Unicode printer. I think a way that we could get around this would be to create a Jupyter notebook with all the printing examples, and screenshot that. We could also include the notebook in the supplement. This way we could also showcase the MathJax printing and maybe plotting. I like this option, but I think if we do it we should figure out how to combine the printing section with a section talking about how you can use SymPy with Jupyter, so that we have just one screenshot. A disadvantage here is that a screenshot would take up a lot of space.

  4. Figure out how to make xetex work with the current document (including style file). But I'm not even sure if the journal would allow it.

What do you think? Do you see any other options? I'm leaning toward 3, but I'm still thinking of how to make it work with the organization of the paper.

@certik
Copy link
Member Author

certik commented May 3, 2016

I would use 2. to get one or two pdf screenshots, but I wouldn't inline them, but rather put them as Figures. Just to show how unicode printing looks like.

The disadvantage of 3. is that the png screenshot will be ugly, wouldn't it?

Also, how are we going to print the rest of sympy outputs in the paper? Cannot we use a latex printer and render it using latex? That would be the best.

@asmeurer
Copy link
Member

asmeurer commented May 3, 2016

I would use 2. to get one or two pdf screenshots, but I wouldn't inline them, but rather put them as Figures. Just to show how unicode printing looks like.

I want to show str, Unicode pprint, ASCII pprint, and LaTeX (and rendered LaTeX if we show a screenshot of the notebook). Would you make all of them figures? This would be different from all the other examples in the paper (or do you think every example in the paper should be a figure?). Or just the Unicode one should be a figure? Wouldn't that look very inconsistent?

The disadvantage of 3. is that the png screenshot will be ugly, wouldn't it?

It wouldn't be ugly if we use a high enough resolution :)

It may also be possible to take a PDF rendering of the notebook. https://stackoverflow.com/questions/176476/how-can-i-automate-html-to-pdf-conversions suggests that http://wkhtmltopdf.org/ might be a way to do it. I know there's nbconvert, but that renders different from the notebook itself, and I'd like for the screenshot to actually look like the notebook in the browser.

Also, how are we going to print the rest of sympy outputs in the paper? Cannot we use a latex printer and render it using latex? That would be the best.

I think it would be confusing to show the output as inline math. Can you show an example of what this would look like?

@certik
Copy link
Member Author

certik commented May 3, 2016

We have two kinds of outputs:

a) a regular output of most examples all over the paper
b) examples of str, unicode pprint, ascii pprint and Latex

For b), I would just put those into a pdf figure. Check in the pdf into git. We only submit a pdf to the journal. Provide scripts to regenerate the pdf (for example using xetex, or ipython notebook, or whatever else that works).

For a), we still need to investigate. I would like a) to be doctested in either case. I was thinking of a special environment, that uses monospace for the commands after >>> and output in latex, something along these lines:

>>> Limit(sin(x)/x, x, 0)
$$\\lim_{x \\to 0^+}\\left(\\frac{1}{x} \\sin{\\left (x \\right )}\\right)$$

Where the latex output is obtained using latex(). The >>> Limit(sin(x)/x, x, 0) part needs to be put into some kind of an environment probably, either verbatim, or something else, perhaps even with syntax highlighting. The $$\\lim_{x \\to 0^+}\\left(\\frac{1}{x} \\sin{\\left (x \\right )}\\right)$$ part should also be a special environment, that aligns the equation on the left (by default it is aligned in the middle).

If this is too much work to setup, then we can use str printing, or ascii printing and use verbatim. Finally, the last option is to try unicode printing, but then we are running into the issues you described. Of all these options, latex will look the best in my opinion.

So let's see if I can come up with something.

@asmeurer
Copy link
Member

asmeurer commented May 3, 2016

The problem is that it's confusing, because you have a Python prompt, but the output is rendered math. There's no SymPy shell that actually works that way. I suppose we could use Jupyter prompts, with the rendered math, since that would match the notebook (assuming we put init_printing() at the top). I would either show the Python prompt with default (str) output (what we are doing now), or an IPython prompt. In other words, this is what a Jupyter notebook, converted to pdf with nbconvert, looks like

screen shot 2016-05-03 at 4 30 43 pm

We could show something similar, like

Here is the square root of $\pi$:
\begin{verbatim}
In [3]: sqrt(pi)
Out[3]:
\end{verbatim}
$$\sqrt{\pi}$$

That would look something like this

screen shot 2016-05-03 at 4 33 37 pm

Upsides to this:

  • Rendered math looks nicer, especially if we have some complicated examples (we don't presently, but I'd like to add at least one for integrals).

Downsides:

  • Takes up more vertical space. EDIT: It's even more vertical space, because I forgot to put the empty line above the "Out".
  • The >>> and str output is just the default Python behavior. We wouldn't need to explain anything. For the Jupyter output, we would need to explain that the output is as it would be in the Jupyter notebook with init_printing() (and it would have to be at the very beginning of the paper, before any examples).

And obviously neither case really solves the issue of how to display Unicode pretty printing. I like my option 2, but it will take some LaTeX fiddling to make it render as if it were just another example. And yes, you're right, we would just submit the built PDF for the Unicode example to the journal.

Another thing I haven't checked is how good or bad the Unicode printing looks in the monospace font the SIAM style file uses. I know it looks good using DejaVu Sans Mono, and generally less good with any other font, but it might be bad (i.e., the font might not even have those characters).

@certik
Copy link
Member Author

certik commented May 3, 2016

Ok, so regarding a), you seem to be leaning towards using str printing or str pretty printing?

We can get inspired with what SageTex does, see an example in sagetex.pdf, page 11, output 5 and 6. This renders the output in latex.

Regarding b), we just have to figure out a robust solution to generate the pdf out of it, we seem to be in agreement there.

@certik
Copy link
Member Author

certik commented May 3, 2016

Btw, in [1] we used str printing (not pretty printing) and we used the IPython prompts In/Out with numbers reset to 1 for each example (i.e. if the example had 3 prompts, they would always be numbered 1, 2, 3). For Mathematica we used the default text style. An example from the paper:

Limits
SymPy:
In [1]: limit(sin(x)/x, x, 0)
Out[1]: 1
In [2]: limit((2-sqrt(x))/(4-x),x,4)
Out[2]: 1/4
Mathematica:
In[1]:= Limit[Sin[x]/x,x->0]
Out[1]= 1
In[2]:= Limit[(2-Sqrt[x])/(4-x),x->4]
Out[2]= 1/4

[1] Čertík, O., Paprocki, M., Meurer, A., Granger, B., & Rathnayake, T. (2015). Symbolic Computing. In Encyclopedia of Applied and Computational Mathematics (pp. 1431–1439). Springer Berlin Heidelberg. http://doi.org/10.1007/978-3-540-70529-1_429

So that's one option for a).

@asmeurer
Copy link
Member

asmeurer commented May 3, 2016

I am leaning towards str printing, but not strongly, i.e., if you or someone else wants to go in and change the examples, I wouldn't be opposed to it, but I'm not going to do it myself. Actually we already need to go through the examples and make them consistent, so maybe it's something that we really should decide about.

@certik
Copy link
Member Author

certik commented May 3, 2016

Let's use str printing then. We will provide the nice pdf figures with all kinds of pretty printing, showing that it really looks nice. But for examples, the str printing is great, since that's what you get if you just print an expression in Python. That way we stress that SymPy is just a Python library.

In addition, for example in the following example:

In [2]: limit((2*E**((1-cos(x))/sin(x))-1)**(sinh(x)/atan(x)**2), x, 0)
Out[2]: E

the complicated expression is actually in the input field, not the output field. So even latex printing can't fix that. For this reason, I think we should just use str printing.

@asmeurer
Copy link
Member

asmeurer commented May 3, 2016

That way we stress that SymPy is just a Python library.

My thoughts exactly.

@asmeurer
Copy link
Member

asmeurer commented May 3, 2016

For the examples that have large inputs, we should have some text preceding them with rendered mathematics, so it's easier to see the expression being computed.

@certik
Copy link
Member Author

certik commented May 12, 2016

I am closing this one, further discussion is at #122 (comment).

@certik certik closed this May 12, 2016
@certik certik deleted the unicode_fix4 branch May 12, 2016 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants