Add new translate function to include accents and other latex macros in symbol names #2488

moble · 2013-09-29T18:23:39Z

This commit adds the ability to create more complex symbol names with accents, bolds, etc., and still have them translated into reasonable latex on printing. This is inspired by the old latex_ex function from the galgebra submodule, from the current release version, but not the dev version. For example, some times I want the vector L and its corresponding unit vector to both be defined, and possibly even distinguished from some scalar L. I can do something like the following:

L, Lvec, Lhat = symbols('L, Lvec, Lhat')

When these are printed to latex, they will just be strings of italic letters. The alternative would be something like

L, Lvec, Lhat = symbols(r'L, \vec{L}, \hat{L}')

But when these are printed to non-latex things, this comes out looking terrible. For example, with the ccode printer, this would give invalid C code.

With this patch, a symbol name like Lhat gets converted to \hat{L} when printed to latex, without interfering with other sympy functions. So this patch gives the best of both worlds for latex and ccode printing, for example.

The list of recognized macros is stored in the new accent_keys list. Multiple accents can be combined, as in the symbol name Lhatdot, which becomes \dot{\hat{L}} in latex, while Ldothat becomes \hat{\dot{L}}. Also, because some people like CamelCase variables, names like LDotHat are processed appropriately. (In fact, any form of capitalization, such as all caps, is accepted.)

jrioux · 2013-10-02T11:43:43Z

Looks good in principle, but please add tests.

moble · 2013-10-02T15:38:51Z

Sorry, I'm very new to all this. By "add tests" do you mean doctest? If so, does my last commit look sufficient?

asmeurer · 2013-10-02T16:55:21Z

No, he also means tests in a test file.

asmeurer · 2013-10-02T16:56:36Z

In other words, test_latex.py.

By the way, it would be awesome to see this done for Unicode pretty printing as well.

moble · 2013-10-02T19:27:14Z

Okay, I think I've added a fairly thorough test, which even seems to pass!

As for the pretty printer, I'd be happy to help out if it's as easy as this one, and if someone could point me to the right place. But having just looked at it for the first time, I'm totally lost in the pretty printer code...

jrioux · 2013-10-03T08:22:08Z

Tests pass, but not the doctest. I think you just need to turn the docstring into a raw string.

________________________________________________________________________________
________________________ sympy.printing.latex.translate ________________________
File "/home/travis/virtualenv/python2.7/local/lib/python2.7/site-packages/sympy/printing/latex.py", line 1641, in sympy.printing.latex.translate
Failed example:
    translate('alphahatdotprime')
Expected:
    "{\dot{\hat{\alpha}}}'"
Got:
    "{\\dot{\\hat{\\alpha}}}'"

moble · 2013-10-03T13:08:15Z

Sorry about that. I didn't know how to run the doctest myself. But Travis CI now reports that it passes.

asmeurer · 2013-10-04T00:22:46Z

./bin/doctest

asmeurer · 2013-10-04T00:26:08Z

I guess you should check if the unicodedata module has anything that makes it easy to add accents to characters. Otherwise, you'll need to add some mapping dictionaries to pretty_symbology.py.

asmeurer · 2013-10-04T00:26:32Z

Or possibly some of the accents use the combining features of Unicode.

jrioux · 2013-10-04T13:04:09Z

We don't need prm and bm when prime and bold are provided. Also, abs isn't an accent; I find it confusing since we have an actual SymPy construct that will print like that: Abs(Symbol('x')). So I think these three (prm, bm and abs) should probably be removed. Other than that, I don't know if Aaron will agree, but I wouldn't block this on the pretty printer stuff, though if you feel like tackling it as well, please do!

jrioux · 2013-10-04T13:11:46Z

latex(Symbol('hbar')) now returns \bar{h} instead of \hbar. So the processing for special symbols needs to happen before processing the accents.

jrioux · 2013-10-04T13:18:25Z

The accent code should also check that it isn't matching the whole symbol name, e.g. latex(Symbol('ddot')) currently returns \ddot{} but it should return \dot{d}, while latex(Symbol('hat')) currently returns \hat{} but it probably should just return hat as is.

moble · 2013-10-04T14:03:32Z

I agree with your last two comments, and I'll work on them ASAP (next week).

As for prm versus prime, etc., to me the question isn't "what can users name their variables?"; it's "what do users want to name their variables?" In the case of prm and bm, I know at least one user who wants to use the shortened versions: the author of the old latex_ex code. So I kept them for some sort of backwards compatibility. More generally I don't think there's any point limiting the range of symbols users can define. In fact, on my next push, I'll add mag as a synonym for abs. Which brings me to...

Regarding abs, some times you need to name a variable that encapsulates Abs(Symbol('x')) -- or a similar concept. For example, in the code I was writing that prompted me change the translate function in the first place, I have an evolution equation for the magnitude of the vector L, so I actually need to define the quantity LVecMag. I can use the fact that that symbol represents something expressed otherwise in sympy, but I do want the symbol itself to print nicely.

"Accents" probably wasn't the best choice of word; I just stole it from the latex_ex code. Again, I don't think we should limit ourselves, and thereby limit users; I think we should provide whatever constructions (accents or otherwise) people will find useful in naming their variables. In the same way, the sub- and superscript conversions are not strictly necessary, but really nice to have. So what else might be nice to have?

asmeurer · 2013-10-06T23:09:53Z

No, it doesn't need to block on the Unicode stuff. If you don't want to do that, just open an issue for it.

asmeurer · 2013-10-06T23:14:21Z

I'm not convinced about abs. Couldn't you just do a substitution of your absx symbol with Abs(x) before printing. Or even better, just use Abs(x). What is the benefit of having a separate Symbol? There are definitely disadvantages, namely, anything that would recognize the relationship between x and Abs(x) would no longer work. In some parts of SymPy you can even get wrong results by doing this, because they implicitly assume that different symbols are independent of one another.

In general, if you want to represent something in SymPy, you're encouraged to represent it as it actually is, i.e., use Abs(x) instead of Symbol('absx').

moble · 2013-10-07T15:40:17Z

I would agree with you if variables were only ever rvalues, but my variable is almost exclusively an lvalue; I never actually need x itself. Instead, I have some big, complicated expression that I happen to know is equal to Abs(x). I need sympy to manipulate, simplify, take the Horner form, etc., of that expression, and I need a variable to hold the expression. I could name that variable y, but I (or others reading my code) might forget that y=Abs(x). And x is a quantity that appears in the literature, so I can't just redefine it. If I can use a variable named xabs, it's much easier to understand my code. And since Abs(x) is obviously not an lvalue, I use xabs.

…tly matching modifiers; add related tests

moble · 2013-10-07T16:11:02Z

I've made the two changes jrioux suggested about not clobbering other symbols or returning empty accents, and added associated tests to ensure those work as expected.

But, as per my comment above, I'm doubling down on my notion that more general things like abs should be allowed :). I've added new keys norm, avg, and mag that all fall into that category. I also changed the word used in this new stuff from "accents" to "modifiers", better describe the more general idea I'm going for.

I'll just open an issue for the Unicode version, because I can't find an easy way into that code. But I will keep looking at it.

asmeurer · 2013-10-07T22:46:52Z

The unicode pretty printer is admittedly much more complicated than the other printers, but I think all you need to do is define a function in pretty_symbology that converts a symbol to the accented version of it (or if you end up just having to write these out manually as dictionaries because there's nothing in unicodedata that can help you, just define the dictionary). Then, modify the _print_Symbol in pretty.py to use it.

asmeurer · 2013-10-07T22:47:34Z

Actually, I guess the function you need to modify is pretty_symbol in pretty_symbology.py.

asmeurer · 2013-10-11T01:02:12Z

Here's what it looks like for me. Pretty awesome.

asmeurer · 2013-10-11T01:05:39Z

I'm not sure if it makes sense to strip the modifiers if we have no faces. I guess it's OK. I don't have a strong opinion either way.

Could you add a test or two for multiple modifiers?

We should figure out a way to document this. Can we just import the dictionary into Sphinx? If it looks ugly, maybe create a special variable that is just the keys of the dictionary and import that.

asmeurer · 2013-10-11T01:08:35Z

This looks like a bug

In [9]: Symbol('x_dot')
Out[9]: x_dot__

In [10]: Symbol('x^dot')
Out[10]: x___dot

Probably it those should give the same thing as xdot, at least the first one (maybe the second should give x ̇). At any rate, it shouldn't add extra _.

moble · 2013-10-11T03:01:43Z

I agree about the faces issue. I've been going back and forth about it, but I think I've settled on not translating them, just because it removes information. People might legitimately have two variables, like x and xbold, that would need to appear different.

moble · 2013-10-11T03:02:48Z

I've fixed the extra-underscores bug. But I think x_dot should probably stay as it is. Or at least, the parsing would need some upstream changes that I'd be hesitant to adjust, for fear of introducing new bugs.

moble · 2013-10-11T03:05:30Z

I've added a couple little tests for multiple modifiers (including a couple doozies from the latex tests).

Here's a screenshot from my terminal showing that certain combinations don't come out right. The first and last have their second modifiers shifted way left. But the second and third look just fine. I don't think that's something I can fix, but it's something to be aware of.

asmeurer · 2013-10-11T03:49:44Z

I get the same behavior. What OS are you on?

By the way, xvechat works. Maybe there is a good ordering here. xbrevecheck and xcheckbreve both give the same thing.

Another question, what is the expected behavior for multiname symbols, like xyzhat?

asmeurer · 2013-10-11T03:52:04Z

xhatvec works in Terminal.app but not in iTerm 2. So I guess it's a bug in iTerm 2. The brevecheck one doesn't work in either.

asmeurer · 2013-10-11T03:53:51Z

If you're not on Linux, someone should check there. From what I've seen, the Unicode support in the Linux terminal emulators can be pretty bad. I wouldn't be surprised if this does strange things there.

By the way, x̌̆ renders correctly in my browser. So these are all bugs in the terminal emulators I guess.

moble · 2013-10-11T04:32:21Z

I use OS X Terminal.app. I agree that it's just a bug in the unicode implementations of the terminals. I got similar badness from just echo -e ' x\u0306\u030C' directly in bash. So I don't think it's something we should necessarily worry about. Of course, I'd guess that the most common combinations would be xVecDot and xHatDot, both of which work correctly for me (as do more dots).

For xyzhat, I'd say the current output of xyẑ is probably the right behavior. Latex should just put a hat over the whole thing (which ends up in the middle). I don't think we can make something nice out of every possible input, but it's still nice to give users the ability to at least have nice versions of the most basic inputs, which still cover most cases I've ever needed.

asmeurer · 2013-10-11T06:15:49Z

I opened https://code.google.com/p/iterm2/issues/detail?id=2639 in the iTerm tracker, so hopefully it will get fixed there (unless it ends up being a Mac OS X rendering issue).

asmeurer · 2013-10-11T06:19:18Z

This looks good to me. I would still like to see some testing on Linux. If things end up being really bad there we may need to add some kind of option to disable this.

By the way, another idea would be to make this work for x^ or x~ and so on. Don't know if it's a good idea (x^ for one is ambiguous with x^1), but it's an idea.

asmeurer · 2013-10-11T06:22:07Z

Oh, and I couldn't think of any, but people will likely come up with unfortunate coincidences of words that are combinations of these characters (like that is t̂). I'm not sure what to do about it, though.

lidavidm · 2013-10-11T15:34:34Z

I think it might have to do with the fonts more than the terminal emulator:

Here is Linux, Gnome Terminal, font is Consolas

Using font Cousine:

Lucida Console Semi-Condensed:

DejaVu Sans Mono:

moble · 2013-10-11T16:58:38Z

Oh yeah. Look at that. I get the same results as David on OS X Terminal. I usually use Source Code Pro. I guess there's really nothing we can do about that, but it would definitely be worth mentioning in the docs.

I didn't do an exhaustive search, but I find that -- in addition to the Lucida Console Semi-Condensed that seems to work, judging from David's screenshot -- Andale Mono, Courier, and Monaco are other fixed-width fonts that also work pretty well. I tested it with:

from sympy.printing.pretty.pretty_symbology import modifier_dict
[Symbol('x'+key1+key2) for key1 in modifier_dict for key2 in modifier_dict]

Given this potential for ugly results if someone can't or won't use a "good" unicode font, as well as the issue of that -> t̂, etc., maybe the best solution would be to add an option to init_printing (or whatever), and let the user decide whether or not these substitutions fit their workflow. What do you think?

asmeurer · 2013-10-11T17:23:34Z

I guess you're right. I use DejaVu Sans Mono, but my output is a little different, but I think it might be because my ASCII font is Menlo, so that may be interfering.

asmeurer · 2013-10-11T17:24:52Z

I would add the option to the printer, and for init_printing just implement https://code.google.com/p/sympy/issues/detail?id=3612.

asmeurer · 2013-10-15T21:30:21Z

Any more comments? I'm fine with merging this as-is.

By the way, the @sympy/mechanics may be interested in the ability to print symbols with a dot over them in the terminal.

moble · 2013-10-16T15:01:40Z

I'm happy with it. I won't have any time to work on the init_printing stuff for a while, and that's kind of separate anyway, so I think this is ready to merge now.

Add new translate function to include accents and other latex macros in symbol names

asmeurer · 2013-10-16T16:47:20Z

Thanks for the contribution!

asmeurer · 2013-10-16T16:51:49Z

I opened https://code.google.com/p/sympy/issues/detail?id=4054 about the option.

moble · 2013-10-17T20:06:46Z

My pleasure. Thanks for your help improving it!

moble added 4 commits September 29, 2013 14:10

Add new translate function to include accents

09cfba7

Remove whitespace Travis CI chokes on

8509dda

Remove Lambda, because of course that will be taken care of

55d081d

Allow different (upper) cases, and remove accent more simply

c45a00a

Add doctest for translate function

a8e60bd

moble added 2 commits October 2, 2013 15:20

Add tests for accents

3a070ac

TeX abs explicitly because MathJax does not know it

8c617fd

raw docstring for doctest

98857ad

Process Greek letters and other_symbols first; pass on variables exac…

0a3bc54

…tly matching modifiers; add related tests

moble mentioned this pull request Oct 7, 2013

Add symbol modifiers to Unicode pretty printing #2518

Closed

moble added 2 commits October 10, 2013 11:11

Use 3.2-compatible unicode syntax

57642b2

Increase pretty_print compatibility with latex printing

cc1d38b

Roll back faces; fix underscores; add harder tests

5a0c7ea

Fix incorrect entry of a unicode combination

d055600

asmeurer added a commit that referenced this pull request Oct 16, 2013

Merge pull request #2488 from MOBle/master

ce0c772

Add new translate function to include accents and other latex macros in symbol names

asmeurer merged commit ce0c772 into sympy:master Oct 16, 2013

moble mentioned this pull request Dec 7, 2013

Passing printer parameters #2656

Open

asmeurer mentioned this pull request Feb 7, 2019

Add option to Unicode printer to disable combining accents #7153

Open

Add new translate function to include accents and other latex macros in symbol names #2488

Add new translate function to include accents and other latex macros in symbol names #2488

Conversation

moble commented Sep 29, 2013

jrioux commented Oct 2, 2013

moble commented Oct 2, 2013

asmeurer commented Oct 2, 2013

asmeurer commented Oct 2, 2013

moble commented Oct 2, 2013

jrioux commented Oct 3, 2013

moble commented Oct 3, 2013

asmeurer commented Oct 4, 2013

asmeurer commented Oct 4, 2013

asmeurer commented Oct 4, 2013

jrioux commented Oct 4, 2013

jrioux commented Oct 4, 2013

jrioux commented Oct 4, 2013

moble commented Oct 4, 2013

asmeurer commented Oct 6, 2013

asmeurer commented Oct 6, 2013

moble commented Oct 7, 2013

moble commented Oct 7, 2013

asmeurer commented Oct 7, 2013

asmeurer commented Oct 7, 2013

asmeurer commented Oct 11, 2013

asmeurer commented Oct 11, 2013

asmeurer commented Oct 11, 2013

moble commented Oct 11, 2013

moble commented Oct 11, 2013

moble commented Oct 11, 2013

asmeurer commented Oct 11, 2013

asmeurer commented Oct 11, 2013

asmeurer commented Oct 11, 2013

moble commented Oct 11, 2013

asmeurer commented Oct 11, 2013

asmeurer commented Oct 11, 2013

asmeurer commented Oct 11, 2013

lidavidm commented Oct 11, 2013

moble commented Oct 11, 2013

asmeurer commented Oct 11, 2013

asmeurer commented Oct 11, 2013

asmeurer commented Oct 15, 2013

moble commented Oct 16, 2013

asmeurer commented Oct 16, 2013

asmeurer commented Oct 16, 2013

moble commented Oct 17, 2013