Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new translate function to include accents and other latex macros in symbol names #2488

Merged
merged 14 commits into from
Oct 16, 2013

Conversation

moble
Copy link
Contributor

@moble moble commented Sep 29, 2013

This commit adds the ability to create more complex symbol names with accents, bolds, etc., and still have them translated into reasonable latex on printing. This is inspired by the old latex_ex function from the galgebra submodule, from the current release version, but not the dev version. For example, some times I want the vector L and its corresponding unit vector to both be defined, and possibly even distinguished from some scalar L. I can do something like the following:

L, Lvec, Lhat = symbols('L, Lvec, Lhat')

When these are printed to latex, they will just be strings of italic letters. The alternative would be something like

L, Lvec, Lhat = symbols(r'L, \vec{L}, \hat{L}')

But when these are printed to non-latex things, this comes out looking terrible. For example, with the ccode printer, this would give invalid C code.

With this patch, a symbol name like Lhat gets converted to \hat{L} when printed to latex, without interfering with other sympy functions. So this patch gives the best of both worlds for latex and ccode printing, for example.

The list of recognized macros is stored in the new accent_keys list. Multiple accents can be combined, as in the symbol name Lhatdot, which becomes \dot{\hat{L}} in latex, while Ldothat becomes \hat{\dot{L}}. Also, because some people like CamelCase variables, names like LDotHat are processed appropriately. (In fact, any form of capitalization, such as all caps, is accepted.)

@jrioux
Copy link
Member

jrioux commented Oct 2, 2013

Looks good in principle, but please add tests.

@moble
Copy link
Contributor Author

moble commented Oct 2, 2013

Sorry, I'm very new to all this. By "add tests" do you mean doctest? If so, does my last commit look sufficient?

@asmeurer
Copy link
Member

asmeurer commented Oct 2, 2013

No, he also means tests in a test file.

@asmeurer
Copy link
Member

asmeurer commented Oct 2, 2013

In other words, test_latex.py.

By the way, it would be awesome to see this done for Unicode pretty printing as well.

@moble
Copy link
Contributor Author

moble commented Oct 2, 2013

Okay, I think I've added a fairly thorough test, which even seems to pass!

As for the pretty printer, I'd be happy to help out if it's as easy as this one, and if someone could point me to the right place. But having just looked at it for the first time, I'm totally lost in the pretty printer code...

@jrioux
Copy link
Member

jrioux commented Oct 3, 2013

Tests pass, but not the doctest. I think you just need to turn the docstring into a raw string.

________________________________________________________________________________
________________________ sympy.printing.latex.translate ________________________
File "/home/travis/virtualenv/python2.7/local/lib/python2.7/site-packages/sympy/printing/latex.py", line 1641, in sympy.printing.latex.translate
Failed example:
    translate('alphahatdotprime')
Expected:
    "{\dot{\hat{\alpha}}}'"
Got:
    "{\\dot{\\hat{\\alpha}}}'"

@moble
Copy link
Contributor Author

moble commented Oct 3, 2013

Sorry about that. I didn't know how to run the doctest myself. But Travis CI now reports that it passes.

@asmeurer
Copy link
Member

asmeurer commented Oct 4, 2013

./bin/doctest

@asmeurer
Copy link
Member

asmeurer commented Oct 4, 2013

I guess you should check if the unicodedata module has anything that makes it easy to add accents to characters. Otherwise, you'll need to add some mapping dictionaries to pretty_symbology.py.

@asmeurer
Copy link
Member

asmeurer commented Oct 4, 2013

Or possibly some of the accents use the combining features of Unicode.

@jrioux
Copy link
Member

jrioux commented Oct 4, 2013

We don't need prm and bm when prime and bold are provided. Also, abs isn't an accent; I find it confusing since we have an actual SymPy construct that will print like that: Abs(Symbol('x')). So I think these three (prm, bm and abs) should probably be removed. Other than that, I don't know if Aaron will agree, but I wouldn't block this on the pretty printer stuff, though if you feel like tackling it as well, please do!

@jrioux
Copy link
Member

jrioux commented Oct 4, 2013

latex(Symbol('hbar')) now returns \bar{h} instead of \hbar. So the processing for special symbols needs to happen before processing the accents.

@jrioux
Copy link
Member

jrioux commented Oct 4, 2013

The accent code should also check that it isn't matching the whole symbol name, e.g. latex(Symbol('ddot')) currently returns \ddot{} but it should return \dot{d}, while latex(Symbol('hat')) currently returns \hat{} but it probably should just return hat as is.

@moble
Copy link
Contributor Author

moble commented Oct 4, 2013

I agree with your last two comments, and I'll work on them ASAP (next week).

As for prm versus prime, etc., to me the question isn't "what can users name their variables?"; it's "what do users want to name their variables?" In the case of prm and bm, I know at least one user who wants to use the shortened versions: the author of the old latex_ex code. So I kept them for some sort of backwards compatibility. More generally I don't think there's any point limiting the range of symbols users can define. In fact, on my next push, I'll add mag as a synonym for abs. Which brings me to...

Regarding abs, some times you need to name a variable that encapsulates Abs(Symbol('x')) -- or a similar concept. For example, in the code I was writing that prompted me change the translate function in the first place, I have an evolution equation for the magnitude of the vector L, so I actually need to define the quantity LVecMag. I can use the fact that that symbol represents something expressed otherwise in sympy, but I do want the symbol itself to print nicely.

"Accents" probably wasn't the best choice of word; I just stole it from the latex_ex code. Again, I don't think we should limit ourselves, and thereby limit users; I think we should provide whatever constructions (accents or otherwise) people will find useful in naming their variables. In the same way, the sub- and superscript conversions are not strictly necessary, but really nice to have. So what else might be nice to have?

@asmeurer
Copy link
Member

asmeurer commented Oct 6, 2013

No, it doesn't need to block on the Unicode stuff. If you don't want to do that, just open an issue for it.

@asmeurer
Copy link
Member

asmeurer commented Oct 6, 2013

I'm not convinced about abs. Couldn't you just do a substitution of your absx symbol with Abs(x) before printing. Or even better, just use Abs(x). What is the benefit of having a separate Symbol? There are definitely disadvantages, namely, anything that would recognize the relationship between x and Abs(x) would no longer work. In some parts of SymPy you can even get wrong results by doing this, because they implicitly assume that different symbols are independent of one another.

In general, if you want to represent something in SymPy, you're encouraged to represent it as it actually is, i.e., use Abs(x) instead of Symbol('absx').

@moble
Copy link
Contributor Author

moble commented Oct 7, 2013

I would agree with you if variables were only ever rvalues, but my variable is almost exclusively an lvalue; I never actually need x itself. Instead, I have some big, complicated expression that I happen to know is equal to Abs(x). I need sympy to manipulate, simplify, take the Horner form, etc., of that expression, and I need a variable to hold the expression. I could name that variable y, but I (or others reading my code) might forget that y=Abs(x). And x is a quantity that appears in the literature, so I can't just redefine it. If I can use a variable named xabs, it's much easier to understand my code. And since Abs(x) is obviously not an lvalue, I use xabs.

@moble
Copy link
Contributor Author

moble commented Oct 7, 2013

I've made the two changes jrioux suggested about not clobbering other symbols or returning empty accents, and added associated tests to ensure those work as expected.

But, as per my comment above, I'm doubling down on my notion that more general things like abs should be allowed :). I've added new keys norm, avg, and mag that all fall into that category. I also changed the word used in this new stuff from "accents" to "modifiers", better describe the more general idea I'm going for.

I'll just open an issue for the Unicode version, because I can't find an easy way into that code. But I will keep looking at it.

@asmeurer
Copy link
Member

asmeurer commented Oct 7, 2013

The unicode pretty printer is admittedly much more complicated than the other printers, but I think all you need to do is define a function in pretty_symbology that converts a symbol to the accented version of it (or if you end up just having to write these out manually as dictionaries because there's nothing in unicodedata that can help you, just define the dictionary). Then, modify the _print_Symbol in pretty.py to use it.

@asmeurer
Copy link
Member

asmeurer commented Oct 7, 2013

Actually, I guess the function you need to modify is pretty_symbol in pretty_symbology.py.

@asmeurer
Copy link
Member

Here's what it looks like for me. Pretty awesome.

screenshot 2013-10-10 19 00 46

@asmeurer
Copy link
Member

I'm not sure if it makes sense to strip the modifiers if we have no faces. I guess it's OK. I don't have a strong opinion either way.

Could you add a test or two for multiple modifiers?

We should figure out a way to document this. Can we just import the dictionary into Sphinx? If it looks ugly, maybe create a special variable that is just the keys of the dictionary and import that.

@asmeurer
Copy link
Member

This looks like a bug

In [9]: Symbol('x_dot')
Out[9]: x_dot__

In [10]: Symbol('x^dot')
Out[10]: x___dot

Probably it those should give the same thing as xdot, at least the first one (maybe the second should give x ̇). At any rate, it shouldn't add extra _.

@moble
Copy link
Contributor Author

moble commented Oct 11, 2013

I agree about the faces issue. I've been going back and forth about it, but I think I've settled on not translating them, just because it removes information. People might legitimately have two variables, like x and xbold, that would need to appear different.

@moble
Copy link
Contributor Author

moble commented Oct 11, 2013

I've fixed the extra-underscores bug. But I think x_dot should probably stay as it is. Or at least, the parsing would need some upstream changes that I'd be hesitant to adjust, for fear of introducing new bugs.

@moble
Copy link
Contributor Author

moble commented Oct 11, 2013

I've added a couple little tests for multiple modifiers (including a couple doozies from the latex tests).

Here's a screenshot from my terminal showing that certain combinations don't come out right. The first and last have their second modifiers shifted way left. But the second and third look just fine. I don't think that's something I can fix, but it's something to be aware of.

screen shot 2013-10-10 at 10 48 37 pm

@asmeurer
Copy link
Member

I get the same behavior. What OS are you on?

By the way, xvechat works. Maybe there is a good ordering here. xbrevecheck and xcheckbreve both give the same thing.

Another question, what is the expected behavior for multiname symbols, like xyzhat?

@asmeurer
Copy link
Member

xhatvec works in Terminal.app but not in iTerm 2. So I guess it's a bug in iTerm 2. The brevecheck one doesn't work in either.

@asmeurer
Copy link
Member

If you're not on Linux, someone should check there. From what I've seen, the Unicode support in the Linux terminal emulators can be pretty bad. I wouldn't be surprised if this does strange things there.

By the way, x̌̆ renders correctly in my browser. So these are all bugs in the terminal emulators I guess.

@moble
Copy link
Contributor Author

moble commented Oct 11, 2013

I use OS X Terminal.app. I agree that it's just a bug in the unicode implementations of the terminals. I got similar badness from just echo -e ' x\u0306\u030C' directly in bash. So I don't think it's something we should necessarily worry about. Of course, I'd guess that the most common combinations would be xVecDot and xHatDot, both of which work correctly for me (as do more dots).

For xyzhat, I'd say the current output of xyẑ is probably the right behavior. Latex should just put a hat over the whole thing (which ends up in the middle). I don't think we can make something nice out of every possible input, but it's still nice to give users the ability to at least have nice versions of the most basic inputs, which still cover most cases I've ever needed.

@asmeurer
Copy link
Member

I opened https://code.google.com/p/iterm2/issues/detail?id=2639 in the iTerm tracker, so hopefully it will get fixed there (unless it ends up being a Mac OS X rendering issue).

@asmeurer
Copy link
Member

This looks good to me. I would still like to see some testing on Linux. If things end up being really bad there we may need to add some kind of option to disable this.

By the way, another idea would be to make this work for x^ or x~ and so on. Don't know if it's a good idea (x^ for one is ambiguous with x^1), but it's an idea.

@asmeurer
Copy link
Member

Oh, and I couldn't think of any, but people will likely come up with unfortunate coincidences of words that are combinations of these characters (like that is ). I'm not sure what to do about it, though.

@lidavidm
Copy link
Member

I think it might have to do with the fonts more than the terminal emulator:

Here is Linux, Gnome Terminal, font is Consolas

image

Using font Cousine:

image

Lucida Console Semi-Condensed:

image

DejaVu Sans Mono:

image

@moble
Copy link
Contributor Author

moble commented Oct 11, 2013

Oh yeah. Look at that. I get the same results as David on OS X Terminal. I usually use Source Code Pro. I guess there's really nothing we can do about that, but it would definitely be worth mentioning in the docs.

I didn't do an exhaustive search, but I find that -- in addition to the Lucida Console Semi-Condensed that seems to work, judging from David's screenshot -- Andale Mono, Courier, and Monaco are other fixed-width fonts that also work pretty well. I tested it with:

from sympy.printing.pretty.pretty_symbology import modifier_dict
[Symbol('x'+key1+key2) for key1 in modifier_dict for key2 in modifier_dict]

Given this potential for ugly results if someone can't or won't use a "good" unicode font, as well as the issue of that -> , etc., maybe the best solution would be to add an option to init_printing (or whatever), and let the user decide whether or not these substitutions fit their workflow. What do you think?

@asmeurer
Copy link
Member

I guess you're right. I use DejaVu Sans Mono, but my output is a little different, but I think it might be because my ASCII font is Menlo, so that may be interfering.

@asmeurer
Copy link
Member

I would add the option to the printer, and for init_printing just implement https://code.google.com/p/sympy/issues/detail?id=3612.

@asmeurer
Copy link
Member

Any more comments? I'm fine with merging this as-is.

By the way, the @sympy/mechanics may be interested in the ability to print symbols with a dot over them in the terminal.

@moble
Copy link
Contributor Author

moble commented Oct 16, 2013

I'm happy with it. I won't have any time to work on the init_printing stuff for a while, and that's kind of separate anyway, so I think this is ready to merge now.

asmeurer added a commit that referenced this pull request Oct 16, 2013
Add new translate function to include accents and other latex macros in symbol names
@asmeurer asmeurer merged commit ce0c772 into sympy:master Oct 16, 2013
@asmeurer
Copy link
Member

Thanks for the contribution!

@asmeurer
Copy link
Member

I opened https://code.google.com/p/sympy/issues/detail?id=4054 about the option.

@moble
Copy link
Contributor Author

moble commented Oct 17, 2013

My pleasure. Thanks for your help improving it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants