Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve LaTeXing of strings #11498

Closed
novoselt opened this issue Jun 16, 2011 · 16 comments
Closed

Improve LaTeXing of strings #11498

novoselt opened this issue Jun 16, 2011 · 16 comments

Comments

@novoselt
Copy link
Member

See the attached PDF for an example of a class that has no _latex_. All commands are repeated twice, one with Sage-4.7 and the second one with the patch applied. In both cases typeset checkbox on the top of the worksheet was tuned on.

Also, I invite you to try

RubiksCube()

in the notebook with typeset mode on before and after the patch - such situations are precisely the ones that I wanted to solve (they are so bad, that printout does not reflect them accurately).

It is possible that user classes will not have customized latex method and there are some standard classes that use their string representation. If this representation includes multiple lines, the result in most cases is not very good and sometimes quite horrible, making it unpleasant to use the notebook in typeset mode. The patch aims to fix this situation and clean up latex module along the way.

The major change is in sage.latex.str_function. Also sage.latex.JSMath.eval is simplified a bit and does not alter the code almost at all anymore. The rest are mostly doctest adjustments and some code clean-up and simplifications which happened while I was going through it to understand how it works.

Apply:

  1. attachment: trac_11498_LaTeX.patch

CC: @kcrisman @jhpalmieri

Component: user interface

Keywords: sd31

Author: Andrey Novoseltsev

Reviewer: John Palmieri

Merged: sage-4.7.2.alpha0

Issue created by migration from https://trac.sagemath.org/ticket/11498

@novoselt
Copy link
Member Author

comment:1

Attachment: String_LaTeXing_--_Sage.pdf.gz

I still need to work on these failures:

	sage -t -long devel/sage-main/sage/graphs/generic_graph.py # 1 doctests failed
	sage -t -long devel/sage-main/sage/combinat/free_module.py # 13 doctests failed
	sage -t -long devel/sage-main/sage/modular/hecke/algebra.py # 1 doctests failed
	sage -t -long devel/sage-main/doc/de/tutorial/latex.rst # 1 doctests failed
	sage -t -long devel/sage-main/sage/sets/set.py # 1 doctests failed
	sage -t -long devel/sage-main/sage/misc/html.py # 2 doctests failed
	sage -t -long devel/sage-main/doc/en/tutorial/latex.rst # 1 doctests failed

@novoselt

This comment has been minimized.

@novoselt
Copy link
Member Author

comment:3

By the way, the issue of this ticket was discussed here:

http://groups.google.com/group/sage-devel/browse_thread/thread/9aee79aaa40c8390/e5d55c0b800775b1

So John is cc-ed due to his explicit request, hint-hint ;-)

@jhpalmieri
Copy link
Member

comment:4

Overall, this looks pretty good, but it could use some work. Here are some comments.

  • in html.py, the comments "We work around a limitation of jsmath (it can't typeset '\texttt')::" should be deleted, since those doctests don't involve texttt anymore. (Do we even need to do the replacement s.replace('\\texttt','\\hbox') now, or can that code be deleted?)

  • in latex.py, for bool_function and None_function, "\mathrm{%s}" is preferable to "{\rm %s}". Actually, I think we can just return "\mathrm{%s}" always, regardless of notebook vs. command line distinctions. Do you know why we had two versions for bool_function before? It looks okay to me without the mbox...

  • a few typos and rewordings:

@@ -239,7 +239,7 @@ def str_function(x):
     If ``x`` contains only digits with, possibly, a single decimal point and/or
     a sign in front, it is considered to be its own representation. Otherwise
     each line of ``x`` is wrapped in a ``\verb`` command and these lines are
-    assembled in a left-justified array. This gives to complicated string the
+    assembled in a left-justified array. This gives to complicated strings the
     closest look to their "terminal representation".
    
     .. warning:: Such wrappers **cannot** be used as arguments of LaTeX
@@ -289,7 +289,7 @@ def str_function(x):
     # There is a bug in verb-space treatment in jsMath...
     spacer = "\\phantom{%s}"
     # \phantom{\verb!%s!} is more accurate and it works, but it is not a valid
-    # LaTeX and may cause problems, so let's live with the above variant untill
+    # LaTeX and may cause problems, so let's live with the above variant until
     # spaces are properly treated in jsMath/MathJax and we don't need to worry.
     lines = []
     for line in x.split("\n"):
  • another comment about the old code: I think that the output from dict_function looks better with no extra space after the colon, so I would suggest the following, but take a look yourself and see what you think:
@@ -334,12 +334,10 @@ def dict_function(x):
                \left[\sin\left(z^{2}\right), \frac{1}{2} \, y\right]\right\}
     """
     return "".join([r"\left\{",
-                    ", ".join(r"%s :\: %s" % (latex(key), latex(value))
+                    ", ".join(r"%s : %s" % (latex(key), latex(value))
                               for key, value in x.iteritems()),
                     r"\right\}"])
  • Also, in JSMath().eval, if x is already a LaTeX expression, then x=str(x) doesn't do anything, and neither does x=latex(x), so I think we can simplify it more:
@@ -1680,11 +1681,7 @@ class JSMath:
             sage: JSMath().eval(type(3), mode='inline')
             <html>...\verb|&lt;type|\phantom{x}\verb|'sage.rings.integer.Integer'&gt;|</span></html>
         """
-        # If x is already a LaTeX expression, i.e. the output of latex(blah),
-        # we will treat it as a string, so that we can see the code itself.
-        if isinstance(x, LatexExpr):
-            x = str(x)
-        # Now get a regular LaTeX representation of x...
+        # Get a regular LaTeX representation of x...
  • Finally, the output in one of the doctests,
            <html>...\verb|&lt;type|\phantom{x}\verb|'sage.rings.integer.Integer'&gt;|</span></html>

looks like bad latex: the "<" inside of the \verb environment shouldn't be typeset correctly, but it actually seems to work. I wonder why...

When we switch to MathJax (#9774), we'll have to redo some of this.

@novoselt
Copy link
Member Author

comment:5

Thanks for the comments!

I don't know why there are differences for different modes, this is the first time I am working with these modules. I agree that the fewer are differences the better and will also remove the extra space.

Here is the reason for extra string conversion:

sage: s = "asdf fs"
sage: l = latex(s)
sage: print l
sage: print latex(l)
sage: print latex(str(l))
\verb|asdf|\phantom{x}\verb|fs|
\verb|asdf|\phantom{x}\verb|fs|
\verb"\verb|asdf|\phantom{x}\verb|fs|"

As I understand, the idea is that when user types latex(x) in the notebook, then (s)he wants to see the actual LaTeX code. So it will be wrapped the second time and displayed.

Regarding &lt; - it is not produced in normal LaTeX but can happen only in jsMath. For some reason it parses < even inside verb and the old solution was to insert extra spaces (which somehow prevents it from this mistake), but this alters the output and looks especially bad on typesetting things like >=. Replacing it with HTML code seems to work, I don't know why and I don't really care - once it is fixed upstream, we will see a doctest break and can remove this workaround. (Hopefully we can also get rid of phantoms eventually, but in MathJax the problem was still present two weeks ago. I don't think it is possible to write a doctest that will detect fixing the spacing issue upstream.)

@novoselt
Copy link
Member Author

novoselt commented Jul 5, 2011

Updated version.

@jhpalmieri
Copy link
Member

comment:6

Attachment: trac_11498_improve_string_LaTeXing.patch.gz

I'm mostly happy with this. Here's a slightly different version (a full patch plus a patch showing the difference between the old one and the new one); I think this does a better job of preserving the original behavior with respect to combinatorial free modules. If you're happy with my changes, feel free to mark this as "positive review".

@jhpalmieri
Copy link
Member

Reviewer: John Palmieri

@jhpalmieri
Copy link
Member

Attachment: trac_11498_LaTeX.patch.gz

new version; apply only this patch

@jhpalmieri
Copy link
Member

Attachment: trac_11498_delta.patch.gz

for reference only; difference between old patch and new one

@novoselt
Copy link
Member Author

novoselt commented Jul 6, 2011

comment:7

Hi John, would you mind explaining to me how does your code work? I am not very familiar with regular expressions.

Also, how did you notice the new patch? It kind of annoys me that trac does not send messages when there are new patches attached to the tickets, so I usually leave some comment to trigger it. This time I was waiting since I didn't yet run tests on the whole library to see if anything got broke ;-)

@jhpalmieri
Copy link
Member

comment:8

I happened to take a look at the ticket, so I saw the new patch; mine is based on that one.

The documentation for regular expressions in Python is here. In the code

if s.find('\\verb') != -1: 
    import re 
    s = re.sub("\\\\verb(.)(.*?)\\1", "\\2", s) 
    s = s.replace("\\phantom{x}", " ") 

the second and third lines are the ones involving regular expressions. Line 3 is a search-and-replace command: it searches the argument s for a pattern \verb followed by a single character (that's the .), and putting that character in parentheses allows you to refer to it later as \1. Then there are any number of characters (that's the .*?, also enclosed in parentheses, so you can refer to it later as \2), followed by whatever the first character was (that's the \\1). It replaces all of this with the string in the second set of parentheses, \2.

There is one subtlety in this: the standard way to refer to any string of characters is .*, but the problem here is that in a string like

\verb|a|, \verb|b|

if we used .*, then the string \2 would match "a|, \verb|b" -- everything between the first "|" and the last "|": by default, regular expression matches use the longest match possible. Using .*? instead of .* tells it to match the shortest string possible instead, so it does two replacements: first \2 matches "a", and then it matches "b", turning this into

a, b

@novoselt
Copy link
Member Author

novoselt commented Jul 6, 2011

comment:9

Thank you!

All tests pass, positive review!

@novoselt

This comment has been minimized.

@jdemeyer jdemeyer modified the milestones: sage-4.7.1, sage-4.7.2 Jul 6, 2011
@jdemeyer
Copy link

jdemeyer commented Aug 1, 2011

Merged: sage-4.7.2.alpha0

@jdemeyer
Copy link

comment:14

See #12156 for a related ticket.

This was referenced Sep 23, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants