Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should ASCII/Unicode or LaTeX formulas be used in docstrings? #14964

Open
sidhantnagpal opened this Issue Jul 24, 2018 · 24 comments

Comments

Projects
None yet
6 participants
@sidhantnagpal
Copy link
Member

sidhantnagpal commented Jul 24, 2018

This is how the docs for Bernoulli numbers look currently:

image

The LaTeX version for the same is:

image

@sidhantnagpal

This comment has been minimized.

Copy link
Member Author

sidhantnagpal commented Jul 24, 2018

As discussed with @jksuom, it would be ideal to have former for terminal and latter for docs.

@sidhantnagpal

This comment has been minimized.

Copy link
Member Author

sidhantnagpal commented Jul 24, 2018

Reference to #13519 for similar discussion.

@asmeurer

This comment has been minimized.

Copy link
Member

asmeurer commented Jul 24, 2018

I like the LaTeX better. The LaTeX is more professional, and easier to read for complex formulas. You can see for example the Ramanujan formula is easier to read in LaTeX (by the way, I would render A(n) as a piecewise). It's also easier to write. In many cases you can just create the expression in SymPy and use latex() to generate it. Strictly speaking the raw docstring is already in a markup format (RST). To be sure, RST is (usually) pretty easy to read in plain text, whereas LaTeX can take some practice to read in raw code form.

The downside as you noted is that it's harder to read in the terminal. But I think more and more people are using the main docs site, or the Jupyter notebook (which doesn't yet render LaTeX in docstrings but there has been some work on it).

The special functions and integral transforms docs are two examples of docs that use LaTeX documentation. I readily admit that both are difficult to read in the terminal. I often look at the SymPy docstrings to find the definition of some special function, and I have to look at it in the web.

But I know that not everyone agrees with this.

A possible compromise is to make documentation like the ODE documentation. In the ODE docs, the math in the text uses LaTeX, but each docstring also shows a formula computed from SymPy itself, pretty printed (in the case of ODE, a formula for the general solution for a given solver). If bernoulli had rewrite methods to show the above formulas, we could write them out in LaTeX and then show a doctest of pprint(bernoulli(n).rewrite('ramanujan')).

@sidhantnagpal

This comment has been minimized.

Copy link
Member Author

sidhantnagpal commented Jul 24, 2018

I like the LaTeX better. The LaTeX is more professional, and easier to read for complex formulas.

+1

I think more and more people are using the main docs site, or the Jupyter notebook (which doesn't yet render LaTeX in docstrings but there has been some work on it).

Probably once Jupyter notebooks start rendering LaTeX, a call on this might be taken because they serve the majority.

For now, the changes can probably be restricted to functions.combinatorial which happens to be using plain text, ASCII/Unicode, LaTeX for different docstrings.

@cbm755

This comment has been minimized.

Copy link
Contributor

cbm755 commented Jul 24, 2018

+1 for the LaTeX.

With apologies for hijacking the thread, may I also request feedback on #12162? I like the idea of doctesting the LaTeX equations but maybe its too messy.

@moorepants

This comment has been minimized.

Copy link
Member

moorepants commented Jul 24, 2018

I prefer unicode for docstrings. A compromise could be that latex is only allowed in the "notes" section of the numpydoc format. The latex is virtually impossible to read quickly (or at all) for those of us that use the terminal docstring views as our primary reference.

@asmeurer

This comment has been minimized.

Copy link
Member

asmeurer commented Jul 24, 2018

Unicode pretty printing has its own issues. If you don't have the right fonts installed the math renders as tofu, or if different characters come from different fonts it doesn't alighn properly, especially in browsers where monospace text isn't guaranteed to align if the characters come from different fonts.

I agree that ideally Unicode pretty printing looks much better than ASCII, and in most cases, nearly as good as LaTeX. But it tends to look quite bad outside of the terminal, sometimes even more unreadable than raw LaTeX if the characters are messed up.

@cbm755

This comment has been minimized.

Copy link
Contributor

cbm755 commented Jul 24, 2018

I don't know enough about the markup being used (RST?) but in Octave (where the markup is Texinfo) we can do things in our docs like:

blah blah blah
@iftex
... latex stuff ...
@end iftex
@ifnottex
... alternative text stuff ...
@end ifnottex

Then the PDF/web docs can have nice math and terminal-based "help meh" can have ascii/unicode. (And crucially the @end ifnottex junk is visible only in the raw source code)

Can that be done here? Or does meh? always give the raw source?

@asmeurer

This comment has been minimized.

Copy link
Member

asmeurer commented Jul 24, 2018

IPython/Jupyter's ? just shows the raw docstring text. So does Python's builtin help().

I think we should perhaps reach out to the Jupyter folks and see if they have any renewed interest in allowing ? to render the docstring using RST, and the math along with it. Most of the work these days in Jupyter is going into jupyterlab, so maybe this could be a jupyterlab widget.

We can't really change what help() does, but we already recommend users use IPython/Jupyter for SymPy, so it wouldn't bother me if users could see nice documentation in Jupyter and harder to read raw RST with raw LaTeX when they don't use it.

@cbm755

This comment has been minimized.

Copy link
Contributor

cbm755 commented Jul 24, 2018

Can we modify our own __doc__ on import?

Then we could have both latex and unicode in the source. On SymPy import, we strip one or the other depending on the environment we find ourselves.

@asmeurer

This comment has been minimized.

Copy link
Member

asmeurer commented Jul 24, 2018

I think it might be technically possible to do what you're suggesting. Wouldn't that require writing each docstring twice though?

@cbm755

This comment has been minimized.

Copy link
Contributor

cbm755 commented Jul 24, 2018

Custom directives embedded in the docstring?

    `K = 0.91596559\ldots` is given by the infinite series

    .. custom_have_latex::

    .. math:: K = \sum_{k=0}^{\infty} \frac{(-1)^k}{(2k+1)^2}

    .. custom_have_latex_else::

             ∞             
       _____           
       ╲               
        ╲          k   
         ╲     (-1)    
          ╲  ──────────
          ╱           2
         ╱   (2⋅k + 1) 
        ╱              
       ╱               
       ‾‾‾‾‾           
       k = 0  

    .. custom_have_latex_end::
@asmeurer

This comment has been minimized.

Copy link
Member

asmeurer commented Jul 24, 2018

Also it probably shouldn't happen at import time because it could slow down the import time too much.

I think if we had a Sphinx extension with a sympy directive as I suggested, you could easily have options in it to render latex or pprint(use_unicode=True/False) or whatever.

Also I wonder if the Jupyter notebook ? is already customizable, so that we could do something like this on our own.

@asmeurer

This comment has been minimized.

Copy link
Member

asmeurer commented Jul 24, 2018

@Carreau I think you've looked at this before. Has there been any progress on rendering docstrings as HTML in the notebook or jupyterlab? And if not, is it possible for us to customize the output of ? to give a custom HTML output? I also vaguely remember there were some issues with MathJAX because of sandboxing.

@cbm755

This comment has been minimized.

Copy link
Contributor

cbm755 commented Jul 24, 2018

Might the sympy directive be even worse for terminal users who would see something like:

`K = 0.91596559\ldots` is given by the infinite series

.. sympy:: pprint(Catalan.rewrite(Sum), use_unicode=True)
@asmeurer

This comment has been minimized.

Copy link
Member

asmeurer commented Jul 24, 2018

You could manually insert the ASCII or Unicode pretty print, like

.. sympy:: Catalan.rewrite(Sum)

             ∞             
       _____           
       ╲               
        ╲          k   
         ╲     (-1)    
          ╲  ──────────
          ╱           2
         ╱   (2⋅k + 1) 
        ╱              
       ╱               
       ‾‾‾‾‾           
       k = 0  

and it would remove that in favor of the LaTeX for Sphinx. It could even doctest it to make sure the pretty printed output format doesn't change.

@asmeurer

This comment has been minimized.

Copy link
Member

asmeurer commented Jul 24, 2018

Or maybe we could have an extension that hooks into the doctests themselves, so that

>>>  pprint(Catalan.rewrite(Sum), use_unicode=True)

             ∞             
       _____           
       ╲               
        ╲          k   
         ╲     (-1)    
          ╲  ──────────
          ╱           2
         ╱   (2⋅k + 1) 
        ╱              
       ╱               
       ‾‾‾‾‾           
       k = 0  

gets replaced with

>>>  pprint(Catalan.rewrite(Sum), use_unicode=True)

.. math:: <latex of Catalan.rewrite(Sum)>

In other words, in Sphinx, pprint will "output" latex. It could still be confusing to users, but it's the most readable in the plain text.

@cbm755

This comment has been minimized.

Copy link
Contributor

cbm755 commented Jul 25, 2018

either of those sound really good to me.

.. sympy:: could happen in a from sympy import * environment.

How do we start? http://www.sphinx-doc.org/en/stable/ext/thirdparty.html

@Carreau

This comment has been minimized.

Copy link
Contributor

Carreau commented Jul 25, 2018

Has there been any progress on rendering docstrings as HTML in the notebook or jupyterlab? And if not, is it possible for us to customize the output of ? to give a custom HTML output? I also vaguely remember there were some issues with MathJAX because of sandboxing.

Some. Yes Mathjax can be an issue. Lat notebook release was because of a mathjax CVE allowing js execution on load...

there are some undocumented option in IPython:

--TerminalInteractiveShell.sphinxify_docstring=<Bool>
    Default: False
    Enables rich html representation of docstrings. (This requires the docrepr
    module).

and

--TerminalInteractiveShell.enable_html_pager=<Bool>
    Default: False
    (Provisional API) enables html representation in mime bundles sent to
    pagers.

It's not perfect and would need feedback.

On my back-burner is a question as to wether we could have a hook into ? that allows custom rendering with mimetype. One question is do we do that on the fly, or do we allow to build the docstring ahead of time (that would allow plots, etc...)

I have some prototype code:

screen shot 2018-07-25 at 09 46 00

One of my though was to figure out a way for project to prove a mapping from docstring/docstring hash to html and ansi.

At some point I should write a grant and try to get some funding for this to get some love.

@asmeurer

This comment has been minimized.

Copy link
Member

asmeurer commented Jul 25, 2018

there are some undocumented option in IPython:

So is it possible to do this now with these options? Or is the MathJax still prevented from loading in the pager?

I have some prototype code:

Is the prototype something that I can play around with somewhere?

How is the math in that doctest rendered? It doesn't look like MathJax.

One question is do we do that on the fly, or do we allow to build the docstring ahead of time (that would allow plots, etc...)

On the fly could be pre-cached if it's expensive. But the advantage is that it would be simpler for stuff that is fast like what we have. Otherwise we'd have to have a build step that generates the "IPython docs" when we make the tarball, and it wouldn't work the same when using the development version.

@Carreau

This comment has been minimized.

Copy link
Contributor

Carreau commented Jul 25, 2018

So is it possible to do this now with these options? Or is the MathJax still prevented from loading in the pager?

Mathjax should be in the pager.

Is the prototype something that I can play around with somewhere?

How is the math in that doctest rendered? It doesn't look like MathJax.

Nah, that's rendered in a terminal:

https://gist.github.com/Carreau/f939617bcb300d1d9dd05928b829d8df

(That pushed me to start learning about parsing and made me consider rewriting a sane RST parser that give a proper AST).

On the fly could be pre-cached if it's expensive. But the advantage is that it would be simpler for stuff that is fast like what we have. Otherwise we'd have to have a build step that generates the "IPython docs" when we make the tarball, and it wouldn't work the same when using the development version.

Yeah, I had the same thoughts, but some docstring are quite complex and require custom directive. So on the fly rendering could be quite hard.

I'm leaning toward an intermediate solution. Look for a built docstring, and if none present, try to do an on the fly rendering. That make it easy for small projects, and flexible for bigger ones.

@asmeurer

This comment has been minimized.

Copy link
Member

asmeurer commented Jul 25, 2018

(That pushed me to start learning about parsing and made me consider rewriting a sane RST parser that give a proper AST).

Yeesh, good luck with that. You might take a look at parso (the library behind jedi). @davidhalter has told me that if you have a grammar and a tokenizer, it is easy to use it to generate a round tripping parser (but I haven't verified the claim yet myself). I don't even know if it's possible to write a grammar for RST, though. Just imagine what that would look like for the tables...

@Carreau

This comment has been minimized.

Copy link
Contributor

Carreau commented Jul 25, 2018

@asmeurer

This comment has been minimized.

Copy link
Member

asmeurer commented Jul 26, 2018

By the way this issue is somewhat related to #13519. If things like IPython get the ability to render math in docstrings, it would be better if we used $ to delimit LaTeX math (and there has also been some discussion of this issue itself over at #13519).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.