Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should ASCII/Unicode or LaTeX formulas be used in docstrings? #14964

Open
sidhantnagpal opened this issue Jul 24, 2018 · 45 comments
Open

Should ASCII/Unicode or LaTeX formulas be used in docstrings? #14964

sidhantnagpal opened this issue Jul 24, 2018 · 45 comments

Comments

@sidhantnagpal
Copy link
Member

This is how the docs for Bernoulli numbers look currently:

image

The LaTeX version for the same is:

image

@sidhantnagpal
Copy link
Member Author

As discussed with @jksuom, it would be ideal to have former for terminal and latter for docs.

@sidhantnagpal
Copy link
Member Author

Reference to #13519 for similar discussion.

@asmeurer
Copy link
Member

I like the LaTeX better. The LaTeX is more professional, and easier to read for complex formulas. You can see for example the Ramanujan formula is easier to read in LaTeX (by the way, I would render A(n) as a piecewise). It's also easier to write. In many cases you can just create the expression in SymPy and use latex() to generate it. Strictly speaking the raw docstring is already in a markup format (RST). To be sure, RST is (usually) pretty easy to read in plain text, whereas LaTeX can take some practice to read in raw code form.

The downside as you noted is that it's harder to read in the terminal. But I think more and more people are using the main docs site, or the Jupyter notebook (which doesn't yet render LaTeX in docstrings but there has been some work on it).

The special functions and integral transforms docs are two examples of docs that use LaTeX documentation. I readily admit that both are difficult to read in the terminal. I often look at the SymPy docstrings to find the definition of some special function, and I have to look at it in the web.

But I know that not everyone agrees with this.

A possible compromise is to make documentation like the ODE documentation. In the ODE docs, the math in the text uses LaTeX, but each docstring also shows a formula computed from SymPy itself, pretty printed (in the case of ODE, a formula for the general solution for a given solver). If bernoulli had rewrite methods to show the above formulas, we could write them out in LaTeX and then show a doctest of pprint(bernoulli(n).rewrite('ramanujan')).

@sidhantnagpal
Copy link
Member Author

I like the LaTeX better. The LaTeX is more professional, and easier to read for complex formulas.

+1

I think more and more people are using the main docs site, or the Jupyter notebook (which doesn't yet render LaTeX in docstrings but there has been some work on it).

Probably once Jupyter notebooks start rendering LaTeX, a call on this might be taken because they serve the majority.

For now, the changes can probably be restricted to functions.combinatorial which happens to be using plain text, ASCII/Unicode, LaTeX for different docstrings.

@cbm755
Copy link
Contributor

cbm755 commented Jul 24, 2018

+1 for the LaTeX.

With apologies for hijacking the thread, may I also request feedback on #12162? I like the idea of doctesting the LaTeX equations but maybe its too messy.

@moorepants
Copy link
Member

I prefer unicode for docstrings. A compromise could be that latex is only allowed in the "notes" section of the numpydoc format. The latex is virtually impossible to read quickly (or at all) for those of us that use the terminal docstring views as our primary reference.

@asmeurer
Copy link
Member

Unicode pretty printing has its own issues. If you don't have the right fonts installed the math renders as tofu, or if different characters come from different fonts it doesn't alighn properly, especially in browsers where monospace text isn't guaranteed to align if the characters come from different fonts.

I agree that ideally Unicode pretty printing looks much better than ASCII, and in most cases, nearly as good as LaTeX. But it tends to look quite bad outside of the terminal, sometimes even more unreadable than raw LaTeX if the characters are messed up.

@cbm755
Copy link
Contributor

cbm755 commented Jul 24, 2018

I don't know enough about the markup being used (RST?) but in Octave (where the markup is Texinfo) we can do things in our docs like:

blah blah blah
@iftex
... latex stuff ...
@end iftex
@ifnottex
... alternative text stuff ...
@end ifnottex

Then the PDF/web docs can have nice math and terminal-based "help meh" can have ascii/unicode. (And crucially the @end ifnottex junk is visible only in the raw source code)

Can that be done here? Or does meh? always give the raw source?

@asmeurer
Copy link
Member

IPython/Jupyter's ? just shows the raw docstring text. So does Python's builtin help().

I think we should perhaps reach out to the Jupyter folks and see if they have any renewed interest in allowing ? to render the docstring using RST, and the math along with it. Most of the work these days in Jupyter is going into jupyterlab, so maybe this could be a jupyterlab widget.

We can't really change what help() does, but we already recommend users use IPython/Jupyter for SymPy, so it wouldn't bother me if users could see nice documentation in Jupyter and harder to read raw RST with raw LaTeX when they don't use it.

@cbm755
Copy link
Contributor

cbm755 commented Jul 24, 2018

Can we modify our own __doc__ on import?

Then we could have both latex and unicode in the source. On SymPy import, we strip one or the other depending on the environment we find ourselves.

@asmeurer
Copy link
Member

I think it might be technically possible to do what you're suggesting. Wouldn't that require writing each docstring twice though?

@cbm755
Copy link
Contributor

cbm755 commented Jul 24, 2018

Custom directives embedded in the docstring?

    `K = 0.91596559\ldots` is given by the infinite series

    .. custom_have_latex::

    .. math:: K = \sum_{k=0}^{\infty} \frac{(-1)^k}{(2k+1)^2}

    .. custom_have_latex_else::

             ∞             
       _____           
       ╲               
        ╲          k   
         ╲     (-1)    
          ╲  ──────────
          ╱           2
         ╱   (2⋅k + 1) 
        ╱              
       ╱               
       ‾‾‾‾‾           
       k = 0  

    .. custom_have_latex_end::

@asmeurer
Copy link
Member

Also it probably shouldn't happen at import time because it could slow down the import time too much.

I think if we had a Sphinx extension with a sympy directive as I suggested, you could easily have options in it to render latex or pprint(use_unicode=True/False) or whatever.

Also I wonder if the Jupyter notebook ? is already customizable, so that we could do something like this on our own.

@asmeurer
Copy link
Member

@Carreau I think you've looked at this before. Has there been any progress on rendering docstrings as HTML in the notebook or jupyterlab? And if not, is it possible for us to customize the output of ? to give a custom HTML output? I also vaguely remember there were some issues with MathJAX because of sandboxing.

@cbm755
Copy link
Contributor

cbm755 commented Jul 24, 2018

Might the sympy directive be even worse for terminal users who would see something like:

`K = 0.91596559\ldots` is given by the infinite series

.. sympy:: pprint(Catalan.rewrite(Sum), use_unicode=True)

@asmeurer
Copy link
Member

You could manually insert the ASCII or Unicode pretty print, like

.. sympy:: Catalan.rewrite(Sum)

             ∞             
       _____           
       ╲               
        ╲          k   
         ╲     (-1)    
          ╲  ──────────
          ╱           2
         ╱   (2⋅k + 1) 
        ╱              
       ╱               
       ‾‾‾‾‾           
       k = 0  

and it would remove that in favor of the LaTeX for Sphinx. It could even doctest it to make sure the pretty printed output format doesn't change.

@asmeurer
Copy link
Member

Or maybe we could have an extension that hooks into the doctests themselves, so that

>>>  pprint(Catalan.rewrite(Sum), use_unicode=True)

             ∞             
       _____           
       ╲               
        ╲          k   
         ╲     (-1)    
          ╲  ──────────
          ╱           2
         ╱   (2⋅k + 1) 
        ╱              
       ╱               
       ‾‾‾‾‾           
       k = 0  

gets replaced with

>>>  pprint(Catalan.rewrite(Sum), use_unicode=True)

.. math:: <latex of Catalan.rewrite(Sum)>

In other words, in Sphinx, pprint will "output" latex. It could still be confusing to users, but it's the most readable in the plain text.

@cbm755
Copy link
Contributor

cbm755 commented Jul 25, 2018

either of those sound really good to me.

.. sympy:: could happen in a from sympy import * environment.

How do we start? http://www.sphinx-doc.org/en/stable/ext/thirdparty.html

@Carreau
Copy link
Contributor

Carreau commented Jul 25, 2018

Has there been any progress on rendering docstrings as HTML in the notebook or jupyterlab? And if not, is it possible for us to customize the output of ? to give a custom HTML output? I also vaguely remember there were some issues with MathJAX because of sandboxing.

Some. Yes Mathjax can be an issue. Lat notebook release was because of a mathjax CVE allowing js execution on load...

there are some undocumented option in IPython:

--TerminalInteractiveShell.sphinxify_docstring=<Bool>
    Default: False
    Enables rich html representation of docstrings. (This requires the docrepr
    module).

and

--TerminalInteractiveShell.enable_html_pager=<Bool>
    Default: False
    (Provisional API) enables html representation in mime bundles sent to
    pagers.

It's not perfect and would need feedback.

On my back-burner is a question as to wether we could have a hook into ? that allows custom rendering with mimetype. One question is do we do that on the fly, or do we allow to build the docstring ahead of time (that would allow plots, etc...)

I have some prototype code:

screen shot 2018-07-25 at 09 46 00

One of my though was to figure out a way for project to prove a mapping from docstring/docstring hash to html and ansi.

At some point I should write a grant and try to get some funding for this to get some love.

@asmeurer
Copy link
Member

there are some undocumented option in IPython:

So is it possible to do this now with these options? Or is the MathJax still prevented from loading in the pager?

I have some prototype code:

Is the prototype something that I can play around with somewhere?

How is the math in that doctest rendered? It doesn't look like MathJax.

One question is do we do that on the fly, or do we allow to build the docstring ahead of time (that would allow plots, etc...)

On the fly could be pre-cached if it's expensive. But the advantage is that it would be simpler for stuff that is fast like what we have. Otherwise we'd have to have a build step that generates the "IPython docs" when we make the tarball, and it wouldn't work the same when using the development version.

@Carreau
Copy link
Contributor

Carreau commented Jul 25, 2018

So is it possible to do this now with these options? Or is the MathJax still prevented from loading in the pager?

Mathjax should be in the pager.

Is the prototype something that I can play around with somewhere?

How is the math in that doctest rendered? It doesn't look like MathJax.

Nah, that's rendered in a terminal:

https://gist.github.com/Carreau/f939617bcb300d1d9dd05928b829d8df

(That pushed me to start learning about parsing and made me consider rewriting a sane RST parser that give a proper AST).

On the fly could be pre-cached if it's expensive. But the advantage is that it would be simpler for stuff that is fast like what we have. Otherwise we'd have to have a build step that generates the "IPython docs" when we make the tarball, and it wouldn't work the same when using the development version.

Yeah, I had the same thoughts, but some docstring are quite complex and require custom directive. So on the fly rendering could be quite hard.

I'm leaning toward an intermediate solution. Look for a built docstring, and if none present, try to do an on the fly rendering. That make it easy for small projects, and flexible for bigger ones.

@asmeurer
Copy link
Member

(That pushed me to start learning about parsing and made me consider rewriting a sane RST parser that give a proper AST).

Yeesh, good luck with that. You might take a look at parso (the library behind jedi). @davidhalter has told me that if you have a grammar and a tokenizer, it is easy to use it to generate a round tripping parser (but I haven't verified the claim yet myself). I don't even know if it's possible to write a grammar for RST, though. Just imagine what that would look like for the tables...

@Carreau
Copy link
Contributor

Carreau commented Jul 25, 2018 via email

@asmeurer
Copy link
Member

By the way this issue is somewhat related to #13519. If things like IPython get the ability to render math in docstrings, it would be better if we used $ to delimit LaTeX math (and there has also been some discussion of this issue itself over at #13519).

@certik
Copy link
Member

certik commented Aug 12, 2019

Another option is to use Fungrim. As an example, for the Gamma function:

\Gamma(x) := \int^{\infty}_{0} t^{x-1} e^{-t} \mathrm{d}t.

Here is how it looks in Fungrim:

http://fungrim.org/entry/4e4e0f/

The whole entry is generated by just:

Entry(ID("4e4e0f"),
    Formula(Equal(GammaFunction(z), Integral(Mul(Pow(t, Sub(z, 1)), Exp(Neg(t))), Tuple(t, 0, Infinity)))),
    Variables(z),
    Assumptions(And(Element(z, CC), Greater(Re(z), 0))))

Applied to the Gamma integral I linked above, which currently is:

\Gamma(x) := \int^{\infty}_{0} t^{x-1} e^{-t} \mathrm{d}t.

This would be written as:

Formula(Equal(GammaFunction(z), Integral(Mul(Pow(t, Sub(z, 1)), Exp(Neg(t))), Tuple(t, 0, Infinity))))

And by overloading Python operators (which I think is already implemented in Fungrim, just not used in this particular formula), I think this can become as simple as:

GammaFunction(z) == Integral(t**(z-1) * Exp(-t), (t, 0, Infinity))

Which is just 8 characters longer than the Latex version (58 vs 66), which is not that much longer, and has the advantage that unlike Latex, this gets parsed, and understood semantically, the equivalent Latex will be automatically (and consistently!) generated for Sphinx, it will have no spurious * (like in a*b vs ab).

We could also just use SymPy for this purpose, but Fungrim is better because it has notion for mathematical constructs that SymPy currently cannot do, and in some sense it's simpler, it does not have any automatic simplifications and just a few simple rules that one follows to construct almost every math formula that we need in our docs. Fungrim formula can still be converted to SymPy if needed.

Once all such formulas can be parsed, then we can automatically generate a nice index of formulas, so that one can quickly find, say, all occurrences for the Gamma function in our docs formulas. And other similar things.

@fredrik-johansson what do you think?

@asmeurer
Copy link
Member

I think it would be confusing to have a SymPy-like syntax that isn't SymPy.

@certik
Copy link
Member

certik commented Aug 12, 2019

I think it would be confusing to have a SymPy-like syntax that isn't SymPy.

Latex is also confusing, as you already noted with regards to a*b vs ab. My proposal would get parsed, so if somebody makes a mistake, our tooling can find it.

Currently we do:

    .. math::
        \Gamma(x) := \int^{\infty}_{0} t^{x-1} e^{-t} \mathrm{d}t.

While for SymPy code, we do:

    >>> gamma(S(3)/2)
    sqrt(pi)/2

So it looks very different visually. My proposal would be something like:

    .. smath::
        GammaFunction(z) == Integral(t**(z-1) * Exp(-t), (t, 0, Infinity))

We can discuss a better keyword than smath (shortcut for semantic-math).

@asmeurer
Copy link
Member

My proposal would get parsed

We could parse the LaTeX formulas with parse_latex and evaluate them with SymPy. SymPy can't evaluate everything, and not all formulas can be represented by SymPy or parsed by parse_latex, so you would need some way to decorate formulas to opt-in to this.

@fredrik-johansson
Copy link
Contributor

I think it would be confusing to have a SymPy-like syntax that isn't SymPy.

I agree with this, and in any case the Fungrim formula language isn't stable enough for other projects to depend on yet.

@certik
Copy link
Member

certik commented Dec 16, 2019

Here is a recent development on Fungrim:

https://twitter.com/hypergeometer/status/1206552744460509184

and a related discussion in the SymPy mailing list:

https://groups.google.com/d/topic/sympy/O6TeAa3IHnA/discussion

My view has not changed: the formulas in docstrings should be in semantic form. Currently they are not and so we need to fix it. The options are:

  1. ASCII / Unicode and parse it somehow into SymPy
  2. LaTeX and parse it
  3. SymPy formula itself
  4. Grim (Fungrim formula)

The issue with both 1) and 2) is that we have to have a really good parser and we won't be able to parse everything, and consequently it will be confusing for users to know how to write their expression so that it is parsed correctly.

The option 3) works, but we need to add quite a few classes into SymPy, roughly to match Grim. And even then it would be hard to do things like Equal(Tuple(1/(i+1)**2, For(i, 0, 9)), Tuple(Evaluated(1/(i+1)**2), For(i, 0, 9))) (see http://fungrim.org/grim/#Non-semantic_markup), because SymPy evaluates by default and it's hard to force it not to.

That leaves 4) as the most viable and natural option, as I suggested above couple months ago. Some advantages of 4):

  • Syntax is similar to SymPy, and thus familiar (easy to write and read)
  • It is clear what is allowed and how to use it (unlike some subset of LaTeX or ASCII)
  • It immediately maps to semantics and is easy to parse
  • Unlike SymPy, it can represent non-semantic markup naturally (unevaluated expressions, typesetting hints, the difference between a*b and b*a)

To answer @fredrik-johansson's and @asmeurer's objection:

I think it would be confusing to have a SymPy-like syntax that isn't SymPy.

I don't think it would be confusing. The language to use to write the formula is Grim, not SymPy. Yes, Grim is SymPy-like syntax, just like any other solution based on Python's syntax. So that fact that Grim is similar to SymPy is an advantage, not a drawback. It will be very natural for people to use. The example from above would be written as:

    .. grim::
        GammaFunction(z) == Integral(t**(z-1) * Exp(-t), (t, 0, Infinity))

@oscarbenjamin
Copy link
Collaborator

@certik it isn't clear to me what exactly you are proposing. Is it that the Grim language could be used as a way of writing formulas in the docstring or that it could be used for specifying evaluation in SymPy (or both)?

@certik
Copy link
Member

certik commented Dec 16, 2019

Is it that the Grim language could be used as a way of writing formulas in the docstring

Yes, that is what I am proposing to consider.

that it could be used for specifying evaluation in SymPy

I am not proposing that.

@oscarbenjamin
Copy link
Collaborator

Is it that the Grim language could be used as a way of writing formulas in the docstring

Yes, that is what I am proposing to consider.

Why is it advantageous to have a computer-readable semantic language in the docstrings rather than a human-readable free-form text?

@asmeurer
Copy link
Member

I don't think auto-evaluation would be that big of a deal for this particular use-case. Most things that SymPy auto-evaluates are things that you would want to have evaluated in the documented expression anyway. The biggest issue is that SymPy re-orders terms, so things might not be in the cleanest order. For example, you might want exp(-t)*t**(x - 1) but SymPy will print it in the other order. But

  • We can (and should) improve how SymPy orders expressions in the printers.
  • We can probably use some tricks to maintain the order from the string input.

Those are both things we should improve anyway.

The advantage of using pure SymPy is that you can copy-paste it directly from the docstring. You can also perform computations on it. There's also a consistency plus in that if you pass everything through the SymPy LaTeX printer, the formulas in docstrings will look the same as they do in the notebook.

Syntax is similar to SymPy, and thus familiar (easy to write and read)

Again, I see this as a disadvantage, not an advantage, because it's confusing that they are so similar.

I don't think it would be confusing. The language to use to write the formula is Grim, not SymPy. Yes, Grim is SymPy-like syntax, just like any other solution based on Python's syntax. So that fact that Grim is similar to SymPy is an advantage, not a drawback. It will be very natural for people to use. The example from above would be written as:

I don't see how this argues that it isn't confusing. Yes, being based on Python is a good thing. But that doesn't change the fact that its similarity to SymPy, but not being exactly SymPy will be very confusing for people reading and writing these docstrings.

@asmeurer
Copy link
Member

Why is it advantageous to have a computer-readable semantic language in the docstrings rather than a human-readable free-form text?

What do you mean by "human-readable free-form text"?

@oscarbenjamin
Copy link
Collaborator

What do you mean by "human-readable free-form text"?

I mean normal text, where you write something in whatever way is best to convey understanding to the (human) reader.

@asmeurer
Copy link
Member

asmeurer commented Dec 16, 2019

The idea here is to write mathematical formulas. How do you write "the gamma function is defined by <integral definition of gamma>" using free form text? You have to write the formula in some way. Surely you don't mean literally writing "the integral of exponential of negative t times t to the z minus 1 with respect to t from 0 to infinity".

@oscarbenjamin
Copy link
Collaborator

I guess my question is: what problem are we trying to solve here?

The Grim language seems suboptimal for communication in text but has the advantage of being computer-readable so what benefits can we expect to obtain from that?

@asmeurer
Copy link
Member

asmeurer commented Dec 17, 2019

I guess it's supposed to be easier to read in the raw docstring than LaTeX. If you look at something like help(gamma) in the interpreter it can be hard to parse the formula because you have to read the LaTeX by hand:

Help on class gamma in module sympy.functions.special.gamma_functions:

class gamma(sympy.core.function.Function)
 |  gamma(arg)
 |
 |  The gamma function
 |
 |  .. math::
 |      \Gamma(x) := \int^{\infty}_{0} t^{x-1} e^{-t} \mathrm{d}t.
 |

On the other hand, the LaTeX is by far the nicest in the HTML documentation. So the goal is to have something that is easier to read in the raw docstring, but still just as nice looking in the HTML.

Some obvious alternatives would be to write the formula as a SymPy formula or as ASCII art, both of which are easier to read in raw docstring, but look worse than rendered MathJax in the HTML. A Grim or SymPy Sphinx extension that lets you write the formula directly in the docstring and auto-convert it to LaTeX would be the best of both worlds (you could even potentially have a tool that does a pprint in the terminal, at least for a SymPy extension).

@certik
Copy link
Member

certik commented Dec 17, 2019

Yes, the goal is to write the formula in some way, but then process it to produce LaTeX and thus nice looking math in HTML and pdf.

The question is what language to write the formula in, and so far nobody has suggested anything else than what I listed above (ASCII/Unicode, LaTeX, Grim or SymPy). @asmeurer, @oscarbenjamin can you think of any other alternative? If not, then let's just choose from these. Here they are:

    .. ascii::
        Γ(z) := ∫_0^∞ t^(z−1) exp(-t) dt

    .. latex::
        \Gamma(z) := \int^{\infty}_{0} t^{z-1} e^{-t} \mathrm{d}t.

    .. grim::
        Gamma(z) == Integral(t**(z-1) * Exp(-t), For(t, 0, Infinity))

    .. sympy::
        Eq(gamma(z), Integral(t**(z-1) * exp(-t), (t, 0, oo)))

In this case the SymPy version seems the best. But for some more complicated formulas the Grim version might be easier to write. But we can definitely introduce the missing classes into SymPy, so that we can just use SymPy to write any formula (except those that SymPy cannot represent easily, like 1+0).

@moorepants
Copy link
Member

My vote would be for SymPy formula notation so we don't have to learn the new grim syntax. People who are using sympy and writing sympy docs will know how to read sympy code more than any other.

@Carreau
Copy link
Contributor

Carreau commented Aug 23, 2020

What if help(), or ? in the case of IPython could pull what it shows from another place than __doc__, for example you could have a "build" step that build the docs, so that a user can say I want Sympy doc with formulas show as grim/latex/ascii ...

In the source of the file they would still be in their original forms, but typically docs inspector would be able to convert.

Would that make people happier, or lift some consideration about readability ?

@asmeurer
Copy link
Member

I think that would be ideal. But it's a nontrivial thing to do.

@Carreau
Copy link
Contributor

Carreau commented Aug 23, 2020

I think that would be ideal. But it's a nontrivial thing to do.

Not implying its easy, just wondering if this there was a hook "don't show __doc__ but XXX" what would, or would not make Sympy use it, and what kind of functionality would be sympy want/need.

@Carreau
Copy link
Contributor

Carreau commented Aug 23, 2020

Also I'm thinking such a thing would likely have uses outside of SymPy, so wondering what effort should be put in pursuing having such hooks as a "standard".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants