-
-
Notifications
You must be signed in to change notification settings - Fork 7.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOC/ENH] How to best handle repeating docstrings #9414
Comments
https://github.com/anntzer/structured-docstrings/blob/master/example.py is basically the idea I had in mind: write each section as separate strings, have some machinery to assemble the parts, and keep the parts available separately. Note that I explicitly don't allow mixing this style with a preexisting docstring -- it should be much more robust to have the entire assembly process under control. Obviously support for all numpydoc sections should be added. Right now the implementation (in structdoc.py) does some weird tricks because it delays the rendering until someone actually uses the docstring (to shave off some time at import). Depending on how fast things are, we may be able to just render everything eagerly in the decorator and not worry about the magic. In fact the original approach I thought of was to add the possibility to write some docstrings "normally" and have a minimal parser that would allow one to retrieve the sections individually, but that seems far too complicated/brittle to be worth it (and parsing would make it lazy-rendering more necessary). Just to clarify, in the case of legend/figlegend/etc., this you would document the parameters on one of the functions (e.g. Axes.legend), and on the others you would just pass the same parameters ( |
or we could simply stop hacking our docstring, and make them simpler to contribute too ,and simpler to maintain. |
@NelleV if you pass through arguments and want to document them in multiple places, I’d argue that it’s not easy to maintain if you have to remember to edit the other X docstrings. |
@anntzer I don't get much of a vote, but I think a system like what you are proposing is good. I think making sure the decorators are sort of self-documenting is great. i.e. so long as the arguments point towards the file the original doc came from it should allay the concern that its hard to follow. I'd also hope the result show up in IDEs (which I assume they do). |
@jklymak Somehow scikit-learn manages this just fine, despite literally having identical API for all of the estimators, and between each of the functions and their associated class. (Matplotlib's case is harder, as the functions often have slightly different parameters, that take different values, while scikit-learn has extremely consistent API). This problem of having identical parameters over many functions is far from being unique to Matplotlib, yet many projects manages just fine without docstring hacking. Sure, there is a risk of the docstrings evolving separately, but is it worse than having hard to detect bugs, cryptic error messages or worse, no contributors to the documentation? Docstring hacking gives the illusion to ease maintenance. In practice, so far, it has caused more headaches than any of the advantages you bring forward. It also breaks many of the points brought up by the Zen of Python:
|
Ideally, sorting out this doc strategy problem might warrant a full MEP, with clear statements of goals and viable alternatives. But to begin here, the goals might be:
I think we should approach this first by looking carefully at criterion 1: what should the user see with each of the methods of documentation access? One of the big problems here is clutter: Too Much Information (or pseudo-information, like my favorite example, the infamous "agg_filter: unknown" in http://matplotlib.org/api/_as_gen/matplotlib.lines.Line2D.html#matplotlib.lines.Line2D). Does everything have to go directly into a single docstring? I don't think so. Instead of interpolating the property kwargs, can we have something like "for additional keyword arguments, see Line2D_properties"? In other words, use some dedicated functions that serve only for documentation, some of which can be used for more than one function or method. In the actual function or method docstring, put the most generally useful subset of all possible information, with references to the remainder. @anntzer, I understand the logic of your suggested example, but I think it makes the actual code unacceptably ugly and adds unnecessary complexity. |
@efiring My concern with referring to other functions is that I can't access those directly i.e.
which now I must google, unless I am away from the internet (on a ship), in which case I am out of luck... |
I don't understand the problem. If the docstring for ax.text includes "For a table of all available keyword arguments see |
The documentation-only functions I am suggesting could also print their own docstrings if invoked as functions. |
@efiring I understand now. That seems like a reasonable route as well! |
I am fairly strongly against doc-only functions -- I consider them to be a subversion of the docstring system... (but agree with @efiring that as things are right now, I believe that there's definitely some value in having all the relevant parameters directly available (without having to follow a link, either hypertext or via IPython). Note that numpydoc defines an "other parameters" section that can be used advantageously. The syntax of the proof of concept may be a bit ugly (because I hacked it yesterday evening...) but note that it can trivially be converted into e.g.
(getting the parameter name from, well, the first word) While I agree that the previous system was brittle (a definite point against it), I am also curious about @NelleV's experience about the difficulty in contributing to the docstrings. Would you not consider an example as the one above essentially self-documenting? What kind of issues were raised to you? Obviously you need to know what the numpydoc sections are, but this is just as much a problem with plain docstrings. |
@anntzer I should be able to contribute docstring patches to any project in less than 10minutes. That means following conventions and using standard tools. You are suggesting editing code to contribute a docstring, using decorators, a concept that a lot of (non-scientific) developers struggle with. Most of the headaches come from all the docstring hacking we have: finding where to edit the docstring, indentation problems because of interpolation, setting up a dev environment and building the documentation (which took a couple of weeks to fix on @choldgraf's computer last year). These are the "usual" ones. The documentation often breaks in unusual ways as well. FYI, I've contributed code patches, that included bug fixes, tests and documentation to most major scientific python packages in less time than it takes me to edit a docstring on Matplotlib. And I know all the major headaches of Matplotlib way better than I know any other scientific python package (except maybe sklearn). |
@tacaswell wrote a MEP a very long time ago that tackles most of the issue we have with docstrings, including the interpolation and duplication one (and the makefile, and other problem we have). |
@jklymak Yep, that one. A lot of the items have been accomplished without even realizing it by @choldgraf and myself during the last year. |
FWIW MEP10 specifically outlines the soln suggested by @efiring. So as an aside I’m not sure about how we are meant to discuss MEPs. |
Massive cut/paste with subsequent synchronization via hand-editing each instance would be a major violation of DRY. I don't think it would end well. We should look for ways of factoring out common elements that are relatively simple, and no more intrusive than necessary. Hence my suggestion of using references to documentation functions. I don't understand the "subversion" objection. It might not have been exactly what Guido had in mind when he designed the language, but I suspect he would have no objection to it. I think this approach is easily understood by users and by coders. I'm not ruling out documentation-generation machinery, and in fact I am using such machinery in another project, but I am not convinced that it is needed, or the most appropriate solution, for matplotlib. |
Looking at this again, it should always be possible to tuck the kwargs on the relevant object itself (e.g., Basically, there should be a "natural" place (a module docstring, a class docstring, a constructor docstring, a function docstring, ...) to document the signature of a callable, and I feel (sorry, nothing objective here) that something's "wrong" if we have to create a mock object just to host that signature. On the other hand, I definitely agree with @efiring that massive cut/paste as in the case of legend/figlegend should be avoided as much as possible. |
WRT a concise docstring that links, 98% of the time I am looking at the docstring (usually in Jupyter notebook), I’m doing so to figure out the available kwargs. And usually I’m doing that by tab completion on an artist I’ve created. HAving to leave the notebook and go to the interpreter console and query another object is not very convenient. It’s really nice when the kwargs are available from the method I’m trying to use. |
Thanks for having this discussion! |
My $0.02 is to nuke completely the docstring generation code we have now, and not replace it with a different kind of generation code. In addition to the complexity for user contributions, generated docstrings are unsupported by pretty much every other tool out there that doesn't actually execute the code. This means our docstrings are useless in e.g. PyCharm. As far as subversion of the doc mechanism by creating doc-only functions is concerned, I'm pretty sure computing docstrings at runtime is equally subversive, so that particular argument doesn't hold much water for me. I'm a fan of having one canonical place for a set of documentation, be it on a class or a standalone function. Other places should reference those liberally. The location should be readily accessible; if |
Clickable links would be cool. Except when you don’t have the internet... Just to be clear, in the notebook there is no way to type matplotlib.text.Text. Well that’s a bit of a lie - there is now a little widget that gives command line access to the kernel. I suppose that’s ok but still nowhere near as nice as tab completion. I’m still leaning towards repeating the doc string unless they are huge and having a test to make sure devs remember to duplicate them. But I’m not claiming my vote matters very much. It should also be “easy” to have a small script that can automate docstring copies before commit/push. The only logistical difficulty is knowing which changed docstring the dev means to be the canonical one. |
@jklymak Two things:
import matplotlib.text
matplotlib.text.Text? Doesn't work? |
That's how numpy does it: In [1]: X = np.arange(10)
In [2]: X.compress?
Docstring:
a.compress(condition, axis=None, out=None)
Return selected slices of this array along given axis.
Refer to `numpy.compress` for full documentation.
See Also
--------
numpy.compress : equivalent function And the rendered html links to np.compress: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.compress.html#numpy.ndarray.compress |
Sure that works. But you need to create a new cell execute it get what you want go back to the previous cell and then delete the spare cell. Conversely the tab completion pops up a little widget that you can expand to get the scrollable docstring. The whole thing dismisses then you are done. Not sure what following a docstring should do. Maybe it could repopulate that widget with the new docstring. I don’t know the tech of that docstring widget so maybe it’s easy. Maybe they already did it ;-) |
@NelleV. I didn’t say it was optimal for numpy either ;-). But numpy doesn’t use pass through kwargs like we do. At least with the above I know what the kwargs are and can make a decent guess as to their purpose |
@dopplershift To be honest I don't understand the point of tools that try so hard not to import dependent modules, and then complain that Python is too dynamic. For example, with something as simple as
PyCharm (2017.2 pro) appears to be unable to report the docstring of The main objections I've heard against importing are 1) it's too slow (well, it's not as if pylint or pycharm's indexing were fast as a rocket either, and just like pycharm's index, the result of the docstrings can be cached -- and yes cache invalidation is a difficult problem for everyone), and 2) it can have side effects (that's... exceptionally rare for well written libraries; if you're worried that you're developing against some "nasty" library then it being executed by your code editor should probably be the last of your worries...) Now this does not address whether dynamic docstring generation is more or less perverse than docstring-only functions, for sure... |
I'm looking at this issue with Jedi right now (spyder-ide/spyder#5522) and it (like pycharm) also can't return doc strings that need to be executed. Jedi is used in Jupyter and this needs to be considered for Jupterlab doc lookup as well. FYI as of matplotlib 2.1(thanks to numpydoc doc strings) and jedi 11 you can now get completions for: import matplotlib.plyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
ax.pl Jedi will give completions of |
Issue
In #9324 I refactored the docstrings for
legend.Legend.__init__
,figure.Figure.legend
andaxes.Axes.legend
. In particular, all three have the same kwargs because the two artist versions passthrough kwargs tolegend
. All three had different and contradictory/incomplete docstrings.To solve this, I used string interpolation. The developer docs say:
and the code has numerous places where this is used. I used a simple form of this that hard-coded the kwarg doc string and then invoked it in the three docstrings:
Reviewers didn't like this (@NelleV, @anntzer, @tacaswell ) because apparently this sort of interpolation is fragile and has caused issues with the docs in the past:
For #9324 it was decided that we should just cut-and-paste the kwarg docs into the three docstrings. I also implemented a test in
test_legend.py
that insured that the docstrings were the same to stop future drift of the docstrings.Proper solution?
So, what is the proper solution to the documentation problem of repeating passthrough kwargs? I didn't see that the substitution decorator was so bad, but I've not seen all the problems.
Cut-and-paste:
perhaps with unit tests to make sure the docstrings don't drift.
Structure to hold docstrings:
@anntzer suggested:
i.e., he suggested a "structured" docstring whereby each section is an
OrderedDict
entry, and each kwarg is anOrderedDict
entry. This allows referencing of this information in other docstrings. See below.Functions to hold docstrings:
@efiring suggested what is also essentially suggested in MEP10 which is that specialized documentation functions be used that have the repeated kwargs, and then the main method's documentation references this.
Documentation issue
I think the developer docs should make the deprecation of the interpolation machinery explicit if it truly is deprecated.
Affected functions:
figure.legend
=axes.legend
=legend.Legend
The text was updated successfully, but these errors were encountered: