Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate PEP-8 in docstring examples #23154

Closed
datapythonista opened this issue Oct 14, 2018 · 15 comments
Closed

Validate PEP-8 in docstring examples #23154

datapythonista opened this issue Oct 14, 2018 · 15 comments
Labels
CI Continuous Integration Code Style Code style, linting, code_checks Docs good first issue

Comments

@datapythonista
Copy link
Member

datapythonista commented Oct 14, 2018

At the moment we are automatically validating PEP-8 in our code (using flake8 . in the CI), and we also have the script scripts/validate_docstrings.py that reports errors for different formatting errors in our docstrings, and it also runs the tests to see if they work and generate the expected input. But we are not validating whether the examples follow PEP-8.

In the next example:

def head():
    """
    Return the first n rows of the Series.

    Examples
    --------
    >>> s=pd.Series([1, 2, 3, 4])
    >>> s.head(2)
    0    1
    1    2
    dtype: int64

Everything will be reported as correct, because the formatting of the docstring is the expected, and when running the examples, they work and they generated the presented output.

According to PEP-8, there should be spaces around the = in an assignment. So, we should have s = pd.Series([1, 2, 3, 4]) instead of s=pd.Series([1, 2, 3, 4]). So, the validation should report it as an error.

What we need to do is:

  • Add a method in scripts/validate_docstrings.py that obtains the code in the examples (and if possible their line in the code to be presented to the user if there are errors). For what I know, the module doctest in the Python standard library provides a way to extract the code.
  • Add another method that given some code, returns possible PEP-8 errors. We currently have pyflakes as a dependency, so using it is probably the best option. But if it's better to use pycodestyle or something else, that could be an option.
  • Use these methods in the validation, so errors are reported as the rest of the errors from the validation
  • Add unit tests for the implemented features in scripts/tests/test_validate_docstrings.py
@dubielt1
Copy link

I'll make an attempt at this

@FHaase
Copy link
Contributor

FHaase commented Oct 18, 2018

As flake8 is already being used, wouldn't it make sense to use flake8 plugins to check for those things?

@datapythonista
Copy link
Member Author

Using a plugin makes sense, better not reinvent the wheel. But it'd be good if the plugin is called from the validator script, so devs working on docstrings don't need to execute more than a command to validate that the docstring is all right.

@FHaase
Copy link
Contributor

FHaase commented Oct 25, 2018

I'm currently including it in ./ci/code_checks.sh under lint section.
However as it finds many PEP-8 violations in the *.rst files it might take a while until it can be merged.

@datapythonista
Copy link
Member Author

Feel free to open issues for the PEP-8 problems if in your local branch you can already get the list of problems. If there are many I'd have more than one issue, so different people can work on it at the same time.

@FHaase
Copy link
Contributor

FHaase commented Oct 25, 2018

commit
is all it takes to include it.

so it would get run with ./ci/code-checks.sh lint

Many mistakes are:

  • F821 undefined name 'Substitution' -> I would ignore them in documentation context
  • E999 SyntaxError: unexpected EOF while parsing
    - mostly because of missing bodys of functions. [missing pass]
    - however E999 also warns about false indentations which actually lead to usefull warnings.
    e.g. pandas/doc/source/gotchas.rst:322:8: E999 IndentationError: unexpected indent
    where the code-block should not be python but console

Also it doesn't catch all occurences of python only

  • no ::
  • no code-blocks without python or pycon
  • currently no >>> outside of code-blocks [Added a PR]

@jorisvandenbossche
Copy link
Member

@FHaase thanks for looking into it!

F821 undefined name 'Substitution' -> I would ignore them in documentation context

Where is this one coming from?

@FHaase
Copy link
Contributor

FHaase commented Oct 26, 2018

./doc/source/contributing_docstring.rst:956:9: F821 undefined name 'Substitution'
./doc/source/contributing_docstring.rst:957:9: F821 undefined name 'Appender'
./doc/source/contributing_docstring.rst:965:14: F821 undefined name 'Parent'
./doc/source/contributing_docstring.rst:966:14: F821 undefined name 'ChildA'
./doc/source/contributing_docstring.rst:967:14: F821 undefined name 'ChildB'
.. code-block:: python

   class Parent:
       def my_function(self):
           """Apply my function to %(klass)s."""
           ...

   class ChildA(Parent):
       @Substitution(klass="ChildA")                                
       @Appender(Parent.my_function.__doc__)
       def my_function(self):
           ...

   class ChildB(Parent):
       @Substitution(klass="ChildB")                   
       @Appender(Parent.my_function.__doc__)
       def my_function(self):
           ...

The resulting docstrings are

The problem is that in a context of a documentation it would make sense to use names that are not defined, however the plugin checks the code-block as if it was a normal python file.

Forcing to include all names in the documentation would in my opinion lead to cluttered documentation and therefore I would ignore F821 completely.

@datapythonista
Copy link
Member Author

I think we should import Substitution and Appender in that example. Wouldn't that fix the problem?

@FHaase
Copy link
Contributor

FHaase commented Oct 27, 2018

Short answer: Yes
Long answer: F821 validations are a quite common issue.

flake8-rst doc --filename=*.rst --select=F821 --bootstrap="import pandas as pd; import numpy as np"

doc/source/dsintro.rst:162:9: F821 undefined name 's'
doc/source/dsintro.rst:661:7: F821 undefined name 'df'
doc/source/dsintro.rst:661:12: F821 undefined name 'df'
doc/source/dsintro.rst:668:7: F821 undefined name 'df'
doc/source/dsintro.rst:668:14: F821 undefined name 'df'
doc/source/basics.rst:318:12: F821 undefined name 'df'
doc/source/basics.rst:318:19: F821 undefined name 'df2'
doc/source/basics.rst:729:8: F821 undefined name 'f'
doc/source/basics.rst:729:10: F821 undefined name 'g'
doc/source/basics.rst:729:12: F821 undefined name 'h'
doc/source/basics.rst:729:14: F821 undefined name 'df'
doc/source/basics.rst:830:4: F821 undefined name 'df'
doc/source/basics.rst:830:13: F821 undefined name 'subtract_and_divide'
doc/source/advanced.rst:1038:13: F821 undefined name 'df'
doc/source/advanced.rst:1042:14: F821 undefined name 'df'
doc/source/text.rst:444:9: F821 undefined name 's'
doc/source/comparison_with_stata.rst:173:4: F821 undefined name 'tips'
doc/source/comparison_with_stata.rst:179:4: F821 undefined name 'tips'
doc/source/extending.rst:159:28: F821 undefined name 'ExtensionArray'
doc/source/extending.rst:159:44: F821 undefined name 'ExtensionScalarOpsMixin'
doc/source/extending.rst:256:27: F821 undefined name 'Series'
doc/source/extending.rst:266:30: F821 undefined name 'DataFrame'
doc/source/extending.rst:326:31: F821 undefined name 'DataFrame'
doc/source/extending.rst:341:13: F821 undefined name 'SubclassedDataFrame2'
doc/source/indexing.rst:1355:7: F821 undefined name 'df'
doc/source/indexing.rst:1744:4: F821 undefined name 'data'
doc/source/indexing.rst:1744:17: F821 undefined name 'index'
doc/source/indexing.rst:1799:4: F821 undefined name 'dfmi'
doc/source/indexing.rst:1799:35: F821 undefined name 'value'
doc/source/indexing.rst:1801:4: F821 undefined name 'dfmi'
doc/source/indexing.rst:1801:59: F821 undefined name 'value'
doc/source/indexing.rst:1807:4: F821 undefined name 'dfmi'
doc/source/indexing.rst:1807:28: F821 undefined name 'value'
doc/source/indexing.rst:1809:4: F821 undefined name 'dfmi'
doc/source/indexing.rst:1809:50: F821 undefined name 'value'
doc/source/indexing.rst:1834:21: F821 undefined name 'value'
doc/source/io.rst:1094:4: F821 undefined name 'read_csv'
doc/source/io.rst:1094:13: F821 undefined name 'path'
doc/source/io.rst:1102:4: F821 undefined name 'read_csv'
doc/source/io.rst:1102:13: F821 undefined name 'path'
doc/source/io.rst:1108:4: F821 undefined name 'read_csv'
doc/source/io.rst:1108:13: F821 undefined name 'path'
doc/source/io.rst:1114:4: F821 undefined name 'read_csv'
doc/source/io.rst:1114:13: F821 undefined name 'path'
doc/source/io.rst:2354:27: F821 undefined name 'url'
doc/source/io.rst:2363:23: F821 undefined name 'url'
doc/source/io.rst:2369:23: F821 undefined name 'url'
doc/source/io.rst:2375:23: F821 undefined name 'url'
doc/source/io.rst:2382:23: F821 undefined name 'url'
doc/source/io.rst:2388:24: F821 undefined name 'url'
doc/source/io.rst:2389:24: F821 undefined name 'url'
doc/source/io.rst:2396:23: F821 undefined name 'url'
doc/source/io.rst:2404:23: F821 undefined name 'url'
doc/source/io.rst:2425:23: F821 undefined name 'url'
doc/source/io.rst:2431:22: F821 undefined name 'randn'
doc/source/io.rst:2442:23: F821 undefined name 'url'
doc/source/io.rst:2448:23: F821 undefined name 'url'
doc/source/io.rst:2456:23: F821 undefined name 'url'
doc/source/io.rst:2692:4: F821 undefined name 'read_excel'
doc/source/io.rst:2740:26: F821 undefined name 'read_excel'
doc/source/io.rst:2741:26: F821 undefined name 'read_excel'
doc/source/io.rst:2744:12: F821 undefined name 'read_excel'
doc/source/io.rst:2766:4: F821 undefined name 'read_excel'
doc/source/io.rst:2773:4: F821 undefined name 'read_excel'
doc/source/io.rst:2780:4: F821 undefined name 'read_excel'
doc/source/io.rst:2787:4: F821 undefined name 'read_excel'
doc/source/io.rst:2794:4: F821 undefined name 'read_excel'
doc/source/io.rst:2861:4: F821 undefined name 'read_excel'
doc/source/io.rst:2868:4: F821 undefined name 'read_excel'
doc/source/io.rst:2882:4: F821 undefined name 'read_excel'
doc/source/io.rst:2893:4: F821 undefined name 'read_excel'
doc/source/io.rst:2905:4: F821 undefined name 'read_excel'
doc/source/io.rst:2919:4: F821 undefined name 'read_excel'
doc/source/io.rst:2937:4: F821 undefined name 'df'
doc/source/io.rst:2950:4: F821 undefined name 'df'
doc/source/io.rst:2957:9: F821 undefined name 'ExcelWriter'
doc/source/io.rst:2958:8: F821 undefined name 'df1'
doc/source/io.rst:2959:8: F821 undefined name 'df2'
doc/source/io.rst:2989:13: F821 undefined name 'ExcelWriter'
doc/source/io.rst:2990:4: F821 undefined name 'df'
doc/source/io.rst:3039:4: F821 undefined name 'df'
doc/source/io.rst:3042:13: F821 undefined name 'ExcelWriter'
doc/source/io.rst:3048:4: F821 undefined name 'df'
doc/source/io.rst:3733:7: F821 undefined name 'store'
doc/source/io.rst:3740:7: F821 undefined name 'store'
doc/source/io.rst:3750:7: F821 undefined name 'store'
doc/source/io.rst:3750:42: F821 undefined name 'string'
doc/source/io.rst:4203:4: F821 undefined name 'store'
doc/source/io.rst:4203:23: F821 undefined name 'df'
doc/source/io.rst:4734:9: F821 undefined name 'engine'
doc/source/io.rst:4845:30: F821 undefined name 'engine'
doc/source/io.rst:4846:30: F821 undefined name 'engine'
doc/source/io.rst:4861:4: F821 undefined name 'df'
doc/source/io.rst:4861:23: F821 undefined name 'engine'
doc/source/io.rst:4862:31: F821 undefined name 'engine'
doc/source/io.rst:4904:44: F821 undefined name 'engine'
doc/source/io.rst:4905:58: F821 undefined name 'engine'
doc/source/io.rst:4993:4: F821 undefined name 'data'
doc/source/io.rst:4993:24: F821 undefined name 'cnx'
doc/source/io.rst:4994:44: F821 undefined name 'con'
doc/source/io.rst:5213:9: F821 undefined name 'do_something'
doc/source/enhancingperf.rst:233:9: F821 undefined name 'apply_integrate_f'
doc/source/enhancingperf.rst:233:27: F821 undefined name 'df'
doc/source/enhancingperf.rst:233:36: F821 undefined name 'df'
doc/source/enhancingperf.rst:233:45: F821 undefined name 'df'
doc/source/enhancingperf.rst:239:9: F821 undefined name 'apply_integrate_f'
doc/source/enhancingperf.rst:239:27: F821 undefined name 'df'
doc/source/enhancingperf.rst:239:43: F821 undefined name 'df'
doc/source/enhancingperf.rst:239:59: F821 undefined name 'df'
doc/source/contributing.rst:645:9: F821 undefined name 'warnings'
doc/source/contributing.rst:646:9: F821 undefined name 'new_func'
doc/source/contributing.rst:896:9: F821 undefined name 'tm'
doc/source/contributing.rst:897:8: F821 undefined name 'df'
doc/source/contributing.rst:912:5: F821 undefined name 'pytest'
doc/source/contributing.rst:926:9: F821 undefined name 'warch'
doc/source/contributing.rst:927:8: F821 undefined name 'warnings'
doc/source/contributing_docstring.rst:28:9: F821 undefined name 'add'
doc/source/contributing_docstring.rst:29:9: F821 undefined name 'add'
doc/source/contributing_docstring.rst:30:9: F821 undefined name 'add'
doc/source/contributing_docstring.rst:461:16: F821 undefined name 'random'
doc/source/contributing_docstring.rst:481:18: F821 undefined name 'random'
doc/source/contributing_docstring.rst:482:27: F821 undefined name 'random'
doc/source/contributing_docstring.rst:482:41: F821 undefined name 'string'
doc/source/contributing_docstring.rst:503:19: F821 undefined name 'random'
doc/source/contributing_docstring.rst:957:9: F821 undefined name 'Substitution'
doc/source/contributing_docstring.rst:958:9: F821 undefined name 'Appender'
doc/source/contributing_docstring.rst:964:9: F821 undefined name 'Substitution'
doc/source/contributing_docstring.rst:965:9: F821 undefined name 'Appender'
doc/source/contributing_docstring.rst:973:14: F821 undefined name 'Parent'
doc/source/contributing_docstring.rst:974:14: F821 undefined name 'ChildA'
doc/source/contributing_docstring.rst:975:14: F821 undefined name 'ChildB'
doc/source/contributing_docstring.rst:994:5: F821 undefined name 'Appender'
doc/source/contributing_docstring.rst:994:14: F821 undefined name 'template'
doc/source/contributing_docstring.rst:994:25: F821 undefined name '_shared_doc_kwargs'
doc/source/groupby.rst:86:19: F821 undefined name 'obj'
doc/source/groupby.rst:86:31: F821 undefined name 'key'
doc/source/groupby.rst:87:19: F821 undefined name 'obj'
doc/source/groupby.rst:87:31: F821 undefined name 'key'
doc/source/groupby.rst:88:19: F821 undefined name 'obj'
doc/source/groupby.rst:88:32: F821 undefined name 'key1'
doc/source/groupby.rst:88:38: F821 undefined name 'key2'
doc/source/groupby.rst:1277:4: F821 undefined name 'df'
doc/source/groupby.rst:1277:42: F821 undefined name 'report_func'
doc/source/comparison_with_sas.rst:171:4: F821 undefined name 'tips'
doc/source/timeseries.rst:902:16: F821 undefined name 'DateOffset'

bootstrapping pandas as pd, numpy as np reduces the output already.

The question that rises is: Does a reader of a documentation require all these names to be defined in order to understand the example?

In my opinion: No. And the boilerplate code around every example in the doc makes it even more unreadable and confusing. Documented code-blocks are not meant to be understood by a computer, but by a human. Most of the variables explain themeselves by what they are called: url is a url, key is a key ...

I think from all the output above Substitution and Appender are the most debatable examples.
If I would read the documentation my first guess would be they were imported anyway.

So as a conclusion: Forcing the documentation to have all names defined would force a kind of documentation that is less readable without being more clear.
Deciding weather to specify a name or not should be a choice of the writer for the documentation, when the semantic is clear it's all fine. F821 issues come purely from syntax.

@datapythonista
Copy link
Member Author

I disagree. I think all examples should be explicit in defining or importing all objects, so they run, and users can reproduce them excatly in the way they are presented on the documentation.

If I'm new to pandas and I see in the documentation something like:

s.head()

I don't see how this can be useful to me. I'd rather have:

import pandas
s = pandas.Series([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
s.head()

Which provides all the required information to understand, run, and edit and play with what is being shown.

We need to ignore F821 now, because it will take some time until all examples are fixed, and it'll be good to validate the rest of pep8 in the meantime, but I don't think we should accept or encourage having examples with missing context.

@FHaase
Copy link
Contributor

FHaase commented Oct 27, 2018

Okay I see what you mean. My point is forcing every use of .. code-block:: python anywhere in the project to be a copy and pasteable example might lead to a more or less confusing situation for the reader. Let me quote some examples:

from pandas/core/accessor.py

Notes
-----
When accessed, your accessor will be initialized with the pandas object
the user is interacting with. So the signature must be

.. code-block:: python

    def __init__(self, pandas_object):  # noqa: E999
        ...
        
For consistency with pandas methods, you should raise an ``AttributeError``
if the data passed to your accessor has an incorrect dtype. 

If I read that, it's absolutely clear what is meant, although it's invalid Syntax [E999]. If I imagine a class, imports, and something written within the function itself, it no longer correlates to So the signature must be.

Having to use invalid Syntax is less common.

from doc/source/groupby.rst

pandas objects can be split on any of their axes. The abstract definition of
grouping is to provide a mapping of labels to group names. To create a GroupBy
object (more on what the GroupBy object is later), you may do the following:

.. code-block:: python

    # default is axis=0
    >>> grouped = obj.groupby(key)
    >>> grouped = obj.groupby(key, axis=1)
    >>> grouped = obj.groupby([key1, key2])

what exactly obj, key, key1, key2 is is not important at this point of the documentation. And I think it should be the choice of the writer how abstract the code should be to be most clear.

After writing a section with mostly comments like this explaining how it works, someone could include a complete working example in a regular .py file included with .. literalinclude::

That way it's possible to run tests on the code so it can get updated when the api changes. And its flake8 and not flake8-rst that checks the code_style.

@jorisvandenbossche
Copy link
Member

Note that a bunch of the warnings you listed above in #23154 (comment) are due to not properly importing the used packages / not properly using the imported name (eg all the read_excel warnings should be solved by using pd.read_excel).
See #9886 for this (already a lot has been cleaned up, but io.rst is clearly still to be done)

But, that said, I agree with @FHaase that not all code blocks should be exactly running code. Often, it can be more educational to show a certain snippet.

But other things:

  • The flake-rst plugin does not check code blocks in .. ipython:: python directives, is that correct? This is the majority of the code blocks
  • It seems to me (from the output you show above), that code-block:: directives are also not checked if they have like IPython input/output prompts in them (like In [8]: )

@FHaase
Copy link
Contributor

FHaase commented Oct 27, 2018

Yes, currently not everything gets checked. The regex to find the parts is
\.\. (code-block|sourcecode):: (python|pycon)\n
I opened a PR to also include ipython within the checks as well as >>> ... outside of code-block directives.
So I hope shortly .. ipython:: python directives will be covered.

  • In order to support In [8]: the regex needs to be modified.
  • .. code-block:: or the very short :: without a language specification might contain ruby, java etc. code so linting would be problematic. -> solution could be to disallow these directives.

I've marked the PR to close this issue, because once flake8-rst finds some python-code it gets checked by flake8 just like any other .py file.
Further improvements could be done by extending the amount of code that flake8-rst finds, but that is in the scope of that plugin, not this project.

@datapythonista
Copy link
Member Author

Fixed in #23399

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration Code Style Code style, linting, code_checks Docs good first issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants