Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: make some doctests in user,reference pass pytest #20288

Merged
merged 11 commits into from
Dec 22, 2021

Conversation

pdebuyl
Copy link
Contributor

@pdebuyl pdebuyl commented Nov 3, 2021

  1. Add import numpy as np in rst files
  2. Update NumPy repr for array (whitespace)
  3. Update bytearray representation
  4. Fix tiny output formatting (<class ...>, etc)
  5. Format tracebacks
  6. Skip random number tests or some platform-dependent outputs
  7. Add <matplotlib. ... at 0x...> or similar output lines where
    missing
  8. Set seed

Status:

  • pytest --doctest-glob="*.rst" doc/source/user still fails 4 tests related to example compiled extensions.
  • doc/source/user/basics.rec.rst fails because of the output of structured_to_unstructured on line 490. I don't know what is expected there.
  • pytest --doctest-glob="*.rst" doc/source/reference only fails the cython file.

See #15846 for previous discussion.

This PR is also related to #15845 about using astropy/pytest-doctestplus. This PR brings many more .rst files to pass pytest, which is (in my opinion) good.

I had to add a few +SKIP directives for random results. Those could be removed if we have a "norm" for random numbers in the docs, i.e. setting seeds and expecting the random output to be set.

Another further progress would be related to automatically building the compiled extensions.

@charris charris changed the title [DOC] make some doctests in user,reference pass pytest DOC: make some doctests in user,reference pass pytest Nov 5, 2021
@rgommers
Copy link
Member

Thanks for working on this @pdebuyl!

Doing:

.. for doctest:
    >>> import numpy as np

is quite noisy, as well as fragile (easy to forget in the future). Why can't this import be automatically added by the refguide checker? I'd hope pytest-doctestplus has such a configuration option as well.

Those could be removed if we have a "norm" for random numbers in the docs, i.e. setting seeds and expecting the random output to be set.

This is also something that should be done by the doctest runner script/framework. It's just visual noise otherwise, and easy to forget in new tests. And adding explicit seeds in order to make doctests pass in the example code itself is not a good alternative.

Another further progress would be related to automatically building the compiled extensions.

Can you explain what this means? It sounds like that's way out of scope for doctests(/examples).

@rgommers
Copy link
Member

The whitespace and repr changes here all look good to me.

@pdebuyl
Copy link
Contributor Author

pdebuyl commented Nov 10, 2021

Thanks for working on this @pdebuyl!

Hi @rgommers thanks for giving it a look :-)

Doing:

.. for doctest:
    >>> import numpy as np

is quite noisy, as well as fragile (easy to forget in the future). Why can't this import be automatically added by the refguide checker? I'd hope pytest-doctestplus has such a configuration option as well.

The program tools/refguide_check.py includes it. Part of the goal here is to perform the doctests outside of that program.

Those could be removed if we have a "norm" for random numbers in the docs, i.e. setting seeds and expecting the random output to be set.

This is also something that should be done by the doctest runner script/framework. It's just visual noise otherwise, and easy to forget in new tests. And adding explicit seeds in order to make doctests pass in the example code itself is not a good alternative.

Some tests do not set the seed currently, so that there would be many edits for random results. But only "once more" ideally :-)

Another further progress would be related to automatically building the compiled extensions.

Can you explain what this means? It sounds like that's way out of scope for doctests(/examples).

The doctests that rely on C or Cython extensions modules fail because the modules are not build.

@pdebuyl
Copy link
Contributor Author

pdebuyl commented Nov 10, 2021

PS: forgot to mention.

pytest can add the numpy import automatically via a conftest.py file, so that this part could be removed. I would add a file doc/conftest.py. pytest could only be run from the doc directory.

@rgommers
Copy link
Member

I think it'd be good to repeat what has been our stance on this for a very long time: doctests are not tests. Running them as tests with plain pytest is not a great goal if the tradeoff is teaching bad practices like hardcoded seeds in examples, plus visual noise like # doctest: SKIP.

In general, the exact string representation of outputs is also going to vary over platforms, compiler and Python versions, etc. So getting repr's to work outside of one fixed config is not possible. What you then end up doing is skipping such tests, which is worse than ensuring they work correctly on a single well-defined CI config. So having pytest --doctest always pass doesn't seem like the right goal to me.

@rgommers
Copy link
Member

The doctests that rely on C or Cython extensions modules fail because the modules are not build.

This would be nice to achieve. I'm not sure it's possible in general, but given the right "harness" it may be possible to do this in CI (with a list of files to skip probably because they're too complex).

@pdebuyl
Copy link
Contributor Author

pdebuyl commented Nov 10, 2021

Thanks for providing more information. I have been somehow misguided apparently. Does it make any sense to work on pytest for the doctest?

The goal is not that far off. Using global directives (i.e. the option doctest_optionflags = NORMALIZE_WHITESPACE ELLIPSIS ALLOW_UNICODE ALLOW_BYTES in pyteset.ini for instance) allows to unclutter the rst files.

There are tradeoffs to make as well. For instance, some platform specific doctypes cannot pass CI runs on all platforms. The matplotlib repr look like <matplotlib.some_object_name 0x...>.

Being a side contributor I am fine with any direction that NumPy takes. Is the plan to keep on using the refguide_check.py tool? What it does is run the doctests and use some fancier names for #doctest: +SKIP (i.e. may vary, etc) and skip a number of files. This could all be done with pytest as far as I understand.

@pdebuyl
Copy link
Contributor Author

pdebuyl commented Nov 10, 2021

The doctests that rely on C or Cython extensions modules fail because the modules are not build.

This would be nice to achieve. I'm not sure it's possible in general, but given the right "harness" it may be possible to do this in CI (with a list of files to skip probably because they're too complex).

It is impossible by "just" running refguide_check.py or pytest. As a target in a Makefile or within a tests script it could be done of course. I can help if the direction is given :-)

@rgommers
Copy link
Member

I have been somehow misguided apparently. Does it make any sense to work on pytest for the doctest?

I'm sorry about that. Don't get me wrong, most of this PR is still quite useful - thank you for working on it!

Is the plan to keep on using the refguide_check.py tool?

The reason some folks wanted to get rid of refguide_check.py is just because it's a long script with functionality that is duplicated in multiple places, like SciPy's refguide_check.py and pytest-doctestplus. It makes perfect sense to try to consolidate that if it eases the maintenance burden. So that's fine, and I think everyone would be perfectly happy to switch to pytest-doctestplus.

The one thing to avoid is to treat the examples like unit tests - they're examples, so need to teach best practices and not be cluttered. The one reason to run them through pytest-doctestplus or similar is that we can then verify that they all run. Whether repr's are identical is not all that relevant.

The matplotlib repr look like <matplotlib.some_object_name 0x...>.

These Matplotlib repr's are probably best left out - they don't really serve a purpose except distract the reader from the example code.

It is impossible by "just" running refguide_check.py or pytest. As a target in a Makefile or within a tests script it could be done of course. I can help if the direction is given :-)

Yes, I suspected that. It may not be worth the trouble. Such tests are going to be inherently fragile and slow to run.

@pdebuyl
Copy link
Contributor Author

pdebuyl commented Nov 10, 2021

No worry about the confusion!

I'll add a conftest.py and clean up the imports.

Regarding

The matplotlib repr look like <matplotlib.some_object_name 0x...>.

These Matplotlib repr's are probably best left out - they don't really serve a purpose except distract the reader from the example code.

apart from writing a pytest plugin to allow variation in the object's adress I don't see a way forward. refguide_check.py does a lot of "progressive" skipping of the output.

I'll check if an existing plugin would work.

@rgommers
Copy link
Member

apart from writing a pytest plugin to allow variation in the object's adress

I'm not sure what that means. If it makes sense to show <matplotlib.xxx then using ... is fine. What I mean is that the whole output should not be there. An example like:

>>> ax = plt.figure()
>>> ax.plot(...)
>>> ax.plot(....)
>>> ax.legend

should not be polluted by <matplotlib.xxx output below each line.

@pdebuyl
Copy link
Contributor Author

pdebuyl commented Nov 10, 2021

In your example only the last line would show the output.

<matplotlib.legend.Legend at 0x7f83b4d8da00>

or

<matplotlib.legend.Legend at 0x...>

for ellipsis matching.

@rgommers
Copy link
Member

In your example only the last line would show the output.

Okay, bad example then - if we change it with any other MPL calls, I'd prefer not to show unused return values. And the legend output also not. It's be great if pytest-doctestplusplus lets us do that.

@pdebuyl
Copy link
Contributor Author

pdebuyl commented Nov 10, 2021

I removed the explicit imports.

To get an idea of the remaining annoyances:

cd doc/source
ack SKIP
grep -h 0x... $(find . -name '*.rst')

doctestplus was not actually necessary at this stage.

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pdebuyl, this is starting to get close. A few inline comments, and one concern for maintenance: doc/pytest.ini is a copy of pytest.ini, that seems to break the DRY principle. Is there a way to get around that? Seems like that should be possible via a test script. I also don't quite understand why it's needed in the first place, the pytest.ini in the root of the repo should be picked up I'd think.

@@ -1685,6 +1687,7 @@ With Matplotlib, you have access to an enormous number of visualization options.
>>> Z = np.sin(R)

>>> ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap='viridis')
<mpl_toolkits.mplot3d.art3d.Poly3DCollection object at 0x...>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I was talking about. This is 4 lines of random output that just detracts from the example. We seem to have a couple more, but not many luckily:

$ rg "matplotlib.lines"
doc/source/reference/routines.polynomials.classes.rst
338:    [<matplotlib.lines.Line2D object at 0x2136c10>]
341:    [<matplotlib.lines.Line2D object at 0x1cf2890>]

numpy/core/function_base.py
112:    [<matplotlib.lines.Line2D object at 0x...>]
114:    [<matplotlib.lines.Line2D object at 0x...>]
267:    [<matplotlib.lines.Line2D object at 0x...>]
269:    [<matplotlib.lines.Line2D object at 0x...>]
380:    [<matplotlib.lines.Line2D object at 0x...>]
382:    [<matplotlib.lines.Line2D object at 0x...>]

numpy/lib/function_base.py
1514:    [<matplotlib.lines.Line2D object at 0x...>]
1516:    [<matplotlib.lines.Line2D object at 0x...>]
2904:    [<matplotlib.lines.Line2D object at 0x...>]
2923:    [<matplotlib.lines.Line2D object at 0x...>]
3013:    [<matplotlib.lines.Line2D object at 0x...>]
3032:    [<matplotlib.lines.Line2D object at 0x...>]
3116:    [<matplotlib.lines.Line2D object at 0x...>]
3135:    [<matplotlib.lines.Line2D object at 0x...>]
3218:    [<matplotlib.lines.Line2D object at 0x...>]
3235:    [<matplotlib.lines.Line2D object at 0x...>]
3497:    [<matplotlib.lines.Line2D object at 0x...>]
3514:    [<matplotlib.lines.Line2D object at 0x...>]
3603:    [<matplotlib.lines.Line2D object at 0x...>]

numpy/fft/_pocketfft.py
207:    [<matplotlib.lines.Line2D object at 0x...>, <matplotlib.lines.Line2D object at 0x...>]
304:    [<matplotlib.lines.Line2D object at ...>]
306:    [<matplotlib.lines.Line2D object at ...>]

I'd much prefer not to see these lines. It's best just not to see them (the example runner should ignore them), second best to add _ = to the plot calls to swallow the output.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For NumPy we don't have many plots in the docs, so it's not a big deal. In SciPy this would be a lot of noise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This output line can still be removed.

148293216 # may vary
>>> f(a)
>>> f(a) #doctest: +SKIP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need #doctest: +SKIP if we already have # may vary for the output? This should work without the skip.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is linked to the fact that I was working toward a split refguide_check.py / pytest testing scenario. The # may vary is internal refguide_check.py cooking, pytest does not take it into consideration.

@pdebuyl
Copy link
Contributor Author

pdebuyl commented Nov 15, 2021

Hi @rgommers

I suppressed the duplication, one pytest.ini is indeed sufficient.

Regarding matplotlib output, solutions:

  1. Use _ = plt.plot, etc. Not very intuitive for readers.
  2. Append #doctest: +SKIP to every visible plot command. The directive does not show in the html output, so this is likely of for readers.
  3. Use a pytest plugin to transform <matplotlib.image.AxesImage at 0x7f956908c280> into <matplotlib.image.AxesImage at 0x...> on the fly. The ELLIPSIS option makes it pass ok. I tested such a plugin and it boils down to got = re.sub("0x[0-9a-fA-F]+", "0x...", got) intercepting the doctest output checker. Bonus: would be useful for SciPy "as is" :-)

@rgommers
Copy link
Member

3. Use a pytest plugin to transform <matplotlib.image.AxesImage at 0x7f956908c280> into <matplotlib.image.AxesImage at 0x...> on the fly. The ELLIPSIS option makes it pass ok. I tested such a plugin and it boils down to got = re.sub("0x[0-9a-fA-F]+", "0x...", got) intercepting the doctest output checker. Bonus: would be useful for SciPy "as is" :-)

I like the bonus:) Just to make sure I understand: you want to keep the output line with ellipsis in this case? Why can't the doctest plugin just swallow the output of any plot call? I.e. the same as dynamically adding #doctest: +SKIP.

@pdebuyl
Copy link
Contributor Author

pdebuyl commented Nov 15, 2021

  1. Use a pytest plugin to transform <matplotlib.image.AxesImage at 0x7f956908c280> into <matplotlib.image.AxesImage at 0x...> on the fly. The ELLIPSIS option makes it pass ok. I tested such a plugin and it boils down to got = re.sub("0x[0-9a-fA-F]+", "0x...", got) intercepting the doctest output checker. Bonus: would be useful for SciPy "as is" :-)

I like the bonus:) Just to make sure I understand: you want to keep the output line with ellipsis in this case? Why can't the doctest plugin just swallow the output of any plot call? I.e. the same as dynamically adding #doctest: +SKIP.

"Solution 3" allows to keep "full matplotlib output lines" such as <matplotlib.image.AxesImage at 0x7f956908c280> and would accept the output when running the test whatever the address is. One variation that I hadn't though of would be to transform such lines into empty lines so that we could remove them from the docs.

@rgommers
Copy link
Member

"Solution 3" allows to keep "full matplotlib output lines" such as <matplotlib.image.AxesImage at 0x7f956908c280> and would accept the output when running the test whatever the address is.

That's even worse than the ellipsis variant I think.

One variation that I hadn't though of would be to transform such lines into empty lines so that we could remove them from the docs.

This would be good.

@pdebuyl
Copy link
Contributor Author

pdebuyl commented Nov 16, 2021

This way?

@rgommers
Copy link
Member

Yes, that looks great, thanks.

@pdebuyl
Copy link
Contributor Author

pdebuyl commented Nov 16, 2021

Cool, let me know if I should do anything else to get the PR in.

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, only a couple of small things left.

I can live with the duplicated conftest.py. Everything else seems like an improvement to me.

doc/source/reference/random/generator.rst Outdated Show resolved Hide resolved
doc/source/reference/random/generator.rst Show resolved Hide resolved
doc/source/reference/random/generator.rst Outdated Show resolved Hide resolved
doc/source/user/absolute_beginners.rst Outdated Show resolved Hide resolved
@@ -1685,6 +1687,7 @@ With Matplotlib, you have access to an enormous number of visualization options.
>>> Z = np.sin(R)

>>> ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap='viridis')
<mpl_toolkits.mplot3d.art3d.Poly3DCollection object at 0x...>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This output line can still be removed.

@@ -57,7 +57,7 @@ array([[2., 0., 0., 0., 0.],
Notice that the return type is a standard ``numpy.ndarray``.

>>> type(np.multiply(arr, 2))
numpy.ndarray
<class 'numpy.ndarray'>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note (change is fine): this is Python interpreter, the old value was IPython interpreter. Most users will see the latter, so this is not necessarily helpful, but I don't mind much either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize this. Any version is fine by me.

@pdebuyl
Copy link
Contributor Author

pdebuyl commented Nov 17, 2021

The duplicate conftest make sense in a way, as we don't test the NumPy library and its doc in the same way.

Regarding the output of random functions, +SKIP is the only way except for setting the seed. This could be done in conftest. I don't know if it would persist between rst files, if it is the case adding a random call in one file could modify the output of other files. For now the remaining +SKIP are for random code, the output of np.empty, the output of id, I/O routines, and compiled extensions.

Passing other parts of the docs may be more challenging and maybe less relevant (the release directory contains deprecated or removed functions for instance, the dev some machine specific output, etc).

@pdebuyl
Copy link
Contributor Author

pdebuyl commented Dec 7, 2021

Hi @rgommers I think that I addressed all the comments :-)

@rgommers
Copy link
Member

rgommers commented Dec 8, 2021

This looks good to me now, thanks @pdebuyl. There are a couple of CI errors that need addressing, most importantly the doc build should pass. Could you have a look at that?

@pdebuyl
Copy link
Contributor Author

pdebuyl commented Dec 8, 2021

Hi @rgommers the CI build break on the docs at an unrelated location

https://app.circleci.com/pipelines/github/numpy/numpy/11714/workflows/2baffc2a-41e5-428e-a069-f1c20144108b/jobs/23825?invite=true#step-107-187

The error is TypeError: _argmax_dispatcher() got an unexpected keyword argument 'keepdims' for the argmin function. This should not error as far as I understand :-/

I fixed a linter error in the meantime.

@charris
Copy link
Member

charris commented Dec 8, 2021

A rebase on current main will probably fix the circleci error.

pdebuyl and others added 10 commits December 8, 2021 15:29
1. Add `import numpy as np` in rst files
2. Update NumPy repr for array (whitespace)
3. Update bytearray representation
4. Fix tiny output formatting (`<class ...>`, etc)
5. Format tracebacks
6. Skip random number tests or some platform-dependent outputs
7. Add `<matplotlib. ... at 0x...>` or similar output lines where
   missing
8. Set seed
Add conftest.py and pytest.ini files in doc directory
@pdebuyl
Copy link
Contributor Author

pdebuyl commented Dec 8, 2021

Thanks @charris rebase done (+ force push)

@pdebuyl
Copy link
Contributor Author

pdebuyl commented Dec 8, 2021

"All checks have passed" :-)

@pdebuyl
Copy link
Contributor Author

pdebuyl commented Dec 21, 2021

I fixed the merge conflict that came up.

@mattip mattip merged commit 6d5bc23 into numpy:main Dec 22, 2021
@mattip
Copy link
Member

mattip commented Dec 22, 2021

Thanks. Any follow-on niggles can be handled in future PRs.

@rgommers
Copy link
Member

Great:) Thanks for all the work on this @pdebuyl!

@pdebuyl
Copy link
Contributor Author

pdebuyl commented Dec 22, 2021

🥳
Thanks for following up.

@pdebuyl pdebuyl deleted the doctest_for_pytest branch December 22, 2021 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants