DOC: make some doctests in user,reference pass pytest #20288

pdebuyl · 2021-11-03T14:37:58Z

Add import numpy as np in rst files
Update NumPy repr for array (whitespace)
Update bytearray representation
Fix tiny output formatting (<class ...>, etc)
Format tracebacks
Skip random number tests or some platform-dependent outputs
Add <matplotlib. ... at 0x...> or similar output lines where
missing
Set seed

Status:

pytest --doctest-glob="*.rst" doc/source/user still fails 4 tests related to example compiled extensions.
doc/source/user/basics.rec.rst fails because of the output of structured_to_unstructured on line 490. I don't know what is expected there.
pytest --doctest-glob="*.rst" doc/source/reference only fails the cython file.

See #15846 for previous discussion.

This PR is also related to #15845 about using astropy/pytest-doctestplus. This PR brings many more .rst files to pass pytest, which is (in my opinion) good.

I had to add a few +SKIP directives for random results. Those could be removed if we have a "norm" for random numbers in the docs, i.e. setting seeds and expecting the random output to be set.

Another further progress would be related to automatically building the compiled extensions.

rgommers · 2021-11-10T08:13:49Z

Thanks for working on this @pdebuyl!

Doing:

.. for doctest:
    >>> import numpy as np

is quite noisy, as well as fragile (easy to forget in the future). Why can't this import be automatically added by the refguide checker? I'd hope pytest-doctestplus has such a configuration option as well.

Those could be removed if we have a "norm" for random numbers in the docs, i.e. setting seeds and expecting the random output to be set.

This is also something that should be done by the doctest runner script/framework. It's just visual noise otherwise, and easy to forget in new tests. And adding explicit seeds in order to make doctests pass in the example code itself is not a good alternative.

Another further progress would be related to automatically building the compiled extensions.

Can you explain what this means? It sounds like that's way out of scope for doctests(/examples).

rgommers · 2021-11-10T08:15:13Z

The whitespace and repr changes here all look good to me.

pdebuyl · 2021-11-10T14:03:15Z

Thanks for working on this @pdebuyl!

Hi @rgommers thanks for giving it a look :-)

Doing:
.. for doctest:
    >>> import numpy as np
is quite noisy, as well as fragile (easy to forget in the future). Why can't this import be automatically added by the refguide checker? I'd hope pytest-doctestplus has such a configuration option as well.

The program tools/refguide_check.py includes it. Part of the goal here is to perform the doctests outside of that program.

Those could be removed if we have a "norm" for random numbers in the docs, i.e. setting seeds and expecting the random output to be set.

This is also something that should be done by the doctest runner script/framework. It's just visual noise otherwise, and easy to forget in new tests. And adding explicit seeds in order to make doctests pass in the example code itself is not a good alternative.

Some tests do not set the seed currently, so that there would be many edits for random results. But only "once more" ideally :-)

Another further progress would be related to automatically building the compiled extensions.

Can you explain what this means? It sounds like that's way out of scope for doctests(/examples).

The doctests that rely on C or Cython extensions modules fail because the modules are not build.

pdebuyl · 2021-11-10T14:04:31Z

PS: forgot to mention.

pytest can add the numpy import automatically via a conftest.py file, so that this part could be removed. I would add a file doc/conftest.py. pytest could only be run from the doc directory.

rgommers · 2021-11-10T14:39:07Z

I think it'd be good to repeat what has been our stance on this for a very long time: doctests are not tests. Running them as tests with plain pytest is not a great goal if the tradeoff is teaching bad practices like hardcoded seeds in examples, plus visual noise like # doctest: SKIP.

In general, the exact string representation of outputs is also going to vary over platforms, compiler and Python versions, etc. So getting repr's to work outside of one fixed config is not possible. What you then end up doing is skipping such tests, which is worse than ensuring they work correctly on a single well-defined CI config. So having pytest --doctest always pass doesn't seem like the right goal to me.

rgommers · 2021-11-10T14:41:16Z

The doctests that rely on C or Cython extensions modules fail because the modules are not build.

This would be nice to achieve. I'm not sure it's possible in general, but given the right "harness" it may be possible to do this in CI (with a list of files to skip probably because they're too complex).

pdebuyl · 2021-11-10T18:30:21Z

Thanks for providing more information. I have been somehow misguided apparently. Does it make any sense to work on pytest for the doctest?

The goal is not that far off. Using global directives (i.e. the option doctest_optionflags = NORMALIZE_WHITESPACE ELLIPSIS ALLOW_UNICODE ALLOW_BYTES in pyteset.ini for instance) allows to unclutter the rst files.

There are tradeoffs to make as well. For instance, some platform specific doctypes cannot pass CI runs on all platforms. The matplotlib repr look like <matplotlib.some_object_name 0x...>.

Being a side contributor I am fine with any direction that NumPy takes. Is the plan to keep on using the refguide_check.py tool? What it does is run the doctests and use some fancier names for #doctest: +SKIP (i.e. may vary, etc) and skip a number of files. This could all be done with pytest as far as I understand.

pdebuyl · 2021-11-10T18:32:29Z

The doctests that rely on C or Cython extensions modules fail because the modules are not build.

This would be nice to achieve. I'm not sure it's possible in general, but given the right "harness" it may be possible to do this in CI (with a list of files to skip probably because they're too complex).

It is impossible by "just" running refguide_check.py or pytest. As a target in a Makefile or within a tests script it could be done of course. I can help if the direction is given :-)

rgommers · 2021-11-10T18:43:34Z

I have been somehow misguided apparently. Does it make any sense to work on pytest for the doctest?

I'm sorry about that. Don't get me wrong, most of this PR is still quite useful - thank you for working on it!

Is the plan to keep on using the refguide_check.py tool?

The reason some folks wanted to get rid of refguide_check.py is just because it's a long script with functionality that is duplicated in multiple places, like SciPy's refguide_check.py and pytest-doctestplus. It makes perfect sense to try to consolidate that if it eases the maintenance burden. So that's fine, and I think everyone would be perfectly happy to switch to pytest-doctestplus.

The one thing to avoid is to treat the examples like unit tests - they're examples, so need to teach best practices and not be cluttered. The one reason to run them through pytest-doctestplus or similar is that we can then verify that they all run. Whether repr's are identical is not all that relevant.

The matplotlib repr look like <matplotlib.some_object_name 0x...>.

These Matplotlib repr's are probably best left out - they don't really serve a purpose except distract the reader from the example code.

It is impossible by "just" running refguide_check.py or pytest. As a target in a Makefile or within a tests script it could be done of course. I can help if the direction is given :-)

Yes, I suspected that. It may not be worth the trouble. Such tests are going to be inherently fragile and slow to run.

pdebuyl · 2021-11-10T19:28:18Z

No worry about the confusion!

I'll add a conftest.py and clean up the imports.

Regarding

The matplotlib repr look like <matplotlib.some_object_name 0x...>.
These Matplotlib repr's are probably best left out - they don't really serve a purpose except distract the reader from the example code.

apart from writing a pytest plugin to allow variation in the object's adress I don't see a way forward. refguide_check.py does a lot of "progressive" skipping of the output.

I'll check if an existing plugin would work.

rgommers · 2021-11-10T19:37:16Z

apart from writing a pytest plugin to allow variation in the object's adress

I'm not sure what that means. If it makes sense to show <matplotlib.xxx then using ... is fine. What I mean is that the whole output should not be there. An example like:

>>> ax = plt.figure()
>>> ax.plot(...)
>>> ax.plot(....)
>>> ax.legend

should not be polluted by <matplotlib.xxx output below each line.

pdebuyl · 2021-11-10T20:05:51Z

In your example only the last line would show the output.

<matplotlib.legend.Legend at 0x7f83b4d8da00>

or

<matplotlib.legend.Legend at 0x...>

for ellipsis matching.

rgommers · 2021-11-10T20:33:49Z

In your example only the last line would show the output.

Okay, bad example then - if we change it with any other MPL calls, I'd prefer not to show unused return values. And the legend output also not. It's be great if pytest-doctestplusplus lets us do that.

pdebuyl · 2021-11-10T22:15:36Z

I removed the explicit imports.

To get an idea of the remaining annoyances:

cd doc/source
ack SKIP
grep -h 0x... $(find . -name '*.rst')

doctestplus was not actually necessary at this stage.

rgommers

Thanks @pdebuyl, this is starting to get close. A few inline comments, and one concern for maintenance: doc/pytest.ini is a copy of pytest.ini, that seems to break the DRY principle. Is there a way to get around that? Seems like that should be possible via a test script. I also don't quite understand why it's needed in the first place, the pytest.ini in the root of the repo should be picked up I'd think.

rgommers · 2021-11-14T21:49:57Z

doc/source/user/absolute_beginners.rst

@@ -1685,6 +1687,7 @@ With Matplotlib, you have access to an enormous number of visualization options.
  >>> Z = np.sin(R)

  >>> ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap='viridis')
+  <mpl_toolkits.mplot3d.art3d.Poly3DCollection object at 0x...>


This is what I was talking about. This is 4 lines of random output that just detracts from the example. We seem to have a couple more, but not many luckily:

$ rg "matplotlib.lines" doc/source/reference/routines.polynomials.classes.rst 338: [<matplotlib.lines.Line2D object at 0x2136c10>] 341: [<matplotlib.lines.Line2D object at 0x1cf2890>] numpy/core/function_base.py 112: [<matplotlib.lines.Line2D object at 0x...>] 114: [<matplotlib.lines.Line2D object at 0x...>] 267: [<matplotlib.lines.Line2D object at 0x...>] 269: [<matplotlib.lines.Line2D object at 0x...>] 380: [<matplotlib.lines.Line2D object at 0x...>] 382: [<matplotlib.lines.Line2D object at 0x...>] numpy/lib/function_base.py 1514: [<matplotlib.lines.Line2D object at 0x...>] 1516: [<matplotlib.lines.Line2D object at 0x...>] 2904: [<matplotlib.lines.Line2D object at 0x...>] 2923: [<matplotlib.lines.Line2D object at 0x...>] 3013: [<matplotlib.lines.Line2D object at 0x...>] 3032: [<matplotlib.lines.Line2D object at 0x...>] 3116: [<matplotlib.lines.Line2D object at 0x...>] 3135: [<matplotlib.lines.Line2D object at 0x...>] 3218: [<matplotlib.lines.Line2D object at 0x...>] 3235: [<matplotlib.lines.Line2D object at 0x...>] 3497: [<matplotlib.lines.Line2D object at 0x...>] 3514: [<matplotlib.lines.Line2D object at 0x...>] 3603: [<matplotlib.lines.Line2D object at 0x...>] numpy/fft/_pocketfft.py 207: [<matplotlib.lines.Line2D object at 0x...>, <matplotlib.lines.Line2D object at 0x...>] 304: [<matplotlib.lines.Line2D object at ...>] 306: [<matplotlib.lines.Line2D object at ...>]

I'd much prefer not to see these lines. It's best just not to see them (the example runner should ignore them), second best to add _ = to the plot calls to swallow the output.

For NumPy we don't have many plots in the docs, so it's not a big deal. In SciPy this would be a lot of noise.

This output line can still be removed.

rgommers · 2021-11-14T21:51:30Z

doc/source/user/quickstart.rst

    148293216  # may vary
-    >>> f(a)
+    >>> f(a)   #doctest: +SKIP


Why do we need #doctest: +SKIP if we already have # may vary for the output? This should work without the skip.

This is linked to the fact that I was working toward a split refguide_check.py / pytest testing scenario. The # may vary is internal refguide_check.py cooking, pytest does not take it into consideration.

pdebuyl · 2021-11-15T10:19:13Z

Hi @rgommers

I suppressed the duplication, one pytest.ini is indeed sufficient.

Regarding matplotlib output, solutions:

Use _ = plt.plot, etc. Not very intuitive for readers.
Append #doctest: +SKIP to every visible plot command. The directive does not show in the html output, so this is likely of for readers.
Use a pytest plugin to transform <matplotlib.image.AxesImage at 0x7f956908c280> into <matplotlib.image.AxesImage at 0x...> on the fly. The ELLIPSIS option makes it pass ok. I tested such a plugin and it boils down to got = re.sub("0x[0-9a-fA-F]+", "0x...", got) intercepting the doctest output checker. Bonus: would be useful for SciPy "as is" :-)

rgommers · 2021-11-15T21:29:43Z

3. Use a pytest plugin to transform <matplotlib.image.AxesImage at 0x7f956908c280> into <matplotlib.image.AxesImage at 0x...> on the fly. The ELLIPSIS option makes it pass ok. I tested such a plugin and it boils down to got = re.sub("0x[0-9a-fA-F]+", "0x...", got) intercepting the doctest output checker. Bonus: would be useful for SciPy "as is" :-)

I like the bonus:) Just to make sure I understand: you want to keep the output line with ellipsis in this case? Why can't the doctest plugin just swallow the output of any plot call? I.e. the same as dynamically adding #doctest: +SKIP.

pdebuyl · 2021-11-15T22:07:38Z

Use a pytest plugin to transform <matplotlib.image.AxesImage at 0x7f956908c280> into <matplotlib.image.AxesImage at 0x...> on the fly. The ELLIPSIS option makes it pass ok. I tested such a plugin and it boils down to got = re.sub("0x[0-9a-fA-F]+", "0x...", got) intercepting the doctest output checker. Bonus: would be useful for SciPy "as is" :-)

I like the bonus:) Just to make sure I understand: you want to keep the output line with ellipsis in this case? Why can't the doctest plugin just swallow the output of any plot call? I.e. the same as dynamically adding #doctest: +SKIP.

"Solution 3" allows to keep "full matplotlib output lines" such as <matplotlib.image.AxesImage at 0x7f956908c280> and would accept the output when running the test whatever the address is. One variation that I hadn't though of would be to transform such lines into empty lines so that we could remove them from the docs.

rgommers · 2021-11-16T08:37:44Z

"Solution 3" allows to keep "full matplotlib output lines" such as <matplotlib.image.AxesImage at 0x7f956908c280> and would accept the output when running the test whatever the address is.

That's even worse than the ellipsis variant I think.

One variation that I hadn't though of would be to transform such lines into empty lines so that we could remove them from the docs.

This would be good.

pdebuyl · 2021-11-16T10:59:49Z

This way?

rgommers · 2021-11-16T11:19:42Z

Yes, that looks great, thanks.

pdebuyl · 2021-11-16T12:40:58Z

Cool, let me know if I should do anything else to get the PR in.

rgommers

Looks good, only a couple of small things left.

I can live with the duplicated conftest.py. Everything else seems like an improvement to me.

doc/source/reference/random/generator.rst

doc/source/user/absolute_beginners.rst

rgommers · 2021-11-17T06:40:07Z

doc/source/user/absolute_beginners.rst

@@ -1685,6 +1687,7 @@ With Matplotlib, you have access to an enormous number of visualization options.
  >>> Z = np.sin(R)

  >>> ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap='viridis')
+  <mpl_toolkits.mplot3d.art3d.Poly3DCollection object at 0x...>


This output line can still be removed.

rgommers · 2021-11-17T06:43:17Z

doc/source/user/basics.dispatch.rst

@@ -57,7 +57,7 @@ array([[2., 0., 0., 0., 0.],
 Notice that the return type is a standard ``numpy.ndarray``.

 >>> type(np.multiply(arr, 2))
-numpy.ndarray
+<class 'numpy.ndarray'>


note (change is fine): this is Python interpreter, the old value was IPython interpreter. Most users will see the latter, so this is not necessarily helpful, but I don't mind much either way.

I didn't realize this. Any version is fine by me.

pdebuyl · 2021-11-17T09:25:11Z

The duplicate conftest make sense in a way, as we don't test the NumPy library and its doc in the same way.

Regarding the output of random functions, +SKIP is the only way except for setting the seed. This could be done in conftest. I don't know if it would persist between rst files, if it is the case adding a random call in one file could modify the output of other files. For now the remaining +SKIP are for random code, the output of np.empty, the output of id, I/O routines, and compiled extensions.

Passing other parts of the docs may be more challenging and maybe less relevant (the release directory contains deprecated or removed functions for instance, the dev some machine specific output, etc).

pdebuyl · 2021-12-07T20:50:57Z

Hi @rgommers I think that I addressed all the comments :-)

rgommers · 2021-12-08T12:54:52Z

This looks good to me now, thanks @pdebuyl. There are a couple of CI errors that need addressing, most importantly the doc build should pass. Could you have a look at that?

pdebuyl · 2021-12-08T14:02:07Z

Hi @rgommers the CI build break on the docs at an unrelated location

https://app.circleci.com/pipelines/github/numpy/numpy/11714/workflows/2baffc2a-41e5-428e-a069-f1c20144108b/jobs/23825?invite=true#step-107-187

The error is TypeError: _argmax_dispatcher() got an unexpected keyword argument 'keepdims' for the argmin function. This should not error as far as I understand :-/

I fixed a linter error in the meantime.

charris · 2021-12-08T14:26:43Z

A rebase on current main will probably fix the circleci error.

1. Add `import numpy as np` in rst files 2. Update NumPy repr for array (whitespace) 3. Update bytearray representation 4. Fix tiny output formatting (`<class ...>`, etc) 5. Format tracebacks 6. Skip random number tests or some platform-dependent outputs 7. Add `<matplotlib. ... at 0x...>` or similar output lines where missing 8. Set seed

Add conftest.py and pytest.ini files in doc directory

Contains code inspired by https://github.com/wooyek/pytest-doctest-ellipsis-markers (MIT license)

pdebuyl · 2021-12-08T14:32:07Z

Thanks @charris rebase done (+ force push)

pdebuyl · 2021-12-08T16:08:04Z

"All checks have passed" :-)

pdebuyl · 2021-12-21T22:47:28Z

I fixed the merge conflict that came up.

mattip · 2021-12-22T12:32:23Z

Thanks. Any follow-on niggles can be handled in future PRs.

rgommers · 2021-12-22T13:22:45Z

Great:) Thanks for all the work on this @pdebuyl!

pdebuyl · 2021-12-22T14:01:12Z

🥳
Thanks for following up.

pdebuyl mentioned this pull request Nov 4, 2021

DOC: remove SKIP in the rst doctests #15846

Open

charris changed the title ~~[DOC] make some doctests in user,reference pass pytest~~ DOC: make some doctests in user,reference pass pytest Nov 5, 2021

github-actions bot added the 04 - Documentation label Nov 5, 2021

rgommers reviewed Nov 14, 2021

View reviewed changes

rgommers reviewed Nov 17, 2021

View reviewed changes

pdebuyl and others added 10 commits December 8, 2021 15:29

DOC: remove explicit imports in rst files

3995a59

Add conftest.py and pytest.ini files in doc directory

remove leftover doctestplus plugin

af39d67

remove duplicate pytest.ini, use agg backend

ac8e7d4

Ignore matplotlib objects output lines

496c02d

Contains code inspired by https://github.com/wooyek/pytest-doctest-ellipsis-markers (MIT license)

Remove matplotlib output lines

ed32022

Use empty lines for mpl_toolkits.mplot3d objects

5ee37b0

remove mplot3d object output line

393fcf8

revert default_rng seeding and use SKIP

4a9e390

fix linter error

831849e

pdebuyl force-pushed the doctest_for_pytest branch from 731f0d0 to 831849e Compare December 8, 2021 14:29

Merge branch 'main' into doctest_for_pytest

7bc1d51

mattip merged commit 6d5bc23 into numpy:main Dec 22, 2021

pdebuyl deleted the doctest_for_pytest branch December 22, 2021 14:05

DOC: make some doctests in user,reference pass pytest #20288

DOC: make some doctests in user,reference pass pytest #20288

Conversation

pdebuyl commented Nov 3, 2021

rgommers commented Nov 10, 2021

rgommers commented Nov 10, 2021

pdebuyl commented Nov 10, 2021

pdebuyl commented Nov 10, 2021

rgommers commented Nov 10, 2021

rgommers commented Nov 10, 2021

pdebuyl commented Nov 10, 2021

pdebuyl commented Nov 10, 2021

rgommers commented Nov 10, 2021

pdebuyl commented Nov 10, 2021

rgommers commented Nov 10, 2021

pdebuyl commented Nov 10, 2021

rgommers commented Nov 10, 2021

pdebuyl commented Nov 10, 2021

rgommers left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pdebuyl commented Nov 15, 2021

rgommers commented Nov 15, 2021

pdebuyl commented Nov 15, 2021

rgommers commented Nov 16, 2021

pdebuyl commented Nov 16, 2021

rgommers commented Nov 16, 2021

pdebuyl commented Nov 16, 2021

rgommers left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pdebuyl commented Nov 17, 2021

pdebuyl commented Dec 7, 2021

rgommers commented Dec 8, 2021

pdebuyl commented Dec 8, 2021

charris commented Dec 8, 2021

pdebuyl commented Dec 8, 2021

pdebuyl commented Dec 8, 2021

pdebuyl commented Dec 21, 2021

mattip commented Dec 22, 2021

rgommers commented Dec 22, 2021

pdebuyl commented Dec 22, 2021