Enforce docstring type and default consistency #4155

akaszynski · 2023-03-18T16:01:30Z

This PR is more of a discussion + code suggestion rather than a genuine PR as I'd like to gain traction before implementing this widely.

@akeak, this is particularly for you.

Looking across our docs, I've found that in, for example, DataSet.find_closest_cell, the Sequence type does not get rendered properly:

This isn't a big deal, we have many types that have no links, but still it's annoying. Plus, we're reminded of the "is numpy.ndarray a sequence?" question since we have to accept both a Sequence and np.ndarray here.

After some brief research, I found this SO answer stating that a numpy.ndarray is a sequence and not a Sequence.

This is actually incredibly helpful because this lets us simplify out documentation because we're fundamentally worried about if it is ordered collection of objects that implementes __len__ and __get_item__. We're really concerned about duck typing here, not if it's inherited from collections.abc.Sequence (capital "s").

Therefore, I propose we change our docstrings to use the ducktyping sequence where applicable. This is because, according to PEP 3119, collections.abc check if the object has the expected interface with without requiring true subclassing.

In fact, this is appears the same as VTK's rules:

>>> import vtk
>>> vpts = vtk.vtkPoints()
>>> vpts.SetNumberOfPoints(1)
>>> vpts.SetPoint(0, frozenset({1, 2, 3}))
TypeError: SetPoint argument 2: expected a sequence of 3 values, got frozenset

>>> vpts.SetPoint(0, range(1, 4))
>>> vpts.GetPoint(0)
(1.0, 2.0, 3.0)

Therefore, to finally clear up the confusion, wherever possible, replace iterable, Sequence or np.ndarray, or whatever mess of values we have with just sequence:

Agreement with type hints

We already have a mismatch between our type hints and the docstring, and this suggestion will not improve it. However, I'd argue that our existing typehints, while helpful for mypy, are not helpful for humans. Furthermore, in DataSet.find_closest_cell, we exclude range and other possible types that work. For example:

import numpy as np
import pyvista as pv
from pyvista.utilities.arrays import _coerce_pointslike_arg

# Prepare test sequences
test_sequences = [
    [1, 2, 3],  # list
    (1, 2, 3),  # tuple
    {1, 2, 3},  # set
    frozenset({1, 2, 3}),  # frozenset
    bytearray([1, 2, 3]),  # bytearray
    bytes([1, 2, 3]),  # bytes
    range(1, 4),  # range
]

# Test each sequence type
for seq in test_sequences:
    try:
        result = _coerce_pointslike_arg(seq)
        print(f"Input Type: {type(seq)}")
        print(f"Result Type: {type(result)}")
        print(f"Result: {result}\n")
    except Exception as e:
        print(f"Input Type: {type(seq)}")
        print(f"Error: {e}\n")

Output:

Input Type: <class 'list'>
Result Type: <class 'tuple'>
Result: (array([[1, 2, 3]]), True)

Input Type: <class 'tuple'>
Result Type: <class 'tuple'>
Result: (array([[1, 2, 3]]), True)

Input Type: <class 'set'>
Error: Given points must be a sequence or an array.

Input Type: <class 'frozenset'>
Error: Given points must be a sequence or an array.

Input Type: <class 'bytearray'>
Result Type: <class 'tuple'>
Result: (array([[1, 2, 3]], dtype=uint8), True)

Input Type: <class 'bytes'>
Error: Given point must have three values

Input Type: <class 'range'>
Result Type: <class 'tuple'>
Result: (array([[1, 2, 3]]), True)

A few questions then:

Can we rewrite our typehints such that they're more helpful?
Will it become more helpful or more confusing by moving to sequence[float] wherever possible even though it disagrees with our type hints?
Does any of this matter?

codecov · 2023-03-18T16:12:58Z

Codecov Report

Merging #4155 (82209fc) into main (1e84e8e) will increase coverage by 0.17%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #4155      +/-   ##
==========================================
+ Coverage   95.57%   95.74%   +0.17%     
==========================================
  Files          95       96       +1     
  Lines       20282    20422     +140     
==========================================
+ Hits        19384    19553     +169     
+ Misses        898      869      -29

adeak · 2023-03-19T00:55:16Z

Looking across our docs, I've found that in, for example, DataSet.find_closest_cell, the Sequence type does not get rendered properly:

I noticed some of these, they are annoying.

After some brief research, I found this SO answer stating that a numpy.ndarray is a sequence and not a Sequence.

Yes, this makes sense to me.

This is actually incredibly helpful because this lets us simplify out documentation because we're fundamentally worried about if it is ordered collection of objects that implementes __len__ and __get_item__. We're really concerned about duck typing here, not if it's inherited from collections.abc.Sequence (capital "s").

Agreed.

Therefore, I propose we change our docstrings to use the ducktyping sequence where applicable. This is because, according to PEP 3119, collections.abc check if the object has the expected interface with without requiring true subclassing.

This definitely makes sense for the docs at least. And I'm pleasantly surprised that sequence seems to have an intersphinx mapping; I didn't realize that the glossary was included in that. Assuming that's indeed where the rendered link points.

And I also want to note that the first screenshot you showed is wrong on multiple points. Firstly, Sequence(float) is never really correct. It should be Sequence[float]. And secondly, we'd probably want to explicitly use collections.abc.Sequence to get the intersphinx link working. Also note the existence of typing.Sequence that we'd need for Python <=3.8 (but type checkers often run on newer Python...).

Agreement with type hints

We already have a mismatch between our type hints and the docstring, and this suggestion will not improve it. However, I'd argue that our existing typehints, while helpful for mypy, are not helpful for humans. Furthermore, in DataSet.find_closest_cell, we exclude range and other possible types that work. For example:

If what you're saying is "keep type hints correct, let doc types be less correct but more meaningful" then I agree. They serve different purposes so I have no issues letting the two diverge a bit. Of course there should be no blatant contradictions.

A few questions then:

Can we rewrite our typehints such that they're more helpful?

If by "typehints" you mean the actual type annotations, then I think not. If you mean the ones for the docs, sure. See also my previous point.

Will it become more helpful or more confusing by moving to sequence[float] wherever possible even though it disagrees with our type hints?

I think it would make sense to use sequence[float] where it makes sense. Also using sequence[float] when we really and obviously allow integers and NumPy numbers.

Does any of this matter?

Absolutely, I'd expect most end users to be looking at the docs more than anything else. Having broken/ugly/confusing types listed there is not very helpful.

And an additional point of my own: you specifically raised _coerce_pointslike_arg() as an example. And then you saw that things like bytes don't work, even though issubclass(bytes, collections.abc.Sequence). This is because in this helper function (and also a few other functions, I'm sure) we want array-likes, i.e. "whatever works with np.array()". These are mostly nested "rectangular" (i.e. non-ragged) sequences. But there are some exceptions, such as bytes:

>>> np.array(bytes([1, 3, 5])).size
1

>>> np.array(bytes([1, 3, 5]))
array(b'\x01\x03\x05', dtype='|S3')

(Although by some definitions this is still an array-like... so I'm using a definition where the resulting array has a numeric type. This might or might not be headcanon.)

Semantically we're not looking for a sequence here, we want an array-like. But in many many cases we really do want a sequence, or actually a fixed-length iterable of any kind (think bounds, min/max limits, light attenuation values etc.). So we should keep an eye out for places where we need array-likes, and perhaps use that instead of sequence.

In fact it seems that NumPy has an intersphinx link to array_like in its glossary! There's also ArrayLike as of NumPy 1.20 but we probably don't want to use that.

pyvista/core/filters/data_set.py

pyvista/utilities/arrays.py

akaszynski · 2023-03-19T03:57:19Z

If what you're saying is "keep type hints correct, let doc types be less correct but more meaningful" then I agree.

That's what I'm hoping for.

Semantically we're not looking for a sequence here, we want an array-like... So we should keep an eye out for places where we need array-likes.

Fully agree, and if we can use array_like[float] and glossary links work, even better.

Does any of this matter?
Absolutely, I'd expect most end users to be looking at the docs more than anything else.

It's always good to ask, especially before going through a refactor. I want the docs to look like one mind wrote them. If we can converge (or at least start to converge) on an idea then it's worth discussing it.

Thanks for the comp-sci discussion.

akaszynski · 2023-03-19T07:14:34Z

In an effort to keep this PR sub 2k LOC, I'm calling it sufficient for the moment.

Moves from list or tuple --> sequence[float] whenever possible. Uses array_like for sequence[sequence] (as in the case of 2D arrays.
Adds the default whenever available
Miscellaneous fixes (missing optional, bad default)
Use | instead of "or". MNE uses it, see: mne.io.read_raw_gdf

Ready for review.

adeak · 2023-03-19T16:13:46Z

Review in progress. I'm still annoyed by the fact that what we end up with is largely hit or miss:

(this is from DataSetAttributes.set_array()). For some reason float and str get this border around the link, so do pyvista_ndarray and numpy.ndarray, but the bool and the array_like only get a blue link. Due to how wonky this array_like[float] (and similarly elsewhere sequence[float]) looks, I'd love to be able to disable the whole border thing for intersphinx links. Just make them blue text like normal web pages do.

(In the same sentence list and tuple aren't resolved at all, but that's because we don't have an explicit :class: directive there.)

akaszynski · 2023-03-19T20:40:33Z

Due to how wonky this array_like[float] (and similarly elsewhere sequence[float]) looks, I'd love to be able to disable the whole border thing for intersphinx links. Just make them blue text like normal web pages do.

I've started by disabling rendering the parameter types as italic. Left is main, right is this branch:

I feel this improves the readability and helps with nested typing array_like[float] as the brackets get rendered correctly.

I'm going to try disabling the border to see if that cleans things up more (which it probably will).

akaszynski · 2023-03-19T20:53:36Z

Now, here's without the border and the reduction of the text size to fit said border. Again, main left, this branch (doc/sequence-docstring-types) right:

adeak · 2023-03-19T22:06:58Z

Now, here's without the border and the reduction of the text size to fit said border.

I think that looks a lot better ❤️ Here's a screenshot where we have a generic type:

It feels a bit unfamiliar that the type links are so big (i.e. as big as the rest of the text). I think I'll get used to the new readable one fast. Assuming you agree this is better and we should keep it.

adeak

Partial review, 4 more big files to go later (filters/poly_data.py, examples/downloads.py, plotting/plotting.py, plotting/renderer.py).

Sorry if a handful of my comments are obsolete upon posting; I haven't checked them against the newest commits you've made.

pyvista/plotting/_property.py

pyvista/utilities/arrays.py

pyvista/plotting/lookup_table.py

pyvista/core/grid.py

pyvista/core/filters/data_set.py

Co-authored-by: Andras Deak <adeak@users.noreply.github.com>

…43/pyvista/pyvista into doc/sequence-docstring-types

akaszynski · 2023-03-25T02:05:09Z

This is everything but pyvista/plotting/widgets.py since I know @banesullivan is doing a refactor there.

pyvista/utilities/geometric_objects.py

adeak

Second half of first pass. I don't expect to have too many further remarks though.

pyvista/utilities/geometric_objects.py

pyvista/utilities/algorithms.py

pyvista/themes.py

pyvista/core/filters/poly_data.py

pyvista/plotting/renderer.py

pyvista/plotting/plotting.py

Co-authored-by: Andras Deak <adeak@users.noreply.github.com>

adeak

One future-proofing remark, but this is already fine as is if you don't want to bother with it. Thanks for the quick fixes @akaszynski!

pyvista/plotting/renderer.py

akaszynski · 2023-03-26T15:53:34Z

As always, thanks @adeak!

akaszynski added 2 commits March 18, 2023 08:37

fix types

00f1a26

fix _coerse_pointslike_arg

d2c3540

github-actions bot added the documentation Anything related to the documentation/website label Mar 18, 2023

adeak reviewed Mar 19, 2023

View reviewed changes

pyvista/core/filters/data_set.py Outdated Show resolved Hide resolved

pyvista/utilities/arrays.py Outdated Show resolved Hide resolved

Apply suggestions from code review

18c19c2

akaszynski added 3 commits March 18, 2023 23:17

dataset filters and dataset classes

ba69aed

most of plotting

5814c47

more work on plotting

6131899

akaszynski changed the title ~~Docstring type suggestions~~ Clean up docstrings Mar 19, 2023

akaszynski changed the title ~~Clean up docstrings~~ Enforce docstring type and default consistency Mar 19, 2023

akaszynski added 2 commits March 19, 2023 14:14

Merge branch 'main' into doc/sequence-docstring-types

17f3937

disable italic for span.classifier elements

aea6237

remove code border

348ef27

adeak reviewed Mar 19, 2023

View reviewed changes

akaszynski mentioned this pull request Mar 24, 2023

Deprecate ratio special case in threshold_percent #4182

Open

akaszynski and others added 5 commits March 24, 2023 08:40

Apply suggestions from code review

0fad456

Co-authored-by: Andras Deak <adeak@users.noreply.github.com>

docs update

367d26a

Merge branch 'doc/sequence-docstring-types' of ssh://ssh.github.com:4…

fb05f78

…43/pyvista/pyvista into doc/sequence-docstring-types

sequence --> array_like

ecc51f4

remainder of modules

45dd5fa

fix type

c7eb171

tkoyama010 reviewed Mar 25, 2023

View reviewed changes

pyvista/utilities/geometric_objects.py Show resolved Hide resolved

adeak reviewed Mar 25, 2023

View reviewed changes

akaszynski commented Mar 25, 2023

View reviewed changes

pyvista/plotting/plotting.py Outdated Show resolved Hide resolved

akaszynski and others added 3 commits March 25, 2023 17:11

Apply suggestions from code review

8029008

Co-authored-by: Andras Deak <adeak@users.noreply.github.com>

Apply suggestions from code review

5fc06b4

Co-authored-by: Andras Deak <adeak@users.noreply.github.com>

final (I hope) edits

8c1b0c6

adeak previously approved these changes Mar 25, 2023

View reviewed changes

pyvista/plotting/renderer.py Outdated Show resolved Hide resolved

reference intersphinx

82209fc

akaszynski dismissed adeak’s stale review via 82209fc March 26, 2023 13:45

adeak approved these changes Mar 26, 2023

View reviewed changes

akaszynski mentioned this pull request Mar 26, 2023

Unify the default notation of parameters in docstrings #2284

Closed

akaszynski merged commit 984cddb into main Mar 26, 2023

akaszynski deleted the doc/sequence-docstring-types branch March 26, 2023 22:18

MatthewFlamm mentioned this pull request Apr 14, 2023

Add Sequence to DataSet[Attributes] __setitem__ typing #4278

Merged

akaszynski mentioned this pull request Apr 30, 2023

Release 0.39.0 #4359

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enforce docstring type and default consistency #4155

Enforce docstring type and default consistency #4155

akaszynski commented Mar 18, 2023

codecov bot commented Mar 18, 2023 •

edited

adeak commented Mar 19, 2023

Agreement with type hints

akaszynski commented Mar 19, 2023

akaszynski commented Mar 19, 2023 •

edited

adeak commented Mar 19, 2023

akaszynski commented Mar 19, 2023 •

edited

akaszynski commented Mar 19, 2023

adeak commented Mar 19, 2023

adeak left a comment

akaszynski commented Mar 25, 2023

adeak left a comment

adeak left a comment

akaszynski commented Mar 26, 2023

Enforce docstring type and default consistency #4155

Enforce docstring type and default consistency #4155

Conversation

akaszynski commented Mar 18, 2023

Agreement with type hints

codecov bot commented Mar 18, 2023 • edited

Codecov Report

adeak commented Mar 19, 2023

Agreement with type hints

akaszynski commented Mar 19, 2023

akaszynski commented Mar 19, 2023 • edited

adeak commented Mar 19, 2023

akaszynski commented Mar 19, 2023 • edited

akaszynski commented Mar 19, 2023

adeak commented Mar 19, 2023

adeak left a comment

Choose a reason for hiding this comment

akaszynski commented Mar 25, 2023

adeak left a comment

Choose a reason for hiding this comment

adeak left a comment

Choose a reason for hiding this comment

akaszynski commented Mar 26, 2023

codecov bot commented Mar 18, 2023 •

edited

akaszynski commented Mar 19, 2023 •

edited

akaszynski commented Mar 19, 2023 •

edited