Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce docstring type and default consistency #4155

Merged
merged 19 commits into from
Mar 26, 2023

Conversation

akaszynski
Copy link
Member

This PR is more of a discussion + code suggestion rather than a genuine PR as I'd like to gain traction before implementing this widely.

@akeak, this is particularly for you.

Looking across our docs, I've found that in, for example, DataSet.find_closest_cell, the Sequence type does not get rendered properly:

fcc

This isn't a big deal, we have many types that have no links, but still it's annoying. Plus, we're reminded of the "is numpy.ndarray a sequence?" question since we have to accept both a Sequence and np.ndarray here.

After some brief research, I found this SO answer stating that a numpy.ndarray is a sequence and not a Sequence.

This is actually incredibly helpful because this lets us simplify out documentation because we're fundamentally worried about if it is ordered collection of objects that implementes __len__ and __get_item__. We're really concerned about duck typing here, not if it's inherited from collections.abc.Sequence (capital "s").

Therefore, I propose we change our docstrings to use the ducktyping sequence where applicable. This is because, according to PEP 3119, collections.abc check if the object has the expected interface with without requiring true subclassing.

In fact, this is appears the same as VTK's rules:

>>> import vtk
>>> vpts = vtk.vtkPoints()
>>> vpts.SetNumberOfPoints(1)
>>> vpts.SetPoint(0, frozenset({1, 2, 3}))
TypeError: SetPoint argument 2: expected a sequence of 3 values, got frozenset

>>> vpts.SetPoint(0, range(1, 4))
>>> vpts.GetPoint(0)
(1.0, 2.0, 3.0)

Therefore, to finally clear up the confusion, wherever possible, replace iterable, Sequence or np.ndarray, or whatever mess of values we have with just sequence:

fcc_fix


Agreement with type hints

We already have a mismatch between our type hints and the docstring, and this suggestion will not improve it. However, I'd argue that our existing typehints, while helpful for mypy, are not helpful for humans. Furthermore, in DataSet.find_closest_cell, we exclude range and other possible types that work. For example:

import numpy as np
import pyvista as pv
from pyvista.utilities.arrays import _coerce_pointslike_arg

# Prepare test sequences
test_sequences = [
    [1, 2, 3],  # list
    (1, 2, 3),  # tuple
    {1, 2, 3},  # set
    frozenset({1, 2, 3}),  # frozenset
    bytearray([1, 2, 3]),  # bytearray
    bytes([1, 2, 3]),  # bytes
    range(1, 4),  # range
]

# Test each sequence type
for seq in test_sequences:
    try:
        result = _coerce_pointslike_arg(seq)
        print(f"Input Type: {type(seq)}")
        print(f"Result Type: {type(result)}")
        print(f"Result: {result}\n")
    except Exception as e:
        print(f"Input Type: {type(seq)}")
        print(f"Error: {e}\n")

Output:

Input Type: <class 'list'>
Result Type: <class 'tuple'>
Result: (array([[1, 2, 3]]), True)

Input Type: <class 'tuple'>
Result Type: <class 'tuple'>
Result: (array([[1, 2, 3]]), True)

Input Type: <class 'set'>
Error: Given points must be a sequence or an array.

Input Type: <class 'frozenset'>
Error: Given points must be a sequence or an array.

Input Type: <class 'bytearray'>
Result Type: <class 'tuple'>
Result: (array([[1, 2, 3]], dtype=uint8), True)

Input Type: <class 'bytes'>
Error: Given point must have three values

Input Type: <class 'range'>
Result Type: <class 'tuple'>
Result: (array([[1, 2, 3]]), True)

A few questions then:

  • Can we rewrite our typehints such that they're more helpful?
  • Will it become more helpful or more confusing by moving to sequence[float] wherever possible even though it disagrees with our type hints?
  • Does any of this matter?

@github-actions github-actions bot added the documentation Anything related to the documentation/website label Mar 18, 2023
@codecov
Copy link

codecov bot commented Mar 18, 2023

Codecov Report

Merging #4155 (82209fc) into main (1e84e8e) will increase coverage by 0.17%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #4155      +/-   ##
==========================================
+ Coverage   95.57%   95.74%   +0.17%     
==========================================
  Files          95       96       +1     
  Lines       20282    20422     +140     
==========================================
+ Hits        19384    19553     +169     
+ Misses        898      869      -29     

@adeak
Copy link
Member

adeak commented Mar 19, 2023

Looking across our docs, I've found that in, for example, DataSet.find_closest_cell, the Sequence type does not get rendered properly:

I noticed some of these, they are annoying.

After some brief research, I found this SO answer stating that a numpy.ndarray is a sequence and not a Sequence.

Yes, this makes sense to me.

This is actually incredibly helpful because this lets us simplify out documentation because we're fundamentally worried about if it is ordered collection of objects that implementes __len__ and __get_item__. We're really concerned about duck typing here, not if it's inherited from collections.abc.Sequence (capital "s").

Agreed.

Therefore, I propose we change our docstrings to use the ducktyping sequence where applicable. This is because, according to PEP 3119, collections.abc check if the object has the expected interface with without requiring true subclassing.

This definitely makes sense for the docs at least. And I'm pleasantly surprised that sequence seems to have an intersphinx mapping; I didn't realize that the glossary was included in that. Assuming that's indeed where the rendered link points.

And I also want to note that the first screenshot you showed is wrong on multiple points. Firstly, Sequence(float) is never really correct. It should be Sequence[float]. And secondly, we'd probably want to explicitly use collections.abc.Sequence to get the intersphinx link working. Also note the existence of typing.Sequence that we'd need for Python <=3.8 (but type checkers often run on newer Python...).

Agreement with type hints

We already have a mismatch between our type hints and the docstring, and this suggestion will not improve it. However, I'd argue that our existing typehints, while helpful for mypy, are not helpful for humans. Furthermore, in DataSet.find_closest_cell, we exclude range and other possible types that work. For example:

If what you're saying is "keep type hints correct, let doc types be less correct but more meaningful" then I agree. They serve different purposes so I have no issues letting the two diverge a bit. Of course there should be no blatant contradictions.

A few questions then:

  • Can we rewrite our typehints such that they're more helpful?

If by "typehints" you mean the actual type annotations, then I think not. If you mean the ones for the docs, sure. See also my previous point.

  • Will it become more helpful or more confusing by moving to sequence[float] wherever possible even though it disagrees with our type hints?

I think it would make sense to use sequence[float] where it makes sense. Also using sequence[float] when we really and obviously allow integers and NumPy numbers.

  • Does any of this matter?

Absolutely, I'd expect most end users to be looking at the docs more than anything else. Having broken/ugly/confusing types listed there is not very helpful.

And an additional point of my own: you specifically raised _coerce_pointslike_arg() as an example. And then you saw that things like bytes don't work, even though issubclass(bytes, collections.abc.Sequence). This is because in this helper function (and also a few other functions, I'm sure) we want array-likes, i.e. "whatever works with np.array()". These are mostly nested "rectangular" (i.e. non-ragged) sequences. But there are some exceptions, such as bytes:

>>> np.array(bytes([1, 3, 5])).size
1

>>> np.array(bytes([1, 3, 5]))
array(b'\x01\x03\x05', dtype='|S3')

(Although by some definitions this is still an array-like... so I'm using a definition where the resulting array has a numeric type. This might or might not be headcanon.)

Semantically we're not looking for a sequence here, we want an array-like. But in many many cases we really do want a sequence, or actually a fixed-length iterable of any kind (think bounds, min/max limits, light attenuation values etc.). So we should keep an eye out for places where we need array-likes, and perhaps use that instead of sequence.

In fact it seems that NumPy has an intersphinx link to array_like in its glossary! There's also ArrayLike as of NumPy 1.20 but we probably don't want to use that.

pyvista/core/filters/data_set.py Outdated Show resolved Hide resolved
pyvista/utilities/arrays.py Outdated Show resolved Hide resolved
@akaszynski
Copy link
Member Author

If what you're saying is "keep type hints correct, let doc types be less correct but more meaningful" then I agree.

That's what I'm hoping for.

Semantically we're not looking for a sequence here, we want an array-like... So we should keep an eye out for places where we need array-likes.

Fully agree, and if we can use array_like[float] and glossary links work, even better.

Does any of this matter?
Absolutely, I'd expect most end users to be looking at the docs more than anything else.

It's always good to ask, especially before going through a refactor. I want the docs to look like one mind wrote them. If we can converge (or at least start to converge) on an idea then it's worth discussing it.

Thanks for the comp-sci discussion.

@akaszynski akaszynski changed the title Docstring type suggestions Clean up docstrings Mar 19, 2023
@akaszynski
Copy link
Member Author

akaszynski commented Mar 19, 2023

In an effort to keep this PR sub 2k LOC, I'm calling it sufficient for the moment.

  • Moves from list or tuple --> sequence[float] whenever possible. Uses array_like for sequence[sequence] (as in the case of 2D arrays.
  • Adds the default whenever available
  • Miscellaneous fixes (missing optional, bad default)
  • Use | instead of "or". MNE uses it, see: mne.io.read_raw_gdf

Ready for review.

@akaszynski akaszynski changed the title Clean up docstrings Enforce docstring type and default consistency Mar 19, 2023
@adeak
Copy link
Member

adeak commented Mar 19, 2023

Review in progress. I'm still annoyed by the fact that what we end up with is largely hit or miss:
Screenshot from 2023-03-19 17-09-57
(this is from DataSetAttributes.set_array()). For some reason float and str get this border around the link, so do pyvista_ndarray and numpy.ndarray, but the bool and the array_like only get a blue link. Due to how wonky this array_like[float] (and similarly elsewhere sequence[float]) looks, I'd love to be able to disable the whole border thing for intersphinx links. Just make them blue text like normal web pages do.

(In the same sentence list and tuple aren't resolved at all, but that's because we don't have an explicit :class: directive there.)

@akaszynski
Copy link
Member Author

akaszynski commented Mar 19, 2023

Due to how wonky this array_like[float] (and similarly elsewhere sequence[float]) looks, I'd love to be able to disable the whole border thing for intersphinx links. Just make them blue text like normal web pages do.

I've started by disabling rendering the parameter types as italic. Left is main, right is this branch:
tmp

I feel this improves the readability and helps with nested typing array_like[float] as the brackets get rendered correctly.

I'm going to try disabling the border to see if that cleans things up more (which it probably will).

@akaszynski
Copy link
Member Author

Now, here's without the border and the reduction of the text size to fit said border. Again, main left, this branch (doc/sequence-docstring-types) right:

no-border

@adeak
Copy link
Member

adeak commented Mar 19, 2023

Now, here's without the border and the reduction of the text size to fit said border.

I think that looks a lot better ❤️ Here's a screenshot where we have a generic type:

Screenshot from 2023-03-19 23-04-56

It feels a bit unfamiliar that the type links are so big (i.e. as big as the rest of the text). I think I'll get used to the new readable one fast. Assuming you agree this is better and we should keep it.

Copy link
Member

@adeak adeak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review, 4 more big files to go later (filters/poly_data.py, examples/downloads.py, plotting/plotting.py, plotting/renderer.py).

Sorry if a handful of my comments are obsolete upon posting; I haven't checked them against the newest commits you've made.

pyvista/plotting/_property.py Outdated Show resolved Hide resolved
pyvista/utilities/arrays.py Outdated Show resolved Hide resolved
pyvista/plotting/lookup_table.py Outdated Show resolved Hide resolved
pyvista/plotting/lookup_table.py Outdated Show resolved Hide resolved
pyvista/core/grid.py Outdated Show resolved Hide resolved
pyvista/core/filters/data_set.py Show resolved Hide resolved
pyvista/core/filters/data_set.py Outdated Show resolved Hide resolved
pyvista/core/filters/data_set.py Outdated Show resolved Hide resolved
pyvista/core/filters/data_set.py Outdated Show resolved Hide resolved
pyvista/core/filters/data_set.py Outdated Show resolved Hide resolved
akaszynski and others added 5 commits March 24, 2023 08:40
@akaszynski
Copy link
Member Author

This is everything but pyvista/plotting/widgets.py since I know @banesullivan is doing a refactor there.

Copy link
Member

@adeak adeak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second half of first pass. I don't expect to have too many further remarks though.

pyvista/utilities/geometric_objects.py Outdated Show resolved Hide resolved
pyvista/utilities/algorithms.py Outdated Show resolved Hide resolved
pyvista/themes.py Outdated Show resolved Hide resolved
pyvista/core/filters/poly_data.py Outdated Show resolved Hide resolved
pyvista/core/filters/poly_data.py Outdated Show resolved Hide resolved
pyvista/plotting/renderer.py Outdated Show resolved Hide resolved
pyvista/plotting/renderer.py Show resolved Hide resolved
pyvista/plotting/renderer.py Outdated Show resolved Hide resolved
pyvista/plotting/renderer.py Outdated Show resolved Hide resolved
pyvista/plotting/renderer.py Show resolved Hide resolved
akaszynski and others added 3 commits March 25, 2023 17:11
Co-authored-by: Andras Deak <adeak@users.noreply.github.com>
Co-authored-by: Andras Deak <adeak@users.noreply.github.com>
adeak
adeak previously approved these changes Mar 25, 2023
Copy link
Member

@adeak adeak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One future-proofing remark, but this is already fine as is if you don't want to bother with it. Thanks for the quick fixes @akaszynski!

pyvista/plotting/renderer.py Outdated Show resolved Hide resolved
@akaszynski
Copy link
Member Author

As always, thanks @adeak!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Anything related to the documentation/website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants