Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: mention string, bytes, and void dtypes in dtype intro #25507

Merged
merged 3 commits into from
Jan 8, 2024

Conversation

ngoldbaum
Copy link
Member

@ngoldbaum ngoldbaum commented Dec 28, 2023

This is part of my work on docs for stringdtype. I'm issuing this separately from the stringdtype PR to ease review.

I moved the section comparing python and C data type names to later in the document, after material that made more sense to me at the beginning of the document.

I also added some discussion about working with string or bytes data. I mentioned some common footguns working with the bytes dtype and also explicitly mentioned the unstructured void dtype as an alternative option. As far as I can tell, unstructured void dtypes are not mentioned anywhere in the docs yet but they seem like a natural thing to mention here.

EDIT: also made sure the dtype classes all show up in the API docs so they can be cross-referenced

[skip actions] [skip azp] [skip cirrus]
Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, when I started reviewing this, I thought what a strange additions you made (see in-line comments), but then I realized that was all in the piece of text you moved... Your actual additions I think are good!

Of course, that moved piece also led me to the same problems as in #25508 (comment) -- recommending against character codes when most users will quite quickly encounter them anyway (and have to understand them) seems rather backward to me.

So, bottom line is that I'm not sure what is best - ideally, we have an actual complete (and ideally concise) system of defining dtypes in place before we change the documentation to something that is incomplete and less helpful, as was done to this file already before your PR...

@@ -33,4 +33,10 @@ Utility
show_runtime
broadcast_shapes

.. automodule:: numpy.dtypes
DType classes and utility (:mod:`numpy.dtypes`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utility -> utilities?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a utility yet? If not, maybe just DType classes?

@@ -12,6 +12,146 @@ Array types and conversions between types
NumPy supports a much greater variety of numerical types than Python does.
This section shows which are available, and how to modify an array's data-type.

NumPy numerical types are instances of `numpy.dtype` (data-type) objects,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instance has specific meaning in python: I would expect type(np.int8) now to be np.dtype, but this is not true - they all derive from np.generic I don't think it is helpful to suggest it is.

EDIT: darn, I see that this is simply all from the previous piece of text...

doc/source/user/basics.types.rst Outdated Show resolved Hide resolved
doc/source/user/basics.types.rst Outdated Show resolved Hide resolved
doc/source/user/basics.types.rst Outdated Show resolved Hide resolved
doc/source/user/basics.types.rst Outdated Show resolved Hide resolved
doc/source/user/basics.types.rst Outdated Show resolved Hide resolved
doc/source/user/basics.types.rst Outdated Show resolved Hide resolved
doc/source/user/basics.types.rst Show resolved Hide resolved
@ngoldbaum
Copy link
Member Author

@mhvk would you mind if we leave the text that was already there as-is and just review the new text I added? And maybe file an issue to come back to the introductory text?

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, a review just of what you added: I like it, but would not mention the long names if you are not going to actually use them.

But I think when you move a piece so much up front, it should actually be good... I suggest improving it by shortening: just show a few basic array instantiation calls, ignoring the scalars part altogether - it is irrelevant here. You could even mostly just let the array constructor itself decide on float or int, to avoid passing in anything, and then perhaps give a single example with an overflow (for which I would use a character code, but that's me...)

doc/source/user/basics.types.rst Show resolved Hide resolved
doc/source/user/basics.types.rst Outdated Show resolved Hide resolved
@ngoldbaum
Copy link
Member Author

since the np.dtypes...DType classes are as yet not directly useful

They are directly useful right now, FWIW:

In [4]: np.array(["hello", "world"], dtype=np.dtypes.StrDType(10))
Out[4]: array(['hello', 'world'], dtype='<U10')

But point taken, let's not move to using those names in this one place. I also edited down the introductory text a bit to remove the most objectionable stuff.

@ngoldbaum ngoldbaum force-pushed the dtype-docs-reorg branch 2 times, most recently from 3c1c498 to d382885 Compare December 28, 2023 21:39
Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much, this now reads very well. Only a few nits left.

@@ -33,4 +33,10 @@ Utility
show_runtime
broadcast_shapes

.. automodule:: numpy.dtypes
DType classes and utility (:mod:`numpy.dtypes`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a utility yet? If not, maybe just DType classes?

doc/source/user/basics.types.rst Outdated Show resolved Hide resolved
doc/source/user/basics.types.rst Show resolved Hide resolved
doc/source/user/basics.types.rst Outdated Show resolved Hide resolved
doc/source/user/basics.types.rst Show resolved Hide resolved
* - Strings
- ``BytesDType``, ``BytesDType``
* - Strings and Bytestrings
- ``StrDType``, ``BytesDType``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

[skip actions] [skip azp] [skip cirrus]
@ngoldbaum
Copy link
Member Author

Responded to latest comments, should be good to go now. Maybe @mdhaber or @rossbar want to look this over?

One wrinkle I noticed looking at the docs build, this introduces some new reference warnings that I can't easily fix:

/Users/goldbaum/Documents/numpy/build-install/usr/local/lib/python3.11/site-packages/numpy/dtypes.py:docstring of numpy.dtypes.BytesDType:87: WARNING: autosummary: stub file not found 'numpy.dtypes.BytesDType.type'. Check your autosummary_generate setting.

And similar warnings for all of the other DType classes. I think that's happening is that both the dtype metaclass and dtype class share a python type attribute, and the way we've implemented the python inheritance for this in C (perhaps also somehow combined with the use of T_OBJECT to define the attribute) is confusing sphinx. If I change dtypemeta to use a scalar_type python-level name, that fixes it, but that seems like a more invasive risky change that I want to take on in this PR. Ping @seberg in case you weren't aware of this issue with the layering of the dtype API.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all good as far as I'm concerned!

@ngoldbaum
Copy link
Member Author

Pulling this in, if anyone has any edits they didn't catch before merging I'm happy to followup.

@ngoldbaum ngoldbaum merged commit f480021 into numpy:main Jan 8, 2024
4 checks passed
@@ -24,6 +24,7 @@ indentation.
routines.ctypeslib
routines.datetime
routines.dtype
routines.dtypes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngoldbaum this file doesn't actually exist. Did you forget to commit it, or is this some leftover?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's odd, must have forgotten to git add it back when I made this PR. Will push a fix tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants