From dd484ff692804be9c0abdb33f02f95516f6594ae Mon Sep 17 00:00:00 2001 From: Nathan Goldbaum Date: Thu, 28 Dec 2023 10:38:38 -0700 Subject: [PATCH 1/3] DOC: mention string, bytes, and void dtypes in dtype intro [skip actions] [skip azp] [skip cirrus] --- doc/source/user/basics.types.rst | 214 ++++++++++++++++++++----------- 1 file changed, 136 insertions(+), 78 deletions(-) diff --git a/doc/source/user/basics.types.rst b/doc/source/user/basics.types.rst index 1b889d33fa0c..64335889ad98 100644 --- a/doc/source/user/basics.types.rst +++ b/doc/source/user/basics.types.rst @@ -12,6 +12,141 @@ Array types and conversions between types NumPy supports a much greater variety of numerical types than Python does. This section shows which are available, and how to modify an array's data-type. +NumPy numerical types are instances of ``dtype`` (data-type) objects, each +having unique characteristics. Once you have imported NumPy using +``>>> import numpy as np`` +the dtypes are available as ``np.bool``, ``np.float32``, etc. + +Data-types can be used as functions to convert python numbers to array scalars +(see the array scalar section for an explanation), python sequences of numbers +to arrays of that type, or as arguments to the dtype keyword that many numpy +functions or methods accept. Some examples:: + + >>> x = np.float32(1.0) + >>> x + 1.0 + >>> y = np.int_([1,2,4]) + >>> y + array([1, 2, 4]) + >>> z = np.arange(3, dtype=np.uint8) + >>> z + array([0, 1, 2], dtype=uint8) + +Array types can also be referred to by character codes, mostly to retain +backward compatibility with older packages such as Numeric. Some +documentation may still refer to these, for example:: + + >>> np.array([1, 2, 3], dtype='f') + array([1., 2., 3.], dtype=float32) + +We recommend using dtype objects instead. + +To convert the type of an array, use the .astype() method (preferred) or +the type itself as a function. For example: :: + + >>> z.astype(float) #doctest: +NORMALIZE_WHITESPACE + array([0., 1., 2.]) + >>> np.int8(z) + array([0, 1, 2], dtype=int8) + +Note that, above, we use the *Python* float object as a dtype. NumPy knows +that ``int`` refers to ``np.int_``, ``bool`` means ``np.bool``, +that ``float`` is ``np.float64`` and ``complex`` is ``np.complex128``. +The other data-types do not have Python equivalents. + +To determine the type of an array, look at the dtype attribute:: + + >>> z.dtype + dtype('uint8') + +dtype objects also contain information about the type, such as its bit-width +and its byte-order. The data type can also be used indirectly to query +properties of the type, such as whether it is an integer:: + + >>> d = np.dtype(int) + >>> d #doctest: +SKIP + dtype('int32') + + >>> np.issubdtype(d, np.integer) + True + + >>> np.issubdtype(d, np.floating) + False + +Numerical Data Types +-------------------- + +There are 5 basic numerical types representing booleans (bool), integers (int), +unsigned integers (uint) floating point (float) and complex. Those with numbers +in their name indicate the bitsize of the type (i.e. how many bits are needed +to represent a single value in memory). Some types, such as ``int`` and +``intp``, have differing bitsizes, dependent on the platforms (e.g. 32-bit +vs. 64-bit machines). This should be taken into account when interfacing +with low-level code (such as C or Fortran) where the raw memory is addressed. + +Data Types for Strings and Bytes +-------------------------------- + +In addition to numerical types, NumPy also supports storing unicode strings, via +`np.dtypes.StrDtype` (``U`` character code), null-terminated byte sequences via +`np.dtypes.BytesDType` (``S`` character code), and arbitrary byte sequences, via +`np.dtypes.VoidDType` (``V`` character code). + +The ``StrDType``, ``BytesDType``, and ``VoidDType`` are *fixed-width* data +types. They are parameterized by a width, in either bytes or unicode points, +that a single data element in the array must fit inside. This means that storing +an array of byte sequences or strings using this dtype requires knowing or +calculating the sizes of the longest text or byte sequence in advance. + +As an example, we can create an array storing the words ``"hello"`` and +``"world!"``:: + + >>> np.array(["hello", "world!"]) + array(['hello', 'world!'], dtype='>> np.array(["hello", "world!"], dtype="U5") + array(['hello', 'world'], dtype='>> np.array(["hello", "world!"], dtype="U7") + array(['hello', 'world!'], dtype='>> np.array(["hello", "world"], dtype="S7").tobytes() + b'hello\x00\x00world\x00\x00' + +Each entry is padded with two extra null bytes. Note however that NumPy cannot +tell the difference between intentionally stored trailing nulls and padding +nulls:: + + >>> x = [b"hello\0\0", b"world"] + >>> a = np.array(x, dtype="S7") + >>> print(a[0]) + b"hello" + >>> a[0] == x[0] + False + +If you need to store and round-trip any trailing null bytes, you will need to +use an unstructured void data type:: + + >>> a = np.array(x, dtype="V7") + >>> a + array([b'\x68\x65\x6C\x6C\x6F\x00\x00', b'\x77\x6F\x72\x6C\x64\x00\x00'], + dtype='|V7') + >>> a[0] == np.void(x[0]) + True + +Advanced types, not listed above, are explored in section +:ref:`structured_arrays`. + +Relationship Between NumPy Data Types and C Data Data Types +=========================================================== + NumPy provides both bit sized type names and names based on the names of C types. Since the definition of C types are platform dependent, this means the explicitly bit sized should be preferred to avoid platform-dependent behavior in programs @@ -144,84 +279,9 @@ confusion with builtin python type names, such as `numpy.bool_`. - ``long double complex`` - Complex number, represented by two extended-precision floats (real and imaginary components). - Since many of these have platform-dependent definitions, a set of fixed-size aliases are provided (See :ref:`sized-aliases`). - -NumPy numerical types are instances of ``dtype`` (data-type) objects, each -having unique characteristics. Once you have imported NumPy using -``>>> import numpy as np`` -the dtypes are available as ``np.bool``, ``np.float32``, etc. - -Advanced types, not listed above, are explored in -section :ref:`structured_arrays`. - -There are 5 basic numerical types representing booleans (bool), integers (int), -unsigned integers (uint) floating point (float) and complex. Those with numbers -in their name indicate the bitsize of the type (i.e. how many bits are needed -to represent a single value in memory). Some types, such as ``int`` and -``intp``, have differing bitsizes, dependent on the platforms (e.g. 32-bit -vs. 64-bit machines). This should be taken into account when interfacing -with low-level code (such as C or Fortran) where the raw memory is addressed. - -Data-types can be used as functions to convert python numbers to array scalars -(see the array scalar section for an explanation), python sequences of numbers -to arrays of that type, or as arguments to the dtype keyword that many numpy -functions or methods accept. Some examples:: - - >>> x = np.float32(1.0) - >>> x - 1.0 - >>> y = np.int_([1,2,4]) - >>> y - array([1, 2, 4]) - >>> z = np.arange(3, dtype=np.uint8) - >>> z - array([0, 1, 2], dtype=uint8) - -Array types can also be referred to by character codes, mostly to retain -backward compatibility with older packages such as Numeric. Some -documentation may still refer to these, for example:: - - >>> np.array([1, 2, 3], dtype='f') - array([1., 2., 3.], dtype=float32) - -We recommend using dtype objects instead. - -To convert the type of an array, use the .astype() method (preferred) or -the type itself as a function. For example: :: - - >>> z.astype(float) #doctest: +NORMALIZE_WHITESPACE - array([0., 1., 2.]) - >>> np.int8(z) - array([0, 1, 2], dtype=int8) - -Note that, above, we use the *Python* float object as a dtype. NumPy knows -that ``int`` refers to ``np.int_``, ``bool`` means ``np.bool``, -that ``float`` is ``np.float64`` and ``complex`` is ``np.complex128``. -The other data-types do not have Python equivalents. - -To determine the type of an array, look at the dtype attribute:: - - >>> z.dtype - dtype('uint8') - -dtype objects also contain information about the type, such as its bit-width -and its byte-order. The data type can also be used indirectly to query -properties of the type, such as whether it is an integer:: - - >>> d = np.dtype(int) - >>> d #doctest: +SKIP - dtype('int32') - - >>> np.issubdtype(d, np.integer) - True - - >>> np.issubdtype(d, np.floating) - False - - Array scalars ============= @@ -249,7 +309,7 @@ Overflow errors =============== The fixed size of NumPy numeric types may cause overflow errors when a value -requires more memory than available in the data type. For example, +requires more memory than available in the data type. For example, `numpy.power` evaluates ``100 ** 9`` correctly for 64-bit integers, but gives -1486618624 (incorrect) for a 32-bit integer. @@ -322,5 +382,3 @@ to standard python types, and it is therefore impossible to preserve extended precision even if many decimal places are requested. It can be useful to test your code with the value ``1 + np.finfo(np.longdouble).eps``. - - From 1c0b2c4e5969f494e79aab1188c66a7a9a156bf4 Mon Sep 17 00:00:00 2001 From: Nathan Goldbaum Date: Thu, 28 Dec 2023 12:57:33 -0700 Subject: [PATCH 2/3] DOC: formatting adjustments, ensure DType classes show up in API docs [skip actions] [skip azp] [skip cirrus] --- doc/source/reference/routines.other.rst | 8 +++- doc/source/user/basics.types.rst | 64 ++++++++++++++----------- numpy/dtypes.py | 7 +-- 3 files changed, 44 insertions(+), 35 deletions(-) diff --git a/doc/source/reference/routines.other.rst b/doc/source/reference/routines.other.rst index 0ba60b20a070..34148038d8d5 100644 --- a/doc/source/reference/routines.other.rst +++ b/doc/source/reference/routines.other.rst @@ -33,4 +33,10 @@ Utility show_runtime broadcast_shapes -.. automodule:: numpy.dtypes +DType classes and utility (:mod:`numpy.dtypes`) +=============================================== + +.. autosummary:: + :toctree: generated/ + + dtypes diff --git a/doc/source/user/basics.types.rst b/doc/source/user/basics.types.rst index 64335889ad98..0bed8a1b48fa 100644 --- a/doc/source/user/basics.types.rst +++ b/doc/source/user/basics.types.rst @@ -12,10 +12,10 @@ Array types and conversions between types NumPy supports a much greater variety of numerical types than Python does. This section shows which are available, and how to modify an array's data-type. -NumPy numerical types are instances of ``dtype`` (data-type) objects, each -having unique characteristics. Once you have imported NumPy using -``>>> import numpy as np`` -the dtypes are available as ``np.bool``, ``np.float32``, etc. +NumPy numerical types are instances of `numpy.dtype` (data-type) objects, +each having unique characteristics. Once you have imported NumPy using ``import +numpy as np`` the dtypes are available as e.g. `numpy.bool`, +`numpy.float32`, etc. Data-types can be used as functions to convert python numbers to array scalars (see the array scalar section for an explanation), python sequences of numbers @@ -33,8 +33,9 @@ functions or methods accept. Some examples:: array([0, 1, 2], dtype=uint8) Array types can also be referred to by character codes, mostly to retain -backward compatibility with older packages such as Numeric. Some -documentation may still refer to these, for example:: +backward compatibility with older packages such as Numeric, but also for +defining :ref:`structured data types `. Some +documentation may refer to these character codes, for example:: >>> np.array([1, 2, 3], dtype='f') array([1., 2., 3.], dtype=float32) @@ -49,10 +50,11 @@ the type itself as a function. For example: :: >>> np.int8(z) array([0, 1, 2], dtype=int8) -Note that, above, we use the *Python* float object as a dtype. NumPy knows -that ``int`` refers to ``np.int_``, ``bool`` means ``np.bool``, -that ``float`` is ``np.float64`` and ``complex`` is ``np.complex128``. -The other data-types do not have Python equivalents. +Note that, above, we use the *Python* float object as a dtype. NumPy knows that +:class:`int` refers to `numpy.int_`, :class:`bool` means +`numpy.bool`, that :class:`float` is `numpy.float64` and +:class:`complex` is `numpy.complex128`. The other data-types do not have +Python equivalents. To determine the type of an array, look at the dtype attribute:: @@ -76,21 +78,24 @@ properties of the type, such as whether it is an integer:: Numerical Data Types -------------------- -There are 5 basic numerical types representing booleans (bool), integers (int), -unsigned integers (uint) floating point (float) and complex. Those with numbers -in their name indicate the bitsize of the type (i.e. how many bits are needed -to represent a single value in memory). Some types, such as ``int`` and -``intp``, have differing bitsizes, dependent on the platforms (e.g. 32-bit -vs. 64-bit machines). This should be taken into account when interfacing -with low-level code (such as C or Fortran) where the raw memory is addressed. +There are 5 basic numerical types representing booleans (``bool``), integers +(``int``), unsigned integers (``uint``) floating point (``float``) and +``complex``. A basic numerical type name combined with a numeric bitsize defines +a concrete type. The bitsize is the number of bits that are needed to represent +a single value in memory. For example, `numpy.float64` is a 64 bit +floating point data type. Some types, such as `numpy.int` and +`numpy.intp`, have differing bitsizes, dependent on the platforms +(e.g. 32-bit vs. 64-bit CPU architectures). This should be taken into account +when interfacing with low-level code (such as C or Fortran) where the raw memory +is addressed. Data Types for Strings and Bytes -------------------------------- In addition to numerical types, NumPy also supports storing unicode strings, via -`np.dtypes.StrDtype` (``U`` character code), null-terminated byte sequences via -`np.dtypes.BytesDType` (``S`` character code), and arbitrary byte sequences, via -`np.dtypes.VoidDType` (``V`` character code). +`numpy.dtypes.StrDType` (``U`` character code), null-terminated byte sequences via +`numpy.dtypes.BytesDType` (``S`` character code), and arbitrary byte sequences, via +`numpy.dtypes.VoidDType` (``V`` character code). The ``StrDType``, ``BytesDType``, and ``VoidDType`` are *fixed-width* data types. They are parameterized by a width, in either bytes or unicode points, @@ -294,7 +299,8 @@ exceptions, such as when code requires very specific attributes of a scalar or when it checks specifically whether a value is a Python scalar. Generally, problems are easily fixed by explicitly converting array scalars to Python scalars, using the corresponding Python type function -(e.g., ``int``, ``float``, ``complex``, ``str``, ``unicode``). +(e.g., :class:`int`, :class:`float`, :class:`complex`, :class:`str`, +:class:`unicode`). The primary advantage of using array scalars is that they preserve the array type (Python may not have a matching scalar type @@ -320,9 +326,9 @@ but gives -1486618624 (incorrect) for a 32-bit integer. The behaviour of NumPy and Python integer types differs significantly for integer overflows and may confuse users expecting NumPy integers to behave -similar to Python's ``int``. Unlike NumPy, the size of Python's ``int`` is -flexible. This means Python integers may expand to accommodate any integer and -will not overflow. +similar to Python's :class:`int`. Unlike NumPy, the size of Python's +:class:`int` is flexible. This means Python integers may expand to accommodate +any integer and will not overflow. NumPy provides `numpy.iinfo` and `numpy.finfo` to verify the minimum or maximum values of NumPy integer and floating point values @@ -348,14 +354,14 @@ Extended precision ================== Python's floating-point numbers are usually 64-bit floating-point numbers, -nearly equivalent to ``np.float64``. In some unusual situations it may be +nearly equivalent to `numpy.float64`. In some unusual situations it may be useful to use floating-point numbers with more precision. Whether this is possible in numpy depends on the hardware and on the development environment: specifically, x86 machines provide hardware floating-point with 80-bit precision, and while most C compilers provide this as their ``long double`` type, MSVC (standard for Windows builds) makes ``long double`` identical to ``double`` (64 bits). NumPy makes the -compiler's ``long double`` available as ``np.longdouble`` (and +compiler's ``long double`` available as `numpy.longdouble` (and ``np.clongdouble`` for the complex numbers). You can find out what your numpy provides with ``np.finfo(np.longdouble)``. @@ -363,7 +369,7 @@ NumPy does not provide a dtype with more precision than C's ``long double``; in particular, the 128-bit IEEE quad precision data type (FORTRAN's ``REAL*16``) is not available. -For efficient memory alignment, ``np.longdouble`` is usually stored +For efficient memory alignment, `numpy.longdouble` is usually stored padded with zero bits, either to 96 or 128 bits. Which is more efficient depends on hardware and development environment; typically on 32-bit systems they are padded to 96 bits, while on 64-bit systems they are @@ -374,8 +380,8 @@ want specific padding. In spite of the names, ``np.float96`` and that is, 80 bits on most x86 machines and 64 bits in standard Windows builds. -Be warned that even if ``np.longdouble`` offers more precision than -python ``float``, it is easy to lose that extra precision, since +Be warned that even if `numpy.longdouble` offers more precision than +python :class:`float`, it is easy to lose that extra precision, since python often forces values to pass through ``float``. For example, the ``%`` formatting operator requires its arguments to be converted to standard python types, and it is therefore impossible to preserve diff --git a/numpy/dtypes.py b/numpy/dtypes.py index 2c85539a820b..943dde4ad000 100644 --- a/numpy/dtypes.py +++ b/numpy/dtypes.py @@ -1,7 +1,4 @@ """ -DType classes and utility (:mod:`numpy.dtypes`) -=============================================== - This module is home to specific dtypes related functionality and their classes. For more general information about dtypes, also see `numpy.dtype` and :ref:`arrays.dtypes`. @@ -49,8 +46,8 @@ * - Complex - ``Complex64DType``, ``Complex128DType``, ``CLongDoubleDType`` - * - Strings - - ``BytesDType``, ``BytesDType`` + * - Strings and Bytestrings + - ``StrDType``, ``BytesDType`` * - Times - ``DateTime64DType``, ``TimeDelta64DType`` From a57325a104b3ca5e7cce3fca46d70ec2d6b46d49 Mon Sep 17 00:00:00 2001 From: Nathan Goldbaum Date: Thu, 28 Dec 2023 14:05:19 -0700 Subject: [PATCH 3/3] DOC: respond to review comments [skip actions] [skip azp] [skip cirrus] --- doc/source/reference/routines.other.rst | 8 --- doc/source/reference/routines.rst | 1 + doc/source/user/basics.types.rst | 69 +++++++++++-------------- 3 files changed, 31 insertions(+), 47 deletions(-) diff --git a/doc/source/reference/routines.other.rst b/doc/source/reference/routines.other.rst index 34148038d8d5..98bcc7001f71 100644 --- a/doc/source/reference/routines.other.rst +++ b/doc/source/reference/routines.other.rst @@ -32,11 +32,3 @@ Utility show_config show_runtime broadcast_shapes - -DType classes and utility (:mod:`numpy.dtypes`) -=============================================== - -.. autosummary:: - :toctree: generated/ - - dtypes diff --git a/doc/source/reference/routines.rst b/doc/source/reference/routines.rst index 9133ed7cb54d..f52eb774acc0 100644 --- a/doc/source/reference/routines.rst +++ b/doc/source/reference/routines.rst @@ -24,6 +24,7 @@ indentation. routines.ctypeslib routines.datetime routines.dtype + routines.dtypes routines.emath routines.err routines.exceptions diff --git a/doc/source/user/basics.types.rst b/doc/source/user/basics.types.rst index 0bed8a1b48fa..2eb4475f70fb 100644 --- a/doc/source/user/basics.types.rst +++ b/doc/source/user/basics.types.rst @@ -12,45 +12,36 @@ Array types and conversions between types NumPy supports a much greater variety of numerical types than Python does. This section shows which are available, and how to modify an array's data-type. -NumPy numerical types are instances of `numpy.dtype` (data-type) objects, -each having unique characteristics. Once you have imported NumPy using ``import -numpy as np`` the dtypes are available as e.g. `numpy.bool`, -`numpy.float32`, etc. - -Data-types can be used as functions to convert python numbers to array scalars -(see the array scalar section for an explanation), python sequences of numbers -to arrays of that type, or as arguments to the dtype keyword that many numpy -functions or methods accept. Some examples:: - - >>> x = np.float32(1.0) - >>> x - 1.0 - >>> y = np.int_([1,2,4]) - >>> y - array([1, 2, 4]) +NumPy numerical types are instances of `numpy.dtype` (data-type) objects, each +having unique characteristics. Once you have imported NumPy using ``import +numpy as np`` you can create arrays with a specified dtype using the scalar +types in the numpy top-level API, e.g. `numpy.bool`, `numpy.float32`, etc. + +These scalar types as arguments to the dtype keyword that many numpy functions +or methods accept. For example:: + >>> z = np.arange(3, dtype=np.uint8) >>> z array([0, 1, 2], dtype=uint8) -Array types can also be referred to by character codes, mostly to retain -backward compatibility with older packages such as Numeric, but also for -defining :ref:`structured data types `. Some -documentation may refer to these character codes, for example:: +Array types can also be referred to by character codes, for example:: >>> np.array([1, 2, 3], dtype='f') array([1., 2., 3.], dtype=float32) + >>> np.array([1, 2, 3], dtype='d') + array([1., 2., 3.], dtype=float64) -We recommend using dtype objects instead. +See :ref:`arrays.dtypes.constructing` for more information about specifying and +constructing data type objects, including how to specify parameters like the +byte order. -To convert the type of an array, use the .astype() method (preferred) or -the type itself as a function. For example: :: +To convert the type of an array, use the .astype() method. For example: :: - >>> z.astype(float) #doctest: +NORMALIZE_WHITESPACE + >>> z.astype(np.float64) #doctest: +NORMALIZE_WHITESPACE array([0., 1., 2.]) - >>> np.int8(z) - array([0, 1, 2], dtype=int8) -Note that, above, we use the *Python* float object as a dtype. NumPy knows that +Note that, above, we could have used the *Python* float object as a dtype +instead of `numpy.float64`. NumPy knows that :class:`int` refers to `numpy.int_`, :class:`bool` means `numpy.bool`, that :class:`float` is `numpy.float64` and :class:`complex` is `numpy.complex128`. The other data-types do not have @@ -65,9 +56,9 @@ dtype objects also contain information about the type, such as its bit-width and its byte-order. The data type can also be used indirectly to query properties of the type, such as whether it is an integer:: - >>> d = np.dtype(int) - >>> d #doctest: +SKIP - dtype('int32') + >>> d = np.dtype(int64) + >>> d + dtype('int64') >>> np.issubdtype(d, np.integer) True @@ -93,15 +84,15 @@ Data Types for Strings and Bytes -------------------------------- In addition to numerical types, NumPy also supports storing unicode strings, via -`numpy.dtypes.StrDType` (``U`` character code), null-terminated byte sequences via -`numpy.dtypes.BytesDType` (``S`` character code), and arbitrary byte sequences, via -`numpy.dtypes.VoidDType` (``V`` character code). - -The ``StrDType``, ``BytesDType``, and ``VoidDType`` are *fixed-width* data -types. They are parameterized by a width, in either bytes or unicode points, -that a single data element in the array must fit inside. This means that storing -an array of byte sequences or strings using this dtype requires knowing or -calculating the sizes of the longest text or byte sequence in advance. +the `numpy.str_` dtype (``U`` character code), null-terminated byte sequences via +`numpy.bytes_` (``S`` character code), and arbitrary byte sequences, via +`numpy.void` (``V`` character code). + +All of the above are *fixed-width* data types. They are parameterized by a +width, in either bytes or unicode points, that a single data element in the +array must fit inside. This means that storing an array of byte sequences or +strings using this dtype requires knowing or calculating the sizes of the +longest text or byte sequence in advance. As an example, we can create an array storing the words ``"hello"`` and ``"world!"``::