Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numpy types support #332

Merged
merged 26 commits into from
Aug 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
15fbcd6
feat: add numpy dumpers for int, float, bool types
vertefra Jul 5, 2022
0a48098
test(numpy): fix skipping tests if numpy module is not installed
dvarrazzo Dec 16, 2022
9387e91
refactor(numpy): use existing numeric base classes
dvarrazzo Dec 16, 2022
5a10126
test(numpy) fix tests with approximative comparisons
dvarrazzo Dec 16, 2022
17fa4ce
refactor(numpy): use builtin bool dumpers for numpy booleans too
dvarrazzo Dec 16, 2022
08be8c5
refactor(numpy): drop aliases for dumpers
dvarrazzo Dec 16, 2022
91137e5
refactor(numpy): reuse base or final classes from builtin numeric types
dvarrazzo Dec 16, 2022
d04b713
feat(numpy): add longlong dumpers
dvarrazzo Dec 16, 2022
7c297ae
test(faker): don't crash if a lazy-import dumper class is not available
dvarrazzo Dec 16, 2022
fd5e175
test(numpy): consolidate all numpy int tests in a single parametrized…
dvarrazzo Dec 16, 2022
ab6ec9d
test(numpy): add random tests with numpy objects
dvarrazzo Dec 16, 2022
a74c66e
test(numpy): consolidate numpy float tests
dvarrazzo Dec 16, 2022
7bd6330
fix(numpy): fix dumpers registration order
dvarrazzo Dec 17, 2022
b842daa
test(numpy): allow more approximation comparing float16 values
dvarrazzo Dec 17, 2022
232ae28
test(numpy): add dump tests with list of numpy values
dvarrazzo Dec 19, 2022
11d7666
test(numpy): drop test with deprecated alias
dvarrazzo Dec 19, 2022
c576c84
test(numpy) avoid overflow testing with int16
dvarrazzo Dec 19, 2022
016a9bb
refactor(c): add _IntOrSubclass dumper to the C dumpers too
dvarrazzo Jan 6, 2023
1aa47d5
docs: improve comment about why we need to register builtin numerics …
dvarrazzo Jan 6, 2023
e1f9420
feat: add C numpy dumpers
dvarrazzo Jan 6, 2023
157a2e2
docs: add news entry and docs about numpy scalars support
dvarrazzo Jan 9, 2023
73ab14d
refactor(numpy): get OIDs from the _oid module
dvarrazzo Jan 9, 2023
15a75f9
fix(numpy): fix dumping numpy values by oid
dvarrazzo Jan 13, 2023
40bb620
fix(copy): fix dumping by oid in text mode
dvarrazzo Jan 13, 2023
29bb630
fix(c): don't try to mutate Cython type
dvarrazzo Aug 2, 2023
92b30b0
feat(crdb): add numpy support
dvarrazzo Aug 3, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
9 changes: 8 additions & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ jobs:

- {impl: python, python: "3.9", ext: dns, postgres: "postgres:14"}
- {impl: python, python: "3.9", ext: postgis, postgres: "postgis/postgis"}
- {impl: python, python: "3.10", ext: numpy, postgres: "postgres:14"}
- {impl: c, python: "3.11", ext: numpy, postgres: "postgres:15"}

# Test with minimum dependencies versions
- {impl: c, python: "3.7", ext: min, postgres: "postgres:15"}
Expand Down Expand Up @@ -88,10 +90,15 @@ jobs:
echo "DEPS=$DEPS shapely" >> $GITHUB_ENV
echo "MARKERS=$MARKERS postgis" >> $GITHUB_ENV

- if: ${{ matrix.ext == 'numpy' }}
run: |
echo "DEPS=$DEPS numpy" >> $GITHUB_ENV
echo "MARKERS=$MARKERS numpy" >> $GITHUB_ENV

- name: Configure to use the oldest dependencies
if: ${{ matrix.ext == 'min' }}
run: |
echo "DEPS=$DEPS dnspython shapely" >> $GITHUB_ENV
echo "DEPS=$DEPS dnspython shapely numpy" >> $GITHUB_ENV
echo "PIP_CONSTRAINT=${{ github.workspace }}/tests/constraints.txt" \
>> $GITHUB_ENV

Expand Down
10 changes: 10 additions & 0 deletions docs/basic/adapt.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,9 @@ Python `bool` values `!True` and `!False` are converted to the equivalent

.. __: https://www.postgresql.org/docs/current/datatype-boolean.html

.. versionchanged:: 3.2
`numpy.bool_` values can be dumped too.


.. index::
single: Adaptation; numbers
Expand Down Expand Up @@ -73,6 +76,13 @@ promoted to the larger Python counterpart.
an adapter to :ref:`cast PostgreSQL numeric to Python float
<adapt-example-float>`. This of course may imply a loss of precision.

.. versionchanged:: 3.2

NumPy integer__ and `floating point`__ values can be dumped too.

.. __: https://numpy.org/doc/stable/reference/arrays.scalars.html#integer-types
.. __: https://numpy.org/doc/stable/reference/arrays.scalars.html#floating-point-types


.. index::
pair: Strings; Adaptation
Expand Down
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@
intersphinx_mapping = {
"py": ("https://docs.python.org/3", None),
"pg2": ("https://www.psycopg.org/docs/", None),
"numpy": ("https://numpy.org/doc/stable/", None),
}

autodoc_member_order = "bysource"
Expand Down
4 changes: 4 additions & 0 deletions docs/news.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ Future releases
Psycopg 3.2 (unreleased)
^^^^^^^^^^^^^^^^^^^^^^^^

- Add support for integer, floating point, boolean `NumPy scalar types`__
(:ticket:`#332`).
- Add support for libpq functions to close prepared statements and portals
introduced in libpq v17 (:ticket:`#603`).
- Disable receiving more than one result on the same cursor in pipeline mode,
Expand All @@ -21,6 +23,8 @@ Psycopg 3.2 (unreleased)
The `Cursor` now only preserves the results set of the last
`~Cursor.execute()`, consistently with non-pipeline mode.

.. __: https://numpy.org/doc/stable/reference/arrays.scalars.html#built-in-scalar-types


Current release
---------------
Expand Down
10 changes: 3 additions & 7 deletions psycopg/psycopg/copy.py
Original file line number Diff line number Diff line change
Expand Up @@ -811,13 +811,9 @@ def _format_row_text(
out += b"\n"
return out

for item in row:
if item is not None:
dumper = tx.get_dumper(item, PY_TEXT)
b = dumper.dump(item)
out += _dump_re.sub(_dump_sub, b)
else:
out += rb"\N"
adapted = tx.dump_sequence(row, [PY_TEXT] * len(row))
for b in adapted:
out += _dump_re.sub(_dump_sub, b) if b is not None else rb"\N"
out += b"\t"

out[-1:] = b"\n"
Expand Down
17 changes: 12 additions & 5 deletions psycopg/psycopg/crdb/_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,12 @@ def register_crdb_adapters(context: AdaptContext) -> None:

_register_postgres_adapters(context)

# String must come after enum to map text oid -> string dumper
# String must come after enum and none to map text oid -> string dumper
_register_crdb_none_adapters(context)
_register_crdb_enum_adapters(context)
_register_crdb_string_adapters(context)
_register_crdb_json_adapters(context)
_register_crdb_net_adapters(context)
_register_crdb_none_adapters(context)

dbapi20.register_dbapi20_adapters(adapters)

Expand All @@ -53,16 +53,23 @@ def _register_postgres_adapters(context: AdaptContext) -> None:
# Same adapters used by PostgreSQL, or a good starting point for customization

from ..types import array, bool, composite, datetime
from ..types import numeric, string, uuid
from ..types import numeric, numpy, string, uuid

array.register_default_adapters(context)
bool.register_default_adapters(context)
composite.register_default_adapters(context)
datetime.register_default_adapters(context)
numeric.register_default_adapters(context)
string.register_default_adapters(context)
uuid.register_default_adapters(context)

# Both numpy Decimal and uint64 dumpers use the numeric oid, but the former
# covers the entire numeric domain, whereas the latter only deals with
# integers. For this reason, if we specify dumpers by oid, we want to make
# sure to get the Decimal dumper. We enforce that by registering the
# numeric dumpers last.
numpy.register_default_adapters(context)
bool.register_default_adapters(context)
numeric.register_default_adapters(context)


def _register_crdb_string_adapters(context: AdaptContext) -> None:
from ..types import string
Expand Down
13 changes: 10 additions & 3 deletions psycopg/psycopg/postgres.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,18 +106,25 @@ def register_default_types(types: TypesRegistry) -> None:

def register_default_adapters(context: AdaptContext) -> None:
from .types import array, bool, composite, datetime, enum, json, multirange
from .types import net, none, numeric, range, string, uuid
from .types import net, none, numeric, numpy, range, string, uuid

array.register_default_adapters(context)
bool.register_default_adapters(context)
composite.register_default_adapters(context)
datetime.register_default_adapters(context)
enum.register_default_adapters(context)
json.register_default_adapters(context)
multirange.register_default_adapters(context)
net.register_default_adapters(context)
none.register_default_adapters(context)
numeric.register_default_adapters(context)
range.register_default_adapters(context)
string.register_default_adapters(context)
uuid.register_default_adapters(context)

# Both numpy Decimal and uint64 dumpers use the numeric oid, but the former
# covers the entire numeric domain, whereas the latter only deals with
# integers. For this reason, if we specify dumpers by oid, we want to make
# sure to get the Decimal dumper. We enforce that by registering the
# numeric dumpers last.
numpy.register_default_adapters(context)
bool.register_default_adapters(context)
numeric.register_default_adapters(context)
106 changes: 78 additions & 28 deletions psycopg/psycopg/types/numeric.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,12 @@

# Copyright (C) 2020 The Psycopg Team

import sys
import struct
from abc import ABC, abstractmethod
from math import log
from typing import Any, Callable, DefaultDict, Dict, Tuple, Union, cast
from typing import Any, Callable, DefaultDict, Dict, Optional, Tuple, Union
from typing import cast, TYPE_CHECKING
from decimal import Decimal, DefaultContext, Context

from .. import _oids
Expand All @@ -30,24 +33,29 @@
Float8 as Float8,
)

if TYPE_CHECKING:
import numpy


class _IntDumper(Dumper):
def dump(self, obj: Any) -> Buffer:
t = type(obj)
if t is not int:
# Convert to int in order to dump IntEnum correctly
if issubclass(t, int):
obj = int(obj)
else:
raise e.DataError(f"integer expected, got {type(obj).__name__!r}")

return str(obj).encode()

def quote(self, obj: Any) -> Buffer:
value = self.dump(obj)
return value if obj >= 0 else b" " + value


class _IntOrSubclassDumper(_IntDumper):
def dump(self, obj: Any) -> Buffer:
t = type(obj)
# Convert to int in order to dump IntEnum or numpy.integer correctly
if t is not int:
obj = int(obj)

return str(obj).encode()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return str(obj).encode()
return super().dump(obj)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I really don't like the need of an isinstance per int dump.

Now that we have Enum dumpers, I would like to try if it's possible to associate a pure int dumper to ints (one that just assumes that str(obj) returns the number representation) and associate a dumper converting to int only to enums.



class _SpecialValuesDumper(Dumper):
_special: Dict[bytes, bytes] = {}

Expand Down Expand Up @@ -96,11 +104,7 @@ class DecimalDumper(_SpecialValuesDumper):
oid = _oids.NUMERIC_OID

def dump(self, obj: Decimal) -> bytes:
if obj.is_nan():
# cover NaN and sNaN
return b"NaN"
else:
return str(obj).encode()
return dump_decimal_to_text(obj)

_special = {
b"Infinity": b"'Infinity'::numeric",
Expand All @@ -109,23 +113,23 @@ def dump(self, obj: Decimal) -> bytes:
}


class Int2Dumper(_IntDumper):
class Int2Dumper(_IntOrSubclassDumper):
oid = _oids.INT2_OID


class Int4Dumper(_IntDumper):
class Int4Dumper(_IntOrSubclassDumper):
oid = _oids.INT4_OID


class Int8Dumper(_IntDumper):
class Int8Dumper(_IntOrSubclassDumper):
oid = _oids.INT8_OID


class IntNumericDumper(_IntDumper):
class IntNumericDumper(_IntOrSubclassDumper):
oid = _oids.NUMERIC_OID


class OidDumper(_IntDumper):
class OidDumper(_IntOrSubclassDumper):
oid = _oids.OID_OID


Expand Down Expand Up @@ -350,23 +354,69 @@ def dump(self, obj: Decimal) -> Buffer:
return dump_decimal_to_numeric_binary(obj)


class NumericDumper(DecimalDumper):
def dump(self, obj: Union[Decimal, int]) -> bytes:
if isinstance(obj, int):
class _MixedNumericDumper(Dumper, ABC):
"""Base for dumper to dump int, decimal, numpy.integer to Postgres numeric

Only used when looking up by oid.
"""

oid = _oids.NUMERIC_OID

# If numpy is available, the dumped object might be a numpy integer too
int_classes: Union[type, Tuple[type, ...]] = ()

def __init__(self, cls: type, context: Optional[AdaptContext] = None):
super().__init__(cls, context)

# Verify if numpy is available. If it is, we might have to dump
# its integers too.
if not _MixedNumericDumper.int_classes:
if "numpy" in sys.modules:
import numpy

_MixedNumericDumper.int_classes = (int, numpy.integer)
else:
_MixedNumericDumper.int_classes = int

@abstractmethod
def dump(self, obj: Union[Decimal, int, "numpy.integer[Any]"]) -> Buffer:
...


class NumericDumper(_MixedNumericDumper):
def dump(self, obj: Union[Decimal, int, "numpy.integer[Any]"]) -> Buffer:
if isinstance(obj, self.int_classes):
return str(obj).encode()
elif isinstance(obj, Decimal):
return dump_decimal_to_text(obj)
else:
return super().dump(obj)
raise TypeError(
f"class {type(self).__name__} cannot dump {type(obj).__name__}"
)


class NumericBinaryDumper(Dumper):
class NumericBinaryDumper(_MixedNumericDumper):
format = Format.BINARY
oid = _oids.NUMERIC_OID

def dump(self, obj: Union[Decimal, int]) -> Buffer:
if isinstance(obj, int):
def dump(self, obj: Union[Decimal, int, "numpy.integer[Any]"]) -> Buffer:
if type(obj) is int:
return dump_int_to_numeric_binary(obj)
else:
elif isinstance(obj, Decimal):
return dump_decimal_to_numeric_binary(obj)
elif isinstance(obj, self.int_classes):
return dump_int_to_numeric_binary(int(obj))
else:
raise TypeError(
f"class {type(self).__name__} cannot dump {type(obj).__name__}"
)


def dump_decimal_to_text(obj: Decimal) -> bytes:
if obj.is_nan():
# cover NaN and sNaN
return b"NaN"
else:
return str(obj).encode()


def dump_decimal_to_numeric_binary(obj: Decimal) -> Union[bytearray, bytes]:
Expand Down