Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: hashing datetime64 objects #50960

Closed
wants to merge 64 commits into from
Closed
Show file tree
Hide file tree
Changes from 54 commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
7761ecd
BUG: hashing datetime64 objects
jbrockmendel Jan 24, 2023
610b0c6
handle cases out of pydatetime bounds
jbrockmendel Jan 24, 2023
0e140ae
Merge branch 'main' into bug-hash-dt64
jbrockmendel Jan 24, 2023
919383c
Merge branch 'main' into bug-hash-dt64
jbrockmendel Jan 24, 2023
ae8c0bb
Merge branch 'main' into bug-hash-dt64
jbrockmendel Jan 25, 2023
92a39eb
troubleshoot CI builds
jbrockmendel Jan 25, 2023
2f67805
troubleshoot CI builds
jbrockmendel Jan 25, 2023
0635f86
troubleshoot CI builds
jbrockmendel Jan 25, 2023
229ab72
troubleshoot CI builds
jbrockmendel Jan 25, 2023
6e96805
troubleshoot CI builds
jbrockmendel Jan 25, 2023
24fda82
Merge branch 'main' into bug-hash-dt64
jbrockmendel Jan 26, 2023
058b666
suggested edits
jbrockmendel Jan 27, 2023
7398991
Merge branch 'main' into bug-hash-dt64
jbrockmendel Jan 31, 2023
3fdf564
Merge branch 'main' into bug-hash-dt64
jbrockmendel Feb 8, 2023
6e4836e
use sebergs suggestion
jbrockmendel Feb 8, 2023
f55337a
Merge branch 'main' into bug-hash-dt64
jbrockmendel Feb 9, 2023
a97dfc9
Merge branch 'main' into bug-hash-dt64
jbrockmendel Feb 9, 2023
1338ca2
Merge branch 'main' into bug-hash-dt64
jbrockmendel Feb 9, 2023
9fb1987
Merge branch 'main' into bug-hash-dt64
jbrockmendel Feb 10, 2023
818682c
suggested edits
jbrockmendel Feb 10, 2023
74ab540
Merge branch 'main' into bug-hash-dt64
jbrockmendel Feb 10, 2023
037ba05
remove unnecessary casts
jbrockmendel Feb 11, 2023
7a4b1ab
Merge branch 'main' into bug-hash-dt64
jbrockmendel Feb 11, 2023
47e5247
Merge branch 'main' into bug-hash-dt64
jbrockmendel Feb 11, 2023
dcd09dd
Merge branch 'main' into bug-hash-dt64
jbrockmendel Feb 11, 2023
32d479b
Merge branch 'bug-hash-dt64' of github.com:jbrockmendel/pandas into b…
jbrockmendel Feb 11, 2023
d47cfd8
Merge branch 'main' into bug-hash-dt64
jbrockmendel Feb 12, 2023
c091317
Merge branch 'main' into bug-hash-dt64
jbrockmendel Feb 12, 2023
1ce791e
Merge branch 'main' into bug-hash-dt64
jbrockmendel Feb 13, 2023
704fb69
suggested edit for PyDateTime_IMPORT
jbrockmendel Feb 13, 2023
6d962b0
Merge branch 'bug-hash-dt64' of github.com:jbrockmendel/pandas into b…
jbrockmendel Feb 13, 2023
f838953
revert delay
jbrockmendel Feb 13, 2023
0d8500a
Merge branch 'main' into bug-hash-dt64
jbrockmendel Feb 13, 2023
58a29b6
Merge branch 'main' into bug-hash-dt64
jbrockmendel Feb 14, 2023
95069e0
restore check
jbrockmendel Feb 14, 2023
7362f3e
Merge branch 'main' into bug-hash-dt64
jbrockmendel Feb 14, 2023
dd08670
Merge branch 'main' into bug-hash-dt64
jbrockmendel Feb 15, 2023
998a4cc
Merge branch 'main' into bug-hash-dt64
jbrockmendel Feb 15, 2023
b75730b
Merge branch 'main' into bug-hash-dt64
jbrockmendel Feb 17, 2023
5c57a5e
Merge branch 'main' into bug-hash-dt64
jbrockmendel Feb 20, 2023
6b4460f
Merge branch 'main' into bug-hash-dt64
jbrockmendel Feb 21, 2023
c94609b
add test
jbrockmendel Feb 22, 2023
afe9493
Merge branch 'main' into bug-hash-dt64
jbrockmendel Feb 26, 2023
4fecc97
Merge branch 'main' into bug-hash-dt64
jbrockmendel Mar 3, 2023
b4cc41e
Merge branch 'main' into bug-hash-dt64
jbrockmendel Mar 8, 2023
c620339
Merge branch 'main' into bug-hash-dt64
jbrockmendel Mar 9, 2023
3633653
Merge branch 'main' into bug-hash-dt64
jbrockmendel Mar 11, 2023
c55f182
shot in the dark
jbrockmendel Mar 11, 2023
6e2bbf0
Merge branch 'main' into bug-hash-dt64
jbrockmendel Mar 12, 2023
23c2826
capsule stuff
jbrockmendel Mar 12, 2023
143b3a3
guessing
jbrockmendel Mar 13, 2023
ffb8365
still tryin
jbrockmendel Mar 13, 2023
1fdfd64
Merge branch 'main' into bug-hash-dt64
jbrockmendel Mar 13, 2023
5513721
macro
jbrockmendel Mar 13, 2023
875d6af
revert sources
jbrockmendel Mar 13, 2023
a29a56a
Merge branch 'main' into bug-hash-dt64
jbrockmendel Mar 13, 2023
40e6e17
Move np_datetime64_object_hash to np_datetime.c
jbrockmendel Mar 13, 2023
15a701c
import_pandas_datetime more
jbrockmendel Mar 13, 2023
af25f40
troubleshoot
jbrockmendel Mar 13, 2023
9d5cb46
Merge branch 'main' into bug-hash-dt64
jbrockmendel Mar 14, 2023
d5a031d
Merge branch 'main' into bug-hash-dt64
jbrockmendel Mar 14, 2023
bd7d432
post-merge merges
jbrockmendel Mar 14, 2023
394d86e
frickin guess
jbrockmendel Mar 14, 2023
1766bc3
Merge branch 'main' into bug-hash-dt64
jbrockmendel Mar 20, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 55 additions & 3 deletions pandas/_libs/src/klib/khash_python.h
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,10 @@
typedef npy_complex64 khcomplex64_t;
typedef npy_complex128 khcomplex128_t;

// get pandas_datetime_to_datetimestruct
#include <../../tslibs/src/datetime/np_datetime.h>
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved

#include "datetime.h"

// khash should report usage to tracemalloc
#if PY_VERSION_HEX >= 0x03060000
Expand Down Expand Up @@ -305,6 +308,7 @@ khuint32_t PANDAS_INLINE kh_python_hash_func(PyObject* key);
#define _PandasHASH_XXROTATE(x) ((x << 13) | (x >> 19)) /* Rotate left 13 bits */
#endif


Py_hash_t PANDAS_INLINE tupleobject_hash(PyTupleObject* key) {
Py_ssize_t i, len = Py_SIZE(key);
PyObject **item = key->ob_item;
Expand All @@ -315,9 +319,7 @@ Py_hash_t PANDAS_INLINE tupleobject_hash(PyTupleObject* key) {
if (lane == (Py_uhash_t)-1) {
return -1;
}
acc += lane * _PandasHASH_XXPRIME_2;
acc = _PandasHASH_XXROTATE(acc);
acc *= _PandasHASH_XXPRIME_1;
acc = tuple_update_uhash(acc, lane);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, missed that you removed this here. If we duplicate the code, maybe just don't change it here?

}

/* Add input length, mangled to keep the historical value of hash(()). */
Expand All @@ -330,6 +332,52 @@ Py_hash_t PANDAS_INLINE tupleobject_hash(PyTupleObject* key) {
}


// TODO: same thing for timedelta64 objects
Py_hash_t np_datetime64_object_hash(PyDatetimeScalarObject* key) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can move this to the capsule. It would fit much more naturally there than the khash header

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The capsule also already does the PyDateTime_IMPORT, which I think will help your remaining issue

// GH#50690 numpy's hash implementation does not preserve comparabity
// either across resolutions or with standard library objects.
// See also Timestamp.__hash__

NPY_DATETIMEUNIT unit = (NPY_DATETIMEUNIT)key->obmeta.base;
npy_datetime value = key->obval;
npy_datetimestruct dts;

if (value == NPY_DATETIME_NAT) {
// np.datetime64("NaT") in any reso
return NPY_DATETIME_NAT;
}

pandas_datetime_to_datetimestruct(value, unit, &dts);

if ((dts.year > 0) && (dts.year <= 9999) && (dts.ps == 0) && (dts.as == 0)) {
// we CAN cast to pydatetime, so use that hash to ensure we compare
// as matching standard library datetimes (and pd.Timestamps)
if (PyDateTimeAPI == NULL) {
/* delayed import, may be nice to move to import time */
PyDateTime_IMPORT;
if (PyDateTimeAPI == NULL) {
return -1;
}
}

PyObject* dt;
Py_hash_t hash;

dt = PyDateTime_FromDateAndTime(
dts.year, dts.month, dts.day, dts.hour, dts.min, dts.sec, dts.us
WillAyd marked this conversation as resolved.
Show resolved Hide resolved
);
if (dt == NULL) {
return -1;
}
hash = PyObject_Hash(dt);
Py_DECREF(dt);
return hash;
}

return hash_datetime_from_struct(&dts);
}


khuint32_t PANDAS_INLINE kh_python_hash_func(PyObject* key) {
Py_hash_t hash;
// For PyObject_Hash holds:
Expand All @@ -351,6 +399,10 @@ khuint32_t PANDAS_INLINE kh_python_hash_func(PyObject* key) {
else if (PyTuple_CheckExact(key)) {
hash = tupleobject_hash((PyTupleObject*)key);
}
else if (PyObject_TypeCheck(key, &PyDatetimeArrType_Type)) {
// GH#50690
hash = np_datetime64_object_hash((PyDatetimeScalarObject *)key);
}
else {
hash = PyObject_Hash(key);
}
Expand Down
2 changes: 2 additions & 0 deletions pandas/_libs/tslibs/np_datetime.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,8 @@ cdef extern from "src/datetime/pd_datetime.h":
pandas_timedeltastruct *result
) nogil

Py_hash_t hash_datetime_from_struct(npy_datetimestruct* dts) except? -1

void PandasDateTime_IMPORT()

ctypedef enum FormatRequirement:
Expand Down
53 changes: 53 additions & 0 deletions pandas/_libs/tslibs/src/datetime/np_datetime.c
Original file line number Diff line number Diff line change
Expand Up @@ -1033,3 +1033,56 @@ PyArray_DatetimeMetaData
get_datetime_metadata_from_dtype(PyArray_Descr *dtype) {
return (((PyArray_DatetimeDTypeMetaData *)dtype->c_metadata)->meta);
}


// we could use any hashing algorithm, this is the original CPython's for tuples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have these same defines in khash_python.h already. Not sure about duplicating here versus refactoring

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah I moved some of it rather than duplicating, likely could go further down that path. I'd like to get this in soonish so prefer to do bigger refactors separately

Copy link
Member

@WillAyd WillAyd Feb 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with C its generally really hard to track down the effects of copy/pasting defines across headers and implementations though. Maybe we just create a cpython_hash.h file that khash and np can include?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving it into its own file seems nice probably. I wouldn't care too much, since this is only a problem if you would include the other header here (even then it might not be since it matches).

On a general note, it might be good to add include guards to your headers:

#ifndef PD_LIBS_..._CPYTHON_HASH_H
#define PD_LIBS_..._CPYTHON_HASH_H

<header content>

#endif  /* PD_LIBS_..._CPYTHON_HASH_H */

(some pattern there, google style suggests full paths and at least a partial path would make sense I think)


#if SIZEOF_PY_UHASH_T > 4
#define _PandasHASH_XXPRIME_1 ((Py_uhash_t)11400714785074694791ULL)
#define _PandasHASH_XXPRIME_2 ((Py_uhash_t)14029467366897019727ULL)
#define _PandasHASH_XXPRIME_5 ((Py_uhash_t)2870177450012600261ULL)
#define _PandasHASH_XXROTATE(x) ((x << 31) | (x >> 33)) /* Rotate left 31 bits */
#else
#define _PandasHASH_XXPRIME_1 ((Py_uhash_t)2654435761UL)
#define _PandasHASH_XXPRIME_2 ((Py_uhash_t)2246822519UL)
#define _PandasHASH_XXPRIME_5 ((Py_uhash_t)374761393UL)
#define _PandasHASH_XXROTATE(x) ((x << 13) | (x >> 19)) /* Rotate left 13 bits */
#endif


Py_uhash_t tuple_update_uhash(Py_uhash_t acc, Py_uhash_t lane) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Py_uhash_t tuple_update_uhash(Py_uhash_t acc, Py_uhash_t lane) {
static inline Py_uhash_t tuple_update_uhash(Py_uhash_t acc, Py_uhash_t lane) {

The static inline should be the same as pandas inline. inline used to not exist, but you can now rely on it. So basically PANDAS_INLINE can be blanket replaced with static inline.

However, you can of course also just use static or PANDAS_INLINE here. You should add one of these (i.e. static should be there) to ensure that this is private to this file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other PRs I've asked us to move away from specifying inline explicitly instead allowing the compiler to choose for us. Is that the wrong approach?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC its a compiler hint mainly but I am not sure where it matters in many places.

Copy link
Member

@WillAyd WillAyd Feb 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea on second read this might be a place where it actually could inline. Thought it was using the Python runtime at first but I see it doesn't now, so it might have a chance

gcc has -Winline to tell you when the hint is ignored would be interesting to see on this

acc += lane * _PandasHASH_XXPRIME_2;
acc = _PandasHASH_XXROTATE(acc);
acc *= _PandasHASH_XXPRIME_1;
return acc;
}

// https://github.com/pandas-dev/pandas/pull/50960
Py_hash_t
hash_datetime_from_struct(npy_datetimestruct* dts) {
/*
* If we cannot cast to datetime, use the datetime struct values directly
* and mix them similar to a tuple.
*/

Py_uhash_t acc = _PandasHASH_XXPRIME_5;
#if 64 <= SIZEOF_PY_UHASH_T
acc = tuple_update_uhash(acc, (Py_uhash_t)dts->year);
#else
/* Mix lower and uper bits of the year if int64 is larger */
acc = tuple_update_uhash(acc, (Py_uhash_t)dts->year);
acc = tuple_update_uhash(acc, (Py_uhash_t)(dts->year >> SIZEOF_PY_UHASH_T));
#endif
acc = tuple_update_uhash(acc, (Py_uhash_t)dts->month);
acc = tuple_update_uhash(acc, (Py_uhash_t)dts->day);
acc = tuple_update_uhash(acc, (Py_uhash_t)dts->min);
acc = tuple_update_uhash(acc, (Py_uhash_t)dts->sec);
acc = tuple_update_uhash(acc, (Py_uhash_t)dts->us);
acc = tuple_update_uhash(acc, (Py_uhash_t)dts->ps);
acc = tuple_update_uhash(acc, (Py_uhash_t)dts->as);
Comment on lines +1080 to +1083
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could bunch everything below second (or even minute or more?) into a single 64bit number (then unfortunately the same trick to split it up if necessary on the platform).

Not sure that is worthwhile, probably a tiny bit faster, but I am mainly wondering if it might generalize a bit nicer if we use a simpler scheme that doesn't require the full datetime struct in principle.

/* should be a need to mix length, as it is fixed anyway? */
if (acc == (Py_uhash_t)-1) {
acc = (Py_uhash_t)-2;
}
return acc;
}
3 changes: 3 additions & 0 deletions pandas/_libs/tslibs/src/datetime/np_datetime.h
Original file line number Diff line number Diff line change
Expand Up @@ -116,4 +116,7 @@ PyArray_DatetimeMetaData get_datetime_metadata_from_dtype(
PyArray_Descr *dtype);


Py_hash_t hash_datetime_from_struct(npy_datetimestruct* dts);
Py_uhash_t tuple_update_uhash(Py_uhash_t acc, Py_uhash_t lane);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Py_uhash_t tuple_update_uhash(Py_uhash_t acc, Py_uhash_t lane);

no need to make this public.


#endif // PANDAS__LIBS_TSLIBS_SRC_DATETIME_NP_DATETIME_H_
1 change: 1 addition & 0 deletions pandas/_libs/tslibs/src/datetime/pd_datetime.c
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ static int pandas_datetime_exec(PyObject *module) {
capi->get_datetime_iso_8601_strlen = get_datetime_iso_8601_strlen;
capi->make_iso_8601_datetime = make_iso_8601_datetime;
capi->make_iso_8601_timedelta = make_iso_8601_timedelta;
capi->hash_datetime_from_struct = hash_datetime_from_struct;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need to define a macro for this later in the file

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#define hash_datetime_from_struct(dts)                                         \
  PandasDateTimeAPI->hash_datetime_from_struct((dts))

didnt seem to do it


PyObject *capsule = PyCapsule_New(capi, PandasDateTime_CAPSULE_NAME,
pandas_datetime_destructor);
Expand Down
3 changes: 3 additions & 0 deletions pandas/_libs/tslibs/src/datetime/pd_datetime.h
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ typedef struct {
int (*make_iso_8601_datetime)(npy_datetimestruct *, char *, int, int,
NPY_DATETIMEUNIT);
int (*make_iso_8601_timedelta)(pandas_timedeltastruct *, char *, size_t *);
Py_hash_t (*hash_datetime_from_struct)(npy_datetimestruct* dts);
} PandasDateTime_CAPI;

// The capsule name appears limited to module.attributename; see bpo-32414
Expand Down Expand Up @@ -107,6 +108,8 @@ static PandasDateTime_CAPI *PandasDateTimeAPI = NULL;
(base))
#define make_iso_8601_timedelta(tds, outstr, outlen) \
PandasDateTimeAPI->make_iso_8601_timedelta((tds), (outstr), (outlen))
#define hash_datetime_from_struct(dts) \
PandasDateTimeAPI->hash_datetime_from_struct((dts))
#endif /* !defined(_PANDAS_DATETIME_IMPL) */

#ifdef __cplusplus
Expand Down
10 changes: 6 additions & 4 deletions pandas/_libs/tslibs/timestamps.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ from pandas._libs.tslibs.np_datetime cimport (
get_datetime64_unit,
get_datetime64_value,
get_unit_from_dtype,
hash_datetime_from_struct,
import_pandas_datetime,
npy_datetimestruct,
npy_datetimestruct_to_datetime,
Expand Down Expand Up @@ -311,11 +312,12 @@ cdef class _Timestamp(ABCTimestamp):
# -----------------------------------------------------------------

def __hash__(_Timestamp self):
if self.nanosecond:
return hash(self._value)
if not (1 <= self.year <= 9999):
cdef:
npy_datetimestruct dts
if not (1 <= self.year <= 9999) or self.nanosecond:
# out of bounds for pydatetime
return hash(self._value)
pydatetime_to_dtstruct(self, &dts)
return hash_datetime_from_struct(&dts)
if self.fold:
return datetime.__hash__(self.replace(fold=0))
return datetime.__hash__(self)
Expand Down
26 changes: 26 additions & 0 deletions pandas/tests/indexes/object/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -187,3 +187,29 @@ def test_slice_locs_dup(self):
assert index2.slice_locs(end="a") == (0, 6)
assert index2.slice_locs("d", "b") == (0, 4)
assert index2.slice_locs("c", "a") == (2, 6)


def test_np_datetime64_objects():
# GH#50690
ms = np.datetime64(1, "ms")
us = np.datetime64(1000, "us")

left = Index([ms], dtype=object)
right = Index([us], dtype=object)

assert left[0] in right
assert right[0] in left

assert left.get_loc(right[0]) == 0
assert right.get_loc(left[0]) == 0

# non-monotonic cases go through different paths in cython code
sec = np.datetime64("9999-01-01", "s")
day = np.datetime64("2016-01-01", "D")
left2 = Index([ms, sec, day], dtype=object)

expected = np.array([0], dtype=np.intp)
res = left2[:1].get_indexer(right)
tm.assert_numpy_array_equal(res, expected)
res = left2.get_indexer(right)
tm.assert_numpy_array_equal(res, expected)
17 changes: 17 additions & 0 deletions pandas/tests/test_algos.py
Original file line number Diff line number Diff line change
Expand Up @@ -1151,6 +1151,15 @@ def test_isin_unsigned_dtype(self):


class TestValueCounts:
def test_value_counts_datetime64_mismatched_units(self):
# GH#50960 np.datetime64 objects with different units that are still equal
arr = np.array(
[np.datetime64(1, "ms"), np.datetime64(1000, "us")], dtype=object
)
res = algos.value_counts(arr)
expected = Series([2], index=arr[:1], name="count")
tm.assert_series_equal(res, expected)

def test_value_counts(self):
np.random.seed(1234)
from pandas.core.reshape.tile import cut
Expand Down Expand Up @@ -1607,6 +1616,14 @@ def test_unique_complex_numbers(self, array, expected):
result = pd.unique(array)
tm.assert_numpy_array_equal(result, expected)

def test_unique_datetime64_mismatched_units(self):
# GH#50960 np.datetime64 objects with different units that are still equal
arr = np.array(
[np.datetime64(1, "ms"), np.datetime64(1000, "us")], dtype=object
)
res = pd.unique(arr)
tm.assert_numpy_array_equal(res, arr[:1])


class TestHashTable:
@pytest.mark.parametrize(
Expand Down
8 changes: 5 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -445,7 +445,7 @@ def srcpath(name=None, suffix=".pyx", subdir="src"):
"_libs.algos": {
"pyxfile": "_libs/algos",
"include": klib_include,
"depends": _pxi_dep["algos"],
"depends": _pxi_dep["algos"] + tseries_depends,
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
},
"_libs.arrays": {"pyxfile": "_libs/arrays"},
"_libs.groupby": {"pyxfile": "_libs/groupby"},
Expand All @@ -456,19 +456,21 @@ def srcpath(name=None, suffix=".pyx", subdir="src"):
"depends": (
["pandas/_libs/src/klib/khash_python.h", "pandas/_libs/src/klib/khash.h"]
+ _pxi_dep["hashtable"]
+ tseries_depends
),
},
"_libs.index": {
"pyxfile": "_libs/index",
"include": klib_include,
"depends": _pxi_dep["index"],
"depends": _pxi_dep["index"] + tseries_depends,
},
"_libs.indexing": {"pyxfile": "_libs/indexing"},
"_libs.internals": {"pyxfile": "_libs/internals"},
"_libs.interval": {
"pyxfile": "_libs/interval",
"include": klib_include,
"depends": _pxi_dep["interval"],
"depends": _pxi_dep["interval"] + tseries_depends,
"sources": ["pandas/_libs/tslibs/src/datetime/np_datetime.c"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't add any new sources arguments to setup.py - this is what causes undefined symbols during parallel compilation with setuptools

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted that and now im getting a different failure

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I think you also need to move the implementation out of the header file into the capsule to resolve that

},
"_libs.join": {"pyxfile": "_libs/join", "include": klib_include},
"_libs.lib": {
Expand Down