Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: implement non-nano Timedelta scalar #46688

Merged
merged 17 commits into from
Apr 18, 2022
Merged

Conversation

jbrockmendel
Copy link
Member

  • closes #xxxx (Replace xxxx with the Github issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Not user-exposed. This is just enough to allow us to have non-nano TimedeltaArray.

asvs for tslibs.timedelta unchanged

@jbrockmendel
Copy link
Member Author

@WillAyd thoughts on troubleshooting non-Mac build failures at import-time ImportError: [...]/pandas/_libs/tslibs/timedeltas.cpython-38-x86_64-linux-gnu.so: undefined symbol: pandas_datetime_to_datetimestruct

So far i've determined that yelling "but it isn't undefined" at the screen doesn't help.

@WillAyd
Copy link
Member

WillAyd commented Apr 9, 2022

I think you need to modify setup.py so that timedeltas includes the header file np_datetime.h during compilation.

@WillAyd
Copy link
Member

WillAyd commented Apr 9, 2022

Err sorry undefined symbol is a runtime error, so not a matter of include but rather a linkage error. Try adding "sources": ["pandas/_libs/tslibs/src/datetime/np_datetime.c"], in setup.py to the appropriate timedeltas.pyx entry

@jbrockmendel
Copy link
Member Author

OK, adding np_datetime.c in setup.py for timedeltas's "sources" tentatively helps

@jreback jreback added Timedelta Timedelta data type Non-Nano datetime64/timedelta64 with non-nanosecond resolution labels Apr 10, 2022
cdef:
str abbrev = npy_unit_to_abbrev(self._reso)
# TODO: way to create a np.timedelta64 obj with the reso directly
# instead of having to get the abbrev?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg is there a C-API way to create a timedelta64 object?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess PyArray_Scalar assuming you got the correct dtype available. (That function should only be used for NumPy dtypes IMO, but that isn't a problem)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assuming you got the correct dtype available

what we have on hand is the correct NPY_DATETIMEUNIT. I guess we need to create the dtype from the unit (we have a function to go the other direction, so i guess this shouldn't be too hard to figure out). If I figure this out, I'll probably try to upstream it into numpy's __init__.pxd

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be OK to add a function that works with the unit directly for the C-API, also. But it doesn't exist yet. It seems PyArray_Scalar is commented out from __init__.pxd, I am not sure if there is a reason for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could the reason by that PyArray_Scalar's first arg is void * which might not play so well with cython? i've figured out how to create the dtype object from the unit (copied create_datetime_dtype_with_unit over from multiarray/datetime.c) but so far having no luck in calling PyArray_Scalar

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? I don't really see a good reason, void * seems perfectly fine, if cython doesn't like it, just use char * (which is likely better anyway?)
But, I also don't think it is API that should be used a lot, so that may just be the reason also, that it is pretty unused.

@jbrockmendel
Copy link
Member Author

@jreback gentle ping here and #46397, lots more to do along both lines

@mroeschke mroeschke added this to the 1.5 milestone Apr 18, 2022
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine just a question in testing

if len(state) == 1:
# older pickle, only supported nanosecond
value = state[0]
reso = NPY_FR_ns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sufficient testing on this?

@jreback jreback merged commit ec75436 into pandas-dev:main Apr 18, 2022
@jbrockmendel jbrockmendel deleted the nano-td branch April 18, 2022 20:58
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Non-Nano datetime64/timedelta64 with non-nanosecond resolution Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants