Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add timezone handling for datetime in pydantic #578

Merged
merged 3 commits into from
Dec 28, 2023

Conversation

changhiskhan
Copy link
Contributor

@changhiskhan changhiskhan commented Oct 17, 2023

If you add timezone information in the Field annotation for a datetime then that will now be passed to the pyarrow data type.

I'm not sure how pyarrow enforces timezones, right now, it silently coerces to the timezone given in the column regardless of whether the input had the matching timezone or not. This is probably not the right behavior. Though we could just make it so the user has to make the pydantic model do the validation instead of doing that at the pyarrow conversion layer.

vector=list(range(16)),
li=[1, 2, 3],
dt=datetime.now(),
dt_with_tz=datetime.now(tz=pytz.timezone("Asia/Kolkata")),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to check whether tz is appropriately serialized

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you put in an invalid timezone pytz will complain. If you put in a tz that doesn't match the schema, then the current behavior is that it'll just get coerced into the timezone that matches the schema.
I was thinking we should ask the user to explicitly add validation to the pydantic model, rather than spreading the validation logic across a bunch of different places. That isn't ideal because if you're not careful you can certainly end up with bad data. But putting this adhoc validator in LanceDB also doesn't feel right 🤷‍♂️

@rok
Copy link
Contributor

rok commented Oct 17, 2023

I'm not sure how pyarrow enforces timezones,

pyarrow only touches timezones when needed e.g. even cast and subtract won't need it, but floor_temporal will:

import pyarrow as pa
from pyarrow import compute as pc
arr = pa.array([1], pa.timestamp("us", "UTC"))
pc.floor_temporal(arr, unit="week")
arr = arr.cast(pa.timestamp("ns", "XYZ"))
pc.floor_temporal(arr, unit="week")
---
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/rok/venv1/lib/python3.10/site-packages/pyarrow/compute.py", line 262, in wrapper
    return func.call(args, options, memory_pool)
  File "pyarrow/_compute.pyx", line 367, in pyarrow._compute.Function.call
  File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Cannot locate timezone 'XYZ': XYZ not found in timezone database

If you add timezone information in the Field annotation for a datetime
then that will now be passed to the pyarrow data type
@changhiskhan changhiskhan merged commit 4b8af26 into main Dec 28, 2023
13 checks passed
@changhiskhan changhiskhan deleted the changhiskhan/datetime branch December 28, 2023 19:02
koolamusic pushed a commit to tecmie/lancedb that referenced this pull request Jan 12, 2024
If you add timezone information in the Field annotation for a datetime
then that will now be passed to the pyarrow data type.

I'm not sure how pyarrow enforces timezones, right now, it silently
coerces to the timezone given in the column regardless of whether the
input had the matching timezone or not. This is probably not the right
behavior. Though we could just make it so the user has to make the
pydantic model do the validation instead of doing that at the pyarrow
conversion layer.
raghavdixit99 pushed a commit to raghavdixit99/lancedb that referenced this pull request Apr 5, 2024
If you add timezone information in the Field annotation for a datetime
then that will now be passed to the pyarrow data type.

I'm not sure how pyarrow enforces timezones, right now, it silently
coerces to the timezone given in the column regardless of whether the
input had the matching timezone or not. This is probably not the right
behavior. Though we could just make it so the user has to make the
pydantic model do the validation instead of doing that at the pyarrow
conversion layer.
raghavdixit99 pushed a commit to raghavdixit99/lancedb that referenced this pull request Apr 5, 2024
If you add timezone information in the Field annotation for a datetime
then that will now be passed to the pyarrow data type.

I'm not sure how pyarrow enforces timezones, right now, it silently
coerces to the timezone given in the column regardless of whether the
input had the matching timezone or not. This is probably not the right
behavior. Though we could just make it so the user has to make the
pydantic model do the validation instead of doing that at the pyarrow
conversion layer.
westonpace pushed a commit that referenced this pull request Apr 5, 2024
If you add timezone information in the Field annotation for a datetime
then that will now be passed to the pyarrow data type.

I'm not sure how pyarrow enforces timezones, right now, it silently
coerces to the timezone given in the column regardless of whether the
input had the matching timezone or not. This is probably not the right
behavior. Though we could just make it so the user has to make the
pydantic model do the validation instead of doing that at the pyarrow
conversion layer.
westonpace pushed a commit that referenced this pull request Apr 5, 2024
If you add timezone information in the Field annotation for a datetime
then that will now be passed to the pyarrow data type.

I'm not sure how pyarrow enforces timezones, right now, it silently
coerces to the timezone given in the column regardless of whether the
input had the matching timezone or not. This is probably not the right
behavior. Though we could just make it so the user has to make the
pydantic model do the validation instead of doing that at the pyarrow
conversion layer.
alexkohler pushed a commit to alexkohler/lancedb that referenced this pull request Apr 20, 2024
* enforce simd alignment

* fmt

* fmt

* address comments

* address comments

* add SIMD check on vector creation

* fix lq

* cargo fmt

* fix simd alignment

* fix tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants