-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add timezone handling for datetime in pydantic #578
Conversation
python/tests/test_table.py
Outdated
vector=list(range(16)), | ||
li=[1, 2, 3], | ||
dt=datetime.now(), | ||
dt_with_tz=datetime.now(tz=pytz.timezone("Asia/Kolkata")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to check whether tz is appropriately serialized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you put in an invalid timezone pytz will complain. If you put in a tz that doesn't match the schema, then the current behavior is that it'll just get coerced into the timezone that matches the schema.
I was thinking we should ask the user to explicitly add validation to the pydantic model, rather than spreading the validation logic across a bunch of different places. That isn't ideal because if you're not careful you can certainly end up with bad data. But putting this adhoc validator in LanceDB also doesn't feel right 🤷♂️
pyarrow only touches timezones when needed e.g. even import pyarrow as pa
from pyarrow import compute as pc
arr = pa.array([1], pa.timestamp("us", "UTC"))
pc.floor_temporal(arr, unit="week")
arr = arr.cast(pa.timestamp("ns", "XYZ"))
pc.floor_temporal(arr, unit="week")
---
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/rok/venv1/lib/python3.10/site-packages/pyarrow/compute.py", line 262, in wrapper
return func.call(args, options, memory_pool)
File "pyarrow/_compute.pyx", line 367, in pyarrow._compute.Function.call
File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Cannot locate timezone 'XYZ': XYZ not found in timezone database |
If you add timezone information in the Field annotation for a datetime then that will now be passed to the pyarrow data type
e1589c8
to
0981ce4
Compare
If you add timezone information in the Field annotation for a datetime then that will now be passed to the pyarrow data type. I'm not sure how pyarrow enforces timezones, right now, it silently coerces to the timezone given in the column regardless of whether the input had the matching timezone or not. This is probably not the right behavior. Though we could just make it so the user has to make the pydantic model do the validation instead of doing that at the pyarrow conversion layer.
If you add timezone information in the Field annotation for a datetime then that will now be passed to the pyarrow data type. I'm not sure how pyarrow enforces timezones, right now, it silently coerces to the timezone given in the column regardless of whether the input had the matching timezone or not. This is probably not the right behavior. Though we could just make it so the user has to make the pydantic model do the validation instead of doing that at the pyarrow conversion layer.
If you add timezone information in the Field annotation for a datetime then that will now be passed to the pyarrow data type. I'm not sure how pyarrow enforces timezones, right now, it silently coerces to the timezone given in the column regardless of whether the input had the matching timezone or not. This is probably not the right behavior. Though we could just make it so the user has to make the pydantic model do the validation instead of doing that at the pyarrow conversion layer.
If you add timezone information in the Field annotation for a datetime then that will now be passed to the pyarrow data type. I'm not sure how pyarrow enforces timezones, right now, it silently coerces to the timezone given in the column regardless of whether the input had the matching timezone or not. This is probably not the right behavior. Though we could just make it so the user has to make the pydantic model do the validation instead of doing that at the pyarrow conversion layer.
If you add timezone information in the Field annotation for a datetime then that will now be passed to the pyarrow data type. I'm not sure how pyarrow enforces timezones, right now, it silently coerces to the timezone given in the column regardless of whether the input had the matching timezone or not. This is probably not the right behavior. Though we could just make it so the user has to make the pydantic model do the validation instead of doing that at the pyarrow conversion layer.
* enforce simd alignment * fmt * fmt * address comments * address comments * add SIMD check on vector creation * fix lq * cargo fmt * fix simd alignment * fix tests
If you add timezone information in the Field annotation for a datetime then that will now be passed to the pyarrow data type.
I'm not sure how pyarrow enforces timezones, right now, it silently coerces to the timezone given in the column regardless of whether the input had the matching timezone or not. This is probably not the right behavior. Though we could just make it so the user has to make the pydantic model do the validation instead of doing that at the pyarrow conversion layer.