-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pl.Datetime
time_zone
parameter has no type or value check
#16038
Comments
pl.Datetime
accepts invalid time zone stringspl.Datetime
time_zone
parameter has no type or value check
thanks for reporting - the validation currently happens later: In [9]: pl.Series([datetime(2020, 1, 1)], dtype=pl.Datetime(time_zone='cabbage'))
---------------------------------------------------------------------------
ComputeError Traceback (most recent call last)
Cell In[9], line 1
----> 1 pl.Series([datetime(2020, 1, 1)], dtype=pl.Datetime(time_zone='cabbage'))
File ~/scratch/.venv/lib/python3.11/site-packages/polars/series/series.py:312, in Series.__init__(self, name, values, dtype, strict, nan_to_null, dtype_if_empty)
309 raise TypeError(msg)
311 if isinstance(values, Sequence):
--> 312 self._s = sequence_to_pyseries(
313 name,
314 values,
315 dtype=dtype,
316 strict=strict,
317 nan_to_null=nan_to_null,
318 )
320 elif values is None:
321 self._s = sequence_to_pyseries(name, [], dtype=dtype)
File ~/scratch/.venv/lib/python3.11/site-packages/polars/_utils/construction/series.py:235, in sequence_to_pyseries(name, values, dtype, strict, nan_to_null)
225 if values_tz != "UTC" and dtype_tz is None:
226 warnings.warn(
227 "Constructing a Series with time-zone-aware "
228 "datetimes results in a Series with UTC time zone. "
(...)
233 stacklevel=find_stacklevel(),
234 )
--> 235 return s.dt.replace_time_zone(dtype_tz or "UTC")._s
236 return s._s
238 elif (
239 _check_for_numpy(value)
240 and isinstance(value, np.ndarray)
241 and len(value.shape) == 1
242 ):
File ~/scratch/.venv/lib/python3.11/site-packages/polars/series/utils.py:107, in call_expr.<locals>.wrapper(self, *args, **kwargs)
105 expr = getattr(expr, namespace)
106 f = getattr(expr, func.__name__)
--> 107 return s.to_frame().select_seq(f(*args, **kwargs)).to_series()
File ~/scratch/.venv/lib/python3.11/site-packages/polars/dataframe/frame.py:7906, in DataFrame.select_seq(self, *exprs, **named_exprs)
7883 def select_seq(
7884 self, *exprs: IntoExpr | Iterable[IntoExpr], **named_exprs: IntoExpr
7885 ) -> DataFrame:
7886 """
7887 Select columns from this DataFrame.
7888
(...)
7904 select
7905 """
-> 7906 return self.lazy().select_seq(*exprs, **named_exprs).collect(_eager=True)
File ~/scratch/.venv/lib/python3.11/site-packages/polars/lazyframe/frame.py:1810, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, no_optimization, streaming, background, _eager)
1807 if background:
1808 return InProcessQuery(ldf.collect_concurrently())
-> 1810 return wrap_df(ldf.collect())
ComputeError: unable to parse time zone: 'cabbage'. Please check the Time Zone Database for a list of available time zones |
Hi @MarcoGorelli, should we add some checks here polars/py-polars/polars/datatypes/classes.py Lines 461 to 463 in 2970c57
Similar to what's done here polars/py-polars/polars/_utils/convert.py Lines 163 to 194 in 2970c57
Or should we leave the error as it is? Do you have any thoughts on this? If possible, I'd like to improve it. |
Using Pyright/MyPy, you'll get a static type-checking warning if you try and pass in |
Related to this: it's a bit concerning that the invalid time zone can end up in Rust without it being validated: import polars as pl
dtype = pl.Datetime(time_zone="invalid_time_zone")
s = pl.Series(dtype=dtype)
print(s)
|
true, validation only kicks in if there's an actual date in the column. maybe it can happen earlier |
hey for testing time zones in polars, can we add
could be more performant than def tz_from_df(df: pl.DataFrame, date_col_name: str = "date") -> str:
tz = df.schema[date_col_name].time_zone
return tz |
Checks
Issue Description
The time zone parameter in
pl.Datetime
can be an invalid string, and can even be any type of Python object. We should probably ensure that the time zone is a valid time zone when constructing apl.Datetime
datatype.Installed versions
The text was updated successfully, but these errors were encountered: