Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(polars): add more accurate type mapping for timestamps #8954

Merged
merged 1 commit into from
Jul 16, 2024

Conversation

ncclementi
Copy link
Contributor

@ncclementi ncclementi commented Apr 12, 2024

polars datetime types have a fixed time unit (us, ns, or ms). Ibis timestamp types do too (but we also have seocnds as an option). In the case of polars we're currently mapping all timestamps to ns, this PR makes things a bit more specific and uses proper units.

I'm still not sure how to handle the case Ibis seconds to maybe ms in polars ? Need a bit help on how the conversion of units happens, more specifically where is the value part handled? we would need to do ibis_seconds*1000 = polars_ms at the moment I'm raising, similarly to what we do in durations, but I need to tackle this differently, see all the current failures in CI.

@@ -518,6 +518,11 @@ def test_timestamp_unit():
assert dt.Timestamp(scale=scale).unit == TimestampUnit.NANOSECOND


def test_timestamp_unit_raise():
Copy link
Contributor Author

@ncclementi ncclementi Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this because it seemed like this

else:
raise ValueError(f"Invalid scale {self.scale}")
wasn't covered.

But turns out this test is causing a raise in a different place, which is confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Timestamp op has a restriction that only the literal ints from 0 through 10 can be passed as scale -- so while it isn't technically unit-tested, it's not possible to set a value outside of that range without raising a SignatureValidationError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, but then why having a code path to "raise" that would never been triggered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would wager that the raise predates the enforcement of values at the op level

],
)
def test_from_ibis_type_seconds(ibis_dtype, polars_type):
# we accept seconds as an ibis type, default to ns in polars
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcrist in the original issue, you suggested maybe defaulting to "ms" for when we go from "s" in Ibis to polars. I though it made sense keeping the behavior we had before for that particular case which is "ns". Any objections?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still seems a bit odd to me that asking for seconds I would get back milliseconds. I say we raise in the seconds case as an UnsupportedOperationError the same any other missing functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break the current behavior where we ask for seconds and return ns. Just making sure we are ok with breaking that.

@ncclementi
Copy link
Contributor Author

After looking more into this PR, I'm not sure if we are correctly casting the values when mapping the timestamps, base on this code:

if dtype.is_integer():
typ = pl.Datetime(time_unit="us", time_zone=time_zone)
arg = (arg * 1_000_000).cast(typ)
if time_unit != "us":
arg = arg.dt.truncate(f"1{time_unit}")
return arg.alias(op.name)
elif dtype.is_string():
typ = pl.Datetime(time_unit=time_unit, time_zone=time_zone)
arg = arg.str.strptime(typ)
if time_unit == "s":
return arg.dt.truncate("1s")
return arg
typ = PolarsType.from_ibis(to)
return arg.cast(typ, strict=strict)

That being said the casting "potential issue" is separate from the mapping of the types, so I proposed opening a separate issue to investigate that, and for now get this PR in.

@ncclementi
Copy link
Contributor Author

I did a little more digging and on main (as well as this branch). The casting is broken, see #9091

I'm not sure what are the next steps here, regarding this PR.

@cpcloud cpcloud added this to the 9.2 milestone Jun 13, 2024
@cpcloud
Copy link
Member

cpcloud commented Jul 16, 2024

@ncclementi Anything blocking this PR? I know it's been awhile, but does it make sense to merge this in for 9.2?

@cpcloud cpcloud added feature Features or general enhancements timestamps Issues related to the timestamp API polars The polars backend labels Jul 16, 2024
@ncclementi
Copy link
Contributor Author

Technically no, as the typing gets fixed in this PR, although the actual casting is kind of broken see #9091.

Let me rebase, and kick CI again. I'd be ok merging this and we can look into the casting of the values in a followup.

Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@cpcloud cpcloud merged commit 3eafac4 into ibis-project:main Jul 16, 2024
82 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements polars The polars backend timestamps Issues related to the timestamp API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

refactor(polars): more accurate type mapping for timestamps
3 participants