Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: ExtensionArray formatting of datetime-like forces nanosecond precision #33319

Open
xhochy opened this issue Apr 6, 2020 · 2 comments
Open
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays. Output-Formatting __repr__ of pandas objects, to_string

Comments

@xhochy
Copy link
Contributor

xhochy commented Apr 6, 2020

When passing in an ExtensionArray column into the formatting code, we always do a cast with numpy.asarray(…) and then re-enter the if-clauses in format_array. This enables us to re-use some of the formatting code that already is there for the existing columns but also is problematic for columns that are slightly different than the existing ones but can coerce to them.

In my current example I have a FletcherContinuousDtype(timestamp[us]) with values that are not representable in the nanosecond range (e.g. year 3000 or year 9999, typical dates used in traditional DB setups for the very far future). It is passed into ExtensionArrayFormatter then transformed into an np.ndarray[datetime64[us]] and then passed again into format_array where it steps into the Datetime64Formatter. There a forceful cast to nanoseconds is done through

if not isinstance(values, DatetimeIndex):
values = DatetimeIndex(values)

We could either fix this by:

  • The specific approach: Somehow allow non-nanoseconds timestamps in the Datetime64Formatter
  • The general approach: For an ExtensionArray the formatting should be fmt_values = [values._formatter(x) for x in self.values]
@xhochy xhochy added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 6, 2020
@jorisvandenbossche jorisvandenbossche added ExtensionArray Extending pandas with custom dtypes or arrays. and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 6, 2020
@simonjayhawkins simonjayhawkins added the Output-Formatting __repr__ of pandas objects, to_string label Apr 6, 2020
@jorisvandenbossche
Copy link
Member

I think the second approach is the best way (or a variant of it).
First, we should not need to do np.asarray(EA) for formatting (because we have no guarantee it does not alter the scalar values you get from that numpy array compard to the scalar values from iterating through EA directly). And then not creating a numpy array, will also avoid re-using one of the other built-in formatters that might not be suitable.

@jbrockmendel
Copy link
Member

@xhochy we've added non-nano support since the OP. Any chance that fixed the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays. Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

No branches or pull requests

4 participants