Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Use more useful fallback types for 'categorical' and 'decimal' Series' #2792

Merged
merged 7 commits into from
Jun 2, 2021

Conversation

timothydijamco
Copy link
Contributor

@timothydijamco timothydijamco commented May 24, 2021

Background

#2753 made some changes to type inference in the Pandas backend (i.e. the way that we infer column types in input data that is in the form of Pandas DataFrames). These changes made use of dt.binary as a general fallback type.

Specifically, the final type inference approach that we attempt is to call pd.api.types.infer_dtype on the Series, then match the result of that function to some Ibis dtype (ibis/backends/pandas/client.py#L159). For non-obvious mappings, we use dt.binary (ibis/backends/pandas/client.py#L82-L103).

This PR

This PR changes some of those mappings to not map to dt.binary, but instead to another Ibis dtype that would be more useful:

  • 'decimal' now maps to dt.float64. Ibis float methods tend to work with Decimals
  • 'categorical' now maps to dt.category
  • 'empty' now maps to dt.string. Assumes that empty object-dtype Series' are meant to be string columns

These inferred types are all still just guesses, but the goal is to hopefully be making guesses that are more useful.

@@ -157,7 +157,7 @@ def _infer_pandas_series_contents(s: pd.Series) -> dt.DataType:
"""
if s.dtype == np.object_:
inferred_dtype = infer_pandas_dtype(s, skipna=True)
if inferred_dtype in {'mixed', 'decimal'}:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'decimal' being a special case here was a carry-over from old code. Should be OK to remove the special case, which would make it just use the type mapping above (L82, _inferable_pandas_dtypes)

Copy link
Contributor

@icexelloss icexelloss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

'complex': dt.binary,
'categorical': dt.binary,
'categorical': dt.category,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there paths for this at all in the tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback jreback added pandas The pandas backend expressions Issues or PRs related to the expression API labels May 24, 2021
@jreback
Copy link
Contributor

jreback commented May 24, 2021

can you also add a release note

@jreback jreback added this to the Next release milestone May 24, 2021
@@ -98,7 +98,7 @@
'time': dt.time,
'period': dt.binary,
'mixed': dt.binary,
'empty': dt.binary,
'empty': dt.string,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm i don't think this should change; this ultimately resolves to binary type.

@timothydijamco timothydijamco changed the title ENH: Use more useful fallback types for 'empty', 'categorical', and 'decimal' Series' ENH: Use more useful fallback types for 'categorical' and 'decimal' Series' May 25, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timothydijamco can you rebase and fix the release note, ping on green.

@@ -12,6 +12,8 @@ Release Notes
These release notes are for versions of ibis **1.0 and later**. Release
notes for pre-1.0 versions of ibis can be found at :doc:`release-pre-1.0`

* :feature:`2792` Infer categorical and decimal Series to more specific Ibis types in Pandas backend
* :feature:`2776` Allow more flexible return type for UDFs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm i think this note got duped

@jreback jreback merged commit 3de8013 into ibis-project:master Jun 2, 2021
@jreback
Copy link
Contributor

jreback commented Jun 2, 2021

thanks @timothydijamco

@cpcloud cpcloud removed this from the Next release milestone Jan 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
expressions Issues or PRs related to the expression API pandas The pandas backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants