-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Use more useful fallback types for 'categorical' and 'decimal' Series' #2792
Conversation
| @@ -157,7 +157,7 @@ def _infer_pandas_series_contents(s: pd.Series) -> dt.DataType: | |||
| """ | |||
| if s.dtype == np.object_: | |||
| inferred_dtype = infer_pandas_dtype(s, skipna=True) | |||
| if inferred_dtype in {'mixed', 'decimal'}: | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'decimal' being a special case here was a carry-over from old code. Should be OK to remove the special case, which would make it just use the type mapping above (L82, _inferable_pandas_dtypes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| 'complex': dt.binary, | ||
| 'categorical': dt.binary, | ||
| 'categorical': dt.category, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there paths for this at all in the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
can you also add a release note |
ibis/backends/pandas/client.py
Outdated
| @@ -98,7 +98,7 @@ | |||
| 'time': dt.time, | |||
| 'period': dt.binary, | |||
| 'mixed': dt.binary, | |||
| 'empty': dt.binary, | |||
| 'empty': dt.string, | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm i don't think this should change; this ultimately resolves to binary type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timothydijamco can you rebase and fix the release note, ping on green.
docs/source/release/index.rst
Outdated
| @@ -12,6 +12,8 @@ Release Notes | |||
| These release notes are for versions of ibis **1.0 and later**. Release | |||
| notes for pre-1.0 versions of ibis can be found at :doc:`release-pre-1.0` | |||
|
|
|||
| * :feature:`2792` Infer categorical and decimal Series to more specific Ibis types in Pandas backend | |||
| * :feature:`2776` Allow more flexible return type for UDFs | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm i think this note got duped
|
thanks @timothydijamco |
Background
#2753 made some changes to type inference in the Pandas backend (i.e. the way that we infer column types in input data that is in the form of Pandas DataFrames). These changes made use of
dt.binaryas a general fallback type.Specifically, the final type inference approach that we attempt is to call
pd.api.types.infer_dtypeon the Series, then match the result of that function to some Ibis dtype (ibis/backends/pandas/client.py#L159). For non-obvious mappings, we usedt.binary(ibis/backends/pandas/client.py#L82-L103).This PR
This PR changes some of those mappings to not map to
dt.binary, but instead to another Ibis dtype that would be more useful:'decimal'now maps todt.float64. Ibis float methods tend to work withDecimals'categorical'now maps todt.category'empty'now maps todt.string. Assumes that emptyobject-dtype Series' are meant to be string columnsThese inferred types are all still just guesses, but the goal is to hopefully be making guesses that are more useful.