Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: Series and Index shouldn't do inference on pandas objects #56012

Closed
phofl opened this issue Nov 16, 2023 · 2 comments · Fixed by #56244
Closed

DEPR: Series and Index shouldn't do inference on pandas objects #56012

phofl opened this issue Nov 16, 2023 · 2 comments · Fixed by #56244
Labels
Deprecate Functionality to remove in pandas Needs Discussion Requires discussion from core team before further action

Comments

@phofl
Copy link
Member

phofl commented Nov 16, 2023

The following changes the dtype, which is unexpected:

idx = pd.Index(pd.date_range("2015-01-01 10:00", freq="D", periods=3), dtype=object)
pd.Index(idx).dtype

this returns datetime64[ns]

Doing inference on a pandas object is pretty weird, this makes stuff very complicated with the new string option. Set_Index can change the dtype, e.g.

df = pd.DataFrame({"a": pd.Series(pd.date_range("2015-01-01 10:00", freq="D", periods=3), dtype=object), "b": 1})
df.dtypes

a    object
b     int64
dtype: object


df.set_index("a").index.dtype
datetime64[ns]

Users can just call infer_objects if they want to do inference. We should keep the inference logic for non-pandas objects

cc @jbrockmendel

@phofl phofl added Deprecate Functionality to remove in pandas Needs Discussion Requires discussion from core team before further action labels Nov 16, 2023
@jbrockmendel
Copy link
Member

I'm ambivalent here. I agree with phofl that idempotency is an attractive property for the constructors. The flip side is the code/api complexity involved with treating Index[object] differently from ndarray[object].

The only strong opinion I have is that pd.Index, pd.Series, pd.array, pd.DataFrame should all have analogous behavior (not withstanding intentional differences in pd.array returning nullable that The Ice Cream Agreement will eventually smooth out)

To the extent that set_index is a pain point, can we just pass dtype=arr.dtype to the Index constructor somewhere along the way?

@phofl
Copy link
Member Author

phofl commented Nov 16, 2023

My concern is that I am guessing that there are numerous places where we would want to keep the dtypes instead of casting and we would have to take care of passing the correct dtype everywhere (annoying)

I agree that all of them should be consistent, note that DataFrame keeps the dtype if you do:

df = pd.DataFrame({"a": pd.Series(pd.date_range("2015-01-01 10:00", freq="D", periods=3), dtype=object), "b": 1})

while we infer on the numpy array. That's the behavior I would prefer

Edit: setitem has a similar inconsistency but between series and Index

This casts:

df = pd.DataFrame({"a": [1, 2, 3]})
idx = pd.Index(pd.date_range("2015-01-01 10:00", freq="D", periods=3), dtype=object)
df["b"] = idx
df.dtypes

This doesn't:

df = pd.DataFrame({"a": [1, 2, 3]})
idx = pd.Series(pd.date_range("2015-01-01 10:00", freq="D", periods=3), dtype=object)
df["b"] = idx
df.dtypes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants