Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set allowed for index argument in Series, index and columns for DataFrame.from_records() #55425

Open
Dr-Irv opened this issue Oct 6, 2023 · 5 comments
Labels
API - Consistency Internal Consistency of API/Behavior DataFrame DataFrame data structure Series Series data structure

Comments

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Oct 6, 2023

In #47215 I brought up the issue of allowing sets as arguments for the DataFrame constructor. This was addressed in #47231 . But there are a few more cases that should be fixed:

>>> pd.Series([3,4,5], index=set(["a", "b", "c"]))
c    3
a    4
b    5
dtype: int64
>>> pd.DataFrame.from_records([[1,2,3]], columns=set(["a", "b", "c"]), index=set(["x", "y", "z"]))
   c  a  b
x  1  2  3
y  1  2  3
z  1  2  3

We shouldn't allow a set as an argument for index or columns in both these cases.

@lithomas1 lithomas1 added Series Series data structure API - Consistency Internal Consistency of API/Behavior DataFrame DataFrame data structure labels Oct 8, 2023
@lithomas1
Copy link
Member

Do we need a deprecation for this, or is it OK to do as a breaking change like the previous PR?

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Oct 9, 2023

Do we need a deprecation for this, or is it OK to do as a breaking change like the previous PR?

I think a breaking change is fine.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Oct 9, 2023

Also do we want to disallow views on dicts?

>>> data =[1, 2, 3]
>>> data
[1, 2, 3]
>>> d = {'a'+str(value): value for value in data}
>>> d
{'a1': 1, 'a2': 2, 'a3': 3}
>>> pd.Series(d.values(), index=d.keys())
a1    1
a2    2
a3    3
dtype: int64

The above works, but strictly speaking, d.keys() and d.values() are not array-like, so maybe we should also test if instances of MappingView are passed and reject them?

If we agree those shouldn't be allowed, that might require a deprecation cycle.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Nov 6, 2023

Also, for Series(), the docs say that we accept an Iterable, but we don't accept all Iterable as values, e.g. sets. So we should adjust the docs as well.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Feb 27, 2024

Here's another one. I think that sets shouldn't be allowed as the argument when constructing an Index:

>>> pd.Index(set([1,2]))
Index([1, 2], dtype='int64')
>>> pd.Index(set([2,1]))
Index([1, 2], dtype='int64')

The order of the Index is ambiguous when you pass a set argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior DataFrame DataFrame data structure Series Series data structure
Projects
None yet
Development

No branches or pull requests

2 participants