Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: Index.astype(<numpy string dtype>) started failing #50127

Open
jorisvandenbossche opened this issue Dec 8, 2022 · 10 comments
Open

REGR: Index.astype(<numpy string dtype>) started failing #50127

jorisvandenbossche opened this issue Dec 8, 2022 · 10 comments
Assignees
Labels
Astype Index Related to the Index class or subclasses Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@jorisvandenbossche
Copy link
Member

On pandas 1.5:

In [2]: pd.Index(['a', 'b']).astype("S3")
Out[2]: Index([b'a', b'b'], dtype='object')

On the main branch:

In [2]: pd.Index(['a', 'b']).astype("S3")
...
File ~/scipy/pandas/pandas/core/indexes/base.py:589, in Index._dtype_to_subclass(cls, dtype)
    584 elif issubclass(
    585     dtype.type, (str, bool, np.bool_, complex, np.complex64, np.complex128)
    586 ):
    587     return Index
--> 589 raise NotImplementedError(dtype)

NotImplementedError: |S3

This started to fail a while ago on pyarrow's CI (https://issues.apache.org/jira/browse/ARROW-18394). This comes up if you roundtrip a pandas DataFrame with bytes column names to arrow and back to pandas.

Didn't yet investigate further what might be the change that caused this / whether this was intentional, etc.

@jorisvandenbossche jorisvandenbossche added Regression Functionality that used to work in a prior pandas version Astype labels Dec 8, 2022
@phofl phofl added this to the 2.0 milestone Dec 8, 2022
@phofl
Copy link
Member

phofl commented Dec 8, 2022

#49393

@MarcoGorelli
Copy link
Member

@phofl
Copy link
Member

phofl commented Dec 8, 2022

Weird, sorry for the noise. You are correct

@MarcoGorelli
Copy link
Member

No worries - @jbrockmendel

@jbrockmendel
Copy link
Member

yah i think the np-str-dtype check needs to be added in Index.__new__ after sanitize_array

jorisvandenbossche added a commit to jorisvandenbossche/arrow that referenced this issue Dec 20, 2022
@mroeschke mroeschke modified the milestones: 2.0, 3.0 Feb 8, 2023
@jorisvandenbossche
Copy link
Member Author

@mroeschke this is a regression, and if we think it's a valid one, not something to bump to 3.0?

@jorisvandenbossche jorisvandenbossche modified the milestones: 3.0, 2.0 Feb 17, 2023
@mroeschke
Copy link
Member

Ah okay fine to still mark at the 2.0 milestone

@Daquisu
Copy link
Contributor

Daquisu commented Feb 22, 2023

take

@Daquisu
Copy link
Contributor

Daquisu commented Feb 22, 2023

For what it's worth, pd.Index(["abcd", "1234"], dtype="S3") is also failing.

@simonjayhawkins simonjayhawkins added the Index Related to the Index class or subclasses label Feb 22, 2023
@MarcoGorelli MarcoGorelli modified the milestones: 2.0, 2.1 Mar 27, 2023
@MarcoGorelli
Copy link
Member

MarcoGorelli commented Mar 27, 2023

removing from the 2.0 milestone as this is a regression from 1.5 and shouldn't block 2.0

EDIT: sorry, this one worked in 1.5 - is it a blocker?

jorisvandenbossche added a commit to apache/arrow that referenced this issue Apr 4, 2023
### What changes are included in this PR?

- The issue with numpy 1.25 in the assert equal helper was fixed in pandas 1.5.3 -> removing the skip (in theory can still run into this error when using an older pandas version with the latest numpy, but that's not something you should do)
- Casting tz-aware strings to datetime64[ns] was not fixed in pandas (pandas-dev/pandas#50140) -> updating our implementation to work around it
- Casting to numpy string dtype (pandas-dev/pandas#50127) is not yet fixed -> updating the skip

### Are there any user-facing changes?

No
* Closes: #15070

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@datapythonista datapythonista modified the milestones: 2.0.1, 2.0.2 Apr 23, 2023
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this issue May 15, 2023
)

### What changes are included in this PR?

- The issue with numpy 1.25 in the assert equal helper was fixed in pandas 1.5.3 -> removing the skip (in theory can still run into this error when using an older pandas version with the latest numpy, but that's not something you should do)
- Casting tz-aware strings to datetime64[ns] was not fixed in pandas (pandas-dev/pandas#50140) -> updating our implementation to work around it
- Casting to numpy string dtype (pandas-dev/pandas#50127) is not yet fixed -> updating the skip

### Are there any user-facing changes?

No
* Closes: apache#15070

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
rtpsw pushed a commit to rtpsw/arrow that referenced this issue May 16, 2023
)

### What changes are included in this PR?

- The issue with numpy 1.25 in the assert equal helper was fixed in pandas 1.5.3 -> removing the skip (in theory can still run into this error when using an older pandas version with the latest numpy, but that's not something you should do)
- Casting tz-aware strings to datetime64[ns] was not fixed in pandas (pandas-dev/pandas#50140) -> updating our implementation to work around it
- Casting to numpy string dtype (pandas-dev/pandas#50127) is not yet fixed -> updating the skip

### Are there any user-facing changes?

No
* Closes: apache#15070

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@datapythonista datapythonista modified the milestones: 2.0.2, 2.0.3 May 26, 2023
@lithomas1 lithomas1 modified the milestones: 2.0.3, 2.0.4 Jun 27, 2023
@lithomas1 lithomas1 modified the milestones: 2.0.4, 2.1.1 Aug 30, 2023
@lithomas1 lithomas1 modified the milestones: 2.1.1, 2.1.2 Sep 21, 2023
@lithomas1 lithomas1 modified the milestones: 2.1.2, 2.1.3 Oct 26, 2023
@jorisvandenbossche jorisvandenbossche modified the milestones: 2.1.3, 2.1.4 Nov 13, 2023
@lithomas1 lithomas1 modified the milestones: 2.1.4, 2.2 Dec 8, 2023
@lithomas1 lithomas1 modified the milestones: 2.2, 2.2.1 Jan 20, 2024
@lithomas1 lithomas1 modified the milestones: 2.2.1, 2.2.2 Feb 23, 2024
@lithomas1 lithomas1 modified the milestones: 2.2.2, 2.2.3 Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Astype Index Related to the Index class or subclasses Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants