Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API/BUG: Handling Dtype Coercions in Series/Index #15832

Closed
gfyoung opened this issue Mar 29, 2017 · 5 comments
Closed

API/BUG: Handling Dtype Coercions in Series/Index #15832

gfyoung opened this issue Mar 29, 2017 · 5 comments
Labels
Dtype Conversions Unexpected or buggy dtype conversions Error Reporting Incorrect or improved errors from pandas
Milestone

Comments

@gfyoung
Copy link
Member

gfyoung commented Mar 29, 2017

Off master (bd169d):

# Case I: Overflow on int64
>>> Index([np.iinfo(np.uint64).max-1],dtype='int64')
...
OverflowError: Python int too large to convert to C long

# Case II: Coercion to uint64
>>> Index([-1], dtype='uint64')
UInt64Index([18446744073709551615], dtype='uint64')

# Case III: Ignoring coercion to int
>>> Index([1, 2, 3.5], dtype=int)
Float64Index([1.0, 2.0, 3.5], dtype='float64')

So we got some coercions that fail but others that work. Although all of these issues involve Index, the first two are also applicable to Series (in the last issue, it does coerce all elements to integer).

How should we handle failed coercions? Is the second case even a failure? Should iron out this.

xref #12758
xref #15187

@jreback jreback added Difficulty Advanced Dtype Conversions Unexpected or buggy dtype conversions Error Reporting Incorrect or improved errors from pandas labels Mar 29, 2017
@jreback jreback added this to the Next Major Release milestone Mar 29, 2017
@jreback
Copy link
Contributor

jreback commented Mar 29, 2017

I think our rules are pretty well established (FYI, we should actually document these).

If dtype=None (on Series/Index). then we find a best-fit dtype.
If dtype=..... Then we coerce to that dtype if its safe, otherwise should raise (a useful message)

Case I: Overflow on int64
Index([np.iinfo(np.uint64).max-1],dtype='int64')
...
OverflowError: Python int too large to convert to C long

I think this is ok (let's test it). Maybe a more user friendly message though.

Case II: Coercion to uint64
Index([-1], dtype='uint64')
UInt64Index([18446744073709551615], dtype='uint64')

This should raise

Case III: Ignoring coercion to int
Index([1, 2, 3.5], dtype=int)
Float64Index([1.0, 2.0, 3.5], dtype='float64')

This should raise

@ucals
Copy link

ucals commented Mar 30, 2017

Now that we were able to close 1 novice and 2 intermediate bugs, I'd like to try an advanced bug - this one :)

@jreback
Copy link
Contributor

jreback commented Mar 30, 2017

sure!

gfyoung added a commit to forking-repos/pandas that referenced this issue Jun 8, 2018
Display a more friendly error message when there
is an OverflowError during Index construction.

Partially addresses pandas-devgh-15832.
gfyoung added a commit to forking-repos/pandas that referenced this issue Jun 8, 2018
Display a more friendly error message when there
is an OverflowError during Index construction.

Partially addresses pandas-devgh-15832.
gfyoung added a commit to forking-repos/pandas that referenced this issue Jun 11, 2018
Display a more friendly error message when there
is an OverflowError during Index construction.

Partially addresses pandas-devgh-15832.
jreback pushed a commit that referenced this issue Jun 12, 2018
* MAINT: More useful error msg on Index overflow

Display a more friendly error message when there
is an OverflowError during Index construction.

Partially addresses gh-15832.

* DOC: Clarify how Index.__new__ handles dtype

Partially addresses gh-15823.
gfyoung added a commit to forking-repos/pandas that referenced this issue Jun 13, 2018
Related to the Index and Series constructors.

Closes pandas-devgh-15832.
gfyoung added a commit to forking-repos/pandas that referenced this issue Jun 13, 2018
Related to the Index and Series constructors.

Closes pandas-devgh-15832.
gfyoung added a commit to forking-repos/pandas that referenced this issue Jun 13, 2018
Related to the Index and Series constructors.

Closes pandas-devgh-15832.
@jreback jreback modified the milestones: Next Major Release, 0.24.0 Jun 13, 2018
gfyoung added a commit to forking-repos/pandas that referenced this issue Jun 13, 2018
Related to the Index and Series constructors.

Closes pandas-devgh-15832.
gfyoung added a commit to forking-repos/pandas that referenced this issue Jun 15, 2018
* Related to the Index and Series constructors.

Closes pandas-devgh-15832.

* Add integer dtype fixtures to conftest.py

Can used for subsequent refactoring.
gfyoung added a commit to forking-repos/pandas that referenced this issue Jun 15, 2018
* Related to the Index and Series constructors.

Closes pandas-devgh-15832.

* Add integer dtype fixtures to conftest.py

Can used for subsequent refactoring.
gfyoung added a commit to forking-repos/pandas that referenced this issue Jun 15, 2018
* Related to the Index and Series constructors.

Closes pandas-devgh-15832.

* Add integer dtype fixtures to conftest.py

Can used for subsequent refactoring.
david-liu-brattle-1 pushed a commit to david-liu-brattle-1/pandas that referenced this issue Jun 18, 2018
* MAINT: More useful error msg on Index overflow

Display a more friendly error message when there
is an OverflowError during Index construction.

Partially addresses pandas-devgh-15832.

* DOC: Clarify how Index.__new__ handles dtype

Partially addresses pandas-devgh-15823.
gfyoung added a commit to forking-repos/pandas that referenced this issue Jun 18, 2018
* Related to the Index and Series constructors.

Closes pandas-devgh-15832.

* Add integer dtype fixtures to conftest.py

Can used for subsequent refactoring.
gfyoung added a commit to forking-repos/pandas that referenced this issue Jun 19, 2018
* Related to the Index and Series constructors.

Closes pandas-devgh-15832.

* Add integer dtype fixtures to conftest.py

Can used for subsequent refactoring.
gfyoung added a commit to forking-repos/pandas that referenced this issue Jun 19, 2018
* Related to the Index and Series constructors.

Closes pandas-devgh-15832.

* Add integer dtype fixtures to conftest.py

Can used for subsequent refactoring.
jorisvandenbossche pushed a commit that referenced this issue Jun 29, 2018
* MAINT: More useful error msg on Index overflow

Display a more friendly error message when there
is an OverflowError during Index construction.

Partially addresses gh-15832.

* DOC: Clarify how Index.__new__ handles dtype

Partially addresses gh-15823.

(cherry picked from commit defdb34)
jorisvandenbossche pushed a commit that referenced this issue Jul 2, 2018
* MAINT: More useful error msg on Index overflow

Display a more friendly error message when there
is an OverflowError during Index construction.

Partially addresses gh-15832.

* DOC: Clarify how Index.__new__ handles dtype

Partially addresses gh-15823.

(cherry picked from commit defdb34)
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this issue Oct 1, 2018
* MAINT: More useful error msg on Index overflow

Display a more friendly error message when there
is an OverflowError during Index construction.

Partially addresses pandas-devgh-15832.

* DOC: Clarify how Index.__new__ handles dtype

Partially addresses pandas-devgh-15823.
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this issue Oct 1, 2018
@moink
Copy link
Member

moink commented Dec 25, 2020

I know this is a super old issue but I ran into it when trying to address #30999 on pandas/tests/indexes/interval/test_as_type.py

There's a test, test_subtype_integer_errors, marked as failing, with three pytest.raises, listing this issue as the reason for the xfail. Since this issue is resolved, I think it's not a good reason for an xfail, right?

If we want to remove the xfail: One of the three pytest.raises can be modified, to a TypeError from a ValueError, and that part the test passes, But the other two don't raise errors, and if I understand this discussion correctly, they should not raise errors, even if the results of the type coercion don't make a huge amount of sense.

The code I am thinking to remove is:

        index = interval_range(0.0, 10.0, freq=0.25)
        dtype = IntervalDtype("int64")
        with pytest.raises(ValueError):
            index.astype(dtype)

        dtype = IntervalDtype("uint64")
        with pytest.raises(ValueError):
            index.astype(dtype)

The result of the first coercion is: IntervalIndex([(0, 0], (0, 0], (0, 0], (0, 1], (1, 1] ... (8, 9], (9, 9], (9, 9], (9, 9], (9, 10]], closed='right', dtype='interval[int64]') which doesn't look particularly useful but isn't exactly wrong either.

In summary: My current thought is to keep the first pytest.raises and remove the other two because they aren't raising anything, and I think from this discussion that not raising anything is the desired behaviour. But please correct me if I am wrong.

@jreback
Copy link
Contributor

jreback commented Dec 25, 2020

sure removing xfail is good
@jbrockmendel has done a lot of work on these conversions and they r likely much more sane

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants