Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Depr] raise_on_error kwarg with errors kwarg in astype#14878 #14967

Conversation

m-charlton
Copy link
Contributor

@m-charlton m-charlton commented Dec 22, 2016

Please check that the entry in whatsnew/v0.20.0.txt. Unsure that the update was in the
_whatsnew_0200.deprecations or _whatsnew_0200.prior_deprecations so put it in the
former.

@@ -3073,7 +3075,9 @@ def astype(self, dtype, copy=True, raise_on_error=True, **kwargs):
the same type. Alternatively, use {col: dtype, ...}, where col is a
column label and dtype is a numpy.dtype or Python type to cast one
or more of the DataFrame's columns to column-specific types.
raise_on_error : raise on invalid input
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave this in here and just say DEPRECATED.

@jorisvandenbossche IIRC that is our convention?

add a versionadded tag 0.20.0 for errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, can leave it (but put it at the end)

raise_on_error : raise on invalid input
errors : {'raise', 'ignore'}, default 'raise'
- ``raise`` : allow exceptions to be raised on invalid input
- ``ignore`` : suppress raising exceptions on invalid input
kwargs : keyword arguments to pass on to the constructor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need a blank line before kwargs (to make the sub-list work)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just checked and the sublist is rendered fine with, or without a line between ignore
and kwargs. I can add an extra line if that is the convention,


def _astype(self, dtype, copy=False, raise_on_error=True, values=None,
def _astype(self, dtype, copy=False, errors='raise', values=None,
klass=None, mgr=None, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check the errors in ['raise', 'ignore'] at the beginning of the function and raise a ValueError otherwise (and add a test for this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, there are two 'public' astype(...) methods:

  • NDFrame.astype(...) in pandas/core/generic.py
  • Block.astype(...) in pandas/core/internals.py

In addition there is a 'protected' Block._astype(...) method in
pandas/core/internals.py. Should I only put the checks in the 'public'
methods?

Bearing in mind that the raise_on_error kwarg is going to be deprecated for
DataFrame.where() and replaced with the errors kwarg it would make sense to
put the code that checks the validity of the arguments in one place. Do we have
any existing code where we put such validity checking functions/methods?

I notice that both NDFrame & Block inherit from PandasObject but, I'm not
sure that this is the correct thing to do. Should I put a function in
pandas/core/base.py?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Block is completely internal
u can put the check there

we have centralized checks but no need in this case

out in _astype as that's where it's actually used
parameter validation is best down where it's actually

@jreback jreback added the Deprecate Functionality to remove in pandas label Dec 23, 2016
@jreback jreback added this to the 0.20.0 milestone Dec 23, 2016
@codecov-io
Copy link

codecov-io commented Dec 23, 2016

Current coverage is 84.75% (diff: 100%)

Merging #14967 into master will increase coverage by 0.10%

@@             master     #14967   diff @@
==========================================
  Files           144        145     +1   
  Lines         51030      51144   +114   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43198      43348   +150   
+ Misses         7832       7796    -36   
  Partials          0          0          

Powered by Codecov. Last update 8b497e4...0ae8550

@m-charlton
Copy link
Contributor Author

Update after review

instead
errors : {'raise', 'ignore'}, default 'raise'
- ``raise`` : allow exceptions to be raised on invalid input
- ``ignore`` : suppress raising exceptions on invalid input
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe specify what happens of there is an error? (original is returned)


if errors not in errors_legal_values:
invalid_arg = "Expected value of kwarg 'errors' to be one of %s. "\
"Supplied value is '%s'" % (', '.join("'%s'" % arg for arg in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style comment: can you do the string continuation with putting ( ) around it instead of the \. And can you use ´.format(..)´ instead of % (can be on a separate line)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. I'll change the output message to:

Expected value of kwarg 'errors' to be one of ['raise', 'ignore']. Supplied value 'True'

This will do away with the need for the messy formatting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@m-charlton can you change this to the {} formatting syntax?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

@@ -566,6 +566,13 @@ def test_astype(self):
else:
self.assertEqual(tmgr.get('d').dtype.type, t)

def test_illegal_arg_for_errors_in_astype(self):
""" ValueError exception raised when illegal value used for errors """
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you turn this into a normal comment (with #)? Otherwise the test name does not show up in the nosetests output.
You can also add the issue number

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also move the test to the high-level astype tests for frame/series

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked at the existing test cases in pandas/tests/series &
pandas/tests/frame and can't find a natural place to put this test.

I was thinking of adding a new file called say test_astype.py in
pandas/tests/frame, as there is an existing test_apply.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no just grep for astype and you will see lots of tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this test is fine here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@m-charlton astype tests are located in pandas/tests/frame/test_dtypes.py (you actually updated some tests there)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small error message formatting change. ping when pushed and green.

@jorisvandenbossche
Copy link
Member

@m-charlton Can you also add a test confirming that the old keyword still works but gives a deprecation warning?

You can do this with:

        with tm.assert_produces_warning(FutureWarning):
            res = df.astype(.., raise_on_error=False)
        exp = df.astype(.., errors='ignore')

@jorisvandenbossche
Copy link
Member

xref #14877 we should try to clarify that the errors keyword only is meant for errors during the type conversion. Maybe we can change the "invalid input" to something else? Eg "invalid data for the provided dtype"?

@m-charlton
Copy link
Contributor Author

m-charlton commented Jan 3, 2017

Back from Christmas/New Year break. Will makes changes today and push for review

Tests added for deprected 'raise_on_error' kwarg & new 'errors' kwarg.
Clarified docstring for DataFrame.astype method
@@ -523,6 +523,24 @@ def test_timedeltas(self):
result = df.get_dtype_counts().sort_values()
assert_series_equal(result, expected)

def test_illegal_arg_for_errors_in_astype(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can make this a single test.

as the same test in tests/series/test_dtypes.py as well

@jreback
Copy link
Contributor

jreback commented Jan 3, 2017

minor test change. ping on green.

Unit tests for DataFrame.astype merged. Dupliacted those tests for
Series.astype. Both testing deprecation of 'raise_on_error' kwarg.
@jorisvandenbossche jorisvandenbossche merged commit b957f6f into pandas-dev:master Jan 3, 2017
@jorisvandenbossche
Copy link
Member

@m-charlton Thanks a lot!

@m-charlton m-charlton deleted the depr_arg_raise_on_error_in_astype#14878 branch January 4, 2017 10:13
gfyoung added a commit to forking-repos/pandas that referenced this pull request Oct 27, 2018
gfyoung added a commit to forking-repos/pandas that referenced this pull request Oct 27, 2018
tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DEPR: change .astype(.., raise_on_error) -> errors
4 participants