Skip to content

Conversation

jiangyue12392
Copy link
Contributor

@jiangyue12392 jiangyue12392 commented Jul 15, 2019

As mentioned in #27335, this PR serves as an infrastructure for future changes in the default int type

@gfyoung gfyoung added Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Jul 15, 2019
Copy link
Member

@gfyoung gfyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test or two would be good to have here I imagine.

@jiangyue12392
Copy link
Contributor Author

@gfyoung Tests added

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. you can add a note on api-breaking changes that infer_dtype will now return integer-na for mixed-floats

cdef inline bint is_value_typed(self, object value) except -1:
return util.is_integer_object(value) or util.is_nan(value)

cdef inline bint is_array_typed(self) except -1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure you need this one, is it actually hit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed


cdef class IntegerNaValidator(Validator):
cdef inline bint is_value_typed(self, object value) except -1:
return util.is_integer_object(value) or util.is_nan(value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need a is_valid_null as well (e.g. see how PeriodValidator is done)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to util.is_nan(value) and util.is_float_object(value) instead

is_positional = False
except KeyError:
if self.inferred_type == "mixed-integer-float":
if (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.inferred_type in [....]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

result = lib.infer_dtype(arr, skipna=True)
assert result == "integer"

# GH 27392
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move to inside the definition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

assert result == "integer"

# GH 27392
def test_integer_na(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you parameterize over skipna?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameterization done

@jiangyue12392 jiangyue12392 force-pushed the integer-na branch 2 times, most recently from 966ac20 to 55cb505 Compare July 23, 2019 10:06
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whatsnew change, otherwise lgtm.

- :meth:`ExtensionArray.argsort` places NA values at the end of the sorted array. (:issue:`21801`)
- :meth:`DataFrame.to_hdf` and :meth:`Series.to_hdf` will now raise a ``NotImplementedError`` when saving a :class:`MultiIndex` with extention data types for a ``fixed`` format. (:issue:`7775`)
- Passing duplicate ``names`` in :meth:`read_csv` will now raise a ``ValueError`` (:issue:`17346`)
- :meth:`infer_type` will now return integer-na for integer and np.nan mix (:issue:`27283`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put integer-na in quotes here; move to 1.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and moved

@jreback jreback added this to the 1.0 milestone Jul 25, 2019
Other API changes
^^^^^^^^^^^^^^^^^

- :meth:`infer_type` will now return "integer-na" for integer and np.nan mix (:issue:`27283`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be :meth:`pandas.api.types.infer_dtype`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. small change in the whatsnew, ping on green.

@jiangyue12392
Copy link
Contributor Author

@jreback Green now

Other API changes
^^^^^^^^^^^^^^^^^

- :meth:`pandas.api.types.infer_type` will now return "integer-na" for integer and np.nan mix (:issue:`27283`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

infer_type -> infer_dtype

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, updated

@jreback
Copy link
Contributor

jreback commented Jul 27, 2019

@gfyoung ok here?

Other API changes
^^^^^^^^^^^^^^^^^

- :meth:`pandas.api.types.infer_dtype` will now return "integer-na" for integer and np.nan mix (:issue:`27283`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double back-ticks around np.nan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

assert result == "integer"

@pytest.mark.parametrize(
"arr, skipna, expected",
Copy link
Member

@gfyoung gfyoung Jul 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to parameterize expected here.

Notice how it's integer-na when skipa is False and integer when skipna is True.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sure, updated

Copy link
Member

@gfyoung gfyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! @jreback back over to you

@jreback jreback merged commit eb9a8e3 into pandas-dev:master Jul 31, 2019
@jreback
Copy link
Contributor

jreback commented Jul 31, 2019

thanks @jiangyue12392 keep em coming! you are tackling non-trivial issues and making very nice PRs!

quintusdias pushed a commit to quintusdias/pandas_dev that referenced this pull request Aug 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ENH: infer_dtype should infer integer-na

3 participants