Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix (22477) dtype=str converts NaN to 'n' #22564

Merged
merged 19 commits into from Nov 20, 2018
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.rst
Expand Up @@ -1442,6 +1442,7 @@ Reshaping
- Bug in :func:`merge_asof` where confusing error message raised when attempting to merge with missing values (:issue:`23189`)
- Bug in :meth:`DataFrame.nsmallest` and :meth:`DataFrame.nlargest` for dataframes that have a :class:`MultiIndex` for columns (:issue:`23033`).
- Bug in :meth:`DataFrame.append` with a :class:`Series` with a dateutil timezone would raise a ``TypeError`` (:issue:`23682`)
- Bug in ``Series`` construction when passing no data and ``dtype=str`` (:issue:`22477`)

.. _whatsnew_0240.bug_fixes.sparse:

Expand Down
13 changes: 8 additions & 5 deletions pandas/core/dtypes/cast.py
Expand Up @@ -6,7 +6,7 @@

from pandas._libs import lib, tslib, tslibs
from pandas._libs.tslibs import OutOfBoundsDatetime, Period, iNaT
from pandas.compat import PY3, string_types, text_type
from pandas.compat import PY3, string_types, text_type, to_str

from .common import (
_INT64_DTYPE, _NS_DTYPE, _POSSIBLY_CAST_DTYPES, _TD_DTYPE, _string_dtypes,
Expand Down Expand Up @@ -1217,11 +1217,14 @@ def construct_1d_arraylike_from_scalar(value, length, dtype):
dtype = dtype.dtype

# coerce if we have nan for an integer dtype
# GH 22858: only cast to float if an index
# (passed here as length) is specified
if length and is_integer_dtype(dtype) and isna(value):
dtype = np.float64
subarr = np.empty(length, dtype=dtype)
dtype = np.dtype('float64')
elif isinstance(dtype, np.dtype) and dtype.kind in ("U", "S"):
subarr = np.empty(length, dtype=object)
if not isna(value):
value = to_str(value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer to have all of the dtype checking in the if/elif/else and then construct the subarr after

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, dtype needs to be overwritten with object because we don't want actual string dtypes (which is easy to do, just noting :-))

else:
subarr = np.empty(length, dtype=dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback by putting this here in the else statement, there is no creation of subarr if you are in the first if case of if length and is_integer_dtype(dtype) and isna(value) (which seems to indicate this is not covered by the tests)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and do u gave have a case that doesn’t work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and do u gave have a case that doesn’t work?

subarr.fill(value)

return subarr
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/dtypes/common.py
Expand Up @@ -419,7 +419,7 @@ def is_datetime64_dtype(arr_or_dtype):
return False
try:
tipo = _get_dtype_type(arr_or_dtype)
except TypeError:
except (TypeError, UnicodeEncodeError):
return False
return issubclass(tipo, np.datetime64)

Expand Down
11 changes: 11 additions & 0 deletions pandas/tests/series/test_constructors.py
Expand Up @@ -134,6 +134,17 @@ def test_constructor_no_data_index_order(self):
result = pd.Series(index=['b', 'a', 'c'])
assert result.index.tolist() == ['b', 'a', 'c']

def test_constructor_no_data_string_type(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls parametrize these tests

# GH 22477
result = pd.Series(index=[1], dtype=str)
assert np.isnan(result.iloc[0])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check the value using iloc instead here which returns a scalar

@pytest.mark.parametrize('item', ['entry', 'ѐ', 13])
def test_constructor_string_element_string_type(self, item):
# GH 22477
result = pd.Series(item, index=[1], dtype=str)
assert result.iloc[0] == str(item)

def test_constructor_dtype_str_na_values(self, string_dtype):
# https://github.com/pandas-dev/pandas/issues/21083
ser = Series(['x', None], dtype=string_dtype)
Expand Down