New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series constructor skips dtype=str conversion for list data #16605

Closed
kbg opened this Issue Jun 5, 2017 · 2 comments

Comments

Projects
None yet
2 participants
@kbg

kbg commented Jun 5, 2017

Code Example

from pandas import Series, DataFrame

int_list = [1, 2, 3]

s1 = Series(int_list)
s2 = Series(int_list, dtype=float)
s3 = Series(int_list, dtype=str)
s4 = Series(int_list, dtype='U')
s5 = Series(Series(int_list), dtype=str)

print('Series element type:')
print('  s1:', type(s1[0]))
print('  s2:', type(s2[0]))
print('  s3:', type(s3[0]))
print('  s4:', type(s4[0]))
print('  s5:', type(s5[0]))

f1 = DataFrame(int_list)
f2 = DataFrame(int_list, dtype=float)
f3 = DataFrame(int_list, dtype=str)
f4 = DataFrame(int_list, dtype='U')
f5 = DataFrame(DataFrame(int_list), dtype=str)

print('\nDataFrame element type:')
print('  f1:', type(f1.iloc[0, 0]))
print('  f2:', type(f2.iloc[0, 0]))
print('  f3:', type(f3.iloc[0, 0]))
print('  f4:', type(f4.iloc[0, 0]))
print('  f5:', type(f5.iloc[0, 0]))

Output:

Series element type:
  s1: <class 'numpy.int64'>
  s2: <class 'numpy.float64'>
  s3: <class 'int'>
  s4: <class 'int'>
  s5: <class 'str'>

DataFrame element type:
  f1: <class 'numpy.int64'>
  f2: <class 'numpy.float64'>
  f3: <class 'str'>
  f4: <class 'str'>
  f5: <class 'str'>

Problem description

When creating a Series from a list using dtype=str, the data elements are not converted to strings. The Series instance apparently just keeps the original (Python) data type in this case.

This problem does not occur when, instead of a list, another Series is used as input data (s5 in the example above). It also does not happen when creating DataFrame instances from list data.

Expected Output

Series element type:
  s1: <class 'numpy.int64'>
  s2: <class 'numpy.float64'>
  s3: <class 'str'>
  s4: <class 'str'>
  s5: <class 'str'>

DataFrame element type:
  f1: <class 'numpy.int64'>
  f2: <class 'numpy.float64'>
  f3: <class 'str'>
  f4: <class 'str'>
  f5: <class 'str'>

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.11.3-1-ARCH
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.1
pytest: 3.1.1
pip: 9.0.1
setuptools: 36.0.1
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.6.1
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: 1.1.10
pymysql: None
psycopg2: 2.7.1 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: 0.4.0
@jreback

This comment has been minimized.

Contributor

jreback commented Jun 5, 2017

hmm, the array-like should be converted exactly like .astype(str), IOW

Series(arr, dtype=str) should be equal to Series(arr).astype(str).

I think str is treated as object here. pull-requests welcome.

@jreback jreback added this to the Next Major Release milestone Jun 5, 2017

@kbg

This comment has been minimized.

kbg commented Jun 7, 2017

I don't have the time to fix it right now. Maybe I'll find some time at the weekend.

In case somebody else wants to fix this: The problem is located at the very end of pandas.core.series._sanitize_array() which is called by the Series constructor.

There is also an issue with scalar input values:

>>> type(pandas.Series(1.0, dtype=str)[0])
float

which needs some additional changes near the end of pandas.core.series._sanitize_array().

@jreback jreback modified the milestones: Next Major Release, 0.22.0 Dec 15, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment