New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series constructor skips dtype=str conversion for list data #16605

kbg opened this Issue Jun 5, 2017 · 2 comments


None yet
2 participants

kbg commented Jun 5, 2017

Code Example

from pandas import Series, DataFrame

int_list = [1, 2, 3]

s1 = Series(int_list)
s2 = Series(int_list, dtype=float)
s3 = Series(int_list, dtype=str)
s4 = Series(int_list, dtype='U')
s5 = Series(Series(int_list), dtype=str)

print('Series element type:')
print('  s1:', type(s1[0]))
print('  s2:', type(s2[0]))
print('  s3:', type(s3[0]))
print('  s4:', type(s4[0]))
print('  s5:', type(s5[0]))

f1 = DataFrame(int_list)
f2 = DataFrame(int_list, dtype=float)
f3 = DataFrame(int_list, dtype=str)
f4 = DataFrame(int_list, dtype='U')
f5 = DataFrame(DataFrame(int_list), dtype=str)

print('\nDataFrame element type:')
print('  f1:', type(f1.iloc[0, 0]))
print('  f2:', type(f2.iloc[0, 0]))
print('  f3:', type(f3.iloc[0, 0]))
print('  f4:', type(f4.iloc[0, 0]))
print('  f5:', type(f5.iloc[0, 0]))


Series element type:
  s1: <class 'numpy.int64'>
  s2: <class 'numpy.float64'>
  s3: <class 'int'>
  s4: <class 'int'>
  s5: <class 'str'>

DataFrame element type:
  f1: <class 'numpy.int64'>
  f2: <class 'numpy.float64'>
  f3: <class 'str'>
  f4: <class 'str'>
  f5: <class 'str'>

Problem description

When creating a Series from a list using dtype=str, the data elements are not converted to strings. The Series instance apparently just keeps the original (Python) data type in this case.

This problem does not occur when, instead of a list, another Series is used as input data (s5 in the example above). It also does not happen when creating DataFrame instances from list data.

Expected Output

Series element type:
  s1: <class 'numpy.int64'>
  s2: <class 'numpy.float64'>
  s3: <class 'str'>
  s4: <class 'str'>
  s5: <class 'str'>

DataFrame element type:
  f1: <class 'numpy.int64'>
  f2: <class 'numpy.float64'>
  f3: <class 'str'>
  f4: <class 'str'>
  f5: <class 'str'>

Output of pd.show_versions()

commit: None
python-bits: 64
OS: Linux
OS-release: 4.11.3-1-ARCH
machine: x86_64
byteorder: little
LC_ALL: None

pandas: 0.20.1
pytest: 3.1.1
pip: 9.0.1
setuptools: 36.0.1
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.6.1
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: 1.1.10
pymysql: None
psycopg2: 2.7.1 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: 0.4.0

This comment has been minimized.


jreback commented Jun 5, 2017

hmm, the array-like should be converted exactly like .astype(str), IOW

Series(arr, dtype=str) should be equal to Series(arr).astype(str).

I think str is treated as object here. pull-requests welcome.

@jreback jreback added this to the Next Major Release milestone Jun 5, 2017


This comment has been minimized.

kbg commented Jun 7, 2017

I don't have the time to fix it right now. Maybe I'll find some time at the weekend.

In case somebody else wants to fix this: The problem is located at the very end of pandas.core.series._sanitize_array() which is called by the Series constructor.

There is also an issue with scalar input values:

>>> type(pandas.Series(1.0, dtype=str)[0])

which needs some additional changes near the end of pandas.core.series._sanitize_array().

@jreback jreback modified the milestones: Next Major Release, 0.22.0 Dec 15, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment