Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string truncation when using Series.astype(str) #4405

Closed
mariusvniekerk opened this issue Jul 30, 2013 · 6 comments · Fixed by #4437
Closed

string truncation when using Series.astype(str) #4405

mariusvniekerk opened this issue Jul 30, 2013 · 6 comments · Fixed by #4437
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions
Milestone

Comments

@mariusvniekerk
Copy link

When converting a pandas Series object to type string using astype(str), long strings are truncated to 64 characters silently.

Pandas version: 0.12
Numpy version: 1.7.1

In [1]: s = '0123456789' * 10

In [2]: tmp = np.array([s]).astype(str)
         tmp
Out[2]: array([ '0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789'], 
      dtype='|S100')

In [3]: tmp = pd.Series([s]).astype(str)
       tmp[0]
Out[3]: '0123456789012345678901234567890123456789012345678901234567890123'

In [4]: len(tmp[0])
Out[4]: 64
@cpcloud
Copy link
Member

cpcloud commented Jul 30, 2013

actually this looks like a numpy bug

In [1]: s = '0123456789' * 10

In [2]: tmp = np.array([s]).astype(str)

In [3]: s = Series([s])

In [4]: s.values
Out[4]: array([ '0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789'], dtype=object)

In [5]: s.values.astype(str)
Out[5]:
array(['0123456789012345678901234567890123456789012345678901234567890123'],
      dtype='|S64')

@jreback
Copy link
Contributor

jreback commented Jul 30, 2013

and astype(str) should just be converted to astype(object) internally (as str type is not a valid type)

@cpcloud
Copy link
Member

cpcloud commented Jul 30, 2013

if you need to do this then you can do

In [29]: s.values.astype('|S%d' % len(s[0]))

@jreback
Copy link
Contributor

jreback commented Jul 30, 2013

str must implcity do S64 somewhere (in numpy)

@cpcloud
Copy link
Member

cpcloud commented Jul 30, 2013

maybe fixed in np master

numpy/numpy#3270

@cpcloud
Copy link
Member

cpcloud commented Jul 30, 2013

i don't have the patience right now to recompile everything...anyone want to try this out on numpy master? i guess we leave it open until that numpy release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants