Explicit column dtype specification in read_* functions #1858

wesm · 2012-09-07T21:53:23Z

e.g. columns with values like 01001 are getting converted to int

example from mailing list:

df = read_csv('test_data.csv')
df.head()
     oid   did mode             ox             oy      dx      dy
0  1001  1001   01  272311.659358  176751.822655  272675  176375
1  1001  1001   01  272311.659358  176751.822655  272375  176375
2  1001  1001   01  272311.659358  176751.822655  272125  176675
3  1001  1001   06  272311.659358  176751.822655  272675  177125
4  1001  1001   06  272311.659358  176751.822655  272675  176375

df.oid = df.oid.apply(lambda x: str(x).zfill(5))
df.head()
     oid   did mode             ox             oy      dx      dy
0  01001  1001   01  272311.659358  176751.822655  272675  176375
1  01001  1001   01  272311.659358  176751.822655  272375  176375
2  01001  1001   01  272311.659358  176751.822655  272125  176675
3  01001  1001   06  272311.659358  176751.822655  272675  177125
4  01001  1001   06  272311.659358  176751.822655  272675  176375

The text was updated successfully, but these errors were encountered:

wesm · 2012-11-02T14:18:23Z

This is done in c-parser (dtype={'oid': object}) but needs a unit test

wesm · 2012-11-28T00:17:37Z

This works now:

In [11]: df = read_clipboard(delim_whitespace=True, dtype={'oid': 'O', 'did': 'O', 'mode': 'O'}); df
Out[11]: 
     oid    did mode             ox             oy      dx      dy
0  01001  01001   01  272311.659358  176751.822655  272675  176375
1  01001  01001   01  272311.659358  176751.822655  272375  176375
2  01001  01001   01  272311.659358  176751.822655  272125  176675
3  01001  01001   06  272311.659358  176751.822655  272675  177125
4  01001  01001   06  272311.659358  176751.822655  272675  176375

This needs to be able to accept more than just format strings though (e.g. 'f8'). I'll do that then close this issue.

wesm closed this as completed in 6a7c11c Nov 28, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicit column dtype specification in read_* functions #1858

Explicit column dtype specification in read_* functions #1858

wesm commented Sep 7, 2012

wesm commented Nov 2, 2012

wesm commented Nov 28, 2012

Explicit column dtype specification in read_* functions #1858

Explicit column dtype specification in read_* functions #1858

Comments

wesm commented Sep 7, 2012

wesm commented Nov 2, 2012

wesm commented Nov 28, 2012