Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit column dtype specification in read_* functions #1858

Closed
wesm opened this issue Sep 7, 2012 · 2 comments
Closed

Explicit column dtype specification in read_* functions #1858

wesm opened this issue Sep 7, 2012 · 2 comments
Labels
Enhancement IO Data IO issues that don't fit into a more specific label Testing pandas testing functions or related to the test suite
Milestone

Comments

@wesm
Copy link
Member

wesm commented Sep 7, 2012

e.g. columns with values like 01001 are getting converted to int

example from mailing list:

df = read_csv('test_data.csv')
df.head()
     oid   did mode             ox             oy      dx      dy
0  1001  1001   01  272311.659358  176751.822655  272675  176375
1  1001  1001   01  272311.659358  176751.822655  272375  176375
2  1001  1001   01  272311.659358  176751.822655  272125  176675
3  1001  1001   06  272311.659358  176751.822655  272675  177125
4  1001  1001   06  272311.659358  176751.822655  272675  176375

df.oid = df.oid.apply(lambda x: str(x).zfill(5))
df.head()
     oid   did mode             ox             oy      dx      dy
0  01001  1001   01  272311.659358  176751.822655  272675  176375
1  01001  1001   01  272311.659358  176751.822655  272375  176375
2  01001  1001   01  272311.659358  176751.822655  272125  176675
3  01001  1001   06  272311.659358  176751.822655  272675  177125
4  01001  1001   06  272311.659358  176751.822655  272675  176375
@wesm
Copy link
Member Author

wesm commented Nov 2, 2012

This is done in c-parser (dtype={'oid': object}) but needs a unit test

@wesm
Copy link
Member Author

wesm commented Nov 28, 2012

This works now:

In [11]: df = read_clipboard(delim_whitespace=True, dtype={'oid': 'O', 'did': 'O', 'mode': 'O'}); df
Out[11]: 
     oid    did mode             ox             oy      dx      dy
0  01001  01001   01  272311.659358  176751.822655  272675  176375
1  01001  01001   01  272311.659358  176751.822655  272375  176375
2  01001  01001   01  272311.659358  176751.822655  272125  176675
3  01001  01001   06  272311.659358  176751.822655  272675  177125
4  01001  01001   06  272311.659358  176751.822655  272675  176375

This needs to be able to accept more than just format strings though (e.g. 'f8'). I'll do that then close this issue.

@wesm wesm closed this as completed in 6a7c11c Nov 28, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Data IO issues that don't fit into a more specific label Testing pandas testing functions or related to the test suite
Projects
None yet
Development

No branches or pull requests

1 participant