BUG: read_csv: dtype={'id' : np.str}: Datatype not understood #3209

Closed
amelio-vazquez-reina opened this Issue Mar 29, 2013 · 4 comments

3 participants

@amelio-vazquez-reina

I have a CSV with several columns. The first of which is a field called id with entries of the type 0001, 0002, etc.

When loading this file, the following works:

pd.read_csv(my_path, dtype={'id' : np.int})

but the following doesn't:

pd.read_csv(my_path, dtype={'id' : np.str})

nor does this either:

pd.read_csv(my_path, dtype={'id' : str})

I get: Datatype not understood

This is with pandas-0.10.1

@jreback

use np.object_ dtype
np.str is a very specifc dtype that needs size information, so hard to deal with

In [13]: data = """1,0001
2,0002
3,0003"""

In [20]: pd.read_csv(StringIO.StringIO(data),header=None,
                                 names=['int','object'],dtype={1 : np.object_ })
Out[20]: 
   int object
0    1   0001
1    2   0002
2    3   0003

In [21]: pd.read_csv(StringIO.StringIO(data),header=0,
                                 names=['int','object'],dtype={1 : np.object_ }).dtypes
Out[21]: 
int        int64
object    object
dtype: object
@jreback

@ribonoous did this solve your issue?

@amelio-vazquez-reina

Yes @jreback Sorry I didn't acknowledge this earlier. I am all set!

@zkk995

it works now

D = pd.read_csv(filep, sep=sep, dtype=mm,header=None,names=feature_name,\
            keep_default_na=False,na_values={m:'' for m,v in mm.items() if v==np.object_})
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment