Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataframe creation: Specifying dtypes with a dictionary #9287

Closed
amelio-vazquez-reina opened this issue Jan 17, 2015 · 3 comments
Closed

Dataframe creation: Specifying dtypes with a dictionary #9287

amelio-vazquez-reina opened this issue Jan 17, 2015 · 3 comments
Labels
API Design Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@amelio-vazquez-reina
Copy link
Contributor

Apologies if this feature has been suggested before. Many of the IO functions (e.g. read_csv) allow use to easily specify the format for each column using a dictionary. As far as I understand, this is not
possible with the regular dataframe construction, e.g:

df = pd.DataFrame(data=data, columns=columns, dtypes={'colname1': str, 'colname2': np.int})

Even better, it would be great if one could change the dtypes for the dataframe columns using a similar contruction, e.g.:

df.change_types({'colname1': str, 'colname2': np.int})

Is anything like this planned for already?

@jreback
Copy link
Contributor

jreback commented Jan 17, 2015

see #9133 and #4464 it's not that difficult
want to give a try?

@jreback jreback closed this as completed Jan 17, 2015
@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode Duplicate Report Duplicate issue or pull request labels Jan 17, 2015
@rpalloni
Copy link

rpalloni commented Apr 17, 2018

This way actually works:
data_df = data_df.astype(dtype= {"wheel_number":"int64", "car_name":"object","minutes_spent":"float64"})

@answerquest
Copy link

One difference between read_csv( csvFile, dtype={..} ) and df.astpye(dtype={..} ) :
In read_csv's case, it's ok if the dict supplied contains more columns that aren't in the csv, they are ignored gracefully. In astype()'s case, it errors out if all the columns defined aren't present in the data.

It should be more like read_csv. Because we can have incoming dict's that may have some columns and not others. Right now this is the workaround I have to do:

df = pd.DataFrame( incoming_data )
gtfs_dtypes = { ... } # my master dtypes dict, having all possible column names
gtfs_dtypes_specific = { x:gtfs_dtypes.get(x,'str') for x in df.columns.tolist() }
df = df.astype(dtype=gtfs_dtypes_specific)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

4 participants