Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.to_dict returning numpy scalars in certain cases #23753

Closed
jorisvandenbossche opened this issue Nov 17, 2018 · 4 comments

Comments

Projects
None yet
4 participants
@jorisvandenbossche
Copy link
Member

commented Nov 17, 2018

I think in general we try to return python scalars instead of numpy scalars in to_dict (similar as in tolist or iteration).

Eg:

In [27]: df = pd.DataFrame({'a': [1, 2], 'b': [.1, .2]})

In [28]: df.to_dict()
Out[28]: {'a': {0: 1, 1: 2}, 'b': {0: 0.1, 1: 0.2}}

In [29]: type(df.to_dict()['a'][0])
Out[29]: int

However, this is not consistent, and eg when using orient='records':

In [31]: df.to_dict(orient='records')
Out[31]: [{'a': 1.0, 'b': 0.10000000000000001}, {'a': 2.0, 'b': 0.20000000000000001}]

In [32]: type(df.to_dict(orient='records')[0]['a'])
Out[32]: numpy.float64

In this case, that is because of iterating over self.values in the 'records' implementation (which also means that if you have a string column, self.values will be object dtype, and you actually get python scalars)

There are a bunch of other issues related to iteration (eg #20791, #13468), but didn't see one specifically related to to_dict.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Nov 17, 2018

pretty sure this is a duplicate issue

@jorisvandenbossche

This comment has been minimized.

Copy link
Member Author

commented Nov 17, 2018

As I said, I searched for it but didn't see one directly. But if you find one, happy to close this as a duplicate.

For iteration there are other issues, but here for to_dict, it is not only due to iteration of pandas objects, but eg also numpy depending on the orient type, so I think it deserves its own issue.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member Author

commented Nov 17, 2018

Not directly related to this issue, but: an option to convert missing values to None would also be nice for my use case. Although that might add quite some complexity to the implementation (and you can do it yourself relatively easy)

@bourbaki

This comment has been minimized.

Copy link
Contributor

commented Nov 25, 2018

@jreback I am working on the issue. The source of it is usage of DataFrame.values property in the most of to_dict orientations. DataFrame.values gathers data from all columns and converts them to typed nd.array

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.