Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using to_dict with the 'records' orient produces different results from the default one #29824

Open
finete opened this issue Nov 24, 2019 · 10 comments
Labels
API - Consistency Internal Consistency of API/Behavior Bug Needs Discussion Requires discussion from core team before further action

Comments

@finete
Copy link

finete commented Nov 24, 2019

consider the following df

df = pd.DataFrame({'d': pd.date_range('2018-01-01', freq='12h', periods=2)})
for col in df.select_dtypes(['datetime']):
    df[col] = pd.Series(df[col].dt.to_pydatetime(), dtype = 'object')

using the normal to_dict returns the wanted results with the native python types

print(df.to_dict())
{'d': {0: datetime.datetime(2018, 1, 1, 0, 0), 1: datetime.datetime(2018, 1, 1, 12, 0)}}

however doing it with a different orient, produces different results

print(df.to_dict('records'))  # also true for the 'split' orient 
[{'d': Timestamp('2018-01-01 00:00:00')}, {'d': Timestamp('2018-01-01 12:00:00')}]

as a side note, it would be nice to have the ability to convert a dataframe or a series into a native types structure (no unlike to_json)


versions:
pandas : 0.25.2
numpy : 1.17.3

@gokhangerdan
Copy link

You can use this function if you really need that:

def to_dict(df, orient="dict"):
	if orient=="records":
		columns = df.columns.tolist()
		rows = (
			dict(zip(columns, row)) for row in df.itertuples(
				index=False,
				name=None
			)
		)
		return [x for x in rows]
	else:
		return df.to_dict(orient)

You can use it like this:

import pandas as pd

df = pd.DataFrame({"d": pd.date_range("2018-01-01", freq="12h", periods=2)})
for col in df.select_dtypes(["datetime"]):
	df[col] = pd.Series(df[col].dt.to_pydatetime(), dtype = "object")
print(to_dict(df, "records"))

This will result as you want it:

[{'d': datetime.datetime(2018, 1, 1, 0, 0)}, {'d': datetime.datetime(2018, 1, 1, 12, 0)}]

@Dr-Irv Dr-Irv added the Needs Discussion Requires discussion from core team before further action label Sep 5, 2020
@Dr-Irv
Copy link
Contributor

Dr-Irv commented Sep 5, 2020

@jbrockmendel can you take a look at this since I think you have thought about datetime issues? I think this is a bug, or API consistency - not sure which label to use

@jbrockmendel jbrockmendel added the API - Consistency Internal Consistency of API/Behavior label Sep 5, 2020
@jbrockmendel
Copy link
Member

Labelled as API-Consistency.

cc @WillAyd would it make sense to have to_dict re-use the json code?

@finete
Copy link
Author

finete commented Sep 6, 2020

maybe add an argument for native types?
something like .to_dict(native_types=True)


@jbrockmendel the pd.Timestamp and datetime handling would have to be different, since to_json produces an epoch time string.

@WillAyd
Copy link
Member

WillAyd commented Sep 8, 2020

Labelled as API-Consistency.

cc @WillAyd would it make sense to have to_dict re-use the json code?

There might be a creative way to do this, but JSON is going to have a different set of requirements to serialize to than native Python types

@arw2019
Copy link
Member

arw2019 commented Jan 31, 2021

as of #37648 DataFrame.to_dict(orient="records") returns python native types for int, float and bool; and pd.Timestamp/pd.Timedelta for datetimes

xref #39389 (comment) people express a preference for returning standard library datetime.datetime over Timestamp because of overflow issues with Timestamp

@finete
Copy link
Author

finete commented Feb 11, 2021

@arw2019 Do you happen to Know about None ? There is nothing more annoying then an unexcepted pd.NA\ pd.NaT\ np.nan somewhere down the line...

@arw2019
Copy link
Member

arw2019 commented Feb 11, 2021

@arw2019 Do you happen to Know about None ? There is nothing more annoying then an unexcepted pd.NA\ pd.NaT\ np.nan somewhere down the line...

would you mind opening a separate issue for that? It's related but will be a separate fix

@finete
Copy link
Author

finete commented Feb 14, 2021

@arw2019 not sure what the issue should be about? a feature request ?

@mroeschke mroeschke added the Bug label Jul 23, 2021
@p-frolov
Copy link

Just for information I have similar issue with Timestamp too when try to save data to mysql
AttributeError("'Timestamp' object has no attribute 'translate'")

I have checked types and found this conversion:

import pandas as pd
import datetime as dt
df__ = pd.DataFrame([[dt.datetime.utcnow()]], columns=['date'])
next(df__.iterrows())[1]
# OK
# date   2022-02-15 05:05:20.881479
# Name: 0, dtype: datetime64[ns]
next(df__.iterrows())[1]['date']
# NOK
# Timestamp('2022-02-15 09:04:14.469187')

I have tried to save datetime64[ns] and have the same issue with attribute. Conversion value to_pydatetime() works for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Bug Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

8 participants