You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using update function with datetime data, it is automatically converted to timestamp, which for me it seems like an abnormal behaviour. Code from above would output
_id date
0 a 2019-11-07 15:50:06.072158
1 b 2019-11-07 15:50:06.072158
_id date
0 a 1573141806072158000
1 b 1573141806072158000
Expected Output
_id date
0 a 2019-11-07 15:50:06.072158
1 b 2019-11-07 15:50:06.072158
_id date
0 a 2019-11-07 15:50:06.072158
1 b 2019-11-07 15:50:06.072158
Output of pd.show_versions()
[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS
commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
Hi, I was able to reproduce this result. This is due to pandas.DataFrame.update calling expressions.where. source link.
From then it eventually calls numpy.wheredocumentation which then eventually uses the Numpy MaskedArray type. source link.
This seems to be a choice to use numpy.where, which causes the datetime type to be converted to unix time, which speeds up the computation. However feel free to correct me on that.
I'd suggest trying pandas.to_datetimelinked here to convert them afterward (sometimes you have to reduce your unix time precision, by removing digits from the end, to get it to work), but I haven't tested on your example data yet, so feel free to try it.
Indeed, this appears to be an odd interaction with DatetimeArray and np.where.
a = np.asarray([None], dtype=np.object)
b = np.asarray(pd.arrays.DatetimeArray(pd.Series([datetime.now()])))
cond = [False]
print(np.where(cond, a, b))
gives [1595782471507905000]; whereas
a = np.asarray([None], dtype=np.object)
b = np.asarray([datetime.now()], dtype=np.object)
cond = [False]
print(np.where(cond, a, b))
Code Sample, a copy-pastable example if possible
Problem description
When using update function with datetime data, it is automatically converted to timestamp, which for me it seems like an abnormal behaviour. Code from above would output
_id date
0 a 2019-11-07 15:50:06.072158
1 b 2019-11-07 15:50:06.072158
_id date
0 a 1573141806072158000
1 b 1573141806072158000
Expected Output
_id date
0 a 2019-11-07 15:50:06.072158
1 b 2019-11-07 15:50:06.072158
_id date
0 a 2019-11-07 15:50:06.072158
1 b 2019-11-07 15:50:06.072158
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 0.25.1
numpy : 1.16.4
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.2
setuptools : 41.0.1
Cython : 0.29.13
pytest : None
hypothesis : None
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.7.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
The text was updated successfully, but these errors were encountered: