BUG: HDFStore doesn't save datetime64[s] right #59004

caballerofelipe · 2024-06-13T18:39:51Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

# **** This doesn't work

import pandas as pd
df = pd.DataFrame(['2001-01-01', '2002-02-02'], dtype='datetime64[s]')
print(df)
# Prints
#            0
# 0 2001-01-01
# 1 2002-02-02

print(df.dtypes)
# Prints
# 0    datetime64[s]
# dtype: object

with pd.HDFStore('deleteme.h5', mode='w') as store:
    store.put(
        'df',
        df,
    )

with pd.HDFStore('deleteme.h5', mode='r') as store:
    df_fromstore = store.get('df')
print(df_fromstore)
# Prints
#                               0
# 0 1970-01-01 00:00:00.978307200
# 1 1970-01-01 00:00:01.012608000

# Delete created file
import pathlib
pathlib.Path('deleteme.h5').unlink()

# **** However this does work

import pandas as pd
df = pd.DataFrame(['2001-01-01', '2002-02-02'], dtype='datetime64[ns]')
print(df)
# Prints
#            0
# 0 2001-01-01
# 1 2002-02-02

print(df.dtypes)
# Prints
# 0    datetime64[ns]
# dtype: object

with pd.HDFStore('deleteme.h5', mode='w') as store:
    store.put(
        'df',
        df,
    )

with pd.HDFStore('deleteme.h5', mode='r') as store:
    df_fromstore = store.get('df')
print(df_fromstore)
# Prints
#            0
# 0 2001-01-01
# 1 2002-02-02

# Delete created file
pathlib.Path('deleteme.h5').unlink()

Issue Description

When saving to a file using .h5 (HDF5 format), if a column is of type datetime64[s] (s not ns) and then loading the file, the date isn't loaded correctly. My intuition tells me that the file gets saved to datetime64[s] but when read Pandas parses the date as datetime64[ns] and I guess some more digits should be present if parsing to datetime64[ns]. I haven't done any tests with other datetime precisions.

Expected Behavior

When saving a datetime64[s] to HDF5 file format and loading that file, the date should be correctly loaded.

Installed Versions

UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS
------------------
commit                : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140
python                : 3.11.9.final.0
python-bits           : 64
OS                    : Darwin
OS-release            : 23.5.0
Version               : Darwin Kernel Version 23.5.0: Wed May  1 20:12:58 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6000
machine               : arm64
processor             : arm
byteorder             : little
LC_ALL                : None
LANG                  : en_US.UTF-8
LOCALE                : en_US.UTF-8

pandas                : 2.2.2
numpy                 : 1.26.4
pytz                  : 2024.1
dateutil              : 2.9.0
setuptools            : 70.0.0
pip                   : 24.0
Cython                : 3.0.8
pytest                : 8.0.0
hypothesis            : None
sphinx                : None
blosc                 : None
feather               : None
xlsxwriter            : 3.1.9
lxml.etree            : 5.1.0
html5lib              : 1.1
pymysql               : None
psycopg2              : None
jinja2                : 3.1.3
IPython               : 8.21.0
pandas_datareader     : None
adbc-driver-postgresql: None
adbc-driver-sqlite    : None
bs4                   : 4.12.3
bottleneck            : None
dataframe-api-compat  : None
fastparquet           : None
fsspec                : None
gcsfs                 : None
matplotlib            : 3.8.2
numba                 : None
numexpr               : 2.10.0
odfpy                 : None
openpyxl              : 3.1.4
pandas_gbq            : None
pyarrow               : None
pyreadstat            : None
python-calamine       : None
pyxlsb                : None
s3fs                  : None
scipy                 : 1.12.0
sqlalchemy            : None
tables                : 3.9.2
tabulate              : None
xarray                : None
xlrd                  : None
zstandard             : None
tzdata                : 2024.1
qtpy                  : None
pyqt5                 : None

The text was updated successfully, but these errors were encountered:

…s]`. Some testing was done and a bug was filed, HDF5Store doesn't save or loads `datetime64[s]` right so it was changed to [ns] which is the format Pandas chooses when transforming to datetime. See the link for the bug: pandas-dev/pandas#59004 .

chaoyihu · 2024-06-14T21:41:33Z

I think this is a bug in pytables::_set_tz(), where the unit is hard-coded as ns. I submitted a PR which will hopefully resolve this issue.

Just for reference, here is a trace stack I used for debugging the reproducible code:

**** This doesn't work
Calling _read_group()
type of Storer:  <class 'pandas.io.pytables.FrameFixed'>
Calling BlockManagerFixed.read()
Calling GenericFixed.read_array()
Checking dtype in read_array(): datetime64[s]
Checking value read from node: [[ 978307200]
 [1012608000]]    # It is correct so far.
Checking value after reconstructing time zone: <DatetimeArray>
[
['1970-01-01 00:00:00.978307200'],
['1970-01-01 00:00:01.012608']
]
Shape: (2, 1), dtype: datetime64[ns]   # So the time zone attribution messed it up somehow.
block0_values <DatetimeArray>
[
['1970-01-01 00:00:00.978307200', '1970-01-01 00:00:01.012608']
]
Shape: (1, 2), dtype: datetime64[ns]
columns Index([0], dtype='int64')
###### result:                                0
0 1970-01-01 00:00:00.978307200
1 1970-01-01 00:00:01.012608000


**** However this does work
Calling _read_group()
type of Storer:  <class 'pandas.io.pytables.FrameFixed'>
Calling BlockManagerFixed.read()
Calling GenericFixed.read_array()
Checking dtype in read_array(): datetime64[ns]
Checking value read from node: [[ 978307200000000000]
 [1012608000000000000]]
Checking value after reconstructing time zone: <DatetimeArray>
[
['2001-01-01 00:00:00'],
['2002-02-02 00:00:00']
]
Shape: (2, 1), dtype: datetime64[ns]
block0_values <DatetimeArray>
[
['2001-01-01 00:00:00', '2002-02-02 00:00:00']
]
Shape: (1, 2), dtype: datetime64[ns]
columns Index([0], dtype='int64')
###### result:             0
0 2001-01-01
1 2002-02-02

caballerofelipe added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 13, 2024

caballerofelipe changed the title ~~BUG:~~ BUG: HDFStore doesn't save datetime[s] right Jun 13, 2024

caballerofelipe changed the title ~~BUG: HDFStore doesn't save datetime[s] right~~ BUG: HDFStore doesn't save datetime64[s] right Jun 13, 2024

chaoyihu mentioned this issue Jun 14, 2024

Fix wrong save of datetime64[s] in HDFStore #59018

Merged

5 tasks

lithomas1 added IO HDF5 read_hdf, HDFStore Non-Nano datetime64/timedelta64 with non-nanosecond resolution and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 17, 2024

mroeschke closed this as completed in #59018 Jun 18, 2024

Aloqeely mentioned this issue Jun 27, 2024

BUG: datetime64[s] data changes when put into HDFStore #59120

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: HDFStore doesn't save datetime64[s] right #59004

BUG: HDFStore doesn't save datetime64[s] right #59004

caballerofelipe commented Jun 13, 2024 •

edited

Loading

chaoyihu commented Jun 14, 2024

BUG: HDFStore doesn't save datetime64[s] right #59004

BUG: HDFStore doesn't save datetime64[s] right #59004

Comments

caballerofelipe commented Jun 13, 2024 • edited Loading

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

chaoyihu commented Jun 14, 2024

caballerofelipe commented Jun 13, 2024 •

edited

Loading