Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Option to store dataset creation/modification times with to_hdf. #44246

Open
rickhg12hs opened this issue Oct 31, 2021 · 2 comments
Open
Labels
Enhancement IO HDF5 read_hdf, HDFStore Needs Discussion Requires discussion from core team before further action

Comments

@rickhg12hs
Copy link

rickhg12hs commented Oct 31, 2021

Is your feature request related to a problem?

"I wish I could use pandas to" store a DataFrame with to_hdf and optionally include the datetime of dataset creation/modification.

Describe the solution you'd like

DataFrame.to_hdf should have an option that will store the datetime of dataset creation/modification.

[docstring addition]

with_datetime : bool, default False
    Stores `datetime` when dataset is written

API breaking implications

Shouldn't break anything.

Describe alternatives you've considered

A separate "metafile" that the user would need to write with dataset creation/modification datetimes.

Additional context

With the possibility of rewriting datasets, having the creation and modification times of the dataset in the HDF5 file would help to inform HDF5 file readers/users of updated datasets.

my_df.to_pdf(..., with_datetime=True, ...)
...
my_df_read = pd.read_hdf(store_file, key=key_value, mode="r")
print(my_df_read.ctime) # None or datetime
print(my_df_read.mtime) # None or datetime
...
my_df_read.info() # Would also show datetimes if present
@rickhg12hs rickhg12hs added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 31, 2021
@rickhg12hs rickhg12hs changed the title ENH: Provide option to store dataset creation time with to_hdf. ENH: Option to store dataset creation/modification times with to_hdf. Nov 1, 2021
@mroeschke
Copy link
Member

Thanks for the suggestion.

My immediate reaction is that this seems out of scope for pandas as this problem doesn't seem entirely specific to HDF5 files (same can be said of CSV files for example), and alternate solutions like the one you provided, naming the file with timestamps, using the filesystem to keep track of time are valid solutions.

@mroeschke mroeschke added IO HDF5 read_hdf, HDFStore Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 5, 2021
@rickhg12hs
Copy link
Author

Understood.

Thanks for the consideration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO HDF5 read_hdf, HDFStore Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

2 participants