Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duplicated time series index entries after hdf5 store (dayligth saving time related) #1081

Closed
bmu opened this issue Apr 19, 2012 · 3 comments
Labels
Bug Datetime Datetime data dtype
Milestone

Comments

@bmu
Copy link

bmu commented Apr 19, 2012

I have a data frame like this

>> df
<class 'pandas.core.frame.DataFrame'>
Index: 596520 entries, 2006-04-14 00:00:00 to 2011-12-31 23:55:00
Data columns:
g_m_pyr__0       596520  non-null values
e_wr__0          596520  non-null values
p_nenn_sg__0     596520  non-null values
flaeche_sg__0    596520  non-null values
dtypes: float64(4)
>> df.index[0].timetuple()
time.struct_time(tm_year=2006, tm_mon=4, tm_mday=14, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=4, tm_yday=104, tm_isdst=-1)
>> df.index.get_duplicates()
[]

saving this dataframe to hdf5 and reloading it results in duplicated index entries:

>> store = pd.io.pytables.HDFStore('store.h5')
>> store['df'] = df 
>> df2 = store['df']
>> df2.index.get_duplicates()
[datetime.datetime(2007, 3, 25, 3, 0),
 datetime.datetime(2007, 3, 25, 3, 5),
 datetime.datetime(2007, 3, 25, 3, 10),
 datetime.datetime(2007, 3, 25, 3, 15),
 datetime.datetime(2007, 3, 25, 3, 20),
 datetime.datetime(2007, 3, 25, 3, 25),
 datetime.datetime(2007, 3, 25, 3, 30),
 datetime.datetime(2007, 3, 25, 3, 35),
 datetime.datetime(2007, 3, 25, 3, 40),
 datetime.datetime(2007, 3, 25, 3, 45),
 datetime.datetime(2007, 3, 25, 3, 50),
 datetime.datetime(2007, 3, 25, 3, 55),
 datetime.datetime(2008, 3, 30, 3, 0),
 datetime.datetime(2008, 3, 30, 3, 5),
 datetime.datetime(2008, 3, 30, 3, 10),
 datetime.datetime(2008, 3, 30, 3, 15),
 datetime.datetime(2008, 3, 30, 3, 20),
 datetime.datetime(2008, 3, 30, 3, 25),
 datetime.datetime(2008, 3, 30, 3, 30),
 datetime.datetime(2008, 3, 30, 3, 35),
 datetime.datetime(2008, 3, 30, 3, 40),
 datetime.datetime(2008, 3, 30, 3, 45),
 datetime.datetime(2008, 3, 30, 3, 50),
 datetime.datetime(2008, 3, 30, 3, 55),
 datetime.datetime(2009, 3, 29, 3, 0),
 datetime.datetime(2009, 3, 29, 3, 5),
 datetime.datetime(2009, 3, 29, 3, 10),
 datetime.datetime(2009, 3, 29, 3, 15),
 datetime.datetime(2009, 3, 29, 3, 20),
 datetime.datetime(2009, 3, 29, 3, 25),
 datetime.datetime(2009, 3, 29, 3, 30),
 datetime.datetime(2009, 3, 29, 3, 35),
 datetime.datetime(2009, 3, 29, 3, 40),
 datetime.datetime(2009, 3, 29, 3, 45),
 datetime.datetime(2009, 3, 29, 3, 50),
 datetime.datetime(2009, 3, 29, 3, 55),
 datetime.datetime(2010, 3, 28, 3, 0),
 datetime.datetime(2010, 3, 28, 3, 5),
 datetime.datetime(2010, 3, 28, 3, 10),
 datetime.datetime(2010, 3, 28, 3, 15),
 datetime.datetime(2010, 3, 28, 3, 20),
 datetime.datetime(2010, 3, 28, 3, 25),
 datetime.datetime(2010, 3, 28, 3, 30),
 datetime.datetime(2010, 3, 28, 3, 35),
 datetime.datetime(2010, 3, 28, 3, 40),
 datetime.datetime(2010, 3, 28, 3, 45),
 datetime.datetime(2010, 3, 28, 3, 50),
 datetime.datetime(2010, 3, 28, 3, 55),
 datetime.datetime(2011, 3, 27, 3, 0),
 datetime.datetime(2011, 3, 27, 3, 5),
 datetime.datetime(2011, 3, 27, 3, 10),
 datetime.datetime(2011, 3, 27, 3, 15),
 datetime.datetime(2011, 3, 27, 3, 20),
 datetime.datetime(2011, 3, 27, 3, 25),
 datetime.datetime(2011, 3, 27, 3, 30),
 datetime.datetime(2011, 3, 27, 3, 35),
 datetime.datetime(2011, 3, 27, 3, 40),
 datetime.datetime(2011, 3, 27, 3, 45),
 datetime.datetime(2011, 3, 27, 3, 50),
 datetime.datetime(2011, 3, 27, 3, 55)]

I get the same duplicates with other time series os the same kind and the duplicates are in march for every dataframe.

Is this a pandas problem?

@bmu
Copy link
Author

bmu commented Apr 20, 2012

This issue seems to daylight saving time related: all puplicated timesteps are in the hour, when summer time is applied.

All my datetimes are naive, so I don't know, where the problem is.

@wesm
Copy link
Member

wesm commented Apr 20, 2012

This is related to this issue: #809

I haven't had more time to dig into this but I think it's a platform related issue. pandas is switching to the NumPy datetime64 dtype soon which will solve this and other issues; not sure if there's a workaround for now

wesm added a commit that referenced this issue May 14, 2012
@wesm
Copy link
Member

wesm commented May 14, 2012

This will be remedied in pandas 0.8.0 with the move to datetime64

@wesm wesm closed this as completed May 14, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype
Projects
None yet
Development

No branches or pull requests

2 participants