Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFStore does not conserve index correctly #973

Closed
saroele opened this issue Mar 27, 2012 · 3 comments

Comments

@saroele
Copy link

commented Mar 27, 2012

I have DataFrame with DateRange index, and i store it with HDFStore to h5.

But when I retreive my DataFrame, there is a problem with the indices cause I get this error:
Exception: Index values are not unique

Here are the commands that produce the exception, I didn't test every possible case so I keep the commands in between that I executed:

dffc=pandas.DataFrame(vpp.forecast.data, index=dr_year_900[:-1])
dffc[start_dt:stop_dt]
dffc.ix[start_dt:stop_dt]
dffc.drop(['Counter'], axis=1)
dffc=dffc.drop(['Counter'], axis=1)
dffc.ix[start_dt:stop_dt]
dffc['EFor'] = dffc['Power'] - dffc['Forecast']
dffc.ix[start_dt:stop_dt]
dffc = dffc.rename(columns={'Power':'PRef', 'Forecast':'PFor'})
dffc
dffc.ix[start_dt:stop_dt]
dfs = pandas.HDFStore('test.h5', 'w')
dfs.put('dffc', dffc)
dfs.close()
del(dffc)
dfs = pandas.HDFStore('test.h5', 'r')
dffc
dffc = dfs['dffc']
dffc
dffc.ix[start_dt:stop_dt]
==> here is the output:
In [49]: dffc.ix[start_dt:stop_dt]
Out[49]:
<class 'pandas.core.frame.DataFrame'>
DateRange: 961 entries, 2010-01-01 00:00:00 to 2010-01-11 00:00:00
offset: <900 Seconds>
Data columns:
Counter 961 non-null values
Forecast 961 non-null values
Imbalance 961 non-null values
NegImbPrice 961 non-null values
PosImbPrice 961 non-null values
Power 961 non-null values
Price 961 non-null values
dtypes: float64(6), object(1)

In [50]: dffc.drop(['Counter'], axis=1)
Out[50]:
<class 'pandas.core.frame.DataFrame'>
DateRange: 35040 entries, 2010-01-01 00:00:00 to 2010-12-31 23:45:00
offset: <900 Seconds>
Data columns:
Forecast 35040 non-null values
Imbalance 35040 non-null values
NegImbPrice 35040 non-null values
PosImbPrice 35040 non-null values
Power 35040 non-null values
Price 35040 non-null values
dtypes: float64(6)

In [51]: dffc=dffc.drop(['Counter'], axis=1)

In [52]: dffc.ix[start_dt:stop_dt]
Out[52]:
<class 'pandas.core.frame.DataFrame'>
DateRange: 961 entries, 2010-01-01 00:00:00 to 2010-01-11 00:00:00
offset: <900 Seconds>
Data columns:
Forecast 961 non-null values
Imbalance 961 non-null values
NegImbPrice 961 non-null values
PosImbPrice 961 non-null values
Power 961 non-null values
Price 961 non-null values
dtypes: float64(6)

In [53]: dffc['EFor'] = dffc['Power'] - dffc['Forecast']

In [54]: dffc.ix[start_dt:stop_dt]
Out[54]:
<class 'pandas.core.frame.DataFrame'>
DateRange: 961 entries, 2010-01-01 00:00:00 to 2010-01-11 00:00:00
offset: <900 Seconds>
Data columns:
Forecast 961 non-null values
Imbalance 961 non-null values
NegImbPrice 961 non-null values
PosImbPrice 961 non-null values
Power 961 non-null values
Price 961 non-null values
EFor 961 non-null values
dtypes: float64(7)

In [55]: dffc = dffc.rename(columns={'Power':'PRef', 'Forecast':'PFor'})

In [56]: dffc
Out[56]:
<class 'pandas.core.frame.DataFrame'>
DateRange: 35040 entries, 2010-01-01 00:00:00 to 2010-12-31 23:45:00
offset: <900 Seconds>
Data columns:
PFor 35040 non-null values
Imbalance 35040 non-null values
NegImbPrice 35040 non-null values
PosImbPrice 35040 non-null values
PRef 35040 non-null values
Price 35040 non-null values
EFor 35040 non-null values
dtypes: float64(7)

In [57]: dffc.ix[start_dt:stop_dt]
Out[57]:
<class 'pandas.core.frame.DataFrame'>
DateRange: 961 entries, 2010-01-01 00:00:00 to 2010-01-11 00:00:00
offset: <900 Seconds>
Data columns:
PFor 961 non-null values
Imbalance 961 non-null values
NegImbPrice 961 non-null values
PosImbPrice 961 non-null values
PRef 961 non-null values
Price 961 non-null values
EFor 961 non-null values
dtypes: float64(7)

In [58]: dfs = pandas.HDFStore('test.h5', 'w')

In [59]: dfs.put('dffc', dffc)

In [60]: dfs.close()

In [61]: del(dffc)

In [62]: dfs = pandas.HDFStore('test.h5', 'r')

In [63]: dffc

Traceback (most recent call last):
File "", line 1, in
NameError: name 'dffc' is not defined

In [64]: dffc = dfs['dffc']

In [65]: dffc
Out[65]:
<class 'pandas.core.frame.DataFrame'>
Index: 35040 entries, 2010-01-01 00:00:00 to 2010-12-31 23:45:00
Data columns:
PFor 35040 non-null values
Imbalance 35040 non-null values
NegImbPrice 35040 non-null values
PosImbPrice 35040 non-null values
PRef 35040 non-null values
Price 35040 non-null values
EFor 35040 non-null values
dtypes: float64(7)

In [66]: dffc.ix[start_dt:stop_dt]

Traceback (most recent call last):
File "", line 1, in
File "C:\Python27\lib\site-packages\pandas-0.7.1-py2.7-win32.egg\pandas\core\indexing.py", line 35, in getitem
return self._getitem_axis(key, axis=0)
File "C:\Python27\lib\site-packages\pandas-0.7.1-py2.7-win32.egg\pandas\core\indexing.py", line 167, in _getitem_axis
return self._get_slice_axis(key, axis=axis)
File "C:\Python27\lib\site-packages\pandas-0.7.1-py2.7-win32.egg\pandas\core\indexing.py", line 345, in _get_slice_axis
i, j = labels.slice_locs(start, stop)
File "C:\Python27\lib\site-packages\pandas-0.7.1-py2.7-win32.egg\pandas\core\index.py", line 819, in slice_locs
beg_slice = self.get_loc(start)
File "C:\Python27\lib\site-packages\pandas-0.7.1-py2.7-win32.egg\pandas\core\index.py", line 499, in get_loc
return self._engine.get_loc(key)
File "engines.pyx", line 101, in pandas._engines.DictIndexEngine.get_loc (pandas\src\engines.c:2498)
File "engines.pyx", line 107, in pandas._engines.DictIndexEngine.get_loc (pandas\src\engines.c:2447)
Exception: Index values are not unique

How can I avoid this problem?

@saroele

This comment has been minimized.

Copy link
Author

commented Mar 29, 2012

Could anyone have a look at this? My saved DataFrames become useless after reloading, this is very 'annoying'.
Thanks

@wesm

This comment has been minimized.

Copy link
Member

commented Mar 29, 2012

Can you send me a pickled version of one of these DataFrames to my personal email so I can have a look? (df.save(filepath))

@wesm

This comment has been minimized.

Copy link
Member

commented May 28, 2012

hi @saroele I believe this will be remedied by upgrading to pandas 0.8.0 as soon as it's out; a number of users have had datetime.datetime locale issues with HDF5 that look just like yours. Please create an issue after the release if you still get this error

@wesm wesm closed this May 28, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.