Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
PyTables dates don't work when you switch to a different time zone #2852
Comments
|
this is not well tested but will take a look thanks On Feb 12, 2013, at 1:29 PM, tavistmorph notifications@github.com wrote:
|
|
this solution looks reasonable can you do a PR with this (and some tests?) not sure how to 'fake' the timezone....prob have to do it explicity.. Also if the toordinal returns an int? then the column should be Int32 and then don't actually need the try: except: on the unconvert, instead you can check |
darindillon
commented
Feb 12, 2013
|
Not sure what a PR is (I assume that's a Git thing, and I'm new to Git). But yes, we have tested the fix I described above significantly in our environment with our real-world dataframes and it's working fine for us. |
|
a pull-request |
|
Action here? |
|
pushing till 0.12, have to store extra data for the column, so need to thing how general this needs to be (see #2391, @scottkidder wanted to work on this) |
|
this is closed by #3531 |
jreback
closed this
May 8, 2013
numpand
commented
Sep 13, 2013
|
This issue is not fixed in pandas 0.12.0. I just reproduced it by following the test case outlined in the description. |
|
this particular issue is untestable so closed. The store correctly stores/retrieves timezone aware datetimes in 0.12. What exactly are you reproducing? |
numpand
commented
Sep 13, 2013
|
I am using dates (not datetimes). I stored dates in a dataframe index and then saved to an hdf store in Eastern timezone, and then retrieved it in Central timezone and got (date - 1) in the index. What exactly do you mean by this issue being untestable? |
|
|
|
what's untestable is resetting the default time zone on the computer (well it IS testable, but the datetimes work fine) so its not an issue. you are having a different issue, in that |
numpand
commented
Sep 13, 2013
|
Well, at the end of the day, datetime.date is "supported" because there is an "if" statement checking for it and doing something based on that (something that's causing a problem when using the hdf store on two different timezones). Is there any harm in making the change suggested by the original submitter? |
|
can you create a self-contained example for testing? (if so, then no problem) |
numpand
commented
Sep 14, 2013
|
https://www.wakari.io/sharing/bundle/johnv/pandas_issue_2852 - shared an ipython notebook on wakari. Let me know if you have problems accessing it (code won't work on Windows because of missing tzset). I'll repeat the code below just in case: import datetime
import os
import time
import pandas as pd
def setTZ(tz):
os.environ['TZ']=tz
time.tzset()
setTZ('EST5EDT')
today = datetime.date(2013,9,10)
df = pd.DataFrame([1,2,3], index = [today, today, today])
print df
filename = 'test.hdf5'
store = pd.HDFStore(filename)
store['obj1'] = df
store.close()
setTZ('CST6CDT')
store = pd.HDFStore(filename)
read_df = store['obj1']
store.close()
print read_df
df.index[0]==read_df.index[0] |
jreback
referenced
this issue
Sep 14, 2013
Merged
BUG: store datetime.date objects in HDFStore as ordinals rather then timetuples to avoid timezone issues (GH2852) #4841
|
@numpand thanks...see PR #4841 just note that storing a Even manipulating in pandas is very inefficient as these are treated as You should ALWAYS use Is there a reason you and @tavistmorph store things this way? |
darindillon
commented
Sep 14, 2013
|
We want to be able to use Datetime.date because its a standard, standard, standard python type that's built in to the language and is needed to interact with hundreds of other non-pandas libraries. No question that datetime64 is more efficient, but for data less than a million rows, the efficiency gains are outweighted by the convenience of not having to convert datetime64 to date.datetime every time we want to interact with some other library. On Sep 14, 2013, at 5:12 PM, jreback notifications@github.com wrote:
|
|
PR was just merged. you are welcome. |
numpand
commented
Sep 14, 2013
|
@jreback thank you! I'll keep in mind the performance penalty and revisit the usage of |
darindillon commentedFeb 12, 2013
Create a dataframe with a date (not datetime) column as the index. Save that dataframe to the HDFStore. Now open that same file on a computer in a time zone that's behind you. (IE, write the file in New York and then read the file from Dallas). Result: all the dates are off by one.
Here is a code snippet to demonstrate the problem, and also a proposed fix to the pandas code. (But someone more qualified than me can review this fix).
To repro the problem:
write a file in New York time zone
import pandas
import datetime
def writeToFile(fileName) :
dates = [ datetime.date.today() , datetime.date.today(), datetime.date.today() ]
numbers = [ 1, 2, 3 ]
p = pandas.DataFrame({ "date" : dates, "number": numbers}, columns = ['date', 'number'])
p.set_index('date', inplace=True)
store = pandas.HDFStore(fileName)
store['obj1'] = p
store.close()
Then go to a dallas time zone and read that file:
def readFromFile(fileName) :
store = pandas.HDFStore(fileName)
return store['obj1']
Notice that the dates on the dataframe now show YESTERDAY when they ought to show today.
Proposed fix in pandas.io.pytables.py:
def _convert_index(index):
...
elif inferred_type == 'date':
#OLD LINE: CHANGE THIS LINE
#converted = np.array([time.mktime(v.timetuple()) for v in values],
# dtype=np.int32)
#NEW LINE:
converted = np.array([v.toordinal() for v in values], dtype=np.int32)
and then also change:
def _unconvert_index(data, kind):
...
elif kind == 'date':
# here we'll try from ordinal first. If the date was saved with the old
# mktime mechanism it'll throw an exception as it'll be out of bounds.
# in those cases we'll convert using the old method (with the bug!)
try:
index = np.array([date.fromordinal(v) for v in data], dtype=object)
except ValueError:
index = np.array([date.fromtimestamp(v) for v in data], dtype=object)