-
Notifications
You must be signed in to change notification settings - Fork 578
Description
Arctic Version
1.80.5
Arctic Store
# ChunkStore
Platform and version
Python 3.8.5
Description of problem and/or code sample that reproduces the issue
I noticed that if I save a dataframe where the UTC date carries over to the next day, most functions (reverse_iterator, get_chunk_ranges, get_info, ...) don't return the chunk for the new date. The following example will make this clear (jupyter notebook attached in the zip file):
Set Up
import pandas as pd
from arctic import Arctic, CHUNK_STORE
store = Arctic("localhost")
store.initialize_library("scratch_lib", lib_type=CHUNK_STORE)
lib = store["scratch_lib"]
Create an Index with some times that will change dates when converted to UTC
ind = pd.Index([pd.Timestamp("20121208T16:00", tz="US/Eastern"), pd.Timestamp("20121208T18:00", tz="US/Eastern"),
pd.Timestamp("20121208T20:00", tz="US/Eastern"), pd.Timestamp("20121208T22:00", tz="US/Eastern")], name="date")
print(ind)
Output:
DatetimeIndex(['2012-12-08 16:00:00-05:00', '2012-12-08 18:00:00-05:00', '2012-12-08 20:00:00-05:00', '2012-12-08 22:00:00-05:00'], dtype='datetime64[ns, US/Eastern]', name='date', freq=None)
print(ind.tz_convert("UTC"))
Output
DatetimeIndex(['2012-12-08 21:00:00+00:00', '2012-12-08 23:00:00+00:00', '2012-12-09 01:00:00+00:00', '2012-12-09 03:00:00+00:00'], dtype='datetime64[ns, UTC]', name='date', freq=None)
Create dataframe, write it to the library, and read it back out
df = pd.DataFrame([1, 2, 3, 4], index=ind, columns=["col"])
lib.write("example_df", df, chunk_size="D")
df_read = lib.read("example_df")
print(df_read)
Output
date col
2012-12-08 21:00:00 1
2012-12-08 23:00:00 2
2012-12-09 01:00:00 3
2012-12-09 03:00:00 4
This is different from what I expected. Is this behavior expected?
lib.get_info("example_df")
Output
{'chunk_count': 1,
'len': 4,
'appended_rows': 0,
'metadata': {'columns': ['date', 'col']},
'chunker': 'date',
'chunk_size': 'D',
'serializer': 'FrameToArray'}
>> expected chunk_count = 2, not 1
list(lib.get_chunk_ranges("example_df"))
Output
[(b'2012-12-08 00:00:00', b'2012-12-08 23:59:59.999000')]
>> expected [(b'2012-12-08 00:00:00', b'2012-12-08 23:59:59.999000'), (b'2012-12-09 00:00:00', b'2012-12-09 23:59:59.999000')]
iterator = lib.reverse_iterator("example_df")
while True:
data = next(iterator, None)
if data is None:
break
print(data)
Output
date col
2012-12-08 21:00:00 1
2012-12-08 23:00:00 2
**>> expected the following:
date col
2012-12-09 01:00:00 3
2012-12-09 03:00:00 4
date col
2012-12-08 21:00:00 1
2012-12-08 23:00:00 2**