Skip to content

Commit

Permalink
MAINT: make the data loading apis more consistent.
Browse files Browse the repository at this point in the history
Changes BcolzDailyBarWriter to not be an abc, data is passed as an
iterator of (sid, dataframe) pairs to the write method.

Changes the AssetsDBWriter to be a single class which accepts an engine
at construction time and has a `write` method for writing dataframes for
the various tables. We no longer support writing the various other data
types, callers should coerce their data into a dataframe themselves. See
zipline.assets.synthetic for some helpers to do this.

Adds many new fixtures and updates some existing fixtures to use the new
ones:

WithDefaultDateBounds
  A fixture that provides the suite a START_DATE and END_DATE. This is
  meant to make it easy for other fixtures to synchronize their date
  ranges without depending on eachother in strange ways. For example,
  WithBcolzMinuteBarReader and WithBcolzDailyBarReader by default should
  both have data for the same dates, so they may use depend on
  WithDefaultDates without forcing a dependency between them.

WithTmpDir, WithInstanceTmpDir
  Provides the suite or individual test case a temporary directory.

WithBcolzDailyBarReader
  Provides the suite a BcolzDailyBarReader which reads from bcolz data
  written to a temporary directory. The data will be read from
  dataframes and then converted to bcolz files with
  BcolzDailyBarWriter.write

WithBcolzDailyBarReaderFromCSVs
  Provides the suite a BcolzDailyBarReader which reads from bcolz data
  written to a temporary directory. The data will be read from a
  collection of CSV files and then converted into the bcolz data through
  BcolzDailyBarWriter.write_csvs

WithBcolzMinuteBarReader
  Provides the suite a BcolzMinuteBarReader which reads from bcolz data
  written to a temporary directory. The data will be read from
  dataframes and then converted to bcolz files with
  BcolzMinuteBarWriter.write

WithAdjustmentReader
  Provides the suite a SQLiteAdjustmentReader which reads from an in
  memory sqlite database. The data will be read from dataframes and then
  converted into sqlite with SQLiteAdjustmentWriter.write

WithDataPortal
  Provides each test case a DataPortal object with data from temporary
  resources.
  • Loading branch information
Joe Jevnik committed Apr 16, 2016
1 parent 8c64cc8 commit bc0b117
Show file tree
Hide file tree
Showing 68 changed files with 5,108 additions and 4,885 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -66,3 +66,5 @@ zipline.iml

# PyCharm custom settings
.idea

TAGS
64 changes: 64 additions & 0 deletions docs/source/whatsnew/1.0.0.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
Development
-----------

:Release: 1.0.0
:Date: TBD

.. warning::
This release is still under active development. All changes listed are
subject to change at any time.


Highlights
~~~~~~~~~~

None

Enhancements
~~~~~~~~~~~~

* Made the data loading classes have more consistent interfaces. This includes
the equity bar writers, adjustment writer, and asset db writer. The new
interface is that the resource to be written to is passed at construction time
and the data to write is provided later to the `write` method as a
dataframe. This model allows us to pass these writer objects around as a
resource for other classes and functions to consume (:issue:`1109`).

Experimental Features
~~~~~~~~~~~~~~~~~~~~~

.. warning::

Experimental features are subject to change.

None

Bug Fixes
~~~~~~~~~

None

Performance
~~~~~~~~~~~

None

Maintenance and Refactorings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

None

Build
~~~~~

None

Documentation
~~~~~~~~~~~~~

None

Miscellaneous
~~~~~~~~~~~~~

None
1 change: 1 addition & 0 deletions etc/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ click==4.0.0

# FUNctional programming utilities
toolz==0.7.4
multipledispatch==0.4.8

# Asset writer and finder
sqlalchemy==1.0.8
Expand Down
121 changes: 65 additions & 56 deletions tests/data/test_us_equity_pricing.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,6 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from unittest import TestCase

from nose_parameterized import parameterized
from numpy import (
arange,
Expand All @@ -28,18 +26,25 @@
Timestamp,
)
from pandas.util.testing import assert_index_equal
from testfixtures import TempDirectory

from zipline.pipeline.loaders.synthetic import (
SyntheticDailyBarWriter,
)
from zipline.data.us_equity_pricing import (
BcolzDailyBarReader,
NoDataOnDate
NoDataOnDate,
)
from zipline.finance.trading import TradingEnvironment
from zipline.pipeline.data import USEquityPricing
from zipline.pipeline.loaders.synthetic import (
OHLCV,
asset_start,
asset_end,
expected_daily_bar_value,
expected_daily_bar_values_2d,
make_daily_bar_data,
)
from zipline.testing import seconds_to_timestamp
from zipline.testing.fixtures import (
WithBcolzDailyBarReader,
ZiplineTestCase,
)

TEST_CALENDAR_START = Timestamp('2015-06-01', tz='UTC')
TEST_CALENDAR_STOP = Timestamp('2015-06-30', tz='UTC')
Expand Down Expand Up @@ -72,58 +77,57 @@
TEST_QUERY_ASSETS = EQUITY_INFO.index


class BcolzDailyBarTestCase(TestCase):
class BcolzDailyBarTestCase(WithBcolzDailyBarReader, ZiplineTestCase):
BCOLZ_DAILY_BAR_START_DATE = TEST_CALENDAR_START
BCOLZ_DAILY_BAR_END_DATE = TEST_CALENDAR_STOP

@classmethod
def setUpClass(cls):
all_trading_days = TradingEnvironment().trading_days
def make_equity_info(cls):
return EQUITY_INFO

@classmethod
def make_daily_bar_data(cls):
return make_daily_bar_data(
EQUITY_INFO,
cls.bcolz_daily_bar_days,
)

@classmethod
def init_class_fixtures(cls):
super(BcolzDailyBarTestCase, cls).init_class_fixtures()
all_trading_days = cls.env.trading_days
cls.trading_days = all_trading_days[
all_trading_days.get_loc(TEST_CALENDAR_START):
all_trading_days.get_loc(TEST_CALENDAR_STOP) + 1
]

def setUp(self):

self.asset_info = EQUITY_INFO
self.writer = SyntheticDailyBarWriter(
self.asset_info,
self.trading_days,
)

self.dir_ = TempDirectory()
self.dir_.create()
self.dest = self.dir_.getpath('daily_equity_pricing.bcolz')

def tearDown(self):
self.dir_.cleanup()

@property
def assets(self):
return self.asset_info.index
return EQUITY_INFO.index

def trading_days_between(self, start, end):
return self.trading_days[self.trading_days.slice_indexer(start, end)]

def asset_start(self, asset_id):
return self.writer.asset_start(asset_id)
return asset_start(EQUITY_INFO, asset_id)

def asset_end(self, asset_id):
return self.writer.asset_end(asset_id)
return asset_end(EQUITY_INFO, asset_id)

def dates_for_asset(self, asset_id):
start, end = self.asset_start(asset_id), self.asset_end(asset_id)
return self.trading_days_between(start, end)

def test_write_ohlcv_content(self):
result = self.writer.write(self.dest, self.trading_days, self.assets)
for column in SyntheticDailyBarWriter.OHLCV:
result = self.bcolz_daily_bar_ctable
for column in OHLCV:
idx = 0
data = result[column][:]
multiplier = 1 if column == 'volume' else 1000
for asset_id in self.assets:
for date in self.dates_for_asset(asset_id):
self.assertEqual(
SyntheticDailyBarWriter.expected_value(
expected_daily_bar_value(
asset_id,
date,
column
Expand All @@ -134,7 +138,7 @@ def test_write_ohlcv_content(self):
self.assertEqual(idx, len(data))

def test_write_day_and_id(self):
result = self.writer.write(self.dest, self.trading_days, self.assets)
result = self.bcolz_daily_bar_ctable
idx = 0
ids = result['id']
days = result['day']
Expand All @@ -145,7 +149,7 @@ def test_write_day_and_id(self):
idx += 1

def test_write_attrs(self):
result = self.writer.write(self.dest, self.trading_days, self.assets)
result = self.bcolz_daily_bar_ctable
expected_first_row = {
'1': 0,
'2': 5, # Asset 1 has 5 trading days.
Expand Down Expand Up @@ -182,16 +186,19 @@ def test_write_attrs(self):
)

def _check_read_results(self, columns, assets, start_date, end_date):
table = self.writer.write(self.dest, self.trading_days, self.assets)
reader = BcolzDailyBarReader(table)
results = reader.load_raw_arrays(columns, start_date, end_date, assets)
results = self.bcolz_daily_bar_reader.load_raw_arrays(
columns,
start_date,
end_date,
assets,
)
dates = self.trading_days_between(start_date, end_date)
for column, result in zip(columns, results):
assert_array_equal(
result,
self.writer.expected_values_2d(
expected_daily_bar_values_2d(
dates,
assets,
EQUITY_INFO,
column.name,
)
)
Expand Down Expand Up @@ -267,35 +274,34 @@ def test_end_on_asset_end(self):
)

def test_unadjusted_spot_price(self):
table = self.writer.write(self.dest, self.trading_days, self.assets)
reader = BcolzDailyBarReader(table)
reader = self.bcolz_daily_bar_reader
# At beginning
price = reader.spot_price(1, Timestamp('2015-06-01', tz='UTC'),
'close')
# Synthetic writes price for date.
self.assertEqual(135630.0, price)
self.assertEqual(108630.0, price)

# Middle
price = reader.spot_price(1, Timestamp('2015-06-02', tz='UTC'),
'close')
self.assertEqual(135631.0, price)
self.assertEqual(108631.0, price)
# End
price = reader.spot_price(1, Timestamp('2015-06-05', tz='UTC'),
'close')
self.assertEqual(135634.0, price)
self.assertEqual(108634.0, price)

# Another sid at beginning.
price = reader.spot_price(2, Timestamp('2015-06-22', tz='UTC'),
'close')
self.assertEqual(235651.0, price)
self.assertEqual(208651.0, price)

# Ensure that volume does not have float adjustment applied.
volume = reader.spot_price(1, Timestamp('2015-06-02', tz='UTC'),
'volume')
self.assertEqual(145631, volume)
self.assertEqual(109631, volume)

def test_unadjusted_spot_price_no_data(self):
table = self.writer.write(self.dest, self.trading_days, self.assets)
table = self.bcolz_daily_bar_ctable
reader = BcolzDailyBarReader(table)
# before
with self.assertRaises(NoDataOnDate):
Expand All @@ -306,18 +312,21 @@ def test_unadjusted_spot_price_no_data(self):
reader.spot_price(4, Timestamp('2015-06-16', tz='UTC'), 'close')

def test_unadjusted_spot_price_empty_value(self):
table = self.writer.write(self.dest, self.trading_days, self.assets)
reader = BcolzDailyBarReader(table)
reader = self.bcolz_daily_bar_reader

# A sid, day and corresponding index into which to overwrite a zero.
zero_sid = 1
zero_day = Timestamp('2015-06-02', tz='UTC')
zero_ix = reader.sid_day_index(zero_sid, zero_day)

# Write a zero into the synthetic pricing data at the day and sid,
# so that a read should now return -1.
# This a little hacky, in lieu of changing the synthetic data set.
reader._spot_col('close')[zero_ix] = 0

close = reader.spot_price(zero_sid, zero_day, 'close')
self.assertEqual(-1, close)
old = reader._spot_col('close')[zero_ix]
try:
# Write a zero into the synthetic pricing data at the day and sid,
# so that a read should now return -1.
# This a little hacky, in lieu of changing the synthetic data set.
reader._spot_col('close')[zero_ix] = 0

close = reader.spot_price(zero_sid, zero_day, 'close')
self.assertEqual(-1, close)
finally:
reader._spot_col('close')[zero_ix] = old
Loading

0 comments on commit bc0b117

Please sign in to comment.