Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upAccessing bundle data outside backtest run #2293
Comments
This comment has been minimized.
This comment has been minimized.
I built some convenience functions for daily data (e.g., https://github.com/marketneutral/alphatools/blob/master/notebooks/research-minimal.ipynb). Would that suffice? Separately, do you ingest futures data properly? If so, can you share the code that builds the security master, etc. and ingests futures price data? Thanks, |
This comment has been minimized.
This comment has been minimized.
Hey Jonathan, Been a while. Hope all's good. That looks brilliant. I'll take a closer look in the office tomorrow. On the futures, I have a futures bundle which is buggy and that I'm not at all happy with. It's next on my to-do list to get a proper futures ingest going. I'd be happy to collaborate on that, as I intend to make all my current Zipline projects public soon. I'm writing a book on Python based backtesting, and using Zipline as the primary library. Don't tell anyone. :) Anyhow, for that reason, I'm looking for as simple and straight forward ways of doing things as possible, avoiding the usual type of workarounds. It's not just about getting it done, but rather getting it done in an easily explainable manner. |
This comment has been minimized.
This comment has been minimized.
The reason I ask about the futures ingest, is that my package currently only works for equities bundles. I just added a feature to allow you to set the bundle in research. For futures data, I have pretended they are equities and used the |
This comment has been minimized.
This comment has been minimized.
Apologies in advance for the long post, and also for the topic drift of the thread... I dusted off my half baked futures bundle today. I started building this a few months back, got stuck on some weird stuff, and decided to get other things finished first. Would be great if we can figure out a solution to this. I have gotten as far as having the bundle properly ingest data, seemingly storing it all as it should. I can inspect the generated sqlite files, and they appear to contain the expected data. However, I have not yet managed to actually use this in an algo. Calendar Issue What I'd love to do is just disable this calendar check completely. My data is good. If there's data for a given day, that market was traded that day. If not, it wasn't. But that doesn't seem to work, so I used a really dumb workaround for now, just to get rid of this very annoying issue. I use the NYSE calendar. A function checks all trading days for a stock which I know covers all possible trading days. When I then read the futures data, I run it by the 'verified valid days', using simple Pandas logic to sync it to the required dates. Yes, that's a total mess, but I didn't figure out a better way yet. There also seem to be a known issue with dates earlier than 2000, and to avoid that issue for now, I simply slice those off. To be fixed later... Data Source Remaining Issue
Implementation
And the extension entry:
|
This comment has been minimized.
This comment has been minimized.
Great. Thanks for sharing. I hope to be able to work with this over the next few weeks. |
This comment has been minimized.
This comment has been minimized.
Looking forward to hearing your thoughts. For those watching at home, I should also share a 'trick' in how to check if your bundle seems to store things correctly. The ingest process creates a bunch of files and folders under your home/user folder, and those can be inspected with standard software. Check under ~/.zipline/data/bundle_name/timestamp and you'll fine some sqllite files. Those can be opened in DB Browser or similar freeware, and you can check what's in there. You may even find some interesting features of their database design, which can (probably) be used in various fun ways. There are a few fields in there that I hadn't seen in the docs, which I'm sure I'll have use for. |
This comment has been minimized.
This comment has been minimized.
@AndreasClenow Do you expect that your ingest process creates I am able to ingest daily CME data (where each file is a day, not a symbol), as below. This generates a valid bundle. I can see the sqlite3 data, e.g.,
And I can see the object in an algo:
However, there is no pricing data to be found. When I do a
So the import datetime
import os
import numpy as np
import pandas as pd
from six import iteritems
from tqdm import tqdm
from trading_calendars import get_calendar
from zipline.assets.futures import CME_CODE_TO_MONTH
from zipline.data.bundles import core as bundles
def csvdir_futures(tframes, csvdir):
return CSVDIRFutures(tframes, csvdir).ingest
class CSVDIRFutures:
"""
Wrapper class to call csvdir_bundle with provided
list of time frames and a path to the csvdir directory
"""
def __init__(self, tframes, csvdir):
self.tframes = tframes
self.csvdir = csvdir
def ingest(self,
environ,
asset_db_writer,
minute_bar_writer,
daily_bar_writer,
adjustment_writer,
calendar,
start_session,
end_session,
cache,
show_progress,
output_dir):
futures_bundle(environ,
asset_db_writer,
minute_bar_writer,
daily_bar_writer,
adjustment_writer,
calendar,
start_session,
end_session,
cache,
show_progress,
output_dir,
self.tframes,
self.csvdir)
def third_friday(year, month):
"""Return datetime.date for monthly option expiration given year and
month
"""
# The 15th is the lowest third day in the month
third = datetime.date(year, month, 15)
# What day of the week is the 15th?
w = third.weekday()
# Friday is weekday 4
if w != 4:
# Replace just the day (of month)
third = third.replace(day=(15 + (4 - w) % 7))
return third
def load_data(parent_dir):
"""Given a parent_dir of cross-sectional daily files,
read in all the days and return a big dataframe.
"""
#list the files
filelist = os.listdir(parent_dir)
#read them into pandas
df_list = [
pd.read_csv(os.path.join(parent_dir, file), parse_dates=[1])
for file
in tqdm(filelist)
]
#concatenate them together
big_df = pd.concat(df_list)
big_df.columns = map(str.lower, big_df.columns)
big_df.symbol = big_df.symbol.astype('str')
mask = big_df.symbol.str.len() == 5 # e.g., ESU18; doesn't work prior to year 2000
return big_df.loc[mask]
def gen_asset_metadata(data, show_progress, exchange='EXCH'):
if show_progress:
log.info('Generating asset metadata.')
data = data.groupby(
by='symbol'
).agg(
{'date': [np.min, np.max]}
)
data.reset_index(inplace=True)
data['start_date'] = data.date.amin
data['end_date'] = data.date.amax
del data['date']
data.columns = data.columns.get_level_values(0)
data['exchange'] = exchange
data['root_symbol'] = data.symbol.str.slice(0,2)
data['exp_month_letter'] = data.symbol.str.slice(2,3)
data['exp_month'] = data['exp_month_letter'].map(CME_CODE_TO_MONTH)
data['exp_year'] = 2000 + data.symbol.str.slice(3,5).astype('int')
data['expiration_date'] = data.apply(lambda x: third_friday(x.exp_year, x.exp_month), axis=1)
del data['exp_month_letter']
del data['exp_month']
del data['exp_year']
data['auto_close_date'] = data['end_date'].values + pd.Timedelta(days=1)
data['notice_date'] = data['auto_close_date']
data['tick_size'] = 0.0001 # Placeholder for now
data['multiplier'] = 1 # Placeholder for now
return data
def parse_pricing_and_vol(data,
sessions,
symbol_map):
for asset_id, symbol in iteritems(symbol_map):
asset_data = data.xs(
symbol,
level=1
).reindex(
sessions.tz_localize(None)
).fillna(0.0)
yield asset_id, asset_data
@bundles.register('futures')
def futures_bundle(environ,
asset_db_writer,
minute_bar_writer,
daily_bar_writer,
adjustment_writer,
calendar,
start_session,
end_session,
cache,
show_progress,
output_dir,
tframes=None,
csvdir=None):
import pdb; pdb.set_trace()
raw_data = load_data('/Users/jonathan/devwork/pricing_data/CME_2018')
asset_metadata = gen_asset_metadata(raw_data, False)
root_symbols = asset_metadata.root_symbol.unique()
root_symbols = pd.DataFrame(root_symbols, columns = ['root_symbol'])
root_symbols['root_symbol_id'] = root_symbols.index.values
asset_db_writer.write(futures=asset_metadata, root_symbols=root_symbols)
symbol_map = asset_metadata.symbol
sessions = calendar.sessions_in_range(start_session, end_session)
raw_data.set_index(['date', 'symbol'], inplace=True)
daily_bar_writer.write(
parse_pricing_and_vol(
raw_data,
sessions,
symbol_map
),
show_progress=show_progress
) Btw- I believe your calendar issues are solved in |
This comment has been minimized.
This comment has been minimized.
I thought it would be funny to try just renaming the directory
So perhaps it is right that the futures data is there? |
This comment has been minimized.
This comment has been minimized.
Yes, I would expect futures object, so that I can use the continuous futures logic and similar functionalities. I got a bit further here this morning, reworking the code a bit. Seems like my weird bug above was related to a missing meta field.
I hadn't bothered providing exchange code in the meta data, and it seems like such omissions are frowned upon. Adding the field got me a bit further. I can write the bundle, get all the asset fields in the sqlite file seemingly correct. I can, as you demonstrated, fetch a futures object with future_symbol(). Where my script now taps out is when trying to read historical data. Looking at the folder structure, the data seems to be there, albeit in the /daily_equities.bcolz/ path. I'm not sure if this is a legacy naming scheme or if it's supposed to change with asset class. My current error
Updated Bundle
|
This comment has been minimized.
This comment has been minimized.
Yes, on
When I step through this,
So, indeed, there is no reader capable of reading futures data. Can any Q folk shed some light on this? In the meantime, will continue to try. |
This comment has been minimized.
This comment has been minimized.
I have taken the liberty to ping guys in Boston on this, and they are looking into it. I'll report back when I hear anything. |
This comment has been minimized.
This comment has been minimized.
Some more info, the environment is set up here when zipline starts up. Note this:
But the signature for
Note future_daily_reader=None, so zipline is not being initialized with a reader capable of finding futures data. |
This comment has been minimized.
This comment has been minimized.
Success... import os
import pandas as pd
from zipline.data import bundles
from zipline.data.data_portal import DataPortal
from zipline.utils.calendars import get_calendar
from zipline.assets._assets import Future
from zipline.utils.run_algo import load_extensions
# Load extensions.py; this allows you access to custom bundles
load_extensions(
default=True,
extensions=[],
strict=True,
environ=os.environ,
)
# Set-Up Pricing Data Access
trading_calendar = get_calendar('CME')
bundle = 'futures'
bundle_data = bundles.load(bundle)
data = DataPortal(
bundle_data.asset_finder,
trading_calendar=trading_calendar,
first_trading_day=bundle_data.equity_daily_bar_reader.first_trading_day,
equity_minute_reader=None,
equity_daily_reader=bundle_data.equity_daily_bar_reader,
future_daily_reader=bundle_data.equity_daily_bar_reader,
adjustment_reader=bundle_data.adjustment_reader,
)
fut = bundle_data.asset_finder.retrieve_futures_contracts([0])[0]
end_dt = pd.Timestamp('2018-01-05', tz='UTC', offset='C')
start_dt = pd.Timestamp('2018-01-02', tz='UTC', offset='C')
end_loc = trading_calendar.closes.index.get_loc(end_dt)
start_loc = trading_calendar.closes.index.get_loc(start_dt)
dat = data.get_history_window(
assets=[fut],
end_dt=end_dt,
bar_count=end_loc - start_loc,
frequency='1d',
field='close',
data_frequency='daily'
) and
|
This comment has been minimized.
This comment has been minimized.
So @AndreasClenow if you change
to
Your algo will run. |
This comment has been minimized.
This comment has been minimized.
Well done, Jonathan! I owe you a beer next time. I've still got some issue, but likely with my own code. I get only NaN values now. Well, that's probably my own but, so I'll find it and kill it. But it's Friday evening on my side of the Pond, so that would have to wait for next week... |
This comment has been minimized.
This comment has been minimized.
Hey @AndreasClenow, wondering if you have had any success with Continuous Futures. I am not having success, but I wonder if my contract meta data (e.g., expiry, auto_close_date, first_notice_date) are wrong. As a case study, repro https://www.quantopian.com/posts/futures-data-now-available-in-research:
That looks perfect! However...
So it gets stuck on the first contract and never rolls. I can see in a backtest run over the same period with
gives log output
So it does roll, but only on the F contract and not G,H,J,K,M... Do your continuous futures work? I am using Quandl CME Wiki data; it's free and has good coverage. |
This comment has been minimized.
This comment has been minimized.
Ha, I was in the middle of typing about my own issues, when yours appeared on the screen. I started by trying to get some trades done. As that didn't work, I haven't gotten much farther yet... Did you get an algo to make trades? I have to leave the office in a moment. Will try your example when I return tomorrow. ---- my progress: Clearly, the data is there. I can define a continuation, pull historical data from it, get current contract and pull historical data from that too. Cool stuff. However, when I try to place an order, it seems like we're back to equity treatment. Orders fail with exception raised on lack of table named SPLITS.
|
This comment has been minimized.
This comment has been minimized.
I hadn't tried to order in an algo; but I just did and got the same error. I figured it out though... it's also in the The default is data = DataPortal(
bundle_data.asset_finder,
trading_calendar=trading_calendar,
first_trading_day=first_trading_day,
equity_minute_reader=bundle_data.equity_minute_bar_reader,
equity_daily_reader=bundle_data.equity_daily_bar_reader,
adjustment_reader=bundle_data.adjustment_reader,
) and you want to change it to data = DataPortal(
bundle_data.asset_finder,
trading_calendar=trading_calendar,
first_trading_day=first_trading_day,
equity_minute_reader=bundle_data.equity_minute_bar_reader,
future_daily_reader=bundle_data.equity_daily_bar_reader,
) Note the two changes: adding the Your algo will run and be able to make trades. I suppose if you had ingested both equities and futures data, then the adjustment reader would look for the future in the SPLITS table, find nothing, and move on. But here we have a situation of just futures data. |
This comment has been minimized.
This comment has been minimized.
Anyhow, to memorialize the issue I am seeing: clf16_sid = bundle_data.asset_finder.lookup_future_symbol('CLF2016').sid
start_dt = pd.Timestamp('2015-10-21', tz='UTC', offset='C')
cl = continuous_future('CL', offset=0, roll_style='calendar', adjustment='mul')
oc = bundle_data.asset_finder.get_ordered_contracts('CL')
chain = oc.active_chain(clf16_sid, start_dt.value)
all_chain = bundle_data.asset_finder.retrieve_all(chain)
all_chain and I believe that should return the futures chain ordered by auto_close_date. However, I see
And meta data looks ok...
And the |
This comment has been minimized.
This comment has been minimized.
Hm.. Removing the adjustment reader doesn't seem like a great solution. I'd then have to either go back and forth and edit the file when running an equity algo, or make two environments. I hope the Q team can find a proper fix for this soon. The roll behavior I got is quite odd. I'm investigating the reasons for it at the moment, and I have yet to rule out that it's somehow my own fault. What happens is that the continuation is always based on March contracts. Rolling one year at a time, to the following year H contract. Puzzled over why H was picked, I loaded up a symbol without a March issue, Soybean Oil. In this case, it picked only F, again rolling from year to year. Seems like it picks the first expiring contract of the year. Despite asking for a volume based roll, it keeps the contract up until the very last day, implying that it does not have any valid volume data. As a side note, rolling on OI generally makes more sense, but that seems like a later issue... I'm trying to replicate your graphs to check the data, but I have not yet been able to figure out how to access the bundle time series data. My attempt so far:
This results in
|
This comment has been minimized.
This comment has been minimized.
@AndreasClenow, a few things
data_portal = DataPortal(asset_finder=bundle_data.asset_finder,
trading_calendar=trading_calendar,
first_trading_day=start_dt) and are missing the daily bar reader. Try data_portal = DataPortal(
asset_finder=bundle_data.asset_finder,
trading_calendar=trading_calendar,
first_trading_day=bundle_data.equity_daily_bar_reader.first_trading_day,
future_daily_reader=bundle_data.equity_daily_bar_reader,
)
|
This comment has been minimized.
This comment has been minimized.
So... if you sort the input data by To get this working, in your ingestion, in return data.sort_values(by='auto_close_date').reset_index(drop=True) It's unclear why this is necessary, but at least everything with futures works now. We've learned a lot through this exercise. I'll put up a full working walk through (get Quandl data, ingest function, algo, nb showing actual and continuous futures) in the next week or so. |
This comment has been minimized.
This comment has been minimized.
Thanks, Jonathan. I'm playing catch-up, with pesky things like my day job getting in the way. :) I'll adapt my solution tomorrow. |
This comment has been minimized.
This comment has been minimized.
After some distractions, I've caught up to where you are, I believe. My solution has the timeseries data, ohclv, I can place simulated orders with order_target and order_target_percent. The continuations however seem a little odd. I can resolve continuation contracts for most markets, but not all. The data provided in the bundle/ingest seems fine and has the same structure. But some root symbols, the the 'current contract' is always None. I'll keep investigating to see what's going on there. Could very well be my own fault. I also see some suspect behavior in the roll timing, but I need to solve my first issue before digging deeper into that. First glance, it looks like it's rolling on expiry, not when volume shifts. I'd really prefer to roll on OI, but that doesn't seem supported yet. |
This comment has been minimized.
This comment has been minimized.
I've still got a bit of an odd issue left. Perhaps you've seen something similar. After a bit of wrangling, I've got a futures bundle that seems to work well. For almost all markets. I added some 20'000 individual contracts for 73 markets (root symbols). Most of this works just fine. But a handful of the root symbols are not working. Let me define not working. I can get a continous_future object:
I can see what looks like all the correct fields in the sqlite file. If I try to resolve for current contract in the continuation, it returns None:
Out of 73 markets, which had the same structure, were added with the same logic, same bundle, 5 fail in this manner, while the rest work just fine. Interestingly, I can pull the time series data directly from the individual future contract. In this case:
Everything seems fine, except that the continuation for this 5 contracts fail. Is this an issue you, or anyone else reading this, have seen before? Anything I could check to investigate why this fails? |
This comment has been minimized.
This comment has been minimized.
Hey @AndreasClenow, In my case I am only looking at about 10 root symbols, via Quandl CME daily data and it seems to all be working for me. My experience with this project tells me that your issue is likely somewhere in the data ingestion or with the meta-data (e.g., how you set |
This comment has been minimized.
This comment has been minimized.
I just tried calendar method, with same result. My logic for auto_close_date is taken from a sample code. Looking at it, I question if it makes sense though...
For real life use, these settings would clearly not make sense, but they seem to work for tinkering. Autoclose one day after a contract expired makes no intuitive sense to me. I'll muck about with the config and see what I can come up with. Regarding futures, on my Christmas wish list from Q would be 1. multi currency support and 2. roll on open interest support. |
This comment has been minimized.
This comment has been minimized.
Try setting a more realistic expiry and autoclose date. I parse the month and year from the symbol and set it to the third Friday in that month.
… On Oct 3, 2018, at 11:06 AM, Andreas Clenow ***@***.***> wrote:
I just tried calendar method, with same result.
My logic for auto_close_date is taken from a sample code. Looking at it, I question if it makes sense though...
data['start_date'] = data.date.amin
data['end_date'] = data.date.amax
data['first_traded'] = data['start_date
data['auto_close_date'] = pd.to_datetime(data['end_date']) + pd.Timedelta(days=1)
data['notice_date'] = pd.to_datetime(data['end_date'])
data['expiration_date'] = pd.to_datetime(data['end_date'])
For real life use, these settings would clearly not make sense, but they seem to work for tinkering. Autoclose one day after a contract expired makes no intuitive sense to me. I'll muck about with the config and see what I can come up with.
Regarding futures, on my Christmas wish list from Q would be 1. multi currency support and 2. roll on open interest support.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
This comment has been minimized.
This comment has been minimized.
@marketneutral and @AndreasClenow : thank you so much for sharing this. Heavily relying on your work I was able to put together a bundle downloading all CME data from Quandl. It's there, if anybody wants it: https://github.com/t1user/zipline @AndreasClenow : I had the same problem with some symbols not working in continuous_future even though they seemed to be ingested exactly in the same manner as everything else. I spent many hours trying to figure it out. I also saw your other post about how just changing symbols helped. However unbelievable, indeed the issue is somehow related to some particular symbols. But it's not a bug, it's a feature -:) The reason for this behaviour is here: https://github.com/quantopian/zipline/blob/master/zipline/assets/continuous_futures.pyx#L41 Some symbols were hand-picked to behave differently in continuous_future, i.e. to use only active contracts. There's sound logic behind it. However, for this additional feature to work, years in symbols have to be encoded as two digits. So if you're using a data provider that encodes years as 4 digits, such as for instance Quandl (I'm pretty sure your provider does the same), it doesn't work. To make everything work, just ensure that symbols you ingest are this style: GCM18 rather than this: GCM2018. I guess the actual feature of including only active contracts in continuous_future doesn't really matter if you're using roll_style='volume' . But then, if you did want to rely on it, you would want to make sure that all relevant contracts are affected. In zipline contract codes are hard coded in a cython, compiled file, quite a hassle to change. I find this design choice a bit peculiar. |
This comment has been minimized.
This comment has been minimized.
Thanks @t1user ! That code explains a lot. This looks like an unfortunate workaround, and I hope they will address the core issue here instead. The assumption of two letter futures root code won't hold up if you start expanding the universe a bit. What really should be done, in my view, is to incorporate open interest. In almost all cases, using open interest to construct continuations will result in a far more realistic series than volume, dates or hard coded solutions. I'm also hoping for international support for futures soon, now that we're getting it for equities. In that case, a two letter code assumption will go out the window fast though. I just updated my solution with this info, and it works great. You solution for the CME data is very helpful, and with your permission I'd like to publish part or whole of it, with full credit of course. I'm working on a publication on this topic, and your sample would be great to use as a teaching tool. Contact me offline at first name dot last name at gmail if you'd like some more info on what I'm up to. |
This comment has been minimized.
This comment has been minimized.
Feel free to use my code anyway you like with or without credit. I'll be
happy to try and customize it too if needed. I'll pm you to discuss.
I fully sympatize with your points about open interest, hard coding, single
currency only, etc. I wouldn't hold my breath for proper multicurrency
platform given it's not really working on Quantopian yet (international
markets included but only in research and still single currency).
This raises a bigger question: can zipline really be a general, independent
backtesting and research platform? Quantopian seems to be focused
predominantly on long/short equity strategies and there isn't any
independent developer community involved (or is there?). Not ideal if you
want to trade futures, currencies, options or preferably all of them.
…On Fri, 9 Nov 2018, 16:25 Andreas Clenow ***@***.*** wrote:
Thanks @t1user <https://github.com/t1user> ! That code explains a lot.
This looks like an unfortunate workaround, and I hope they will address the
core issue here instead. The assumption of two letter futures root code
won't hold up if you start expanding the universe a bit.
What really should be done, in my view, is to incorporate open interest.
In almost all cases, using open interest to construct continuations will
result in a far more realistic series than volume, dates or hard coded
solutions.
I'm also hoping for international support for futures soon, now that we're
getting it for equities. In that case, a two letter code assumption will go
out the window fast though.
I just updated my solution with this info, and it works great.
You solution for the CME data is very helpful, and with your permission
I'd like to publish part or whole of it, with full credit of course. I'm
working on a publication on this topic, and your sample would be great to
use as a teaching tool. Contact me offline at first name dot last name at
gmail if you'd like some more info on what I'm up to.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2293 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ATkgsUmYuSUvx3n-xZYyTqxbdJJB8Jmuks5utZ5kgaJpZM4Wo3_p>
.
|
Sorry if this has been discussed already. I didn't find anything in the forums.
Does Zipline have a method to access individual symbol time-series data from a bundle, outside of the algo run? Is so, how?
I want to make custom trade charts, after a simulation run. Pulling time-series data, and overlaying markers/graphics to show visually how the algo traded a given symbol.
The logical solution seems to be to use zipline.protocol.BarData but I failed to get that working so far, and the documentation isn't helping much.
My currency solution is clunky, and involves pulling data directly from the same source that the bundle pulls from. It would be cleaner to use Zipline and pull from the bundle.
Would be great if anyone has input on this.