Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessing bundle data outside backtest run #2293

Open
AndreasClenow opened this issue Sep 14, 2018 · 31 comments
Open

Accessing bundle data outside backtest run #2293

AndreasClenow opened this issue Sep 14, 2018 · 31 comments

Comments

@AndreasClenow
Copy link

@AndreasClenow AndreasClenow commented Sep 14, 2018

Sorry if this has been discussed already. I didn't find anything in the forums.

Does Zipline have a method to access individual symbol time-series data from a bundle, outside of the algo run? Is so, how?

I want to make custom trade charts, after a simulation run. Pulling time-series data, and overlaying markers/graphics to show visually how the algo traded a given symbol.

The logical solution seems to be to use zipline.protocol.BarData but I failed to get that working so far, and the documentation isn't helping much.

My currency solution is clunky, and involves pulling data directly from the same source that the bundle pulls from. It would be cleaner to use Zipline and pull from the bundle.

Would be great if anyone has input on this.

@marketneutral

This comment has been minimized.

Copy link

@marketneutral marketneutral commented Sep 17, 2018

Hi @AndreasClenow

I built some convenience functions for daily data (e.g., https://github.com/marketneutral/alphatools/blob/master/notebooks/research-minimal.ipynb). Would that suffice?

Separately, do you ingest futures data properly? If so, can you share the code that builds the security master, etc. and ingests futures price data?

Thanks,
Jonathan

@AndreasClenow

This comment has been minimized.

Copy link
Author

@AndreasClenow AndreasClenow commented Sep 17, 2018

Hey Jonathan,

Been a while. Hope all's good.

That looks brilliant. I'll take a closer look in the office tomorrow.

On the futures, I have a futures bundle which is buggy and that I'm not at all happy with. It's next on my to-do list to get a proper futures ingest going. I'd be happy to collaborate on that, as I intend to make all my current Zipline projects public soon.

I'm writing a book on Python based backtesting, and using Zipline as the primary library. Don't tell anyone. :)

Anyhow, for that reason, I'm looking for as simple and straight forward ways of doing things as possible, avoiding the usual type of workarounds. It's not just about getting it done, but rather getting it done in an easily explainable manner.

@marketneutral

This comment has been minimized.

Copy link

@marketneutral marketneutral commented Sep 17, 2018

The reason I ask about the futures ingest, is that my package currently only works for equities bundles.

I just added a feature to allow you to set the bundle in research. For futures data, I have pretended they are equities and used the csvdir ingest feature to load the (daily) data, but would prefer to handle futures data properly (perhaps ingesting the Quandl CME data which looks high quality and is free). I've seen a few issues asking about this without a full resolution. So if you'd like to share your current implementation, I can try to help.

@AndreasClenow

This comment has been minimized.

Copy link
Author

@AndreasClenow AndreasClenow commented Sep 18, 2018

Apologies in advance for the long post, and also for the topic drift of the thread...

I dusted off my half baked futures bundle today. I started building this a few months back, got stuck on some weird stuff, and decided to get other things finished first. Would be great if we can figure out a solution to this.

I have gotten as far as having the bundle properly ingest data, seemingly storing it all as it should. I can inspect the generated sqlite files, and they appear to contain the expected data. However, I have not yet managed to actually use this in an algo.

Calendar Issue
The enforced calendar logic is driving me off the walls. It caused a nightmare of issues for futures data. The included futures calendars are not helpful. Modifying them seem to have a rather random result, sometimes helping sometimes seemingly making no difference. If there is any mismatch whatsoever, the ingest blows up.

What I'd love to do is just disable this calendar check completely. My data is good. If there's data for a given day, that market was traded that day. If not, it wasn't. But that doesn't seem to work, so I used a really dumb workaround for now, just to get rid of this very annoying issue.

I use the NYSE calendar. A function checks all trading days for a stock which I know covers all possible trading days. When I then read the futures data, I run it by the 'verified valid days', using simple Pandas logic to sync it to the required dates.

Yes, that's a total mess, but I didn't figure out a better way yet.

There also seem to be a known issue with dates earlier than 2000, and to avoid that issue for now, I simply slice those off. To be fixed later...

Data Source
This implementation reads individual csv files from disk. These files come from CSI data, and contain daily time series history. I pull the metadata direct from a local MySql sandbox securities db.

Remaining Issue
As mentioned, the bundle does seem to work. It reads from the files, and stores in Zipline's SqlLite files without complaints. But when trying to run an algo with this bundle, I get the following:


TypeError                                 Traceback (most recent call last)
<ipython-input-3-a48d05b9c6fb> in <module>()
     52     capital_base=100000,
     53     data_frequency = 'daily',
---> 54     bundle='ac_futures' ) 

C:\ProgramData\Anaconda3_new\envs\zip35\lib\site-packages\zipline\utils\run_algo.py in run_algorithm(start, end, initialize, capital_base, handle_data, before_trading_start, analyze, data_frequency, data, bundle, bundle_timestamp, trading_calendar, metrics_set, default_extension, extensions, strict_extensions, environ, blotter)
    428         local_namespace=False,
    429         environ=environ,
--> 430         blotter=blotter,
    431     )

C:\ProgramData\Anaconda3_new\envs\zip35\lib\site-packages\zipline\utils\run_algo.py in _run(handle_data, initialize, before_trading_start, analyze, algofile, algotext, defines, data_frequency, capital_base, data, bundle, bundle_timestamp, start, end, output, trading_calendar, print_algo, metrics_set, local_namespace, environ, blotter)
    227     ).run(
    228         data,
--> 229         overwrite_sim_params=False,
    230     )
    231 

C:\ProgramData\Anaconda3_new\envs\zip35\lib\site-packages\zipline\algorithm.py in run(self, data, overwrite_sim_params)
    666             self._assets_from_source = \
    667                 self.trading_environment.asset_finder.retrieve_all(
--> 668                     self.trading_environment.asset_finder.sids
    669                 )
    670 

C:\ProgramData\Anaconda3_new\envs\zip35\lib\site-packages\zipline\assets\assets.py in retrieve_all(self, sids, default_none)
    490         update_hits(self.retrieve_equities(type_to_assets.pop('equity', ())))
    491         update_hits(
--> 492             self.retrieve_futures_contracts(type_to_assets.pop('future', ()))
    493         )
    494 

C:\ProgramData\Anaconda3_new\envs\zip35\lib\site-packages\zipline\assets\assets.py in retrieve_futures_contracts(self, sids)
    548             When any requested asset isn't found.
    549         """
--> 550         return self._retrieve_assets(sids, self.futures_contracts, Future)
    551 
    552     @staticmethod

C:\ProgramData\Anaconda3_new\envs\zip35\lib\site-packages\zipline\assets\assets.py in _retrieve_assets(self, sids, asset_tbl, asset_type)
    675         for row in rows:
    676             sid = row['sid']
--> 677             asset = asset_type(**filter_kwargs(row))
    678             hits[sid] = cache[sid] = asset
    679 

C:\ProgramData\Anaconda3_new\envs\zip35\lib\site-packages\zipline\assets\_assets.pyx in zipline.assets._assets.Future.__init__ (zipline/assets\_assets.c:5110)()

TypeError: __init__() takes at least 2 positional arguments (1 given)

Implementation
This 'implementation' is in development, and the code is not pretty.


from os import listdir
from os.path import isfile, join

import numpy  as np
import pandas as pd
from . import core as bundles

from sqlalchemy import create_engine

from zipline.utils.cli import maybe_show_progress

import sys
from logbook import Logger, StreamHandler
handler = StreamHandler(sys.stdout, format_string=" | {record.message}")
logger = Logger(__name__)
logger.handlers.append(handler)

engine = create_engine('redacted connection string')

def get_meta_df():
    """
    Fetches metadata direct from securities db
    """
    
    meta_query = """
    select CsiSymbol as root_symbol, BigPointValue, csiMultiplier
    from futures_meta """
    
    df_meta = pd.read_sql_query(meta_query, engine, index_col='root_symbol')
    
    return df_meta

def get_valid_dates():
    """
    Stupid workaround for calendar issue. 
    Gets valid days for NYSE calendar.
    """
    query = "select trade_date from equity_history where ticker='CAR' order by trade_date"
    df = pd.read_sql_query(query, engine, index_col='trade_date', parse_dates=['trade_date'])
    return df

def get_valid_futures_data(path, symbol):
    """
    Stupid workaround for calendar issue.
    Gets valid days, according to NYSE calendar.
    """
    file = symbol + '.CSV'
    df = pd.read_csv(join(path, file), header=None, parse_dates=[0], index_col=0)
    df = df[df.index > '2000'] ## potential pandas issue, hardcode in the holiday file. 
    
    if len(df) == 0:
        return df
    
    start = df.index[0]
    end = df.index[-1]
    
    val = valid_dates[start:end]
    df = df.reindex(val[start:end].index)
    df = df.fillna(method='ffill')
    
    return df
    
    
valid_dates = get_valid_dates()    


@bundles.register('ac_futures')
def ac_futures_bundle(environ,
                  asset_db_writer,
                  minute_bar_writer,
                  daily_bar_writer,
                  adjustment_writer,
                  calendar,
                  start_session,
                  end_session,
                  cache,
                  show_progress,
                  output_dir):

    
    path = "E:\\UA\Data\\individual_contracts"
    
    symbols = listdir(path)
    symbols = [s[:-4] for s in symbols if not '$' in s] # filter out cash markets for now, containing character $
    
    metadata = pd.DataFrame(np.empty(len(symbols),

                                 dtype = [('start_date', 'datetime64[ns]'),
                                          ('end_date', 'datetime64[ns]'),
                                          ('auto_close_date', 'datetime64[ns]'),
                                          ('symbol', 'object'),
                                          ('root_symbol','object'),
                                          ('notice_date','datetime64[ns]'),
                                          ('expiration_date','datetime64[ns]'),
                                          ('tick_size','float'),
                                          ('multiplier','float')
                                          ]))
    
    df_meta = get_meta_df()

    daily_bar_writer.write(_pricing_iter(symbols, metadata, show_progress, path, df_meta))

    root_symbols = metadata.root_symbol.unique()
    
    root_symbols = pd.DataFrame(root_symbols, columns = ['root_symbol'])
    root_symbols['root_symbol_id'] = root_symbols.index.values
    
    metadata=metadata.dropna()
    root_symbols = root_symbols.dropna()
    asset_db_writer.write(futures=metadata, root_symbols = root_symbols)

def _pricing_iter(symbols, metadata, show_progress, path, df_meta):

    with maybe_show_progress(
            symbols,
            show_progress,
            label='Reading futures pricing data: ') as it:
        
        for sid, symbol in enumerate(it):
            #logger.debug('%s: sid %s' % (symbol, sid))
            
            df = get_valid_futures_data(path, symbol)
            
            df = df[df.index > '2000'] ## potential pandas issue, hardcode in the holiday file. 
            
            if len(df) == 0:
                # Skip if no valid dates left.
                continue

            open = df[1]
            high = df[2]
            low = df[3]
            close = df[4]
            volume = df[5] 
            
            prices = pd.DataFrame([open, high, low, close, volume],
                                  index = ['open','high','low','close','volume']).T
                                                     
            if len(close.dropna()) > 0:                               
                          
                start_date = prices['close'].first_valid_index()
                end_date = prices['close'].last_valid_index()
                ac_date = end_date + pd.Timedelta(days=1)
                
                root_symbol = df[9][0] 
                if root_symbol is not None:
                    metadata.iloc[sid] = start_date, end_date, ac_date, symbol, root_symbol, end_date, end_date, 0.001, df_meta.loc[root_symbol]['BigPointValue']
                    yield sid, prices

And the extension entry:


from zipline.data.bundles import register, ac_futures
register('ac_futures', ac_futures.ac_futures_bundle, 
         calendar_name='NYSE')

@marketneutral

This comment has been minimized.

Copy link

@marketneutral marketneutral commented Sep 18, 2018

Great. Thanks for sharing. I hope to be able to work with this over the next few weeks.

@AndreasClenow

This comment has been minimized.

Copy link
Author

@AndreasClenow AndreasClenow commented Sep 18, 2018

Looking forward to hearing your thoughts.

For those watching at home, I should also share a 'trick' in how to check if your bundle seems to store things correctly. The ingest process creates a bunch of files and folders under your home/user folder, and those can be inspected with standard software.

Check under ~/.zipline/data/bundle_name/timestamp and you'll fine some sqllite files. Those can be opened in DB Browser or similar freeware, and you can check what's in there.

You may even find some interesting features of their database design, which can (probably) be used in various fun ways. There are a few fields in there that I hadn't seen in the docs, which I'm sure I'll have use for.

@marketneutral

This comment has been minimized.

Copy link

@marketneutral marketneutral commented Sep 20, 2018

@AndreasClenow Do you expect that your ingest process creates Future objects?

I am able to ingest daily CME data (where each file is a day, not a symbol), as below. This generates a valid bundle. I can see the sqlite3 data, e.g.,

sqlite> select * from futures_contracts;
0|A6F18|A6||1514764800000000000|1515715200000000000|-9223372036854775808|EXCH|1515801600000000000|1516320000000000000|1515801600000000000|1.0|0.0001
1|A6G18|A6||1514764800000000000|1518739200000000000|-9223372036854775808|EXCH|1518825600000000000|1518739200000000000|1518825600000000000|1.0|0.0001

And I can see the object in an algo:

future_symbol('A6F18') returns Future(0 [A6F18]). So that's pretty cool.

However, there is no pricing data to be found. When I do a ls in ~/.zipline/data/futures/<bundle-time>/ I see

adjustments.sqlite	assets-6.sqlite		daily_equities.bcolz	minute_equities.bcolz

So the daily_bar_writer.write(...) is puting the data in the daily_equities.bcolz dir. I don't see anywhere in the Zipline codebase that there is a BcolzDailyBarWriter that writes futures pricing. Maybe @ehebert could advise?

import datetime
import os
import numpy as np
import pandas as pd
from six import iteritems
from tqdm import tqdm
from trading_calendars import get_calendar

from zipline.assets.futures import CME_CODE_TO_MONTH
from zipline.data.bundles import core as bundles

def csvdir_futures(tframes, csvdir):
    return CSVDIRFutures(tframes, csvdir).ingest


class CSVDIRFutures:
    """
    Wrapper class to call csvdir_bundle with provided
    list of time frames and a path to the csvdir directory
    """

    def __init__(self, tframes, csvdir):
        self.tframes = tframes
        self.csvdir = csvdir

    def ingest(self,
               environ,
               asset_db_writer,
               minute_bar_writer,
               daily_bar_writer,
               adjustment_writer,
               calendar,
               start_session,
               end_session,
               cache,
               show_progress,
               output_dir):

        futures_bundle(environ,
                       asset_db_writer,
                       minute_bar_writer,
                       daily_bar_writer,
                       adjustment_writer,
                       calendar,
                       start_session,
                       end_session,
                       cache,
                       show_progress,
                       output_dir,
                       self.tframes,
                       self.csvdir)



def third_friday(year, month):
    """Return datetime.date for monthly option expiration given year and
    month
    """
    # The 15th is the lowest third day in the month
    third = datetime.date(year, month, 15)
    # What day of the week is the 15th?
    w = third.weekday()
    # Friday is weekday 4
    if w != 4:
        # Replace just the day (of month)
        third = third.replace(day=(15 + (4 - w) % 7))
    return third


def load_data(parent_dir):
    """Given a parent_dir of cross-sectional daily files,
       read in all the days and return a big dataframe.
    """
    
    #list the files
    filelist = os.listdir(parent_dir) 
    #read them into pandas
    df_list = [
        pd.read_csv(os.path.join(parent_dir, file), parse_dates=[1])
        for file
        in tqdm(filelist)
    ]
    #concatenate them together
    big_df = pd.concat(df_list)
    big_df.columns = map(str.lower, big_df.columns)
    big_df.symbol = big_df.symbol.astype('str')
    mask = big_df.symbol.str.len() == 5  # e.g., ESU18; doesn't work prior to year 2000
    return big_df.loc[mask]


def gen_asset_metadata(data, show_progress, exchange='EXCH'):
    if show_progress:
        log.info('Generating asset metadata.')

    data = data.groupby(
        by='symbol'
    ).agg(
        {'date': [np.min, np.max]}
    )
    data.reset_index(inplace=True)
    data['start_date'] = data.date.amin
    data['end_date'] = data.date.amax
    del data['date']
    data.columns = data.columns.get_level_values(0)

    data['exchange'] = exchange
    data['root_symbol'] = data.symbol.str.slice(0,2)

    data['exp_month_letter'] = data.symbol.str.slice(2,3)
    data['exp_month'] = data['exp_month_letter'].map(CME_CODE_TO_MONTH)
    data['exp_year'] = 2000 + data.symbol.str.slice(3,5).astype('int')
    data['expiration_date'] = data.apply(lambda x: third_friday(x.exp_year, x.exp_month), axis=1)
    del data['exp_month_letter']
    del data['exp_month']
    del data['exp_year']
    
    data['auto_close_date'] = data['end_date'].values + pd.Timedelta(days=1)
    data['notice_date'] = data['auto_close_date']

    data['tick_size'] = 0.0001   # Placeholder for now
    data['multiplier'] = 1       # Placeholder for now
    
    return data

def parse_pricing_and_vol(data,
                          sessions,
                          symbol_map):
    for asset_id, symbol in iteritems(symbol_map):
        asset_data = data.xs(
            symbol,
            level=1
        ).reindex(
            sessions.tz_localize(None)
        ).fillna(0.0)
        yield asset_id, asset_data


@bundles.register('futures')
def futures_bundle(environ,
                   asset_db_writer,
                   minute_bar_writer,
                   daily_bar_writer,
                   adjustment_writer,
                   calendar,
                   start_session,
                   end_session,
                   cache,
                   show_progress,
                   output_dir,
                   tframes=None,
                   csvdir=None):

    import pdb; pdb.set_trace()
    raw_data = load_data('/Users/jonathan/devwork/pricing_data/CME_2018')
    asset_metadata = gen_asset_metadata(raw_data, False)
    root_symbols = asset_metadata.root_symbol.unique()
    root_symbols = pd.DataFrame(root_symbols, columns = ['root_symbol'])
    root_symbols['root_symbol_id'] = root_symbols.index.values
    
    asset_db_writer.write(futures=asset_metadata, root_symbols=root_symbols)

    symbol_map = asset_metadata.symbol
    sessions = calendar.sessions_in_range(start_session, end_session)
    raw_data.set_index(['date', 'symbol'], inplace=True)
    daily_bar_writer.write(
        parse_pricing_and_vol(
            raw_data,
            sessions,
            symbol_map
        ),
        show_progress=show_progress
    )

Btw- I believe your calendar issues are solved in parse_pricing_and_vol above; if there is a calendar session that is not in the data, then it yields zero; solved meaning it won't throw errors on ingest.

@marketneutral

This comment has been minimized.

Copy link

@marketneutral marketneutral commented Sep 21, 2018

I thought it would be funny to try just renaming the directory daily_equities.bcolz to daily_futures.bcolz, but data.current(future_symbol('A6F18'), 'close') throws:

KeyError: "Disk-based ctable opened with `r`ead mode yet `rootdir='/Users/jonathan/.zipline/data/futures/2018-09-20T23;09;04.461188/daily_equities.bcolz'` does not exist"

So perhaps it is right that the futures data is there?

@AndreasClenow

This comment has been minimized.

Copy link
Author

@AndreasClenow AndreasClenow commented Sep 21, 2018

Yes, I would expect futures object, so that I can use the continuous futures logic and similar functionalities. I got a bit further here this morning, reworking the code a bit.

Seems like my weird bug above was related to a missing meta field.

TypeError: __init__() takes at least 2 positional arguments (1 given)

I hadn't bothered providing exchange code in the meta data, and it seems like such omissions are frowned upon. Adding the field got me a bit further.

I can write the bundle, get all the asset fields in the sqlite file seemingly correct. I can, as you demonstrated, fetch a futures object with future_symbol().

Where my script now taps out is when trying to read historical data. Looking at the folder structure, the data seems to be there, albeit in the /daily_equities.bcolz/ path. I'm not sure if this is a legacy naming scheme or if it's supposed to change with asset class.

My current error

    # This works as expected
    print(future_symbol('AD2017H')) 
    
    # This gives "KeyError: <class 'zipline.assets._assets.Future'>""
    hist = data.history(future_symbol('AD2017H'), 'close', 10, '1d') 

Updated Bundle

from os import listdir
from os.path import join
import numpy  as np
import pandas as pd
from . import core as bundles
from sqlalchemy import create_engine
import sys
from logbook import Logger, StreamHandler
from tqdm import tqdm
from six import iteritems


handler = StreamHandler(sys.stdout, format_string=" | {record.message}")
logger = Logger(__name__)
logger.handlers.append(handler)

engine = create_engine('mysql+mysqlconnector://---')

def get_meta_df():
    """
    Fetches metadata direct from securities db
    """
    
    meta_query = """
    select CsiSymbol as root_symbol, BigPointValue * csiMultiplier as multiplier,
    name as asset_name, exchange, sector
    from futures_meta """
    
    df_meta = pd.read_sql_query(meta_query, engine)
    
    return df_meta

def csvdir_futures(tframes=None, csvdir=None):
    return CSVDIRFutures(tframes, csvdir).ingest


class CSVDIRFutures:
    """
    Wrapper class to call csvdir_bundle with provided
    list of time frames and a path to the csvdir directory
    """

    def __init__(self, tframes, csvdir):
        self.tframes = tframes
        self.csvdir = csvdir

    def ingest(self,
               environ,
               asset_db_writer,
               minute_bar_writer,
               daily_bar_writer,
               adjustment_writer,
               calendar,
               start_session,
               end_session,
               cache,
               show_progress,
               output_dir):

        futures_bundle(environ,
                       asset_db_writer,
                       minute_bar_writer,
                       daily_bar_writer,
                       adjustment_writer,
                       calendar,
                       start_session,
                       end_session,
                       cache,
                       show_progress,
                       output_dir,
                       self.tframes,
                       self.csvdir)

def load_data(path):
    filelist = [s for s in listdir(path) if not '$' in s]  
    layout = ['date', 'open','high','low','close','volume','openinterest','expiration_date','root_symbol','exchange', 'delivery']
    df_list = [
        pd.read_csv(join(path, file), parse_dates=[1], header=None, names=layout)
        for file
        in tqdm(filelist)
    ]    
    big_df = pd.concat(df_list)
    big_df.columns = map(str.lower, big_df.columns)
    big_df['symbol'] = big_df['root_symbol'] + big_df['delivery']
    
    return big_df
    

def gen_asset_metadata(raw_data, show_progress):
    if show_progress:
        logger.info('Generating asset metadata.')

    data = raw_data.groupby(
        by='symbol'
    ).agg(
        {'date': [np.min, np.max]}
    )
    data.reset_index(inplace=True)
    data['start_date'] = data.date.amin
    data['end_date'] = data.date.amax
    
    data['first_traded'] = data['start_date']
    del data['date']
    data.columns = data.columns.get_level_values(0)
    meta = get_meta_df()
    
    data['root_symbol'] = [s[:-5] for s in data.symbol.unique() ] 
    data = data.merge(meta, on='root_symbol')
    
    data['auto_close_date'] = data['end_date'] #+ pd.Timedelta(days=1)
    data['notice_date'] = data['auto_close_date']

    data['tick_size'] = 0.0001   # Placeholder for now

    return data
    
def parse_pricing_and_vol(data,
                          sessions,
                          symbol_map):
    for asset_id, symbol in iteritems(symbol_map):
        asset_data = data.xs(
            symbol,
            level=1
        ).reindex(
            sessions.tz_localize(None)
        ).fillna(0.0)
        yield asset_id, asset_data    


@bundles.register('futures')
def futures_bundle(environ,
                   asset_db_writer,
                   minute_bar_writer,
                   daily_bar_writer,
                   adjustment_writer,
                   calendar,
                   start_session,
                   end_session,
                   cache,
                   show_progress,
                   output_dir,
                   tframes=None,
                   csvdir=None):

    #import pdb; pdb.set_trace()
    raw_data = load_data('-----')
    asset_metadata = gen_asset_metadata(raw_data, False)
    root_symbols = asset_metadata.root_symbol.unique()
    root_symbols = pd.DataFrame(root_symbols, columns = ['root_symbol'])
    root_symbols['root_symbol_id'] = root_symbols.index.values
    
    root_symbols['sector'] = [asset_metadata.loc[asset_metadata['root_symbol']==rs]['sector'].iloc[0] for rs in root_symbols.root_symbol.unique() ]
    root_symbols['exchange'] = [asset_metadata.loc[asset_metadata['root_symbol']==rs]['exchange'].iloc[0] for rs in root_symbols.root_symbol.unique() ]
    root_symbols['description'] = [asset_metadata.loc[asset_metadata['root_symbol']==rs]['asset_name'].iloc[0] for rs in root_symbols.root_symbol.unique() ]
    
    
    asset_db_writer.write(futures=asset_metadata, root_symbols=root_symbols)

    symbol_map = asset_metadata.symbol
    sessions = calendar.sessions_in_range(start_session, end_session)
    raw_data.set_index(['date', 'symbol'], inplace=True)
    daily_bar_writer.write(
        parse_pricing_and_vol(
            raw_data,
            sessions,
            symbol_map
        ),
        show_progress=show_progress
    )

@marketneutral

This comment has been minimized.

Copy link

@marketneutral marketneutral commented Sep 21, 2018

Yes, on data.current(future_symbol('A6F18'), 'close') I am getting the same error.

File "/anaconda3/envs/env_zipline/lib/python3.5/site-packages/zipline/data/dispatch_bar_reader.py", line 97, in get_value
    r = self._readers[type(asset)]
KeyError: <class 'zipline.assets._assets.Future'>

When I step through this, type(asset) is Future; so it is not failing to find the specific future, it is failing to find a reader capable of reading futures data. I guess this is obvious from the KeyError, but I was hung up on the idea that the contract was missing in the data. Inspecting self._readers gives a dict with

{<class 'zipline.assets._assets.Equity'>: <zipline.data.resample.ReindexSessionBarReader object at 0x1a1e437a58>}

So, indeed, there is no reader capable of reading futures data.

Can any Q folk shed some light on this? In the meantime, will continue to try.

@AndreasClenow

This comment has been minimized.

Copy link
Author

@AndreasClenow AndreasClenow commented Sep 21, 2018

I have taken the liberty to ping guys in Boston on this, and they are looking into it. I'll report back when I hear anything.

@marketneutral

This comment has been minimized.

Copy link

@marketneutral marketneutral commented Sep 21, 2018

Some more info, the environment is set up here when zipline starts up.

Note this:

    data = DataPortal(
        bundle_data.asset_finder,
        trading_calendar=trading_calendar,
        first_trading_day=first_trading_day,
        equity_minute_reader=bundle_data.equity_minute_bar_reader,
        equity_daily_reader=bundle_data.equity_daily_bar_reader,
        adjustment_reader=bundle_data.adjustment_reader,
    )

But the signature for DataPortal is

Init signature: DataPortal(asset_finder, trading_calendar, first_trading_day, equity_daily_reader=None, equity_minute_reader=None, future_daily_reader=None, future_minute_reader=None, adjustment_reader=None, last_available_session=None, last_available_minute=None, minute_history_prefetch_length=1560, daily_history_prefetch_length=40)

Note future_daily_reader=None, so zipline is not being initialized with a reader capable of finding futures data.

@marketneutral

This comment has been minimized.

Copy link

@marketneutral marketneutral commented Sep 21, 2018

Success...

import os
import pandas as pd
from zipline.data import bundles
from zipline.data.data_portal import DataPortal
from zipline.utils.calendars import get_calendar
from zipline.assets._assets import Future
from zipline.utils.run_algo import load_extensions

# Load extensions.py; this allows you access to custom bundles
load_extensions(
    default=True,
    extensions=[],
    strict=True,
    environ=os.environ,
)

# Set-Up Pricing Data Access
trading_calendar = get_calendar('CME')
bundle = 'futures'
bundle_data = bundles.load(bundle)

data = DataPortal(
    bundle_data.asset_finder,
    trading_calendar=trading_calendar,
    first_trading_day=bundle_data.equity_daily_bar_reader.first_trading_day,
    equity_minute_reader=None,
    equity_daily_reader=bundle_data.equity_daily_bar_reader,
    future_daily_reader=bundle_data.equity_daily_bar_reader,
    adjustment_reader=bundle_data.adjustment_reader,
)

fut = bundle_data.asset_finder.retrieve_futures_contracts([0])[0]

end_dt = pd.Timestamp('2018-01-05', tz='UTC', offset='C')
start_dt = pd.Timestamp('2018-01-02', tz='UTC', offset='C')
    
end_loc = trading_calendar.closes.index.get_loc(end_dt)
start_loc = trading_calendar.closes.index.get_loc(start_dt)    
    
dat = data.get_history_window(
    assets=[fut],
    end_dt=end_dt,
    bar_count=end_loc - start_loc,
    frequency='1d',
    field='close',
    data_frequency='daily'
)

and dat is...

                                Future(0 [A6F18])
2018-01-03 00:00:00+00:00	0.784
2018-01-04 00:00:00+00:00	0.786
2018-01-05 00:00:00+00:00	0.787
@marketneutral

This comment has been minimized.

Copy link

@marketneutral marketneutral commented Sep 21, 2018

So @AndreasClenow if you change .../zipline/utils/run_algo.py here from

 data = DataPortal(
        bundle_data.asset_finder,
        trading_calendar=trading_calendar,
        first_trading_day=first_trading_day,
        equity_minute_reader=bundle_data.equity_minute_bar_reader,
        equity_daily_reader=bundle_data.equity_daily_bar_reader,
        adjustment_reader=bundle_data.adjustment_reader,
    )

to

 data = DataPortal(
        bundle_data.asset_finder,
        trading_calendar=trading_calendar,
        first_trading_day=first_trading_day,
        equity_minute_reader=bundle_data.equity_minute_bar_reader,
        future_daily_reader=bundle_data.equity_daily_bar_reader,
        adjustment_reader=bundle_data.adjustment_reader,
    )

Your algo will run.

@AndreasClenow

This comment has been minimized.

Copy link
Author

@AndreasClenow AndreasClenow commented Sep 21, 2018

Well done, Jonathan! I owe you a beer next time.

I've still got some issue, but likely with my own code. I get only NaN values now. Well, that's probably my own but, so I'll find it and kill it.

But it's Friday evening on my side of the Pond, so that would have to wait for next week...

@marketneutral

This comment has been minimized.

Copy link

@marketneutral marketneutral commented Sep 24, 2018

Hey @AndreasClenow, wondering if you have had any success with Continuous Futures. I am not having success, but I wonder if my contract meta data (e.g., expiry, auto_close_date, first_notice_date) are wrong.

As a case study, repro https://www.quantopian.com/posts/futures-data-now-available-in-research:

continuous_future = bundle_data.asset_finder.create_continuous_future
history = data_portal.get_history_window

cl_contracts = [
    bundle_data.asset_finder.lookup_future_symbol(x) 
    for x
    in ['CLF2016', 'CLG2016', 'CLH2016', 'CLJ2016', 'CLK2016', 'CLM2016']
]

start_dt = pd.Timestamp('2015-10-21', tz='UTC', offset='C')
end_dt = pd.Timestamp('2016-06-01', tz='UTC', offset='C')

end_loc = trading_calendar.closes.index.get_loc(end_dt)
start_loc = trading_calendar.closes.index.get_loc(start_dt)    
    
cl_consecutive_contract_volume = history(
    assets=cl_contracts,
    end_dt=end_dt,
    bar_count=end_loc - start_loc,
    frequency='1d',
    field='volume',
    data_frequency='daily'
)

cl_consecutive_contract_volume.plot();

image

That looks perfect!

However...

cl = continuous_future('CL', offset=0, roll_style='volume', adjustment='mul')

cl_continuous_volume = history(
    assets=[cl],
    end_dt=end_dt,
    bar_count=end_loc - start_loc,
    frequency='1d',
    field='volume',
    data_frequency='daily'
)
cl_volume_history = pd.concat([cl_consecutive_contract_volume, cl_continuous_volume], axis=1)
cl_volume_history.plot(style={cl: 'k--'}, figsize=(10,6));

image

So it gets stuck on the first contract and never rolls. I can see in a backtest run over the same period with

def handle_data(context, data):
    contract = data.current(continuous_future('CL'), 'contract')
    log.info(contract)

gives log output

[2018-09-24 15:52:46.184085]: INFO: handle_data: Future(6489 [CLF2016])
[2018-09-24 15:52:46.188417]: INFO: handle_data: Future(6489 [CLF2016])
[2018-09-24 15:52:46.192604]: INFO: handle_data: Future(6489 [CLF2016])
[2018-09-24 15:52:46.196846]: INFO: handle_data: Future(6489 [CLF2016])
[2018-09-24 15:52:46.201141]: INFO: handle_data: Future(6489 [CLF2016])
    :
    :
    :
    :
[2018-09-24 15:52:46.567311]: INFO: handle_data: Future(6489 [CLF2016])
[2018-09-24 15:52:46.577303]: INFO: handle_data: Future(6490 [CLF2017])
[2018-09-24 15:52:46.585674]: INFO: handle_data: Future(6490 [CLF2017])
[2018-09-24 15:52:46.594387]: INFO: handle_data: Future(6490 [CLF2017])

So it does roll, but only on the F contract and not G,H,J,K,M...

Do your continuous futures work?

I am using Quandl CME Wiki data; it's free and has good coverage.

@AndreasClenow

This comment has been minimized.

Copy link
Author

@AndreasClenow AndreasClenow commented Sep 24, 2018

Ha, I was in the middle of typing about my own issues, when yours appeared on the screen.

I started by trying to get some trades done. As that didn't work, I haven't gotten much farther yet... Did you get an algo to make trades?

I have to leave the office in a moment. Will try your example when I return tomorrow.

---- my progress:

Clearly, the data is there. I can define a continuation, pull historical data from it, get current contract and pull historical data from that too. Cool stuff.

However, when I try to place an order, it seems like we're back to equity treatment. Orders fail with exception raised on lack of table named SPLITS.

OperationalError                          Traceback (most recent call last)
<ipython-input-2-ddc27af3bbe2> in <module>()
     96     capital_base=1000000,
     97     data_frequency = 'daily',
---> 98     bundle='ac_futures' ) 
     99 
    100 

C:\ProgramData\Anaconda3_new\envs\zip35\lib\site-packages\zipline\utils\run_algo.py in run_algorithm(start, end, initialize, capital_base, handle_data, before_trading_start, analyze, data_frequency, data, bundle, bundle_timestamp, trading_calendar, metrics_set, default_extension, extensions, strict_extensions, environ, blotter)
    429         local_namespace=False,
    430         environ=environ,
--> 431         blotter=blotter,
    432     )

C:\ProgramData\Anaconda3_new\envs\zip35\lib\site-packages\zipline\utils\run_algo.py in _run(handle_data, initialize, before_trading_start, analyze, algofile, algotext, defines, data_frequency, capital_base, data, bundle, bundle_timestamp, start, end, output, trading_calendar, print_algo, metrics_set, local_namespace, environ, blotter)
    228     ).run(
    229         data,
--> 230         overwrite_sim_params=False,
    231     )
    232 

C:\ProgramData\Anaconda3_new\envs\zip35\lib\site-packages\zipline\algorithm.py in run(self, data, overwrite_sim_params)
    754         try:
    755             perfs = []
--> 756             for perf in self.get_generator():
    757                 perfs.append(perf)
    758 

C:\ProgramData\Anaconda3_new\envs\zip35\lib\site-packages\zipline\gens\tradesimulation.py in transform(self)
    207                         yield capital_change_packet
    208                 elif action == SESSION_START:
--> 209                     for capital_change_packet in once_a_day(dt):
    210                         yield capital_change_packet
    211                 elif action == SESSION_END:

C:\ProgramData\Anaconda3_new\envs\zip35\lib\site-packages\zipline\gens\tradesimulation.py in once_a_day(midnight_dt, current_data, data_portal)
    169             if assets_we_care_about:
    170                 splits = data_portal.get_splits(assets_we_care_about,
--> 171                                                 midnight_dt)
    172                 if splits:
    173                     algo.blotter.process_splits(splits)

C:\ProgramData\Anaconda3_new\envs\zip35\lib\site-packages\zipline\data\data_portal.py in get_splits(self, assets, dt)
   1182         splits = self._adjustment_reader.conn.execute(
   1183             "SELECT sid, ratio FROM SPLITS WHERE effective_date = ?",
-> 1184             (seconds,)).fetchall()
   1185 
   1186         splits = [split for split in splits if split[0] in assets]

OperationalError: no such table: SPLITS
@marketneutral

This comment has been minimized.

Copy link

@marketneutral marketneutral commented Sep 24, 2018

I hadn't tried to order in an algo; but I just did and got the same error.

I figured it out though... it's also in the DataPortal(...) setup in .../zipline/utils/run_algo.py here.

The default is

 data = DataPortal(
        bundle_data.asset_finder,
        trading_calendar=trading_calendar,
        first_trading_day=first_trading_day,
        equity_minute_reader=bundle_data.equity_minute_bar_reader,
        equity_daily_reader=bundle_data.equity_daily_bar_reader,
        adjustment_reader=bundle_data.adjustment_reader,
    )

and you want to change it to

 data = DataPortal(
        bundle_data.asset_finder,
        trading_calendar=trading_calendar,
        first_trading_day=first_trading_day,
        equity_minute_reader=bundle_data.equity_minute_bar_reader,
        future_daily_reader=bundle_data.equity_daily_bar_reader,
    )

Note the two changes: adding the future_daily_reader and killing the adjustment_reader.

Your algo will run and be able to make trades.

I suppose if you had ingested both equities and futures data, then the adjustment reader would look for the future in the SPLITS table, find nothing, and move on. But here we have a situation of just futures data.

@marketneutral

This comment has been minimized.

Copy link

@marketneutral marketneutral commented Sep 24, 2018

Anyhow, to memorialize the issue I am seeing:

clf16_sid = bundle_data.asset_finder.lookup_future_symbol('CLF2016').sid
start_dt = pd.Timestamp('2015-10-21', tz='UTC', offset='C')
cl = continuous_future('CL', offset=0, roll_style='calendar', adjustment='mul')
oc = bundle_data.asset_finder.get_ordered_contracts('CL')
chain = oc.active_chain(clf16_sid, start_dt.value)
all_chain = bundle_data.asset_finder.retrieve_all(chain)
all_chain

and I believe that should return the futures chain ordered by auto_close_date. However, I see

[Future(5044 [CLF2016]),
 Future(5045 [CLF2017]),
 Future(5046 [CLF2018]),
 Future(5047 [CLF2019]),
 Future(5048 [CLF2020]),
 Future(5056 [CLG2007]),
 Future(5057 [CLG2008]),
 Future(5058 [CLG2009]),
 Future(5059 [CLG2010]),
 Future(5060 [CLG2011]),
 Future(5061 [CLG2012]),
 Future(5062 [CLG2013]),
 Future(5063 [CLG2014]),
 Future(5064 [CLG2015]),
 Future(5065 [CLG2016]),
 Future(5066 [CLG2017]),
    :
    :

And meta data looks ok...

all_chain[0].to_dict()
{'asset_name': '',
 'auto_close_date': Timestamp('2015-12-22 00:00:00+0000', tz='UTC'),
 'end_date': Timestamp('2015-12-21 00:00:00+0000', tz='UTC'),
 'exchange': 'EXCH',
 'exchange_full': 'EXCH',
 'expiration_date': Timestamp('2016-01-15 00:00:00+0000', tz='UTC'),
 'first_traded': None,
 'multiplier': 1.0,
 'notice_date': Timestamp('2015-12-22 00:00:00+0000', tz='UTC'),
 'root_symbol': 'CL',
 'sid': 5044,
 'start_date': Timestamp('2010-11-22 00:00:00+0000', tz='UTC'),
 'symbol': 'CLF2016',
 'tick_size': 0.0001}
all_chain[1].to_dict()
{'asset_name': '',
 'auto_close_date': Timestamp('2016-12-21 00:00:00+0000', tz='UTC'),
 'end_date': Timestamp('2016-12-20 00:00:00+0000', tz='UTC'),
 'exchange': 'EXCH',
 'exchange_full': 'EXCH',
 'expiration_date': Timestamp('2017-01-20 00:00:00+0000', tz='UTC'),
 'first_traded': None,
 'multiplier': 1.0,
 'notice_date': Timestamp('2016-12-21 00:00:00+0000', tz='UTC'),
 'root_symbol': 'CL',
 'sid': 5045,
 'start_date': Timestamp('2011-11-18 00:00:00+0000', tz='UTC'),
 'symbol': 'CLF2017',
 'tick_size': 0.0001}
all_chain[5].to_dict()
{'asset_name': '',
 'auto_close_date': Timestamp('2007-01-23 00:00:00+0000', tz='UTC'),
 'end_date': Timestamp('2007-01-22 00:00:00+0000', tz='UTC'),
 'exchange': 'EXCH',
 'exchange_full': 'EXCH',
 'expiration_date': Timestamp('2007-02-16 00:00:00+0000', tz='UTC'),
 'first_traded': None,
 'multiplier': 1.0,
 'notice_date': Timestamp('2007-01-23 00:00:00+0000', tz='UTC'),
 'root_symbol': 'CL',
 'sid': 5056,
 'start_date': Timestamp('2007-01-03 00:00:00+0000', tz='UTC'),
 'symbol': 'CLG2007',
 'tick_size': 0.0001}

And the auto_close_date of the 5th item is earlier than the 2nd item. The ordered list seems to be just ordered by sid and this is what the Continuous Future is keying off.

@AndreasClenow

This comment has been minimized.

Copy link
Author

@AndreasClenow AndreasClenow commented Sep 25, 2018

Hm.. Removing the adjustment reader doesn't seem like a great solution. I'd then have to either go back and forth and edit the file when running an equity algo, or make two environments. I hope the Q team can find a proper fix for this soon.

The roll behavior I got is quite odd. I'm investigating the reasons for it at the moment, and I have yet to rule out that it's somehow my own fault. What happens is that the continuation is always based on March contracts. Rolling one year at a time, to the following year H contract.

Puzzled over why H was picked, I loaded up a symbol without a March issue, Soybean Oil. In this case, it picked only F, again rolling from year to year. Seems like it picks the first expiring contract of the year. Despite asking for a volume based roll, it keeps the contract up until the very last day, implying that it does not have any valid volume data. As a side note, rolling on OI generally makes more sense, but that seems like a later issue...

I'm trying to replicate your graphs to check the data, but I have not yet been able to figure out how to access the bundle time series data.

My attempt so far:

import pandas as pd
from zipline.data import bundles
from zipline.data.data_portal import DataPortal
from zipline.utils.calendars import get_calendar
from zipline.assets._assets import Future
from zipline.utils.run_algo import load_extensions

bundle_data = bundles.load('ac_futures')

continuous_future = bundle_data.asset_finder.create_continuous_future
trading_calendar = get_calendar('NYSE')

end_loc = trading_calendar.closes.index.get_loc(end_dt)
start_loc = trading_calendar.closes.index.get_loc(start_dt)    

start_dt = pd.Timestamp('2010-01-04', tz='UTC')
end_dt = pd.Timestamp('2011-06-01', tz='UTC')

data_portal = DataPortal(asset_finder=bundle_data.asset_finder, 
                         trading_calendar=trading_calendar, 
                         first_trading_day=start_dt)

history = data_portal.get_history_window

contracts = [bundle_data.asset_finder.lookup_future_symbol(x) for x in ['SF_H12','SF_M12','SF_U12']]
    
consecutive_contract_volume = history(
    assets=contracts,
    end_dt=end_dt,
    bar_count=end_loc - start_loc,
    frequency='1d',
    field='volume',
    data_frequency='daily'
)

consecutive_contract_volume.plot();

This results in

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
C:\ProgramData\Anaconda3_new\envs\zip35\lib\site-packages\trading_calendars\utils\memoize.py in __get__(self, instance, owner)
     46         try:
---> 47             return self._cache[instance]
     48         except KeyError:

C:\ProgramData\Anaconda3_new\envs\zip35\lib\weakref.py in __getitem__(self, key)
    393     def __getitem__(self, key):
--> 394         return self.data[ref(key)]
    395 

KeyError: <weakref at 0x0000010BD0978278; to 'AssetDispatchSessionBarReader' at 0x0000010BD28BE0B8>

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
C:\ProgramData\Anaconda3_new\envs\zip35\lib\site-packages\trading_calendars\utils\memoize.py in __get__(self, instance, owner)
     46         try:
---> 47             return self._cache[instance]
     48         except KeyError:

C:\ProgramData\Anaconda3_new\envs\zip35\lib\weakref.py in __getitem__(self, key)
    393     def __getitem__(self, key):
--> 394         return self.data[ref(key)]
    395 

KeyError: <weakref at 0x0000010BD0978278; to 'AssetDispatchSessionBarReader' at 0x0000010BD28BE0B8>

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-62-ff628316147f> in <module>()
     32     frequency='1d',
     33     field='volume',
---> 34     data_frequency='daily'
     35 )
     36 

C:\ProgramData\Anaconda3_new\envs\zip35\lib\site-packages\zipline\data\data_portal.py in get_history_window(self, assets, end_dt, bar_count, frequency, field, data_frequency, ffill)
    965             else:
    966                 df = self._get_history_daily_window(assets, end_dt, bar_count,
--> 967                                                     field, data_frequency)
    968         elif frequency == "1m":
    969             if field == "price":

C:\ProgramData\Anaconda3_new\envs\zip35\lib\site-packages\zipline\data\data_portal.py in _get_history_daily_window(self, assets, end_dt, bar_count, field_to_use, data_frequency)
    804 
    805         data = self._get_history_daily_window_data(
--> 806             assets, days_for_window, end_dt, field_to_use, data_frequency
    807         )
    808         return pd.DataFrame(

C:\ProgramData\Anaconda3_new\envs\zip35\lib\site-packages\zipline\data\data_portal.py in _get_history_daily_window_data(self, assets, days_for_window, end_dt, field_to_use, data_frequency)
    827                 field_to_use,
    828                 days_for_window,
--> 829                 extra_slot=False
    830             )
    831         else:

C:\ProgramData\Anaconda3_new\envs\zip35\lib\site-packages\zipline\data\data_portal.py in _get_daily_window_data(self, assets, field, days_in_window, extra_slot)
   1115                                                 days_in_window,
   1116                                                 field,
-> 1117                                                 extra_slot)
   1118             if extra_slot:
   1119                 return_array[:len(return_array) - 1, :] = data

C:\ProgramData\Anaconda3_new\envs\zip35\lib\site-packages\zipline\data\history_loader.py in history(self, assets, dts, field, is_perspective_after)
    547                                              dts,
    548                                              field,
--> 549                                              is_perspective_after)
    550         end_ix = self._calendar.searchsorted(dts[-1])
    551 

C:\ProgramData\Anaconda3_new\envs\zip35\lib\site-packages\zipline\data\history_loader.py in _ensure_sliding_windows(self, assets, dts, field, is_perspective_after)
    396         asset_windows = {}
    397         needed_assets = []
--> 398         cal = self._calendar
    399 
    400         assets = self._asset_finder.retrieve_all(assets)

C:\ProgramData\Anaconda3_new\envs\zip35\lib\site-packages\zipline\data\history_loader.py in _calendar(self)
    564     @property
    565     def _calendar(self):
--> 566         return self._reader.sessions
    567 
    568     def _array(self, dts, assets, field):

C:\ProgramData\Anaconda3_new\envs\zip35\lib\site-packages\trading_calendars\utils\memoize.py in __get__(self, instance, owner)
     47             return self._cache[instance]
     48         except KeyError:
---> 49             self._cache[instance] = val = self._get(instance)
     50             return val
     51 

C:\ProgramData\Anaconda3_new\envs\zip35\lib\site-packages\zipline\data\dispatch_bar_reader.py in sessions(self)
    146     def sessions(self):
    147         return self.trading_calendar.sessions_in_range(
--> 148             self.first_trading_day,
    149             self.last_available_dt)

C:\ProgramData\Anaconda3_new\envs\zip35\lib\site-packages\trading_calendars\utils\memoize.py in __get__(self, instance, owner)
     47             return self._cache[instance]
     48         except KeyError:
---> 49             self._cache[instance] = val = self._get(instance)
     50             return val
     51 

C:\ProgramData\Anaconda3_new\envs\zip35\lib\site-packages\zipline\data\dispatch_bar_reader.py in first_trading_day(self)
     90     @lazyval
     91     def first_trading_day(self):
---> 92         return max(r.first_trading_day for r in self._readers.values())
     93 
     94     def get_value(self, sid, dt, field):

ValueError: max() arg is an empty sequence
@marketneutral

This comment has been minimized.

Copy link

@marketneutral marketneutral commented Sep 25, 2018

@AndreasClenow, a few things

  1. Your code is failing in the previous post, I believe, because of the DataPortal setup. You have
data_portal = DataPortal(asset_finder=bundle_data.asset_finder, 
                         trading_calendar=trading_calendar, 
                         first_trading_day=start_dt)

and are missing the daily bar reader. Try

data_portal = DataPortal(
    asset_finder=bundle_data.asset_finder,
    trading_calendar=trading_calendar,
    first_trading_day=bundle_data.equity_daily_bar_reader.first_trading_day,
    future_daily_reader=bundle_data.equity_daily_bar_reader,
)
  1. With regard, to removing the adjustment_reader, I agree that's not ideal. I was offering that suggestion as a quick fix. I don' think there is anything wrong with Zipline per se that Q would need to fix. To properly handle this situation, without having to have two environments, I would suggest any of the following:

    1. Add empty sqlite3 adjustment tables; add the following to your ingest (I haven't tested this):
    divs_splits = {'divs': DataFrame(columns=['sid', 'amount',
                                              'ex_date', 'record_date',
                                              'declared_date', 'pay_date']),
                   'splits': DataFrame(columns=['sid', 'ratio',
                                                'effective_date'])}
    adjustment_writer.write(splits=divs_splits['splits'],
                                dividends=divs_splits['divs'])
    

    You might also need to specify the types. This should just allow the adjustment finder to search for futures adjustments, find nothing, and move on. I think this is your best & easiest option.

    1. Create a new entry point in zipline/zipline/__main__.py, like run_futures which points to a new _run_futures in zipline/utils/run_algo.py; this new _run_futures would have the DataPortal setup without the adjustment reader. Now you could use run for equities algos and run_futures for futures algos.

    2. Create a ingest that ingests both equities and futures; reserve, say, sid 0-25000 for equities and 25001+ for futures. This is the best scenario because you can run a single algo trading both equities and futures. However, let's not try this yet until we get the continuous futures working. 😉

  2. Lastly, the error you see with continuous futures is exacty the same as mine. My CL futures roll on the F contract, year to year. The F contract is the first contract. So even though there are G, H, J, K... it skips these. This is strange behaviour indeed. Key (to me) in this is that the rolls are done by sid order, not auto_close_order. Is that the same for you? Look at the sids and see if your H contracts are in sid order. There is very carefully crafted Cython sorting code here. However, when I look at the test cases, I see something which may be important. In the test cases the spoofed market data has the sid is in the same order as the auto_close_date. For example, look at this test. Meaning the symbols are ingested in order of their auto_close_date. This means one of four things: 1) I am missing something; 2) I am reading the tests incorrectly, 3) we are required to ingest the futures in order of auto_close_date, and/or 4) there is a bug in OrderedContracts but the tests don't find it because the spoofed market data is already in order. I'll say 95% chance of the first right now and 0.01% of number 4. I'll keep digging. Please provide any color you can.

@marketneutral

This comment has been minimized.

Copy link

@marketneutral marketneutral commented Sep 25, 2018

So... if you sort the input data by auto_close_date, then the continuous futures work. 😠 It's not 100% mirrored to the Q example b/c I have some days with NaN or 0 volume, likely due to calendar mismatch. But it's basically there!

image

To get this working, in your ingestion, in gen_asset_metadata(raw_data, show_progress), instead of returning data, put

    return data.sort_values(by='auto_close_date').reset_index(drop=True)

It's unclear why this is necessary, but at least everything with futures works now. We've learned a lot through this exercise. I'll put up a full working walk through (get Quandl data, ingest function, algo, nb showing actual and continuous futures) in the next week or so.

@AndreasClenow

This comment has been minimized.

Copy link
Author

@AndreasClenow AndreasClenow commented Sep 25, 2018

Thanks, Jonathan. I'm playing catch-up, with pesky things like my day job getting in the way. :)

I'll adapt my solution tomorrow.

@AndreasClenow

This comment has been minimized.

Copy link
Author

@AndreasClenow AndreasClenow commented Sep 28, 2018

After some distractions, I've caught up to where you are, I believe.

My solution has the timeseries data, ohclv, I can place simulated orders with order_target and order_target_percent.

The continuations however seem a little odd. I can resolve continuation contracts for most markets, but not all. The data provided in the bundle/ingest seems fine and has the same structure. But some root symbols, the the 'current contract' is always None.

I'll keep investigating to see what's going on there. Could very well be my own fault.

I also see some suspect behavior in the roll timing, but I need to solve my first issue before digging deeper into that. First glance, it looks like it's rolling on expiry, not when volume shifts. I'd really prefer to roll on OI, but that doesn't seem supported yet.

@AndreasClenow

This comment has been minimized.

Copy link
Author

@AndreasClenow AndreasClenow commented Oct 2, 2018

I've still got a bit of an odd issue left. Perhaps you've seen something similar.

After a bit of wrangling, I've got a futures bundle that seems to work well. For almost all markets.

I added some 20'000 individual contracts for 73 markets (root symbols). Most of this works just fine. But a handful of the root symbols are not working. Let me define not working.

I can get a continous_future object:

    c = continuous_future('AD')
    print(c)

    ContinuousFuture(90428234331652096 [AD, 0, volume, mul])

I can see what looks like all the correct fields in the sqlite file.

If I try to resolve for current contract in the continuation, it returns None:

    c = continuous_future('AD')
    curr_contracts = data.current(c, 'contract')
    print(curr_contracts)

    None

Out of 73 markets, which had the same structure, were added with the same logic, same bundle, 5 fail in this manner, while the rest work just fine.

Interestingly, I can pull the time series data directly from the individual future contract. In this case:

    c = future_symbol('AD2010H')
    c_price = data.history(
        c, 
        fields=['open','high','low','close','volume'], 
        frequency='1d', 
        bar_count=5,
    )    
    print(c_price)

                           close   high    low   open    volume
2009-11-18 00:00:00+00:00  0.916  0.922  0.915  0.920     358.0
2009-11-19 00:00:00+00:00  0.908  0.918  0.902  0.917     251.0
2009-11-20 00:00:00+00:00  0.903  0.910  0.895  0.907     701.0
2009-11-23 00:00:00+00:00  0.914  0.917  0.903  0.903     726.0
2009-11-24 00:00:00+00:00  0.909  0.915  0.902  0.914     734.0

Everything seems fine, except that the continuation for this 5 contracts fail. Is this an issue you, or anyone else reading this, have seen before? Anything I could check to investigate why this fails?

@marketneutral

This comment has been minimized.

Copy link

@marketneutral marketneutral commented Oct 3, 2018

Hey @AndreasClenow, In my case I am only looking at about 10 root symbols, via Quandl CME daily data and it seems to all be working for me. My experience with this project tells me that your issue is likely somewhere in the data ingestion or with the meta-data (e.g., how you set auto_close_date). That's just a hunch though. If you change the roll style to calendar does that help? The other thing I recall is that zipline does funny things (can't remember a specific example) when you have NaNs anywhere in the OHLCV data.

@AndreasClenow

This comment has been minimized.

Copy link
Author

@AndreasClenow AndreasClenow commented Oct 3, 2018

I just tried calendar method, with same result.

My logic for auto_close_date is taken from a sample code. Looking at it, I question if it makes sense though...

    data['start_date'] = data.date.amin
    data['end_date'] = data.date.amax
    data['first_traded'] = data['start_date
    data['auto_close_date'] = pd.to_datetime(data['end_date']) + pd.Timedelta(days=1)
    data['notice_date'] = pd.to_datetime(data['end_date'])
    data['expiration_date'] = pd.to_datetime(data['end_date'])

For real life use, these settings would clearly not make sense, but they seem to work for tinkering. Autoclose one day after a contract expired makes no intuitive sense to me. I'll muck about with the config and see what I can come up with.

Regarding futures, on my Christmas wish list from Q would be 1. multi currency support and 2. roll on open interest support.

@marketneutral

This comment has been minimized.

Copy link

@marketneutral marketneutral commented Oct 3, 2018

@t1user

This comment has been minimized.

Copy link

@t1user t1user commented Nov 8, 2018

@marketneutral and @AndreasClenow : thank you so much for sharing this. Heavily relying on your work I was able to put together a bundle downloading all CME data from Quandl.

It's there, if anybody wants it:

https://github.com/t1user/zipline

@AndreasClenow : I had the same problem with some symbols not working in continuous_future even though they seemed to be ingested exactly in the same manner as everything else. I spent many hours trying to figure it out. I also saw your other post about how just changing symbols helped. However unbelievable, indeed the issue is somehow related to some particular symbols. But it's not a bug, it's a feature -:)

The reason for this behaviour is here:

https://github.com/quantopian/zipline/blob/master/zipline/assets/continuous_futures.pyx#L41

Some symbols were hand-picked to behave differently in continuous_future, i.e. to use only active contracts. There's sound logic behind it. However, for this additional feature to work, years in symbols have to be encoded as two digits. So if you're using a data provider that encodes years as 4 digits, such as for instance Quandl (I'm pretty sure your provider does the same), it doesn't work. To make everything work, just ensure that symbols you ingest are this style: GCM18 rather than this: GCM2018.

I guess the actual feature of including only active contracts in continuous_future doesn't really matter if you're using roll_style='volume' . But then, if you did want to rely on it, you would want to make sure that all relevant contracts are affected. In zipline contract codes are hard coded in a cython, compiled file, quite a hassle to change. I find this design choice a bit peculiar.

@AndreasClenow

This comment has been minimized.

Copy link
Author

@AndreasClenow AndreasClenow commented Nov 9, 2018

Thanks @t1user ! That code explains a lot. This looks like an unfortunate workaround, and I hope they will address the core issue here instead. The assumption of two letter futures root code won't hold up if you start expanding the universe a bit.

What really should be done, in my view, is to incorporate open interest. In almost all cases, using open interest to construct continuations will result in a far more realistic series than volume, dates or hard coded solutions.

I'm also hoping for international support for futures soon, now that we're getting it for equities. In that case, a two letter code assumption will go out the window fast though.

I just updated my solution with this info, and it works great.

You solution for the CME data is very helpful, and with your permission I'd like to publish part or whole of it, with full credit of course. I'm working on a publication on this topic, and your sample would be great to use as a teaching tool. Contact me offline at first name dot last name at gmail if you'd like some more info on what I'm up to.

@t1user

This comment has been minimized.

Copy link

@t1user t1user commented Nov 10, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.