# How to setup Zipline backtester with custom data from FXCM  

## 1. Install standard packages
ou should be able to install zipline by using pip. If it doesn't work, there are other ways to install zipline that you can look up.  

```
pip install zipline
```
I had to upgrade some packages
```
pip install -U pip
pip install -U numpy
```

## 2. Download custom forex calendar  
Zipline has made its calendars a standalone package available at https://github.com/quantopian/trading_calendars  
to make it more available for adding custom calendars, which is what you want because Zipline doesn't support the 24/7 forex calendar as default.  

Clone my fork of `trading_calendars` into any preferable location  
```
git clone https://github.com/grananqvist/trading_calendars.git  
```
This fork as a new calendar called `forex`  

## 3.  Download my zipline utilities  
Necessary utilities are located in zipline_extensions in this repo, download them by cloning the repo into preferable location  
```
git glone https://github.com/grananqvist/Machine-Learning-Trading-Strategies.git
```

## 4. Add packages to PYTHONPATH
For python to find the repos above, add them to your `$PYTHONPATH` 
```
export PYTHONPATH=$PYTHONPATH:<path-to-trading_calendars>:<path-to-trading-repo>
```
In my case, my `$PYTHONPATH` looks like this  
`/Users/system/Github/Machine-Learning-Trading-Strategies/:/Users/system/Github/trading_calendars/`  

Test if packages are installed correctly (you will need to restart the jupyter notebook kernel):  

In [13]:
from zipline.utils.calendars import get_calendar
get_calendar('forex')
from zipline_extensions.bundles.fxcm import via_fxcm_csv_daily

## 5. Download FXCM data  
1. Clone my FXCM data downloader
```
git clone https://github.com/grananqvist/FXCM-Forex-Data-Downloader.git
```
2. Get an [API key](https://www.fxcm.com/uk/algorithmic-trading/api-trading/) from FXCM, works with a demo account  
3. Download data using the download script  
```
python main.py -pe D1 -t <YOUR FXCM API KEY> -p <ANY LOCATION TO STORE DATA>
```

## 6. Edit your extensions.py
A file called `extensions.py` should be located in `~/.zipline/`  
Insert the code below into `extensions.py` to register fxcm_daily as a bundle for zipline  

**Note:** replace `DATA_PATH` with the path to your downloaded dataset  

In [None]:
import os
from zipline.data.bundles import register
from zipline_extensions.bundles.fxcm import via_fxcm_csv_daily
DATA_PATH = os.path.join(
    os.environ['HOME'], 'Github/Machine-Learning-Trading-Strategies/data/D1/')

register('fxcm_daily', via_fxcm_csv_daily(DATA_PATH), calendar_name='forex')

In [17]:
%load_ext zipline

In [22]:
import os
import numpy  as np
import pandas as pd
import datetime
from zipline.utils.cli import maybe_show_progress

def via_fxcm_csv(path,start=None,end=None):
    boDebug = True

    _, _, file_names = list(os.walk(path))[0]
    symbols = [ file_name.split('_')[0] for file_name in file_names ]
    tuSymbols = tuple(symbols)

    if boDebug:
        print( "entering via_fxcm_csv.  tuSymbols=",tuSymbols)

    # Define our custom ingest function
    def ingest(environ,
               asset_db_writer,
               minute_bar_writer,  # unused
               daily_bar_writer,
               adjustment_writer,
               calendar,
               cache,
               show_progress,
               output_dir,
               # pass these as defaults to make them 'nonlocal' in py2
               start=start,
               end=end):

        if boDebug:
            print( "entering ingest and creating blank dfMetadata")

        dfMetadata = pd.DataFrame(np.empty(len(tuSymbols), dtype=[
            ('start_date', 'datetime64[ns]'),
            ('end_date', 'datetime64[ns]'),
            ('auto_close_date', 'datetime64[ns]'),
            ('symbol', 'object'),
        ]))

        if boDebug:
            print( "dfMetadata",type(dfMetadata))
            print( dfMetadata.describe)

        # We need to feed something that is iterable - like a list or a generator -
        # that is a tuple with an integer for sid and a DataFrame for the data to
        # daily_bar_writer

        liData=[]
        iSid=0
        for file_name in file_names:
            symbol = file_name.split('_')[0]
            if boDebug:
               print( "symbol=",symbol,"file=",file_name)
            dfData=pd.read_csv(file_name,index_col='date',parse_dates=True).sort_index()
            if boDebug:
               print( "read_csv dfData",type(dfData),"length",len(dfData))
            dfData['open'] = (dfData['bidopen'] + dfData['askopen']) / 2
            dfData.drop(['bidopen', 'askopen'])
            dfData['high'] = (dfData['bidhigh'] + dfData['askhigh']) / 2
            dfData.drop(['bidhigh', 'askhigh'])
            dfData['low'] = (dfData['bidlow'] + dfData['asklow']) / 2
            dfData.drop(['bidlow', 'asklow'])
            dfData['close'] = (dfData['bidclose'] + dfData['askclose']) / 2
            dfData.drop(['bidclose', 'askclose'])
            """
            dfData.rename(
                columns={
                    'Open': 'open',
                    'High': 'high',
                    'Low': 'low',
                    'Close': 'close',
                    'Volume': 'volume',
                    'Adj Close': 'price',
                },
                inplace=True,
            )
            """
            #dfData['volume']=dfData['volume']/1000
            liData.append((iSid,dfData))

            # the start date is the date of the first trade and
            start_date = dfData.index[0]
            if boDebug:
                print( "start_date",type(start_date),start_date)

            # the end date is the date of the last trade
            end_date = dfData.index[-1]
            if boDebug:
                print( "end_date",type(end_date),end_date)

            # The auto_close date is the day after the last trade.
            ac_date = end_date + pd.Timedelta(days=1)
            if boDebug:
                print( "ac_date",type(ac_date),ac_date)

            # Update our meta data
            dfMetadata.iloc[iSid] = start_date, end_date, ac_date, symbol

            iSid += 1

        if boDebug:
            print( "liData",type(liData),"length",len(liData))
            print( liData)
            print( "Now calling daily_bar_writer")

        daily_bar_writer.write(liData, show_progress=False)

        dfMetadata['exchange'] = "FXCM"

        if boDebug:
            print( "returned from daily_bar_writer")
            print( "calling asset_db_writer")
            print( "dfMetadata",type(dfMetadata))
            print( dfMetadata)

        # Not sure why symbol_map is needed
        symbol_map = pd.Series(dfMetadata.symbol.index, dfMetadata.symbol)
        if boDebug:
            print( "symbol_map",type(symbol_map))
            print( symbol_map)

        asset_db_writer.write(equities=dfMetadata)

        if boDebug:
            print( "returned from asset_db_writer")
            print( "calling adjustment_writer")

        adjustment_writer.write()

        if boDebug:
            print( "returned from adjustment_writer")
            print( "now leaving ingest function")

    if boDebug:
       print( "about to return ingest function")
    return ingest


In [23]:
from zipline.data.bundles import register
DATA_PATH = './data/D1/'

register(
    'fxcm_test',    # name this whatever you like
    via_fxcm_csv(DATA_PATH),
)
!zipline ingest -b fxcm_test

entering via_fxcm_csv.  tuSymbols= ('EURUSD', 'GER30', 'NAS100', 'SPX500', 'US30')
about to return ingest function


  


Error: No bundle registered with the name 'fxcm_test'


# Installation
1. install zipline
2. 

# Load m1 data 

In [31]:
import pandas as pd
df=pd.read_csv(
    './data/m1/EURUSD_m1.csv',
                    index_col='date',
                    parse_dates=True,
                    ).sort_index()

In [4]:
def bidask_to_ohlc(df):
    df['open'] = (df['bidopen'] + df['askopen']) / 2
    df['high'] = (df['bidhigh'] + df['askhigh']) / 2
    df['low'] = (df['bidlow'] + df['asklow']) / 2
    df['close'] = (df['bidclose'] + df['askclose']) / 2
    df = df[['open', 'high', 'low', 'close']]
    return df

df = bidask_to_ohlc(df)

NameError: name 'df' is not defined

In [33]:
df = df.resample("1min").mean()
df.fillna(method="ffill", inplace=True)

In [36]:
print(df.index)
print(df.head())
df.to_csv('./data/m1/EURUSD_m1_cleaned.csv')

DatetimeIndex(['2001-11-28 04:14:00', '2001-11-28 04:15:00',
               '2001-11-28 04:16:00', '2001-11-28 04:17:00',
               '2001-11-28 04:18:00', '2001-11-28 04:19:00',
               '2001-11-28 04:20:00', '2001-11-28 04:21:00',
               '2001-11-28 04:22:00', '2001-11-28 04:23:00',
               ...
               '2018-08-29 07:40:00', '2018-08-29 07:41:00',
               '2018-08-29 07:42:00', '2018-08-29 07:43:00',
               '2018-08-29 07:44:00', '2018-08-29 07:45:00',
               '2018-08-29 07:46:00', '2018-08-29 07:47:00',
               '2018-08-29 07:48:00', '2018-08-29 07:49:00'],
              dtype='datetime64[ns]', name='date', length=8810136, freq='T')
                         open      high       low     close
date                                                       
2001-11-28 04:14:00  0.884895  0.885195  0.884695  0.884695
2001-11-28 04:15:00  0.884695  0.884895  0.884495  0.884495
2001-11-28 04:16:00  0.884695  0.884895  0.884495  0.

In [27]:
ohlc_dict = {'open':'first', 'high':'max', 'low':'min', 'close': 'last'}
df = df.resample("24h", base=21, how=ohlc_dict).dropna(how='any')
df.index = df.index + df.index.map(lambda x: pd.Timedelta(hours=3))
df.head()

the new syntax is .resample(...)..apply(<func>)
  


Unnamed: 0_level_0,low,high,open,close
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2001-11-28,0.882795,0.888595,0.884895,0.888095
2001-11-29,0.885795,0.891895,0.887595,0.887895
2001-11-30,0.884895,0.898295,0.887295,0.896695
2001-12-02,0.896595,0.896595,0.896595,0.896595
2001-12-03,0.889195,0.896895,0.896495,0.891795


In [28]:
print(df.dtypes)
print(df.head())
print(df.tail())
print(df.index)

low      float64
high     float64
open     float64
close    float64
dtype: object
                 low      high      open     close
date                                              
2001-11-28  0.882795  0.888595  0.884895  0.888095
2001-11-29  0.885795  0.891895  0.887595  0.887895
2001-11-30  0.884895  0.898295  0.887295  0.896695
2001-12-02  0.896595  0.896595  0.896595  0.896595
2001-12-03  0.889195  0.896895  0.896495  0.891795
                 low      high      open     close
date                                              
2018-08-23  1.152980  1.159975  1.159680  1.153920
2018-08-24  1.153485  1.163990  1.153920  1.162155
2018-08-27  1.159455  1.169370  1.162435  1.167755
2018-08-28  1.166250  1.173355  1.167755  1.169430
2018-08-29  1.166685  1.169760  1.169440  1.167125
DatetimeIndex(['2001-11-28', '2001-11-29', '2001-11-30', '2001-12-02',
               '2001-12-03', '2001-12-04', '2001-12-05', '2001-12-06',
               '2001-12-07', '2001-12-09',
               ...


In [29]:
df.to_csv('./data/D1/EURUSD_D1.csv')

In [2]:
%load_ext zipline
import pandas as pd
from trading_calendars.calendar_forex import ForexCalendar

In [11]:
df_daily =pd.read_csv(
    '../FXCM-Data-Downloader/AUDCAD_D1.csv',
                    index_col='date',
                    parse_dates=True,
                    ).sort_index() 
df_daily = df_daily.resample("1D").mean()
df_daily = bidask_to_ohlc(df_daily)
df_daily.fillna(method="ffill", inplace=True)
calendar = ForexCalendar()
df_daily = calendar.filter_dates(df_daily)
#print(df_daily.loc[df.index])

In [12]:
print(df_daily.tail(30))

                open      high       low     close
2018-07-29  0.966130  0.966200  0.965395  0.965985
2018-07-30  0.965995  0.967465  0.962630  0.965615
2018-07-31  0.965615  0.970865  0.964805  0.965865
2018-08-01  0.965865  0.966840  0.961035  0.962860
2018-08-02  0.962860  0.963715  0.958330  0.958485
2018-08-03  0.958485  0.962580  0.957870  0.961330
2018-08-05  0.961690  0.961705  0.961210  0.961360
2018-08-06  0.961360  0.963120  0.960405  0.960555
2018-08-07  0.960545  0.970160  0.959945  0.968640
2018-08-08  0.968645  0.970690  0.966425  0.967765
2018-08-09  0.967765  0.969380  0.961775  0.962215
2018-08-10  0.962215  0.962750  0.952810  0.959480
2018-08-12  0.957270  0.959715  0.954420  0.957810
2018-08-13  0.957800  0.959165  0.953770  0.954965
2018-08-14  0.954965  0.955400  0.945030  0.945670
2018-08-15  0.945665  0.952490  0.941795  0.951365
2018-08-16  0.951375  0.957050  0.948955  0.955205
2018-08-17  0.955200  0.957395  0.950810  0.955440
2018-08-19  0.954790  0.955915 

In [46]:
print(trading_dates[-100:])

DatetimeIndex(['2018-05-08', '2018-05-09', '2018-05-10', '2018-05-11',
               '2018-05-13', '2018-05-14', '2018-05-15', '2018-05-16',
               '2018-05-17', '2018-05-18', '2018-05-20', '2018-05-21',
               '2018-05-22', '2018-05-23', '2018-05-24', '2018-05-25',
               '2018-05-27', '2018-05-28', '2018-05-29', '2018-05-30',
               '2018-05-31', '2018-06-01', '2018-06-03', '2018-06-04',
               '2018-06-05', '2018-06-06', '2018-06-07', '2018-06-08',
               '2018-06-10', '2018-06-11', '2018-06-12', '2018-06-13',
               '2018-06-14', '2018-06-15', '2018-06-17', '2018-06-18',
               '2018-06-19', '2018-06-20', '2018-06-21', '2018-06-22',
               '2018-06-24', '2018-06-25', '2018-06-26', '2018-06-27',
               '2018-06-28', '2018-06-29', '2018-07-01', '2018-07-02',
               '2018-07-03', '2018-07-04', '2018-07-05', '2018-07-06',
               '2018-07-08', '2018-07-09', '2018-07-10', '2018-07-11',
      