# Complete Backtest Example

This notebook run through a complete backtest example using raw data (external to nautilus) to a parameterised run 

## Getting some raw data

Before we start the notebook - as a once off we need to download some sample data for backtesting

For this notebook we will use Forex data from `histdata.com`, simply go to https://www.histdata.com/download-free-forex-historical-data/?/ascii/tick-data-quotes/ and select a Forex pair and one or more months of data to download.

Once you have downloaded the data, set the variable `DATA_DIR` below to the directory containing the data. By default it will use the users `Downloads` directory.

In [1]:
DATA_DIR = "~/Downloads/"

Run the cell below; you should see the files that you downloaded

In [2]:
import fsspec
fs = fsspec.filesystem('file')
raw_files = fs.glob(f"{DATA_DIR}/HISTDATA*")
assert raw_files, f"Unable to find any histdata files in directory {DATA_DIR}"
raw_files

['/Users/bradleymcelroy/Downloads/HISTDATA_COM_ASCII_AUDUSD_T202001.zip',
 '/Users/bradleymcelroy/Downloads/HISTDATA_COM_ASCII_AUDUSD_T202106',
 '/Users/bradleymcelroy/Downloads/HISTDATA_COM_ASCII_AUDUSD_T202106.zip',
 '/Users/bradleymcelroy/Downloads/HISTDATA_COM_ASCII_EURUSD_T202101',
 '/Users/bradleymcelroy/Downloads/HISTDATA_COM_ASCII_EURUSD_T202101.txt',
 '/Users/bradleymcelroy/Downloads/HISTDATA_COM_ASCII_EURUSD_T202101.zip']

## The Data Catalog

Next we will load this raw data into the data catalog. The data catalog is a central store for nautilus data, persisted in the [Parquet](https://parquet.apache.org) file format.

We have chosen parquet as the storage format for the following reasons:
- It performs much better than CSV/JSON/HDF5/etc in terms of compression (storage size) and read performance
- It does not require any separate running components (for example a database)
- It is quick and simple for someone to get up and running with

## Loading data into the catalog

We can load data from various sources into the data catalog using helper methods in the `nautilus_trader.persistence.external.readers` module. The module contains methods for reading various data formats (csv, json, txt), minimising the amount of code required to get data loaded correctly into the data catalog.

The Forex data from `histdata` is stored in csv/text format, with fields `timestamp, bid_price, ask_price`. To load the data into the catalog, we simply write a function that converts each row into a nautilus object (in this case, a `QuoteTick`). For this example, we will use the `TextReader` helper, which allows reading and applying a parsing function line by line.

Then, we simply instantiate a data catalog (passing in a directory where to store the data, by default we will just use the current directory) and pass our parsing function wrapping in the Reader class to `process_files`. We also need to know about which instrument this data is for; in this example, we will simply use one of the nautilus test helpers to create a Forex instrument.

It should only take a couple of minutes to load the data (depending on how many months)

In [3]:
import datetime
import pandas as pd

from nautilus_trader.persistence.catalog import DataCatalog
from nautilus_trader.persistence.external.core import process_files, write_objects
from nautilus_trader.persistence.external.readers import TextReader

from nautilus_trader.model.data.tick import QuoteTick
from nautilus_trader.model.objects import Price, Quantity
from nautilus_trader.core.datetime import dt_to_unix_nanos


from tests.test_kit.providers import TestInstrumentProvider

In [4]:
def parser(line):
    ts, bid, ask, idx = line.split(b",")
    dt = pd.Timestamp(datetime.datetime.strptime(ts.decode(), "%Y%m%d %H%M%S%f"), tz='UTC')
    yield QuoteTick(
        instrument_id=AUDUSD.id,
        bid=Price.from_str(bid.decode()),
        ask=Price.from_str(ask.decode()),
        bid_size=Quantity.from_int(100_000),
        ask_size=Quantity.from_int(100_000),
        ts_event=dt_to_unix_nanos(dt),
        ts_init=dt_to_unix_nanos(dt),
    )

We'll set up a catalog in the current working directory

In [5]:
import os, shutil
CATALOG_PATH = os.getcwd() + "/catalog"

# Clear if it already exists, then create fresh
if os.path.exists(CATALOG_PATH):
    shutil.rmtree(CATALOG_PATH)
os.mkdir(CATALOG_PATH)

In [6]:
AUDUSD = TestInstrumentProvider.default_fx_ccy("AUD/USD")

catalog = DataCatalog(CATALOG_PATH)

process_files(
    glob_path=f"{DATA_DIR}/HISTDATA*202001*",
    reader=TextReader(line_parser=parser),
    catalog=catalog,
)

# Also manually write the AUDUSD instrument to the catalog
write_objects(catalog, [AUDUSD]) 

[########################################] | 100% Completed | 45.9s


## Using the Data Catalog 

Once data has been loaded into the catalog, the `catalog` instance can be used for loading data into the backtest engine, or simple for research purposes. It contains various methods to pull data from the catalog, like `quote_ticks` (show below))

In [7]:
catalog.instruments()

Unnamed: 0,id,base_currency,quote_currency,price_precision,size_precision,price_increment,size_increment,lot_size,max_quantity,min_quantity,...,min_notional,max_price,min_price,margin_init,margin_maint,maker_fee,taker_fee,info,ts_init,ts_event
0,AUD/USD.SIM,AUD,USD,5,0,1e-05,1,1000,10000000,1000,...,1_000.00 USD,,,0.03,0.03,2e-05,2e-05,,0,0


In [8]:
start = dt_to_unix_nanos(pd.Timestamp('2020-01-01', tz='UTC'))
end =  dt_to_unix_nanos(pd.Timestamp('2020-01-02', tz='UTC'))

catalog.quote_ticks(start=start, end=end)

Unnamed: 0,bid,bid_size,ask,ask_size,ts_event,ts_init,instrument_id
0,0.701370,100000,0.702130,100000,1577898010013000000,1577898010013000000,AUD/USD.SIM
1,0.701370,100000,0.701880,100000,1577898015267000000,1577898015267000000,AUD/USD.SIM
2,0.701370,100000,0.701790,100000,1577898054385000000,1577898054385000000,AUD/USD.SIM
3,0.701370,100000,0.701800,100000,1577898054665000000,1577898054665000000,AUD/USD.SIM
4,0.701370,100000,0.701780,100000,1577898060114000000,1577898060114000000,AUD/USD.SIM
...,...,...,...,...,...,...,...
11101,0.700180,100000,0.700270,100000,1577923150990000000,1577923150990000000,AUD/USD.SIM
11102,0.700180,100000,0.700270,100000,1577923170732000000,1577923170732000000,AUD/USD.SIM
11103,0.700180,100000,0.700270,100000,1577923170884000000,1577923170884000000,AUD/USD.SIM
11104,0.700190,100000,0.700290,100000,1577923186425000000,1577923186425000000,AUD/USD.SIM


## Configuring backtests

Nautilus has a top level object `BacktestRunConfig` that allows configuring a backtest in one place. It is a `Partialable` object which means it can be configured in stages; the benefits of which are reduced boilerplate code when creating multiple backtest runs (for example when doing some sort of grid search over parameters).

### Staring with a Venue

We can start partially configuring the config with just a Venue:

In [9]:
from nautilus_trader.backtest.config import BacktestRunConfig, BacktestVenueConfig, BacktestDataConfig, BacktestEngineConfig
from nautilus_trader.model.currencies import USD

# Create a `base` config object to be shared with all backtests
base = BacktestRunConfig(
    venues=[
        BacktestVenueConfig(
            name="SIM",
            venue_type="ECN",
            oms_type="HEDGING",
            account_type="MARGIN",
            base_currency="USD",
            starting_balances=["1000000 USD"],
        )
    ]
)
base

BacktestRunConfig(name=None, engine=None, venues=[BacktestVenueConfig(name='SIM', venue_type='ECN', oms_type='HEDGING', account_type='MARGIN', base_currency='USD', starting_balances=['1000000 USD'])], data=None, strategies=None)

### Adding Data

Notice many of the fields are `None` - we can continue to configure the backtest via `update`.

The `data_config` arg allows adding multiple data types (`quotes`, `trades`, `generic_data`), but for this example, we will simply load the quote ticks we added earlier.

In [10]:
import os

instrument = catalog.instruments(as_nautilus=True)[0]

data_config=[
    BacktestDataConfig(
        catalog_path=CATALOG_PATH,
        data_type=QuoteTick,
        instrument_id=instrument.id.value,
        start_time=1580398089820000000,
        end_time=1580504394501000000,
    )
]

config = base.update(
    data=data_config,
    engine=BacktestEngineConfig()
)

config

BacktestRunConfig(name=None, engine=BacktestEngineConfig(trader_id='BACKTESTER-000', log_level='INFO', cache=None, cache_database=None, data_engine=None, risk_engine=None, exec_engine=None, use_data_cache=False, bypass_logging=False, run_analysis=True), venues=[BacktestVenueConfig(name='SIM', venue_type='ECN', oms_type='HEDGING', account_type='MARGIN', base_currency='USD', starting_balances=['1000000 USD'])], data=[BacktestDataConfig(catalog_path='/Users/bradleymcelroy/projects/nautilus_trader_gh/examples/backtest/notebooks/catalog', data_type=<class 'nautilus_trader.model.data.tick.QuoteTick'>, catalog_fs_protocol=None, instrument_id='AUD/USD.SIM', start_time=1580398089820000000, end_time=1580504394501000000, filters=None, client_id=None)], strategies=None)

### Finally, add Strategy instances

We can perform a grid-search of some parameters by using the `replace` method, which returns a new copy of the config. We use the `ImportableStrategyConfig` object to tell nautilus where the `TradingStrategy` class exists, and add some config 

In [11]:
from nautilus_trader.model.enums import BarAggregation, PriceType, AggregationSource
from nautilus_trader.model.data.bar import BarSpecification, BarType

In [12]:
str(BarType(
    instrument_id=instrument.id,
    bar_spec=BarSpecification(15, BarAggregation.MINUTE, PriceType.BID),
    aggregation_source=AggregationSource.INTERNAL,
))

'AUD/USD.SIM-15-MINUTE-BID-INTERNAL'

In [13]:
from decimal import Decimal
from nautilus_trader.trading.config import ImportableStrategyConfig
from nautilus_trader.model.data.bar import BarSpecification
from nautilus_trader.model.enums import BarAggregation, PriceType

from examples.strategies.ema_cross_simple import EMACrossConfig


PARAM_SET = [
    {"fast_ema": 10, "slow_ema": 20},
    {"fast_ema": 20, "slow_ema": 30},
    {"fast_ema": 30, "slow_ema": 40},
]

configs = []
for params in PARAM_SET:
    strategies = [
        ImportableStrategyConfig(
            path="examples.strategies.ema_cross_simple:EMACross",
            config=EMACrossConfig(
                instrument_id=instrument.id.value,
                bar_type='AUD/USD.SIM-15-MINUTE-BID-INTERNAL',
                trade_size=Decimal(1_000_000),
                **params
            ),
        ),
    ]
    # Create the final config
    new = config.replace(strategies=strategies)
    
    configs.append(new)

### This gives us 3 parameter sets to backtest

In [14]:
print("\n\n".join(map(str, configs)))

BacktestRunConfig(name=None, engine=BacktestEngineConfig(trader_id='BACKTESTER-000', log_level='INFO', cache=None, cache_database=None, data_engine=None, risk_engine=None, exec_engine=None, use_data_cache=False, bypass_logging=False, run_analysis=True), venues=[BacktestVenueConfig(name='SIM', venue_type='ECN', oms_type='HEDGING', account_type='MARGIN', base_currency='USD', starting_balances=['1000000 USD'])], data=[BacktestDataConfig(catalog_path='/Users/bradleymcelroy/projects/nautilus_trader_gh/examples/backtest/notebooks/catalog', data_type=<class 'nautilus_trader.model.data.tick.QuoteTick'>, catalog_fs_protocol=None, instrument_id='AUD/USD.SIM', start_time=1580398089820000000, end_time=1580504394501000000, filters=None, client_id=None)], strategies=[ImportableStrategyConfig(path='examples.strategies.ema_cross_simple:EMACross', source=None, config=EMACrossConfig(order_id_tag='000', oms_type='HEDGING', instrument_id='AUD/USD.SIM', bar_type='AUD/USD.SIM-15-MINUTE-BID-INTERNAL', fast_e

# Run the backtest

Finally, we can create a BacktestNode and run the backtest

In [15]:
from nautilus_trader.backtest.node import BacktestNode
node = BacktestNode()

In [16]:
task = node.build_graph(run_configs=configs)

In [17]:
# Visualising the graph requires graphviz - `%pip install graphviz` in a notebook cell to install it

# task.visualize(rankdir='LR') 

^ Notice because our configs share the same data that only one instance of `load` is required

### Start up a local dask cluster to execute the graph

In [18]:
# Create a local dask client - not a requirement, but allows parallelising the runs
from distributed import Client
client = Client(n_workers=2)
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 2
Total threads: 8,Total memory: 16.00 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:65182,Workers: 2
Dashboard: http://127.0.0.1:8787/status,Total threads: 8
Started: Just now,Total memory: 16.00 GiB

0,1
Comm: tcp://127.0.0.1:65188,Total threads: 4
Dashboard: http://127.0.0.1:65190/status,Memory: 8.00 GiB
Nanny: tcp://127.0.0.1:65185,
Local directory: /Users/bradleymcelroy/projects/nautilus_trader_gh/examples/backtest/notebooks/dask-worker-space/worker-kw6sd341,Local directory: /Users/bradleymcelroy/projects/nautilus_trader_gh/examples/backtest/notebooks/dask-worker-space/worker-kw6sd341

0,1
Comm: tcp://127.0.0.1:65189,Total threads: 4
Dashboard: http://127.0.0.1:65191/status,Memory: 8.00 GiB
Nanny: tcp://127.0.0.1:65184,
Local directory: /Users/bradleymcelroy/projects/nautilus_trader_gh/examples/backtest/notebooks/dask-worker-space/worker-npy1s8ot,Local directory: /Users/bradleymcelroy/projects/nautilus_trader_gh/examples/backtest/notebooks/dask-worker-space/worker-npy1s8ot


### Run the backtests!

In [19]:
results = task.compute()


[1m2021-09-13T21:47:41.045493Z[0m [INF] BACKTESTER-000.BacktestEngine:  NAUTILUS TRADER - Algorithmic Trading Platform[0m
[1m2021-09-13T21:47:41.045499Z[0m [INF] BACKTESTER-000.BacktestEngine:  by Nautech Systems Pty Ltd.[0m
[1m2021-09-13T21:47:41.045510Z[0m [INF] BACKTESTER-000.BacktestEngine:  Copyright (C) 2015-2021. All rights reserved.[0m
[1m2021-09-13T21:47:41.045517Z[0m [INF] BACKTESTER-000.BacktestEngine:                                                                  [0m
[1m2021-09-13T21:47:41.045520Z[0m [INF] BACKTESTER-000.BacktestEngine:                             .......                              [0m
[1m2021-09-13T21:47:41.045523Z[0m [INF] BACKTESTER-000.BacktestEngine:                          .............                           [0m
[1m2021-09-13T21:47:41.045526Z[0m [INF] BACKTESTER-000.BacktestEngine:     .                  ......... .......                         [0m
[1m2021-09-13T21:47:41.045530Z[0m [INF] BACKTESTER-000.BacktestEngine:

### Compare the results

In [24]:
r = results.results[0][0]

In [31]:
import orjson

In [34]:
orjson.loads(r.account_balances.iloc[-1]['balances'])

[{'type': 'AccountBalance',
  'currency': 'USD',
  'total': '996396.92',
  'locked': '0.00',
  'free': '996396.92'}]

In [20]:
results

BacktestRunResults(results=[[BacktestResult(id='backtest-50a88f0ab208068b168d6856921367e7', account_balances=                                 account_id account_type base_currency  \
2020-01-30 15:28:14.275000+00:00    SIM-001       MARGIN           USD   
2020-01-30 20:15:00.720000+00:00    SIM-001       MARGIN           USD   
2020-01-30 20:15:00.720000+00:00    SIM-001       MARGIN           USD   
2020-01-30 20:15:00.720000+00:00    SIM-001       MARGIN           USD   
2020-01-30 20:15:00.720000+00:00    SIM-001       MARGIN           USD   
...                                     ...          ...           ...   
2020-01-31 10:30:00.869000+00:00    SIM-001       MARGIN           USD   
2020-01-31 16:59:46.918000+00:00    SIM-001       MARGIN           USD   
2020-01-31 16:59:46.918000+00:00    SIM-001       MARGIN           USD   
2020-01-31 16:59:46.918000+00:00    SIM-001       MARGIN           USD   
2020-01-31 16:59:46.918000+00:00    SIM-001       MARGIN           USD   

  