# Backtesting with zipline - Pipeline API with Custom Data

The key steps you need to take to define and backtest a trading strategy and evaluate the results are as follows:
1. Call the Zipline function `load_extensions()` to identify the bundle location
2. Optionally but helpful: set up logging
3. Define the number of long and short positions to take each day.
4. Load the bundle data
5. Load your model predictions (both in-sample and out-of-sample) generated for the last milestone and replace the tickers with the bundle's `sid` values to allow Zipline to align predictions and price data.
6. Subclass the `zipline.pipeline.DataSet` to create a custom `zipline.pipeline.Column` of type `float` for the domain `US_EQUITIES`.
7. Define the custom `zipline.pipeline.DataFrameLoader` that will populate the `DataSet` we just created using the model predictions. 
8. Incorporate the predictions into the `zipline.pipeline.Pipeline` by creating a `zipline.pipeline.CustomFactor` that simply passes the model predictions along
9. Create the actual `zipline.pipeline import Pipeline` that receives the model inputs via the `CustomFactor` and selects the target long and short positions based on the model predictions.
10. Now we'll define several core components of the algorithm: 
    - an `initialize()` method that sets environment variables for the algo like the universe and the number of positions, defines commission and slippage, and uses `schedule_function()` to determine when to rebalance the portfolio, i.e., make trades, or to record variables.
    - a `before_trading_start()` method that retrieves the current Pipeline values
    - a `rebalance()` method that manages the transition from the current portfolio positions to the target holdings implied by the model predictions
    - an (optional) `record_vars()` method that stores certain values, such as the actual number of short or long positions
11. Now we're ready to call the `run_algorithm()` function to execute the backtest for the target time period; we'll use the first in-sample prediction as start date and the last out-of-sample prediction as end date. We'll pass our `custom_loader` defined above to the parameter of the same name and otherwiese use default values.
12. Use the `pyfolio.utils` function `extract_rets_pos_txn_from_zipline` to generate pyfolio inputs from the return values of the `run_algorithm()` function.
13. Create a pyfolio tearsheet using these inputs with the date of the first out-of-sample prediction as `live_start_date`.
14. Next steps: consider variations in trading frequency, number of positions, or transaction costs - how do they affect the outcome?

## Imports & Settings

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
%matplotlib inline

from pathlib import Path
from collections import defaultdict
from time import time

import numpy as np
import pandas as pd
import pandas_datareader.data as web

from logbook import Logger, StderrHandler, INFO, WARNING

from zipline import run_algorithm
from zipline.api import (attach_pipeline, pipeline_output,
                         date_rules, time_rules, record,
                         schedule_function, commission, slippage,
                         set_slippage, set_commission, set_max_leverage,
                         order_target, order_target_percent,
                         get_open_orders, cancel_order)
from zipline.data import bundles
from zipline.utils.run_algo import load_extensions
from zipline.pipeline import Pipeline, CustomFactor
from zipline.pipeline.data import Column, DataSet
from zipline.pipeline.domain import US_EQUITIES
from zipline.pipeline.filters import StaticAssets
from zipline.pipeline.loaders import USEquityPricingLoader
from zipline.pipeline.loaders.frame import DataFrameLoader
from trading_calendars import get_calendar

import pyfolio as pf
from pyfolio.plotting import plot_rolling_returns, plot_rolling_sharpe
from pyfolio.timeseries import forecast_cone_bootstrap

import matplotlib.pyplot as plt
import seaborn as sns

In [4]:
DATA_PATH = Path('..', 'data')

### Load zipline extensions

Only need this in notebook to find bundle.

In [5]:
load_extensions(default=True,
                extensions=[],
                strict=True,
                environ=None)

In [6]:
log_handler = StderrHandler(format_string='[{record.time:%Y-%m-%d %H:%M:%S.%f}]: ' +
                            '{record.level_name}: {record.func_name}: {record.message}',
                            level=WARNING)
log_handler.push_application()
log = Logger('Algorithm')

## Algo Params

## Load Data

### Quandl Wiki Bundle

### ML Predictions

## Pipeline Setup

## Initialize Algorithm

## Define Rebalancing Logic

## Run Algorithm

## PyFolio Analysis