In [1]:
from quantopian.pipeline import Pipeline
from quantopian.research import run_pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.factors import SimpleMovingAverage, AverageDollarVolume

## Custom Factors
When we first looked at factors, we explored the set of built-in factors. Frequently, a desired computation isn't included as a built-in factor. One of the most powerful features of the Pipeline API is that it allows us to define our own custom factors. When a desired computation doesn't exist as a built-in, we define a custom factor.

Conceptually, a custom factor is identical to a built-in factor. It accepts `inputs`, `window_length`, and `mask` as constructor arguments, and returns a `Factor` object each day.

Let's take an example of a computation that doesn't exist as a built-in: standard deviation. To create a factor that computes the [standard deviation](https://en.wikipedia.org/wiki/Standard_deviation) over a trailing window, we can subclass `quantopian.pipeline.CustomFactor` and implement a compute method whose signature is:


```
def compute(self, today, asset_ids, out, *inputs):
    ...
```

- `*inputs` are M x N [numpy arrays](http://docs.scipy.org/doc/numpy-1.10.1/reference/arrays.ndarray.html), where M is the `window_length` and N is the number of securities (usually around ~8000 unless a `mask` is provided). `*inputs` are trailing data windows. Note that there will be one M x N array for each `BoundColumn` provided in the factor's `inputs` list. The data type of each array will be the `dtype` of the corresponding `BoundColumn`.
- `out` is an empty array of length N. `out` will be the output of our custom factor each day. The job of compute is to write output values into out.
- `asset_ids` will be an integer [array](http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.array.html) of length N containing security ids corresponding to the columns in our `*inputs` arrays.
- `today` will be a [pandas Timestamp](http://pandas.pydata.org/pandas-docs/stable/timeseries.html#converting-to-timestamps) representing the day for which `compute` is being called.

Of these, `*inputs` and `out` are most commonly used.

An instance of `CustomFactor` that’s been added to a pipeline will have its compute method called every day. For example, let's define a custom factor that computes the standard deviation of the close price over the last 30 days. To start, let's add `CustomFactor` and `numpy` to our import statements.

In [2]:
from quantopian.pipeline import CustomFactor
import numpy

Next, let's define our custom factor to calculate the standard deviation over a trailing window using [numpy.nanstd](http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.nanstd.html):

In [6]:
class StdDev(CustomFactor):
    def compute(self, today, asset_ids, out, values):
        # Calculates the column-wise standard deviation, ignoring NaNs
        out[:] = numpy.nanstd(values, axis=0)

Finally, let's instantiate our factor in `make_pipeline()`:

In [4]:
def make_pipeline():
    std_dev = StdDev(inputs=[USEquityPricing.close], window_length=30)

    return Pipeline(
        columns={'std dev': std_dev})

When this pipeline is run, `StdDev.compute()` will be called every day with data as follows:

- `values`: An M x N [numpy array](http://docs.scipy.org/doc/numpy-1.10.1/reference/arrays.ndarray.html), where M is 5 (`window_length`), and N is ~8000 (the number of securities in our database on the day in question).
- `out`: An empty array of length N (~8000). In this example, the job of `compute` is to populate `out` with an array storing of 5-day close price standard deviations.

In [7]:
result = run_pipeline(make_pipeline(), '2017-12-27', '2017-12-27')
result.head(20)





Unnamed: 0,Unnamed: 1,std dev
2017-12-27 00:00:00+00:00,Equity(2 [HWM]),1.014845
2017-12-27 00:00:00+00:00,Equity(21 [AAME]),0.18574
2017-12-27 00:00:00+00:00,Equity(24 [AAPL]),2.170619
2017-12-27 00:00:00+00:00,Equity(25 [HWM_PR]),0.959581
2017-12-27 00:00:00+00:00,Equity(31 [ABAX]),1.279765
2017-12-27 00:00:00+00:00,Equity(41 [ARCB]),1.939977
2017-12-27 00:00:00+00:00,Equity(52 [ABM]),2.075785
2017-12-27 00:00:00+00:00,Equity(53 [ABMD]),3.484061
2017-12-27 00:00:00+00:00,Equity(62 [ABT]),0.776153
2017-12-27 00:00:00+00:00,Equity(64 [GOLD]),0.260351


### Default Inputs
When writing a custom factor, we can set default `inputs` and `window_length` in our `CustomFactor` subclass. For example, let's define the `ThirtyDayMeanDifference` custom factor to compute the mean difference between two data columns over a trailing window using [numpy.nanmean](http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.nanmean.html). Let's set the default `inputs` to `[USEquityPricing.close, USEquityPricing.open]` and the default `window_length` to 30:

In [10]:
class ThirtyDayMeanDifference(CustomFactor):
    # Default inputs.
    inputs = [USEquityPricing.close, USEquityPricing.open]
    window_length = 30
    def compute(self, today, asset_ids, out, close, open):
        # Calculates the column-wise mean difference, ignoring NaNs
        out[:] = numpy.nanmean(close - open, axis=0)

<i>Remember in this case that `close` and `open` are each 30 x ~8000 2D [numpy arrays.](http://docs.scipy.org/doc/numpy-1.10.1/reference/arrays.ndarray.html)</i>

If we call `ThirtyDayMeanDifference` without providing any arguments, it will use the defaults.

In [13]:
def make_pipeline():
    # Computes the 30-day mean difference between the daily open and close prices.
    close_open_diff = ThirtyDayMeanDifference()
    
    return Pipeline(columns = {'close-open diff': close_open_diff})

run_pipeline(make_pipeline(), '2017-11-27', '2017-12-27').head(10)



Unnamed: 0,Unnamed: 1,close-open diff
2017-11-27 00:00:00+00:00,Equity(2 [HWM]),-0.112771
2017-11-27 00:00:00+00:00,Equity(21 [AAME]),-0.022538
2017-11-27 00:00:00+00:00,Equity(24 [AAPL]),0.319677
2017-11-27 00:00:00+00:00,Equity(25 [HWM_PR]),0.335714
2017-11-27 00:00:00+00:00,Equity(31 [ABAX]),0.050667
2017-11-27 00:00:00+00:00,Equity(41 [ARCB]),0.088408
2017-11-27 00:00:00+00:00,Equity(52 [ABM]),-0.036667
2017-11-27 00:00:00+00:00,Equity(53 [ABMD]),0.4772
2017-11-27 00:00:00+00:00,Equity(62 [ABT]),0.0885
2017-11-27 00:00:00+00:00,Equity(64 [GOLD]),-0.084867


The defaults can be manually overridden by specifying arguments in the constructor call.

In [15]:
def make_pipeline():
    # Computes the 30-day mean difference between the daily high and low prices.
    high_low_diff = ThirtyDayMeanDifference(inputs=[USEquityPricing.high, USEquityPricing.low])
    
    return Pipeline(columns = {'high-low diff': high_low_diff})

run_pipeline(make_pipeline(), '2017-11-27', '2017-12-27').head(10)



Unnamed: 0,Unnamed: 1,high-low diff
2017-11-27 00:00:00+00:00,Equity(2 [HWM]),0.6408
2017-11-27 00:00:00+00:00,Equity(21 [AAME]),0.144654
2017-11-27 00:00:00+00:00,Equity(24 [AAPL]),2.177083
2017-11-27 00:00:00+00:00,Equity(25 [HWM_PR]),0.524286
2017-11-27 00:00:00+00:00,Equity(31 [ABAX]),1.313933
2017-11-27 00:00:00+00:00,Equity(41 [ARCB]),1.228563
2017-11-27 00:00:00+00:00,Equity(52 [ABM]),0.537233
2017-11-27 00:00:00+00:00,Equity(53 [ABMD]),4.121467
2017-11-27 00:00:00+00:00,Equity(62 [ABT]),0.702433
2017-11-27 00:00:00+00:00,Equity(64 [GOLD]),0.277267


### Further Example
Let's take another example where we build a [momentum](http://www.investopedia.com/terms/m/momentum.asp) custom factor and use it to create a filter. We will then use that filter as a `screen` for our pipeline.

Let's start by defining a `Momentum` factor to be the division of the most recent close price by the close price from `n` days ago where `n` is the `window_length`.

In [16]:
class Momentum(CustomFactor):
    # Default inputs
    inputs = [USEquityPricing.close]

    # Compute momentum
    def compute(self, today, assets, out, close):
        out[:] = close[-1] / close[0]

Now, let's instantiate our `Momentum` factor (twice) to create a 30-day momentum factor and a 60-day momentum factor. Let's also create a `positive_momentum` filter returning `True` for securities with both a positive 30-day momentum and a positive 60-day momentum.

In [17]:
thirty_day_momentum = Momentum(window_length=30)
sixty_day_momentum = Momentum(window_length=60)

positive_momentum = ((thirty_day_momentum >= 0) & (sixty_day_momentum >= 0))

Next, let's add our momentum factors and our `positive_momentum` filter to `make_pipeline`. Let's also pass `positive_momentum` as a `screen` to our pipeline.

In [20]:
def make_pipeline():

    thirty_day_momentum = Momentum(window_length=30)
    sixty_day_momentum = Momentum(window_length=60)

    positive_momentum = ((thirty_day_momentum >= 0) & (sixty_day_momentum >= 0))

    std_dev = StdDev(inputs=[USEquityPricing.close], window_length=5)

    return Pipeline(columns={
                    'std dev': std_dev,
                    'thirty day momentum': thirty_day_momentum,
                    'sixty day momentum': sixty_day_momentum},
                    screen=positive_momentum)

Running this pipeline outputs the standard deviation and each of our momentum computations for securities with positive 10-day and 20-day momentum.

In [21]:
result = run_pipeline(make_pipeline(), '2017-12-27', '2017-12-27')
result.head(20)



Unnamed: 0,Unnamed: 1,sixty day momentum,std dev,thirty day momentum
2017-12-27 00:00:00+00:00,Equity(2 [HWM]),1.046868,0.527811,1.137873
2017-12-27 00:00:00+00:00,Equity(24 [AAPL]),1.112952,1.686416,0.9804
2017-12-27 00:00:00+00:00,Equity(31 [ABAX]),1.086465,0.319837,1.086465
2017-12-27 00:00:00+00:00,Equity(41 [ARCB]),1.082004,0.27313,1.15748
2017-12-27 00:00:00+00:00,Equity(52 [ABM]),0.907375,0.133776,0.953946
2017-12-27 00:00:00+00:00,Equity(53 [ABMD]),1.116843,1.6407,0.971965
2017-12-27 00:00:00+00:00,Equity(62 [ABT]),1.066469,0.07494,1.030193
2017-12-27 00:00:00+00:00,Equity(64 [GOLD]),0.905764,0.09975,1.045911
2017-12-27 00:00:00+00:00,Equity(66 [AB]),1.077104,0.04,1.030364
2017-12-27 00:00:00+00:00,Equity(67 [ADSK]),0.923077,0.619206,0.832798


Custom factors allow us to define custom computations in a pipeline. They are frequently the best way to perform computations on [partner datasets](https://www.quantopian.com/data) or on multiple data columns. The full documentation for CustomFactors is available [here](https://www.quantopian.com/help#custom-factors).

In the next lesson, we'll use everything we've learned so far to create a pipeline for an algorithm.