<img alt="QuantRocket logo" src="https://www.quantrocket.com/assets/img/notebook-header-logo.png">

<a href="https://www.quantrocket.com/disclaimer/">Disclaimer</a>

# Filters
A Filter is a function from an asset and a moment in time to a boolean:

```
F(asset, timestamp) -> boolean

```

In Pipeline, Filters are used for narrowing down the set of securities included in a computation or in the final output of a pipeline. There are two common ways to create a `Filter`: comparison operators and `Factor`/`Classifier` methods.

In [1]:
from zipline.pipeline import Pipeline
from zipline.research import run_pipeline
from zipline.pipeline.data import EquityPricing
from zipline.pipeline.factors import SimpleMovingAverage

## Comparison Operators

Comparison operators on `Factors` and `Classifiers` produce Filters. Since we haven't looked at `Classifiers` yet, let's stick to examples using `Factors`. The following example produces a filter that returns `True` whenever the latest close price is above $20.

In [2]:
last_close_price = EquityPricing.close.latest
close_price_filter = last_close_price > 20

And this example produces a filter that returns True whenever the 10-day mean is below the 30-day mean.

In [3]:
mean_close_10 = SimpleMovingAverage(inputs=[EquityPricing.close], window_length=10)
mean_close_30 = SimpleMovingAverage(inputs=[EquityPricing.close], window_length=30)
mean_crossover_filter = mean_close_10 < mean_close_30

Remember, each security will get its own `True` or `False` value each day.

## Factor/Classifier Methods

Various methods of the `Factor` and `Classifier` classes return `Filters`. Again, since we haven't yet looked at `Classifiers`, let's stick to `Factor` methods for now (we'll look at `Classifier` methods later). The `Factor.top(n)` method produces a `Filter` that returns `True` for the top `n` securities of a given `Factor`. The following example produces a filter that returns `True` for exactly 200 securities every day, indicating that those securities were in the top 200 by last close price across all known securities.

In [4]:
last_close_price = EquityPricing.close.latest
top_close_price_filter = last_close_price.top(200)

For a full list of `Factor` methods that return `Filters`, see the [Factor API Reference](https://www.quantrocket.com/docs/api/#zipline.pipeline.Factor).

For a full list of `Classifier` methods that return `Filters`, see the [Classifier API Reference](https://www.quantrocket.com/docs/api/#zipline.pipeline.Classifier).

## Dollar Volume Filter
As a starting example, let's create a filter that returns `True` if a security's 30-day average dollar volume is above $10,000,000. To do this, we'll first need to create an `AverageDollarVolume` factor to compute the 30-day average dollar volume. Let's include the built-in `AverageDollarVolume` factor in our imports:

In [5]:
from zipline.pipeline.factors import AverageDollarVolume

And then, let's instantiate our average dollar volume factor.

In [6]:
dollar_volume = AverageDollarVolume(window_length=30)

By default, `AverageDollarVolume` uses `EquityPricing.close` and `EquityPricing.volume` as its `inputs`, so we don't specify them.

Now that we have a dollar volume factor, we can create a filter with a boolean expression. The following line creates a filter returning `True` for securities with a `dollar_volume` greater than 10,000,000:

In [7]:
high_dollar_volume = (dollar_volume > 10000000)

To see what this filter looks like, let's can add it as a column to the pipeline we defined in the previous lesson.

In [8]:
def make_pipeline():

    mean_close_10 = SimpleMovingAverage(inputs=[EquityPricing.close], window_length=10)
    mean_close_30 = SimpleMovingAverage(inputs=[EquityPricing.close], window_length=30)

    percent_difference = (mean_close_10 - mean_close_30) / mean_close_30
    
    dollar_volume = AverageDollarVolume(window_length=30)
    high_dollar_volume = (dollar_volume > 10000000)

    return Pipeline(
        columns={
            'percent_difference': percent_difference,
            'high_dollar_volume': high_dollar_volume
        }
    )

If we make and run our pipeline, we now have a column `high_dollar_volume` with a boolean value corresponding to the result of the expression for each security.

In [9]:
result = run_pipeline(make_pipeline(), start_date='2015-05-05', end_date='2015-05-05')
result

Unnamed: 0,Unnamed: 1,high_dollar_volume,percent_difference
2015-05-05 00:00:00+00:00,Equity(FIBBG000C2V3D6 [A]),True,-0.000111
2015-05-05 00:00:00+00:00,Equity(FIBBG00B3T3HD3 [AA]),False,
2015-05-05 00:00:00+00:00,Equity(QI000000004076 [AABA]),True,-0.017429
2015-05-05 00:00:00+00:00,Equity(FIBBG006T1NZ18 [AAC]),False,0.048123
2015-05-05 00:00:00+00:00,Equity(FIBBG001B9VR83 [AAC]),False,
2015-05-05 00:00:00+00:00,Equity(FIBBG000V2S3P6 [AACG]),False,0.052646
2015-05-05 00:00:00+00:00,Equity(FIBBG000BDYRW6 [AADR]),False,0.021066
2015-05-05 00:00:00+00:00,Equity(FIBBG002MYG6B3 [AAIT]),False,0.007100
2015-05-05 00:00:00+00:00,Equity(FIBBG005P7Q881 [AAL]),True,0.007729
2015-05-05 00:00:00+00:00,Equity(FIBBG003PNL136 [AAMC]),False,0.008597


## Applying a Screen
By default, a pipeline produces computed values each day for every asset in the data bundle. Very often however, we only care about a subset of securities that meet specific criteria (for example, we might only care about securities that have enough daily trading volume to fill our orders quickly). We can tell our Pipeline to ignore securities for which a filter produces `False` by passing that filter to our Pipeline via the `screen` keyword.

To screen our pipeline output for securities with a 30-day average dollar volume greater than $10,000,000, we can simply pass our `high_dollar_volume` filter as the `screen` argument. This is what our `make_pipeline` function now looks like:

In [10]:
def make_pipeline():

    mean_close_10 = SimpleMovingAverage(inputs=[EquityPricing.close], window_length=10)
    mean_close_30 = SimpleMovingAverage(inputs=[EquityPricing.close], window_length=30)

    percent_difference = (mean_close_10 - mean_close_30) / mean_close_30

    dollar_volume = AverageDollarVolume(window_length=30)
    high_dollar_volume = dollar_volume > 10000000

    return Pipeline(
        columns={
            'percent_difference': percent_difference
        },
        screen=high_dollar_volume
    )

When we run this, the pipeline output only includes securities that pass the `high_dollar_volume` filter on a given day. For example, running this pipeline on May 5th, 2015 results in an output for ~2,200 securities

In [11]:
result = run_pipeline(make_pipeline(), start_date='2015-05-05', end_date='2015-05-05')
print(f'Number of securities that passed the filter: {len(result)}')
result

Number of securities that passed the filter: 2253


Unnamed: 0,Unnamed: 1,percent_difference
2015-05-05 00:00:00+00:00,Equity(FIBBG000C2V3D6 [A]),-0.000111
2015-05-05 00:00:00+00:00,Equity(QI000000004076 [AABA]),-0.017429
2015-05-05 00:00:00+00:00,Equity(FIBBG005P7Q881 [AAL]),0.007729
2015-05-05 00:00:00+00:00,Equity(FIBBG000D9V7T4 [AAN]),0.095343
2015-05-05 00:00:00+00:00,Equity(FIBBG000F7RCJ1 [AAP]),-0.010067
2015-05-05 00:00:00+00:00,Equity(FIBBG000B9XRY4 [AAPL]),0.016827
2015-05-05 00:00:00+00:00,Equity(FIBBG000Q57YP0 [AAWW]),0.023100
2015-05-05 00:00:00+00:00,Equity(FIBBG000G6GXC5 [AAXJ]),0.030521
2015-05-05 00:00:00+00:00,Equity(FIBBG000BHJWG1 [AAXN]),0.102930
2015-05-05 00:00:00+00:00,Equity(FIBBG000DK5Q25 [ABB]),0.009474


## Inverting a Filter
The `~` operator is used to invert a filter, swapping all `True` values with `Falses` and vice-versa. For example, we can write the following to filter for low dollar volume securities:

In [12]:
low_dollar_volume = ~high_dollar_volume

This will return `True` for all securities with an average dollar volume below or equal to $10,000,000 over the last 30 days.

---

**Next Lesson:** [Combining Filters](Lesson07-Combining-Filters.ipynb) 