#### Companion notebook for Alphalens tutorial lesson 4

# Advanced Alphalens concepts

You've learned the basics of using Alphalens. This lesson explores the following advanced Alphalens concepts:

1. Determining how far an alpha factor's decay rate.
2. Dealing with a common Alphalens error named MaxLossExceededError.
3. Grouping assets by market cap, then analyzing each cap type individually.
4. Writing group neutral strategies.

**All sections of this lesson will use the data produced by the Pipeline created in the following cell. Please run it.**

**Important note**: Until this lesson, we passed the output of `run_pipeline()` to `get_clean_factor_and_forward_returns()` without any changes. This was possible because the previous lessons' Pipelines only returned one column. This lesson's Pipeline returns two columns, which means we need to *specify the column* we're passing as factor data. Look for commented code near `get_clean_factor_and_forward_returns()` in the following cell to see how to do this.

In [None]:
from quantopian.pipeline import Pipeline
from quantopian.research import run_pipeline
from quantopian.pipeline.factors import AverageDollarVolume
from quantopian.pipeline.data import factset, USEquityPricing
from alphalens.utils import get_clean_factor_and_forward_returns


def make_pipeline():
    # Filter out equities with low market capitalization
    market_cap_filter = factset.Fundamentals.mkt_val.latest > 500000000

    # Filter out equities with low volume
    volume_filter = AverageDollarVolume(window_length=200) > 2500000

    # Filter out equities with a close price below $5
    price_filter = USEquityPricing.close.latest > 5

    # Our final base universe
    base_universe = market_cap_filter & volume_filter & price_filter
    
    change_in_working_capital = factset.Fundamentals.wkcap_chg_qf.latest
    ciwc_processed = change_in_working_capital.winsorize(.2, .98).zscore()
    
    sales_per_working_capital = factset.Fundamentals.sales_wkcap_qf.latest
    spwc_processed = sales_per_working_capital.winsorize(.2, .98).zscore()

    factor_to_analyze = (ciwc_processed + spwc_processed).zscore()

    # The following columns will help us group assets by market cap. This will allow us to analyze
    # whether our alpha factor's predictiveness varies among assets with different market caps.
    market_cap = factset.Fundamentals.mkt_val.latest
    is_small_cap = market_cap.percentile_between(0, 100)
    is_mid_cap = market_cap.percentile_between(50, 100)
    is_large_cap = market_cap.percentile_between(90, 100)

    return Pipeline(
        columns = {
          'factor_to_analyze': factor_to_analyze, 
          'small_cap_filter': is_small_cap,
          'mid_cap_filter': is_mid_cap,
          'large_cap_filter': is_large_cap,
        },
        screen = (
            base_universe
            & factor_to_analyze.notnull()
            & market_cap.notnull()
        )
    )


pipeline_output = run_pipeline(make_pipeline(), '2013-1-1', '2014-1-1')
pricing_data = get_pricing(pipeline_output.index.levels[1], '2013-1-1', '2014-3-1', fields='open_price')

# To group by market cap, we will follow the following steps.

# Convert the "True" values to ones, so they can be added together
pipeline_output[['small_cap_filter', 'mid_cap_filter', 'large_cap_filter']] *= 1

# If a stock passed the large_cap filter, it also passed the mid_cap and small_cap filters.
# This means we can add the three columns, and stocks that are large_cap will get a value of 3,
# stocks that are mid cap will get a value of 2, and stocks that are small cap will get 1.
pipeline_output['cap_type'] = (
    pipeline_output['small_cap_filter'] + pipeline_output['mid_cap_filter'] + pipeline_output['large_cap_filter']
)

# drop the old columns, we don't need them anymore
pipeline_output.drop(['small_cap_filter', 'mid_cap_filter', 'large_cap_filter'], axis=1, inplace=True)

# rename the 1's, 2's and 3's for clarity
pipeline_output['cap_type'].replace([1, 2, 3], ['small_cap', 'mid_cap', 'large_cap'], inplace=True)

# the final product
pipeline_output.head(5)

## Visualizing An Alpha Factor's Decay Rate

A lot of fundamental data only comes out 4 times a year in quarterly reports. Because of this low frequency, it can be useful to increase the amount of time `get_clean_factor_and_forward_returns()` looks into the future to calculate returns. 

**Tip:** A month usually has 21 trading days, a quarter usually has 63 trading days, and a year usually has 252 trading days.

Let's say you're creating a strategy that buys stock in companies with rising profits (data that is released every 63 trading days). Would you only look 10 days into the future to analyze that factor? Probably not! But how do you decide how far to look forward?

**Run the following cell to chart our alpha factor's IC mean over time. The point where the line dips below 0 represents when our alpha factor's predictions stop being useful.**

In [None]:
longest_look_forward_period = 63 # week = 5, month = 21, quarter = 63, year = 252
range_step = 5

merged_data = get_clean_factor_and_forward_returns(
    factor = pipeline_output['factor_to_analyze'],
    prices = pricing_data,
    periods = range(1, longest_look_forward_period, range_step)
)

from alphalens.performance import mean_information_coefficient
mean_information_coefficient(merged_data).plot(title="IC Decay");

What do you think the chart will look like if we calculate the IC a full year into the future?

*Hint*: This is a setup for section two of this lesson.

In [None]:
factor_data = get_clean_factor_and_forward_returns(
    pipeline_output['factor_to_analyze'], 
    pricing_data,
    periods=range(1,252,20) # The third argument to the range statement changes the "step" of the range
)

mean_information_coefficient(factor_data).plot()

## Dealing With MaxLossExceededError

Oh no! What does `MaxLossExceededError` mean?

`get_clean_factor_and_forward_returns()` looks at how alpha factor data affects pricing data *in the future*. This means we need our pricing data to go further into the future than our alpha factor data **by at least as long as our forward looking period.** 

In this case, we'll change `get_pricing()`'s `end_date` to be at least a year after `run_pipeline()`'s `end_date`.

**Run the following cell to make those changes. As you can see, this alpha factor's IC decays quickly after a quarter, but comes back even stronger six months into the future. Interesting!**

In [None]:
new_pipeline_output = run_pipeline(
    make_pipeline(),
    start_date='2013-1-1', 
    end_date='2014-1-1' #  *** NOTE *** Our factor data ends in 2014
)

new_pricing_data = get_pricing(
    pipeline_output.index.levels[1], 
    start_date='2013-1-1',
    end_date='2015-2-1', # *** NOTE *** Our pricing data ends in 2015
    fields='open_price'
)

new_factor_data = get_clean_factor_and_forward_returns(
    new_pipeline_output['factor_to_analyze'], 
    new_pricing_data,
    periods=range(1,252,20) # Change the step to 10 or more for long look forward periods to save time
)

mean_information_coefficient(new_factor_data).plot()

*Note: MaxLossExceededError has two possible causes; forward returns computation and binning. We showed you how to fix forward returns computation here because it is much more common. Try passing `quantiles=None` and `bins=5` if you get MaxLossExceededError because of binning.*

## Analyzing Alpha Factors By Group

Alphalens allows you to group assets using a classifier. A common use case for this is classifying equities by market cap then comparing your alpha factor's returns among cap types.

You can group assets by any classifier, but sector and market cap are most common. The Pipeline in the first cell of this lesson returns a column named `cap_type`, whose values represent the assets market capitalization. All we have to do now is pass that column to the `groupby` argument of `get_clean_factor_and_forward_returns()`

**Run the following cell, and notice the charts at the bottom of the tear sheet showing how our factor performs among different cap types.**

In [None]:
from alphalens.tears import create_returns_tear_sheet

factor_data = get_clean_factor_and_forward_returns(
    factor=pipeline_output['factor_to_analyze'],
    prices=pricing_data,
    groupby=pipeline_output['cap_type'],
)

create_returns_tear_sheet(factor_data=factor_data, by_group=True)

## Writing Group Neutral Strategies

Not only does Alphalens allow us to simulate how our alpha factor would perform in a long/short trading strategy, it also allows us to simulate how it would do if we went long/short on every group! 

Grouping by cap type, and going long/short on each cap type allows you to limit exposure to the overall movement of those market cap groups. For example, you may have noticed in step three of this tutorial, that certain cap types had all positive returns, or all negative returns. That information isn't useful to us, because that just means the market cap group outperformed (or underperformed) the market; it doesn't give us any insight into how our factor performs within that cap type.

Since we grouped our assets by cap type in the previous cell, going group neutral is easy; just make the two following changes:
- Pass `binning_by_group=True` as an argument to `get_clean_factor_and_forward_returns()`.
- Pass `group_neutral=True` as an argument to `create_full_tear_sheet()`.

**The following cell has made the approriate changes. Try running it and notice how the results differ from the previous cell.**

In [None]:
factor_data = get_clean_factor_and_forward_returns(
    pipeline_output['factor_to_analyze'],
    prices=pricing_data,
    groupby=pipeline_output['cap_type'],
    binning_by_group=True,
)

create_returns_tear_sheet(factor_data, by_group=True, group_neutral=True)

That's it! This tutorial got you started with Alphalens, but there's so much more to it. Check out our [API docs](http://quantopian.github.io/alphalens/) to see the rest!