# 时间回归

## The Formation Process of Winners and Losers in Momentum Investing
(https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2610571)

> **p. 3**: Intermediate-term (3–12 months) momentum has been documented by Jegadeesh
 nd Titman (1993, 2001, hereafter JT), while short-term (weekly) and long-term (3–5
 ears) reversals have been documented by Lehmann (1990) and Jegadeesh (1990) and
 y DeBondt and Thaler (1985), respectively. Various models and theories have been
 roposed to explain the coexistence of intermediate-term momentum and long-term
 eversal. However, most studies have focused primarily on which stocks are winners
 r losers; they have paid little attention to how those stocks become winners or losers.
 his paper develops a model to analyze whether the movement of historical prices is
 elated to future expected returns.

> **p. 4**: This paper captures the idea that past returns and the formation process of past
 eturns have a joint effect on future expected returns. We argue that how one stock
 ecomes a winner or loser—that is, the movement of historical prices—plays an
 mportant role in momentum investing. Using a polynomial quadratic model to
 pproximate the nonlinear pattern of historical prices, the model shows that as long as
 wo stocks share the same return over the past n-month, the future expected return of
 he stock whose historical prices are convex shaped is not lower than one whose
 istorical prices are concave shaped. In other words, when there are two winner (or
 oser) stocks, the one with convex-shaped historical prices will possess higher future
 xpected returns than the one with concave-shaped historical prices.

> **p. 4**: To test the model empirically, we regress previous daily prices in the ranking
 eriod on an ordinal time variable and the square of the ordinal time variable for each
 tock. The coefficient of the square of the ordinal time variable is denoted as $\gamma$.

## 安装软件包

In [None]:
import sys

In [None]:
!{sys.executable} -m pip install -r requirements.txt

In [None]:
import cvxpy as cvx
import numpy as np
import pandas as pd
import time
import os
import quiz_helper
import matplotlib.pyplot as plt

In [None]:
%matplotlib inline
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (14, 8)

### 数据包

In [None]:
import os
import quiz_helper
from zipline.data import bundles

In [None]:
os.environ['ZIPLINE_ROOT'] = os.path.join(os.getcwd(), '..', '..','data','module_4_quizzes_eod')
ingest_func = bundles.csvdir.csvdir_equities(['daily'], quiz_helper.EOD_BUNDLE_NAME)
bundles.register(quiz_helper.EOD_BUNDLE_NAME, ingest_func)
print('Data Registered')

### 构建管道引擎

In [None]:
from zipline.pipeline import Pipeline
from zipline.pipeline.factors import AverageDollarVolume
from zipline.utils.calendars import get_calendar

universe = AverageDollarVolume(window_length=120).top(500) 
trading_calendar = get_calendar('NYSE') 
bundle_data = bundles.load(quiz_helper.EOD_BUNDLE_NAME)
engine = quiz_helper.build_pipeline_engine(bundle_data, trading_calendar)

### 查看数据

构建管道引擎后，我们获取时段结束时股票池中的股票。我们将使用这些 ticker 生成风险模型的收益率数据。

In [None]:
universe_end_date = pd.Timestamp('2016-01-05', tz='UTC')

universe_tickers = engine\
    .run_pipeline(
        Pipeline(screen=universe),
        universe_end_date,
        universe_end_date)\
    .index.get_level_values(1)\
    .values.tolist()
    
universe_tickers

# 获取收益率数据

In [None]:
from zipline.data.data_portal import DataPortal

data_portal = DataPortal(
    bundle_data.asset_finder,
    trading_calendar=trading_calendar,
    first_trading_day=bundle_data.equity_daily_bar_reader.first_trading_day,
    equity_minute_reader=None,
    equity_daily_reader=bundle_data.equity_daily_bar_reader,
    adjustment_reader=bundle_data.adjustment_reader)

## 获取股价数据的辅助函数

In [None]:
from quiz_helper import get_pricing

## 将股价数据放入 dataframe 中

In [None]:
returns_df = \
    get_pricing(
        data_portal,
        trading_calendar,
        universe_tickers,
        universe_end_date - pd.DateOffset(years=5),
        universe_end_date)\
    .pct_change()[1:].fillna(0) #convert prices into returns

returns_df

## 行业数据辅助函数

我们将为你创建一个对象，它会针对每支股票定义一个行业。行业由整数表示。我们继承了 Classifier 类。[Classifier 文档](https://www.quantopian.com/posts/pipeline-classifiers-are-here)以及 [Classifier 的源代码](https://github.com/quantopian/zipline/blob/master/zipline/pipeline/classifiers/classifier.py)

In [None]:
from zipline.pipeline.classifiers import Classifier
from zipline.utils.numpy_utils import int64_dtype
class Sector(Classifier):
    dtype = int64_dtype
    window_length = 0
    inputs = ()
    missing_value = -1

    def __init__(self):
        self.data = np.load('../../data/project_4_sector/data.npy')

    def _compute(self, arrays, dates, assets, mask):
        return np.where(
            mask,
            self.data[assets],
            self.missing_value,
        )

In [None]:
sector = Sector()

## 我们将使用 2 年的数据计算因子

**注意：**2 年前的日期是休市日期。管道软件包不会处理开始或结束日期是休市的日期。为了解决这个问题，我们再往回推 2 天，这时候市场没有休市。

In [None]:
factor_start_date = universe_end_date - pd.DateOffset(years=2, days=2)
factor_start_date

## 用曲线描述股价随时间推移的轨迹

为了描述股价随时间推移的轨迹，我们将使用每天递增的整数作为自变量，并将股价设为因变量。我们练习下使股价根据时间和时间的平方回归，并使用多项式描述股价随时间推移的轨迹。

$ ClosePrice_i = \beta \times time_i + \gamma \times time_i^2$

首先，我们将使用 `numpy.arange(days)` ，days 可能是一星期 5 天或一年 252 天的数据。我们将用整数表示时间窗口内的天数。 

为了创建二维 numpy 数组，我们将它们组合成一个列表。默认情况下，`numpy.arange` 数组是行向量，我们通过转置将其变成列向量（每个自变量占一列）。

实例化 LinearRegression 对象，然后调用 `.fit(X,y)` 并传入自变量和因变量。 

我们将通过 `.coefficient` 访问根据数据估算的系数。每个自变量对应一个系数。

In [None]:
# we're choosing a window of 5 days as an example
X = np.array([np.arange(5), np.arange(5)**2])
X = X.T
X

In [None]:
#we're making up some numbers to represent the stock price
y = np.array(np.random.random(5)*2)
y

In [None]:
from sklearn.linear_model import LinearRegression

In [None]:
reg = LinearRegression()
reg.fit(X,y);

## 小测验 1
输出 $\beta$ 和 $\gamma$ 的估算值

In [None]:
# TODO Output the estimates for Beta and gamma


## outputs
`outputs` 是在 CustomFactor 类中定义的类变量。我们将 outputs 设为字符串列表，表示 `out` 对象中的成员变量。

* outputs（可迭代的 [str]，可选）– 一个字符串可迭代对象，代表这个因子应该计算和返回的每个输出的名称。如果未向 CustomFactor 构造函数传入此参数，我们将寻找类级相同名称的输出。

>例如，如果我们创建一个继承 CustomFactor 的子类，我们可以定义类级变量 `outputs = ['var1','var2']`，以字符串的形式传入变量名称。

下面是此变量在 `compute` 函数中的使用方式：
>out : np.array[self.dtype, ndim=1]
>   输出形状和 `assets` 一样的数组。 `compute` 应该将其期望的返回值写入 `out`。如果指定了多个输出，`compute`  应该针对 `self.outputs` 中的每个输出名称将其期望的返回值写入  `out.<output_name>`。

如果我们定义 `outputs = ['var1', 'var2']`，那么在 `compute` 函数中，我们将有 `out.var1` 和 `out.var2` numpy 数组。在每个 numpy 数组中，我们要处理的每支股票都对应一个元素（我们从 CustomFactor 继承的代码为我们执行了此操作。）

## numpy.isfinite

Numpy 可以使用 `numpy.isnan()` 检查是否有 `NaN`（非数字）。我们还可以使用 `numpy.isfinite()` 检查数字是否不是 `NaN` 或无穷值。

# 小测验 2：时间回归

我们将构建一个继承 CustomFactor 的类，称为 `RegressionAgainstTime`。它一次对一年的数据进行一次回归。如果股价是 NaN 或无穷值（糟糕的数据，或者无比惊人的公司！），那么不需要数据回归。

**提示：**看看我们是如何处理 β 变量的，你也可以对 γ 变量做出相似的处理。

In [None]:
from zipline.pipeline.data import USEquityPricing
from zipline.pipeline.factors import CustomFactor
class RegressionAgainstTime(CustomFactor):

    #TODO: choose a window length that spans one year's worth of trading days
    window_length = # ...
    
    #TODO: use USEquityPricing's close price
    inputs = # ...
    
    #TODO: set outputs to a list of strings, which are names of the outputs
    #We're calculating regression coefficients for two independent variables, 
    # called beta and gamma
    outputs = # [..., ...]
    
    def compute(self, today, assets, out, dependent):
        
        #TODO: define an independent variable that represents time from the start to end
        # of the window length. E.g. [1,2,3...252]
        t1 = # ...
        
        #TODO: define a second independent variable that represents time ^2
        t2 = # ...
        
        # combine t1 and t2 into a 2D numpy array
        X = # ...

    
        #TODO: the number of stocks is equal to the length of the "out" variable,
        # because the "out" variable has one element for each stock
        n_stocks = # ...
        # loop over each asset

        for i in range(n_stocks):
            # TODO: "dependent" is a 2D numpy array that
            # has one stock series in each column,
            # and days are along the rows.
            # set y equal to all rows for column i of "dependent"
            y = # ...
            
            # TODO: run a regression only if all values of y
            # are finite.
            if # ... :
                # create a LinearRegression object
                regressor = LinearRegression()
                
                # TODO: fit the regressor on X and y
                
                
                # store the beta coefficient
                out.beta[i] = regressor.coef_[0]
                
                #TODO: store the gamma coefficient
                # ...
            else:
                # store beta as not-a-number
                out.beta[i] = np.nan
                
                # TODO: store gammas not-a-number
                # ...



# 小测验 3：创建条件因子

我们可以用 β 和 γ 因子的乘积创建条件因子。

$ joint_{Factor} = \beta_{Factor} \times \gamma_{Factor} $


请参阅 [actor 类的文档](https://www.zipline.io/appendix.html?highlight=customfactor#zipline.pipeline.factors.Factor)：

> Factors can be combined, both with other Factors and with scalar values, via any of the builtin mathematical operators (+, -, *, etc). This makes it easy to write complex expressions that combine multiple Factors. For example, constructing a Factor that computes the average of two other Factors is simply:

In [None]:
f1 = SomeFactor(...)  
f2 = SomeOtherFactor(...)  
average = (f1 + f2) / 2.0  

In [None]:
#Example: we'll call the RegressionAgainstTime constructor,
# pass in the "universe" variable as our mask, 
# and get the "beta" variable from that object.
# Then we'll get the rank based on the beta value.
beta_factor = (
    RegressionAgainstTime(mask=universe).beta.
    rank()
)

# TODO: similar to the beta factor,
# We'll create the gamma factor
gamma_factor = # ...

# TODO: if we multiply the beta factor and gamma factor,
# we can then rank that product to create the conditional factor
conditional_factor = # ...

p = Pipeline(screen=universe)
# Add the beta, gamma and conditional factor to the pipeline
p.add(beta_factor, 'time_beta')
p.add(gamma_factor, 'time_gamma')
p.add(conditional_factor, 'conditional_factor')

## 可视化管道

注意，如果图片太小，请右击图片并在单独的窗口中查看图片。

In [None]:
p.show_graph(format='png')

## 运行管道并查看因子数据

In [None]:
df = engine.run_pipeline(p, factor_start_date, universe_end_date)

In [None]:
df.head()

## 可视化因子收益率

这些收益率是指按照单个 alpha 因子的值设定股票权重后的理论投资组合获得的收益率。

In [None]:
from quiz_helper import make_factor_plot

In [None]:
make_factor_plot(df, data_portal, trading_calendar, factor_start_date, universe_end_date);

## 解答
[解答 notebook](regression_against_time_solution.ipynb)