# Barra USE4/CNE5 主要形式
$$ r_n = f_c + \sum_{i=1}{X_n^{I_i} f_{I_i}} + \sum_{p=1}{X_n^{S_p} f_{S_p}} + u_n \\ \text{s.t.} \sum_{i=1}{s_{I_i} f_{I_i}} = 0$$  
其中，$r_n$为股票$n$收益率，$f_c$为国家因子收益率，$f_{I_i}$为行业$I_i$因子收益率，$f_{S_p}$为风格因子$S_p$收益率，$X_n^{I_i}$和$X_n^{S_p}$为股票$n$的因子暴露，$u_n$为股票$n$的特质收益率，$s_{I_i}$为所有属于行业$I_i$的股票的按流通市值计算出的权重之和。

对于给定某一期截面数据（记为$T$期），在截面回归时，Barra 采用期初的因子暴露取值（等价于$T-1$期期末的因子暴露取值）和股票在$T$期内的收益率进行截面回归。在USE4 模型中，因子收益率是日频的，因此截面回归也应该是日频的，所以按照上述说明，在$T-1$日结束后更新因子的暴露，并利用$T$日的股票收益率和因子暴露做截面回归。

## 国家因子
- 国家纯因子投资组合的实质是按 __流通市值__ 为权重的市场组合。

## 行业因子
- 国家纯因子投资组合对行业的暴露不为0
- 国家因子的因子暴露和$I$个行业的因子暴露之间存在共线性，造成上式的解不唯一。为此，对行业因子的因子收益率作如下限制：  
    $$ \sum_{i=1}{s_{I_i} f_{I_i}} = 0 $$  
    $s_{I_i}$为所有属于行业$I_i$的股票的按流通市值计算出的权重之和.
- 行业纯因子投资组合是dollar-neutral，行业纯因子投资组合的本质是100%做多该行业，并100%做空国家纯因子组合
- 行业纯因子投资组合对所有风格因子的暴露为 0

## 风格因子
风格因子和每一个Descriptor需要进行市值规模加权的标准化，即  
$$ D_{ik} = \frac{D_{ik}^{Raw} - \mu_k}{\sigma_k}$$  
其中$\mu_k$和$\sigma_k$为市值规模加权的均值和标准差。标准化后保证了投资于该因子的投资组合仅仅在这个因子上有 1 个单位的暴露，而在其他所有因子上均没有任何暴露，即对任何风格因子$S_p$满足  
$$ \left\{ \begin{array}{l} \sum_{n=1}^N{w_{S_p n}X_n^k} = 1, \ s_p=k \\ \sum_{n=1}^N{w_{S_p n}X_n^k} = 0, \ s_p \neq k \end{array} \right.$$  
国家纯因子投资组合对所有的风格因子都应该是中性的，因此国家纯因子投资组合在所有风格因子上的暴露必须是0，这意味着**使用流通市值权重来标准化**后的风格因子$X_n^s$暴露必须满足：    
$$\sum_{n=1}^N{s_n X_n^{S_p}} = 0, \ p = 1,\dots,P $$

# 细节

## 多重共线性(Multi-collinearity)
- 后果：
    1. 参数估计结果不具有稳健性
    2. 尽管模型$R^2$很高，但因子的显著性较低
- 相关性度量：方差膨胀系数VIF(_Variance Inflation Factor_)，通过该因子对其他因子回归计算$R^2$得到。  
    $$ X_{ik} = \sum_{k' \neq k}{X_{ik'}b_{k'}} + \varepsilon_{ik} \\ VIF_k = \frac{1}{1-R_k^2}$$  
    when  
    - $ 0 < VIF < 10 $: 不存在多重共线性
    - $ 10 \le VIF < 100$: 存在较强的多重共线性
    - $ VIF\ge 100$: 存在严重多重共线性
- 措施：正交化(orthogonalization)

## 因子有效性检验

### 因子收益显著性
$$ H_0: f_i = 0, \\ H_1: f_i \neq 0 \\ |t| = \left\vert \frac{f_i-0}{se(f_i)} \right\vert \sim t_{N-K-1} $$  
其中股票个数$N$远大于因子个数$K$，当$|t|>2$时拒绝$H_0$，为检验因子有效性的持续能力，一般可以计算历史上发生$|t|>2$的次数占比。
### IC检验
为了简化估计步骤，定义 $IC=\text{corr}(X,R)$，X表示的是当期因子值，R表示下一期的对应的收益率。一般情况下，IC的计算是日频的，利用市场上三千多只股票的因子暴露与第二天该股票的收益率进行计算。  
由于因子值和因子收益率这两个序列并不服从一定的分布函数（如正态分布、T分布等），所以使用平时计算相关性的公式所以这里采取了非参数的方法，所以这里计算因子的IC值，采取的spearman秩和相关性。一般来说，**IC值大于0.03**，就认为这个因子的有效性比较高了。
### 因子自稳定系数
$$ \rho_{kt} = \frac{\sum_n{v_n^t (X_{nk}^t - \bar{X}_k^t)(X_{nk}^{t+1} - \bar{X}_k^{t+1})}}{\sqrt{\sum_n{v_n^t (X_{nk}^t - \bar{X_k^t})^2}}\sqrt{\sum_n{v_n^t (X_{nk}^{t+1} - \bar{X}_k^{t+1})^2}}} $$  
其中$v_n^t$表示股票$n$在第$t$天的WLS回归所采用的加权权重，即第一部分说的市值或者是残差。通常情况下，**因子自稳定系数大于0.9**算是比较不错的，小于0.8则认为因子并不具有一定的稳定性

## 因子标准化
$$ X_ik = \frac{X_{ik}^{Raw} - \mu_k}{\sigma_k}$$  
其中$\mu_k$和$\sigma_k$为市值规模加权的均值和标准差


## 因子收益率的协方差矩阵
### 估计
日频因子收益率，通过EWMA(指数加权移动平均)计算日频因子收益率协方差$F$  
$$ F = \text{cov}(f_k, f_{k'})_t = \frac{\sum_{s=t-h}^t{0.5^{\frac{t-s}{l}}(f_{ks}-\bar{f_k})(f_{k's}-\bar{f_{k'}})}}{\sum_{s=t-h}^t{0.5^{\frac{t-s}{l}}}}$$  
### Eigenfactor(特征值) Risk Adjustment
> Menchero, J., J. Wang, and D. J. Orr (2011). Eigen-adjusted Covariance Matrices. MSCI Research Insight.

- Shepard(2009) derived an analytic result for the magnitude of the bias. Under assumptions of normality and stationarity (and massive asset $K$), he found with $K$ factors and $T$ effective number of observations, the true volatility of the optimized portfolio $\sigma_{true}$  
$$\sigma_{true} \approx \frac{\sigma_{pred}}{1-(K/T)} $$  

### Volatility Regime Adjustment
If cross-sectional observations show that the model is consistently overforecasting or underforecasting risk over a recent period, then the volatilities of all factors can be collectively adjusted to remove this bias.

Let $f_{kt}$ be the return to factor $k$ on day $t$, and let $\sigma_{kt}$ be the one-day volatility forecast for the factor at the start of the day. The standardized return of the factor is given by the ratio ($f_{kt}/\sigma_{kt}$). If the risk forecasts are accurate, ratio should close to 1.

1. Define an instantaneous measure of factor risk bias called the _factor cross-sectional bias statistic_ $B_t^F$ on day $t$ as  
$$ B_t^F = \sqrt{\frac{1}{K} \sum_k{(\frac{f_{kt}}{\sigma_{kt}})^2}} $$  
where
    - $f_{kt}$: the return to factor $k$ on day $t$
    - $\sigma_{kt}$: the one-day volatility forecast for the factor at the start of the day
    - $K$: total number of factors  

  with the assumption of normal distribution, $B_t^F \in [1-\sqrt{2/T}, 1+\sqrt{2/T}]$
2.  Define the _factor volatility multiplier_ $\lambda_F$ as an exponentially weighted average  
$$ \lambda_F = \sqrt{\sum_t{(B_t^F)^2 w_t}} $$  
where $w_t$ is an exponential weight with Volatility Regime Adjustment half-life $\tau_{VRA}^F$
3. Then, the _Volatility Regime Adjustment_ forecast are given by $$ \tilde{\sigma_k} = \lambda_F \sigma_k $$, but has no effect on factor correlations.

### Newey-West 调整
> https://zhuanlan.zhihu.com/p/38506157

假设单期因子收益率 $F_t$ 满足 $\operatorname{MA}(q)$，则协方差矩阵 $V_f$ 的相合估计（Newey-West 调整）为：  
\begin{eqnarray}
&&V_f = \Gamma_0 + \sum_{i=1}^q{w_i(\Gamma_i + \Gamma_i')} \\
where \ &&\Gamma_0 = \frac{1}{T} \sum_{t=1}^T{F_t F_t'} \\
&& \Gamma_i = \frac{1}{T} \sum_{t=1}^{T-i}{F_t F_{t+i}'} \\
&& w_i = 1-\frac{i}{1+q}
\end{eqnarray}  
Newey and West (1987) 证明了上面这个协方差矩阵是一个相合估计，而且它是半正定的。

## Orthogonality
> https://zhuanlan.zhihu.com/p/41993542

该算法的核心是通过连续的正交化计算把一组非两两正交的向量 $\mathbf{X_i}$ 转换成一组两两正交的向量 $\mathbf{z_i}$ ，并以此方便的求出最后一个被正交化的解释变量的多元回归系数。

## Outlier - MAD
For a univariate data set $X_1, X_2, \dots, X_n$, the [median absolute deviation (MAD)](https://en.wikipedia.org/wiki/Median_absolute_deviation) is defined as the median of the absolute deviations from the data's median.  
$$ \operatorname{MAD} = \operatorname{median}(|X_i - \operatorname{median}(X)|) $$

The MAD may be used similarly to how one would use the deviation for the average. In order to use the MAD as a consistent estimator for the estimation of the standard deviation $\sigma$ , one takes  
 $${\hat {\sigma }}= k \cdot \operatorname{MAD}$$  
where $k$ is a constant scale factor, which depends on the distribution, for normally distributed data $k$ is taken to be:  
$$ k=1 / \left(\Phi^{-1}(0.75)\right)\approx 1.4826 $$

## Exponentially Weighted with half-life
> [barra风险模型因子计算中的半衰期？](https://www.zhihu.com/question/49694102/answer/138080097)

EWMA half life 63 trading days $\delta_{Beta}=0.5^{\frac{1}{63}}$, then weight matrix $W$ is  
$$ W = \left( \begin{array}{cccc} \delta_{Beta}^1 & 0 & \dots & 0 \\ 0 & \delta_{Beta}^2 & \dots & 0 \\ \dots & \dots & \dots & \dots \\ 0 & 0 & \dots & \delta_{Beta}^{252} \end{array} \right)$$  


## Weighted Least Square Regression
For regression question as  
$$ \mathbf{y = Xb + e},  $$  
weighted least square regression means  
$$ \arg\min_{\mathbf{b}} \mathbf{W(y - Xb)}^2  $$  
so 
$$ \mathbf{b} = \mathbf{(X^T W X)^{-1} X^T W y} \\ \mathbf{e = y - Xb}$$  

# CNE5 Descriptors

## Beta
> $1.00 \cdot \operatorname{BETA}$

### BETA 
Computed as the slope coefficient in a time-series regression of excess stock return $r_t-r_{ft}$ against the cap_weighted excess return of the estimation universe $R_t$  
$$ r_t - r_{ft} = \alpha + \beta R_t + e_t $$  
The regression coefficients are estimated over the trailing 252 trading days of returns with a half-life of 63 trading days.

## Momentum
> $1.00 \cdot \operatorname{RSTR}$

### RSTR
_Relative strength_, computed as the sum of excess log returns over the trading days $T=504$ with a lag $l=21$ of trading days,  
$$ RSTR = \sum_{t=L}^{T+L}{w_t [\ln(1+r_t) - \ln(1+r_{ft})]} $$  
where $r_t$ is the stock return on day $t$, $r_{ft}$ is the risk-free return, and $w_t$ is an exponential weight with a half-life of 126 trading days.

## Size
> $1.00 \cdot \operatorname{LNCAP}$

### LNCAP
_Natural Log of market cap_, computed by the logarithm of the __total__ market capitalization of the firm.

## Non-linear Size
> $1.00 \cdot \operatorname{NLSIZE}$

### NLSIZE
_Cube of Size_, orthogonalized to the size factor on a regression-weight basis with winsorized and standardized.

## Earning Yield(盈利预期因子)
> $0.68 \cdot \operatorname{EPFWD} + 0.21 \cdot \operatorname{CETOP} + 0.11 \cdot \operatorname{ETOP}$

### EPIBS
_Analyst Predicted Earnings-to-Price_

### ETOP
_Trailing earnings-to-price ratio_(盈利市值比, pe_ttm的倒数), computed by dividing the trailing 12-month earning by the current market capitalization.
ep_ttm = 总市值 / 归属母公司净利润ttm

### CETOP
_Cash earnings-to-price ratio_(净现金流量市值比), computed by dividing the trailing 12-month cash earning divided by current price.

## Residual Volatility
> $0.74 \cdot \operatorname{DASTD} + 0.16 \cdot \operatorname{CMRA} + 0.10 \cdot \operatorname{HSIGMA}$
  
**Note**: The Residual Volatility factor is orthogonalized to Beta (and Size?) to reduce collinearity.

### DASTD
_Daily standard deviation_, computed as the volatility of daily excess returns over the past 252 trading days with a half-life of 42 trading days.  
$$\operatorname{DASTD}=\sum_{t=1}^{n}w_{t}({r_{et}-\bar{r_{e}})^2}$$  
其中股票日超额收益率为$r_{et}=r_{t}-r_{ft}$

### CMRA
_Cumulative range_, differentiates stocks that have experienced wide swings over the last 12 months from those that have traded within a narrow range.

Let $Z(T)$ be the cumulative excess log return over the past $21 \cdot T$ trading days,  
$$ Z(T) = \sum_{\tau=1}^T{[\ln(1+r_{\tau}) - \ln(1+r_{ft})]} $$  
where $r_{\tau}$ is the stock return for $21\tau$ trading days (think as $\tau$ months), and $r_{ft}$ is the risk free rate. The cumulative range is given by  
$$ \operatorname{CMRA} = \max\{Z(T)\} - \min\{Z(T)\}, \quad T=1,\dots,12 $$

### HSIGMA
_Historical sigma( $\sigma$ )_, computed as the volatility of residual returns in $\operatorname{BETA}$  
$$ \sigma = \operatorname{std}(e_t) $$
> TODO: need to calculate volatility with capt weight?

## Growth
> $0.47 \cdot \operatorname{SGRO} + 0.24 \cdot \operatorname{EGRO} + 0.18 \cdot \operatorname{EGIBS} + 0.11 \cdot \operatorname{EGIBS_s}$

### SGRO
_Sales Growth (trailing 5 years)_, the slope coefficient of annual reported sales per share regressed against time over the past five fiscal years.

### EGRO
_Earning growth(trailing 5 years)_, the slope coefficient of annual reported earnings per share regressed against time over the past five fiscal years.

### EGIBS / EGIBS_s
Long-term/Short-term Predicted Earnings GrowthLong-term earnings growth forecasted by analysts.

## Book-to-Price
> $1.00 \cdot \operatorname{BTOP}$

### BTOP
Last reported book value of common equity divided by current market capitalization.

## Leverage
> $0.38 \cdot \operatorname{MLEV} + 0.35 \cdot \operatorname{DTOA} + 0.27 \cdot \operatorname{BLEV}$

**Note**: 关于对数市值做正交化以消除共线性

### MLEV
_Market leverage_(市场杠杆), computed as  
$$ \operatorname{MLEV} = \frac{\operatorname{ME}+\operatorname{PE}+\operatorname{LD}}{\operatorname{ME}} $$  
- $\operatorname{ME}$ is the market value of common equity on the last trading day(普通股市值)
- $\operatorname{PE}$ is the most recent book value of preferred equity(优先股账面价值)
- $\operatorname{LD}$ is the most recent book value of long-term debt(长期负债账面价值).

### DTOA
_Debt to assets_(资产负债比), computed as  
$$ \operatorname{DTOA} = \frac{\operatorname{TD}}{\operatorname{TA}} $$  
- $\operatorname{TD}$ is the book value of total debt(long-term debt and current liabilities, 总负债账面价值)
- $\operatorname{TA}$ is the most recent book value of total assets(总资产账面价值).

### BLEV
_Book leverage_(账面杠杆), computed as  
$$ \operatorname{BLEV} = \frac{\operatorname{BE}+\operatorname{PE}+\operatorname{LD}}{\operatorname{BE}} $$  
- $\operatorname{BE}$ is the most recent book value of common equity(普通股账面价值)

## Liquidity
> $0.35 \cdot \operatorname{STOM} + 0.35 \cdot \operatorname{STOQ} + 0.30 \cdot \operatorname{STOA}$

### STOM
_One Month Share turnover_, computed as the log of the sum of daily turnover during the previous 21 trading days,   
$$ \operatorname{STOM} = \ln(\sum_{t=1}^{21}{\frac{V_t}{S_t}}) $$  
where $V_t$ is the trading volume on day $t$ and $S_t$ is the number of shares outstanding.

### STOQ
_Average share turnover, trailing 3 months_, computed as  
$$ \operatorname{STOQ} = \ln \left( \frac{1}{T} \sum_{\tau=1}^{T}{\exp(\operatorname{STOM}_{\tau})} \right) $$  
where $T=3$

### STOA
_Average share turnover, trailing 12 months_, as $T=12$ in equation above.
# 
https://allocatesmartly.com/list-of-strategies/