# Adjusted Factor-Based Performance Attribution

[Link to article here](http://bfjlaward.com/pdf/26059/67-78_Stubbs_colour_JPM_0517.pdf)

Suppose an algorithm is trading, generating a daily profit or loss (PnL). *How much of the PnL came from where, and what can we do to mitigate this risk?*

This question is the motivation of performance attribution.

Just as an algorithm can use a factor model to make its trading decisions, so too can a factor model be used to analyze an algorithm's trading decisions. In a sense, performance attribution can be thought of as solving the inverse problem of designing an algorithm.

First, the authors explain why the factors in a risk model matter. Consider the strategy:

*maximize:* exposure to a growth factor

*subject to:*
- long only
- fully invested
- active risk constraint of ±3% (overall strategy)
- sector bounds of ±4%
- asset bounds of ±3%

We analyze the returns using two risk models:

- RM1 has 10 sector factors and 4 style factors (market sensititvity, momentum, size and value)
- RM2 is the same as RM1, but **with** the growth factor

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg .tg-baqh{text-align:center;vertical-align:top}
.tg .tg-lqy6{text-align:right;vertical-align:top}
.tg .tg-yw4l{vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-baqh">Risk Model</th>
    <th class="tg-lqy6">RM1</th>
    <th class="tg-lqy6">RM2</th>
  </tr>
  <tr>
    <td class="tg-yw4l">Returns</td>
    <td class="tg-baqh" colspan="2">1.47%</td>
  </tr>
  <tr>
    <td class="tg-yw4l">Factor Contribution (FC)</td>
    <td class="tg-lqy6">-0.18%</td>
    <td class="tg-lqy6">2.35%</td>
  </tr>
  <tr>
    <td class="tg-yw4l">Specific Contribution (SC)</td>
    <td class="tg-lqy6">1.65%</td>
    <td class="tg-lqy6">-0.88%</td>
  </tr>
  <tr>
    <td class="tg-yw4l">FC-SC Correlation</td>
    <td class="tg-lqy6">-0.09</td>
    <td class="tg-lqy6">-0.32</td>
  </tr>
</table>

As expected, RM2 attributes much more of the returns to the factors. Even worse, RM2 has a significant correlation between the daily factor contribution and the daily specific contribution.

## Why is correlation between factor and specific contribution bad?

Consider the following portfolio returns:

$$ r = f + (-0.5f) $$

where:

- $r$ are the portfolio's returns
- $f$ is the factor contribution
- $-0.5f$ is the specific contribution

In this case, we have an FC-SC correlation of 1: this means that some (in this case, all) of the specific contribution can be explained by the factor $f$. This is undesireable: we want the specific contribution to be completely idiosyncratic to $f$.

One assumption of linear regression is that $E(u_i | H) = 0$: i.e. the expected value of each position in the unexplained portfolio is 0. Violation of this assumption leads to biased estimates of $\lambda$. Now, if the unexplained portfolio covaries with the factor-mimicking portfolios, $cov(u, H) \neq 0 \implies E(u_i | H) \neq 0$

<div class="alert alert-success">
**TLDR:** reducing the correlation between factor contributions and specific contributions drives the specific contribution down, thus leading to more accurate inferences from the performance attribution.
</div>

## Mathematics of Factor Attribution

There are 2 ways to think of factor attribution.

### Way 1 (less important):

$$ r = Xf + \epsilon $$

where:

- $n$ is the number of assets
- $k$ is the number of factors
- $X$ is an $n \times k$ factor exposure matrix
- $f$ is an $n \times 1$ vector of factor returns
- $\epsilon$ is an $n \times 1$ vector of stock-specific residual returns

In a cross-sectional returns model, $X$ is given, and $f$ is estimated using WLS regression.

If this is the case, then it can be shown that

$$ f = H^t r$$

where:

- $H = WX(X^tWX)^{-1}$ is an $n \times k$ matrix whose columns are pure factor-mimicking portfolios

Knowing $f$ and our portfolio $h$ (an $n \times 1$ vector of our holdings), we thus have our PnL attribution:

$$ h^t r = h^t X f + h^t \epsilon $$

### Way 2 (more important):

$$ h = \tilde{H}\lambda + u$$

where:

- $\tilde{H}$ is now **constructed by us** (I used a tilde to reflect that)
- $\lambda$ is a $k \times 1$ vector of the portfolio's factor exposures
- $u$ is a $k \times 1$ vector of factor-specific residual exposures

> "The advantage of this second way of thinking about attribution is that we can see that exposures are not exact:
> They are least-squares estimates of a linear regression. And as with all regressions, the estimates contain
> errors and may be biased if all underlying model assumptions are not satisfied."

Clearly, if $\lambda = X^th$, this way is no different from way 1.

$$ h = \tilde{H}\lambda + u $$

$$ \implies h = \tilde{H} X^t h + u $$

$$ \implies h^t = h^t X \tilde{H}^t  + u^t $$

$$ \implies h^t r = h^t X \tilde{H}^t r  + u^t r $$

$$ \implies h^t r = h^t X f  + u^t r $$

#### So, when does $\lambda = X^th$?

Basically, never in real life.

The authors outline some instances in which it is: if you cleverly construct $\tilde{H}$'s factor-mimicking portfolios using weights that cancel out some bad stuff that we did before (I don't really get this bit).

## So, how do we make $cov(u, H) = 0$?

Again, there are two ways.

We consider the residual portfolio $u$ as a linear combination of the factor-mimicking portfolios in $H$. Let $H = [H_1 \: H_2 \: ... \: H_k]$.

### Way 1: Absolute adjustment

First, estimate the $\beta$s using a time-series regression

- Instead of using the first equation and running a cross-sectional, multivariate regression, use the third equation to run a time-series regression.
    
- A cross-sectional regression won't work because it introduces some of the aforementioned biases. Further, a time series regression has the benefit of being a single regression through time, as opposed to modifying the factor exposures differently in each period.


$$ u = \sum_{j}{\beta_j \tilde{H_j}} + \tilde{u} $$

$$\implies r^t u = \sum_{j}{\beta_j \: r^t \tilde{H_j}} + r^t \tilde{u} $$

$$\implies r^t u = \sum_{j}{\beta_j \: \tilde{f_j}} + r^t \tilde{u} $$

Then,

$$ h = \tilde{H}\lambda + u$$

$$ \implies r^t h = r^t \tilde{H} \lambda + r^t u $$

$$ \implies r^t h = \sum_{j}{r^t \tilde{H_j} X_j^t h}  + \sum_{j}{\beta_j \: r^t H_j} + r^t \tilde{u} $$

$$ \implies r^t h = \sum_{j}{f_j (X_j^t h + \beta_j)} + r^t \tilde{u}$$

### Way 2: Relative adjustment

The authors find that in practice, exposures are typically off by a relative amount, instead of an absolute amount. Therefore, they propose an alternative to the above equation:

$$ r^t h = \sum_{j}{f_j X_j^t h (1 + \beta_j)} + r^t \tilde{u}$$

where the $\beta$s are estimated using the following equation:

$$ r_t^t u_t = \sum_{j}{f_{tj} X_{tj}^t h_t \beta_j} + r_t^t \tilde{u_t} $$

> A relative adjustment can also be more appropriate if factor exposures are changing through time. For these reasons, we prefer the relative adjustment to the absolute adjustment and use it in all computational results.

<div class="alert alert-warning">
Beware of overfitting! The problem as stated is that "some of the specific contribution is explained by the factor". Be careful that we do not explain the **noise** in the specific contribution with the factor!
</div>

Relative adjustments help to overcome this problem...

> Because we are making relative adjustments to the exposures, the adjustment procedure will not suddenly allow a factor to explain a large portion of returns when the unadjusted factor exposure is near zero. If the exposure was near zero prior to adjustment, it will remain near zero after the adjustment. In this sense, the proposed adjusted attribution methodology behaves like a Bayesian method with the standard exposures as the prior.

But more importantly, a robust method is needed to estimate the $\beta$s. The authors propose the following scheme:

> We use a heuristic variable selection scheme to select the independent variables (factor contributions) of Equation 10 based on their statistical significance, as measured by their $p$-values. We use an iterative regression scheme that starts with all variables present. After each iteration, we remove the variable with the greatest p-value if it is greater than the specified tolerance 0.02. If none of the $p$-values exceed the tolerance, we stop the iterative procedure of removing factors. Thereafter, we employ a reentry procedure in which we consider reentering rejected variables into the regression one at a time. A variable can reenter the regression only if its entry does not increase the $p$-value of any variable (including itself) above the tolerance. After the reentry trials, we run a final regression with the selected variables to compute the final estimate of $\beta$.

This sounds to me like it is very susceptible to overfitting.

Last remark:

> In our experience, the classical bias/variance trade-off seems to exist in standard attribution results in which variance is the volatility of the unexplained portfolio, and bias is the over- or underestimation of factor contributions. 