# The curse of dimensionality

The common theme of these problems is that when the dimensionality increases, the volume of the space increases so fast that the available data become sparse. In order to obtain a reliable result, the amount of data needed often grows exponentially with the dimensionality. Also, organizing and searching data often relies on detecting areas where objects form groups with similar properties; in high dimensional data, however, all objects appear to be sparse and dissimilar in many ways, which prevents common data organization strategies from being efficient.

Aims: To obtain efficient frontier of N securities

One must estimate

N expected Returns

N volatility parameters

[N(N-1)/2] correlations

![image.png](attachment:image.png)

reduce the parameters

1. Increase sample size (increase sample period, increase frequency)

2. Decrease number of parameters (decrease the number of assets N, decrease the number of parameters for a fixed N)

Q: What is the number of parameters required for mean-variance optimization based on the S&P 500 universe, which contains 500 stocks?

In [15]:
no_expr_parameters=500
no_cov_parameters=int(no_expr_parameters*(no_expr_parameters-1)/2)
print(f'The total number of parameters estimates is {no_cov_parameters+no_expr_parameters}')

The total number of parameters estimates is 125250


![image.png](attachment:image.png)

Constant Correlation Model

![image.png](attachment:image.png)

Sigma ij hat (estimator for covariance)= Sigma i hat (estimator for stock i volatility) * Sigma j hat (estimator for stock j volatility) * Rho hat (best estimate for this unique common parameter)


To find the best estimate Rho hat

![image.png](attachment:image.png)

Q: What is the number of parameter estimates required for mean-variance optimization based on the S&P 500 universe, when using the constant correlation covariance matrix estimate?

In [28]:
no_expr_parameters=500
no_vol_parameters=no_expr_parameters
no_cov_parameters=int(1)
print(f'The total number of parameters estimates is {no_cov_parameters+no_vol_parameters+no_expr_parameters} when using Constant correlation covariance matrtix estimate')

The total number of parameters estimates is 1001 when using Constant correlation covariance matrtix estimate


# Estimating the Covariance Matrix with a Factor Model


![image.png](attachment:image.png)

![image.png](attachment:image.png)

Q: How many parameters do you need to estimate when using a 2-factor models for estimating the covariance matrix of a universe of 500 stocks?

1502, <MARK>500 estimates of betas of stocks with respect to factor 1</MARK>, and <MARK>500 estimates of betas of stocks with respect to factor 2</MARK>, but also need <MARK>500 volatility estimates for individual stock returns</MARK>, plus <MARK>2 volatility estimates for factors 1 and 2.</MARK>

![image.png](attachment:image.png)

Summary:

![image-2.png](attachment:image-2.png)

# Honey I Shrunk the Covariance Matrix!

![image.png](attachment:image.png)

<mark>Model risk</mark> is considered a subset of operational risk, as model risk mostly affects the firm that creates and uses the model. Traders or other investors who use a given model may not completely understand its assumptions and limitations, which limits the usefulness and application of the model itself.

In financial companies, model risk can affect the outcome of financial securities valuations, but it's also a factor in other industries. A model can incorrectly predict the probability of an airline passenger being a terrorist or the probability or a fraudulent credit card transaction. This can be due to incorrect assumptions, programming or technical errors, and other factors that increase the risk of a poor outcome.


<mark>Sample risk</mark> is the possibility that the items selected in a sample are not truly representative of the population being tested.

The sample-based estimates for covariance parameters has lot of sample risk, too many parameters to estimate, but there is no model risk


![image.png](attachment:image.png)

Imposing constraints on weights is equivalent to perform statistical shrinkage

![image.png](attachment:image.png)

Q: Consider two stocks with sample volatility estimates at 20% and 30%, respectively, and sample correlation at .75. Further assume that the average of the sample correlation estimates of all stocks in the universe is .5. What is for these two stocks the sample-based covariance estimate, the constant correlation covariance estimate and the covariance estimate based on statistical shrinkage with a shrinkage factor of 50%?


The sample-based estimate is <mark>20%x30%x0.75=0.045</mark>. The constant correlation estimate is <mark>20%x30%x0.5=0.03</mark>. The shrinkage estimate is <mark>(0.045+0.03)/2=0.0375</mark>.  

![image.png](attachment:image.png)