## 优化投资组合

### 1. 什么是投资组合？
投资组合是由不同投资产品（包括股票、债券等）组成的集合，目的是为了分散风险。  
我们给不同标的不同的权重 $w_i$, $i=1, \ldots, n$

### 2. 投资组合的收益及风险

把标的的价格变化看做随机变量，以它的期望值/均值来衡量收益，以它的方差来衡量风险。  

对于单支股票：  
- 收益率是一个随机变量
- 期望值（均值）： ${\bf E}(r) = \mu$ 
- 方差：${\bf (r_1-\mu)^2 + (r_2-\mu)^2 + \ldots + (r_n-\mu)^2} = {\bf E{(r-\mu)(r-\mu)^T}} = \Sigma$

两支股票的协方差：  
$ Cov(r_1, r_2) = {\bf E}[(r_1 - \mu_1)(r_2 - \mu_2)] $

对于投资组合：  
- 收益率是一个随机变量
- 期望值（均值）： ${\bf E}(R) = w_1\mu_1 + w_2\mu_2 + \ldots + w_n\mu_n = \mu^T w$  
- 方差：${\bf Var}(R) = w^T\Sigma w$

以两支股票组成的投资组合为例：  
$$ {\bf E}(R) = w_1\mu_1 + w_2\mu_2 $$ 

$$ \begin{equation}
{\bf E}(R)=\left[
\begin{matrix}
\mu_1&\mu_2&
\end{matrix}\right]
\left[
\begin{matrix}
w_1&\\
w_2&
\end{matrix}\right]
\end{equation} $$

$${\bf Var}(R) = w_1^2 Cov(r_1, r_1) + w_2^2 Cov(r_2, r_2) + 2w_1 w_2 Cov(r_1, r_2)$$

$$\begin{equation}
{\bf Var}(R)=\left[
\begin{matrix}
w_1&w_2&
\end{matrix}
\right]
\left[
\begin{matrix}
Cov(r_1, r_1)&Cov(r_2, r_1)&\\
Cov(r_1, r_2)&Cov(r_2, r_2)&
\end{matrix}\right]
\left[
\begin{matrix}
w_1&\\
w_2&
\end{matrix}\right]
\end{equation}$$

### 3. 马科维茨投资组合理论
**终极问题**：如何确定w，使得投资组合在收益率一定时，风险最小；也就是使得收益-风险最大化。  
\begin{array}{ll} \mbox{maximize} & \mu^T w - \gamma w^T\Sigma w\\
\mbox{subject to} & {\bf 1}^T w = 1, \quad w \in {\cal W},
\end{array}

In [1]:
import tushare as ts
import pandas as pd
import numpy as np
stock_list = ts.get_hs300s()
return_dict = {}
for symbol in stock_list['code']:
    data = ts.get_hist_data(symbol, start='2017-01-01')
    if data is not None:
        return_dict[symbol] = data['p_change']

In [2]:
return_df = pd.DataFrame(return_dict)
return_df.dropna(how='all', axis=1, inplace=True)
return_df = return_df.replace([np.inf, -np.inf], np.nan)
return_df.fillna(0, inplace=True)
return_df.head()

Unnamed: 0,000001,000002,000008,000060,000063,000069,000100,000157,000166,000333,...,601878,601881,601888,601898,601899,601901,601919,601933,601939,601958
2017-01-03,0.66,0.88,0.21,1.26,1.57,0.29,3.94,0.0,0.0,3.37,...,0.0,0.0,0.02,0.69,0.6,1.58,0.76,2.24,0.92,1.05
2017-01-04,0.0,0.58,0.54,1.15,-0.31,0.29,-0.29,0.88,0.0,2.44,...,0.0,0.0,0.14,0.17,0.3,0.39,0.95,0.0,0.0,0.78
2017-01-05,0.11,0.38,0.21,0.7,-1.67,0.0,0.0,1.53,0.0,-0.47,...,0.0,0.0,0.71,0.34,1.19,-0.52,0.19,-0.4,0.0,0.26
2017-01-06,-0.44,-1.39,0.74,-1.22,-1.57,-0.86,-0.88,-0.65,0.0,-1.01,...,0.0,0.0,-0.39,-0.68,-0.88,-1.3,0.94,1.0,-0.55,-0.13
2017-01-09,0.22,0.1,0.53,0.7,-3.71,0.58,0.89,0.87,0.0,-0.48,...,0.0,0.0,-0.02,0.34,-0.3,0.39,0.93,0.4,0.18,0.64


In [3]:
symbols = return_df.columns.values
mu = return_df.mean().values
mu.shape = (len(mu),1)
Sigma = return_df.cov().values
mu.shape, Sigma.shape

((286, 1), (286, 286))

In [4]:
import cvxpy as cvx
w = cvx.Variable(len(symbols))
gamma = 0.5
ret = mu.T*w 
risk = cvx.quad_form(w, Sigma)
prob = cvx.Problem(cvx.Maximize(ret - gamma*risk), 
               [cvx.sum(w) == 1, 
                w >= 0])
prob.solve()

0.030808687756599062

In [5]:
pd.Series(w.value, index=symbols).head()

000001    3.424403e-11
000002    2.912290e-10
000008    3.864077e-11
000060    1.825174e-11
000063    1.008711e-10
dtype: float64

In [6]:
sum(w.value)

0.9999999999997032

In [7]:
ret.value

array([0.14432115])

In [8]:
risk.value

0.22702491925342172

### 4. 市场中性的投资组合
市场中性的投资组合，是不随市场的波动而波动的投资组合，也就是使得组合与市场之间相关系数为0，即两者协方差为0。  
- ${\bf Cov}(M, R) = m^T\Sigma w$ = 0 (m为市场中各股的权重)

以两支股票组成的投资组合为例：

$${\bf Cov}(M, R) = Cov(m_1 r_1+m_2 r_2, w_1 r_1+w_2 r_2)\\
            = Cov(m_1 r_1, w_1 r_1+w_2 r_2) + Cov(m_2 r_2, w_1 r_1+w_2 r_2)\\
            = Cov(m_1 r_1, w_1 r_1) + Cov(m_1 r_1, w_2 r_2) + Cov(m_2 r_2, w_1 r_1) + Cov(m_2 r_2, w_2 r_2)\\
            = m_1 w_1 Cov(r_1, r_1) + m_1 w_2 Cov(r_1, r_2) + m_2 w_1 Cov(r_2, r_1) + m_2 w_2 Cov(r_2, r_2)$$

$$\begin{equation}
{\bf Cov}(M, R)=\left[
\begin{matrix}
m_1&m_2&
\end{matrix}
\right]
\left[
\begin{matrix}
Cov(r_1, r_1)&Cov(r_2, r_1)&\\
Cov(r_1, r_2)&Cov(r_2, r_2)&
\end{matrix}\right]
\left[
\begin{matrix}
w_1&\\
w_2&
\end{matrix}\right]
\end{equation}$$

In [9]:
stock_list.set_index('code', inplace=True)
market_weight = pd.Series(stock_list['weight'], index=symbols)
m = market_weight.values
m.shape

(286,)

In [10]:
import cvxpy as cvx
w = cvx.Variable(len(symbols))
gamma = 0.5
ret = mu.T*w 
risk = cvx.quad_form(w, Sigma)
cov_MR = m.T*Sigma*w
prob = cvx.Problem(cvx.Maximize(ret - gamma*risk), 
               [cvx.sum(w) <= 1, cov_MR == 0, w >= 0])
prob.solve()

4.3683343916091525e-11

In [11]:
pd.Series(w.value, index=symbols).head()

000001   -5.188573e-14
000002    3.747884e-14
000008    2.164068e-13
000060    6.144296e-14
000063   -1.532397e-14
dtype: float64

In [12]:
ret.value

array([2.89615857e-13])

In [13]:
risk.value

1.0391530516688176e-21

### 5. 多因子模型
#### 5.1 单支股票的多因子模型
单支股票的收益率：  
- $ r_j = x_{j1} f_1 + x_{j2} f_2 + \ldots + x_{jk} f_k + u_j = X_jf+u_j$ （X为因子值，f为因子的权重，u为特质因子值）    

股票的收益由两部分组成，一部分是可以共有因子的收益率$X_jf$, 一部分是各股特有的收益率$u_j$。我们需清晰的切分这两部分收益，因此$X_jf$和$u_j$的相关系数为0。    
    
单支股票的方差：  
- $ {\bf Var}(r_j) = Var(X_jf+u_j) = Var(X_jf) + Var(u_j) + 2Cov(X_jf, u) $ (其中f和u为随机变量)     

因 $ Cov(X_jf, u) = 0$，所以 $ {\bf Var}(r_j) = Var(X_jf) + Var(u_j) $。 也就是说股票的风险也由两部分组成，一部分是共有因子的风险，一部分是特异风险。

#### 5.2 投资组合的多因子模型：    
- 收益率： $ R_p = \sum_{j=1}^N w_j * (\sum_{k=1}^K f_kX_{jk} + u_j) = w^T (Xf + u)$     

$$\begin{equation}
Xf + u =
\left[\begin{matrix}
r_1&\\
r_2&\\
\ldots&\\
r_j&
\end{matrix}\right]_{N*1} = 
\left[\begin{matrix}
X_{11}&X_{12}&\ldots&X_{1k}&\\
X_{21}&X_{22}&\ldots&X_{2k}&\\
\ldots&\ldots&\ldots&\ldots&\\
X_{j1}&X_{j2}&\ldots&X_{jk}&
\end{matrix}\right]_{N*K} *  
\left[\begin{matrix}
f_1&\\
f_2&\\
\ldots&\\
f_k&
\end{matrix}\right]_{K*1} + 
\left[\begin{matrix}
u_1&\\
u_2&\\
\ldots&\\
u_j&
\end{matrix}\right]_{N*1}
\end{equation}$$ 
    
- 方差： ${\bf Var}(R_p) = w^T\Sigma w$, 其中 $\Sigma = XFX^T + \Delta$   

$$\begin{equation}
\Sigma =
\left[\begin{matrix}
Cov(r_1,r_1)&Cov(r_2,r_1)&\ldots&Cov(r_j,r_1)&\\
Cov(r_1,r_2)&Cov(r_2,r_2)&\ldots&Cov(r_j,r_2)&\\
\ldots&\ldots&\ldots&\ldots&\\
Cov(r_1,r_j)&Cov(r_2,r_j)&\ldots&Cov(r_j,r_j)&&
\end{matrix}\right]_{N*N}\\ = 
\left[\begin{matrix}
X_{11}&X_{12}&\ldots&X_{1k}&\\
X_{21}&X_{22}&\ldots&X_{2k}&\\
\ldots&\ldots&\ldots&\ldots&\\
X_{j1}&X_{j2}&\ldots&X_{jk}&
\end{matrix}\right]_{N*K} *  
\left[\begin{matrix}
Cov(f_1,f_1)&Cov(f_2,f_1)&\ldots&Cov(f_k,f_1)&\\
Cov(f_1,f_2)&Cov(f_2,f_2)&\ldots&Cov(f_k,f_2)&\\
\ldots&\ldots&\ldots&\ldots&\\
Cov(f_1,f_k)&Cov(f_2,f_k)&\ldots&Cov(f_k,f_k)&&
\end{matrix}\right]_{K*K} * 
\left[\begin{matrix}
X_{11}&X_{21}&\ldots&X_{j1}&\\
X_{12}&X_{22}&\ldots&X_{j2}&\\
\ldots&\ldots&\ldots&\ldots&\\
X_{1k}&X_{2k}&\ldots&X_{jk}&
\end{matrix}\right]_{K*N} + 
\left[\begin{matrix}
Var(u_1)&0&\ldots&0&\\
0&Var(u_2)&\ldots&0&\\
\ldots&\ldots&\ldots&\ldots&\\
.&0&\ldots&Var(u_j)&
\end{matrix}\right]_{N*N}
\end{equation}$$ 
   
以两支股票组成的投资组合,两个因子的模型为例：    
    
$$\begin{equation}
XFX^T =
\left[\begin{matrix}
X_{11}&X_{12}&\\
X_{21}&X_{22}&
\end{matrix}\right] *  
\left[\begin{matrix}
Cov(f_1,f_1)&Cov(f_2,f_1)&\\
Cov(f_1,f_2)&Cov(f_2,f_2)&
\end{matrix}\right] * 
\left[\begin{matrix}
X_{11}&X_{21}&\\
X_{12}&X_{22}&
\end{matrix}\right]\\
=\left[\begin{matrix}
X_{11}Cov(f_1,f_1)+X_{12}Cov(f_1,f_2),&X_{11}Cov(f_2,f_1)+X_{12}Cov(f_2,f_2)&\\
X_{21}Cov(f_1,f_1)+X_{22}Cov(f_1,f_2),&X_{21}Cov(f_1,f_1)+X_{22}Cov(f_1,f_2)&
\end{matrix}\right] * 
\left[\begin{matrix}
X_{11}&X_{21}&\\
X_{12}&X_{22}&
\end{matrix}\right]
\end{equation}$$ 

$$ \begin{equation}
\begin{matrix}
X_{11}^2Cov(f_1,f_1)+2X_{12}X_{11}Cov(f_1,f_2)+X_{12}^2Cov(f_2,f_2),&X_{11}X_{21}Cov(f_1,f_1)+X_{12}X_{21}Cov(f_1,f_2)+X_{11}X_{22}Cov(f_2,f_1)+X_{12}X_{22}Cov(f_2,f_2)&\\
X_{21}X_{11}Cov(f_1,f_1)+X_{22}X_{11}Cov(f_1,f_2)+X_{21}X_{12}Cov(f_2,f_1)+X_{22}X_{12}Cov(f_2,f_2),&X_{21}^2Cov(f_1,f_1)+2X_{22}X_{21}Cov(f_1,f_2)+X_{22}^2Cov(f_2,f_2)&
\end{matrix}
\end{equation}$$  

分解开来看:  
左上: $ X_{11}^2Cov(f_1,f_1)+2X_{12}X_{11}Cov(f_1,f_2)+X_{12}^2Cov(f_2,f_2) = Var(X_{11}f_1+X_{12}f_2)$    
右下: $ X_{21}^2Cov(f_1,f_1)+2X_{22}X_{21}Cov(f_1,f_2)+X_{22}^2Cov(f_2,f_2) = Var(X_{21}f_1+X_{22}f_2)$    
右上: $ X_{11}X_{21}Cov(f_1,f_1)+X_{12}X_{21}Cov(f_1,f_2)+X_{11}X_{22}Cov(f_2,f_1)+X_{12}X_{22}Cov(f_2,f_2)\\ 
    = Cov(X_{11}f_1,X_{21}f_1)+Cov(X_{21}f_1,X_{12}f_2)+Cov(X_{22}f_2,X_{11}f_1)+Cov(X_{12}f_2,X_{22}f_2)\\
    = Cov(X_{11}f_1 + X_{12}f_2, X_{21}f_1 + X_{22}f_2)\\
    = Cov(r_1, r_2)$    
  
因此:    
$$\begin{equation}
XFX^T =
\left[\begin{matrix}
Var(X_{11}f_1+X_{12}f_2)&Cov(r_1, r_2)&\\
Cov(r_1, r_2)&Var(X_{21}f_1+X_{22}f_2)&
\end{matrix}\right]
\end{equation}$$    
   
$$\begin{equation}
XFX^T + \Delta =
\left[\begin{matrix}
Var(X_{11}f_1+X_{12}f_2)&Cov(r_1, r_2)&\\
Cov(r_1, r_2)&Var(X_{21}f_1+X_{22}f_2)&
\end{matrix}\right] + 
\left[\begin{matrix}
Var(u_1)&0&\\
0&Var(u_2)&
\end{matrix}\right]\\
=\left[\begin{matrix}
Var(X_{11}f_1+X_{12}f_2)+Var(u_1)&Cov(r_1, r_2)&\\
Cov(r_1, r_2)&Var(X_{21}f_1+X_{22}f_2)+Var(u_2)&
\end{matrix}\right]\\
=\left[\begin{matrix}
Var(r_1)&Cov(r_2, r_1)&\\
Cov(r_1, r_2)&Var(r_2)&
\end{matrix}\right]
\end{equation}$$ 

In [14]:
data_dict = {}
basic_data = ts.get_stock_basics()
for symbol in stock_list.index.values:
    data = ts.get_hist_data(symbol, start='2018-05-01')
    if data is not None:
        data['eps'] = basic_data.loc[symbol,'esp']
        data['pe'] = data['close']/data['eps']
        data['shares']=basic_data.loc[symbol,'outstanding']
        data['market_cap'] = data['shares']*data['close']
        data.sort_index(inplace=True)
        data['high_low'] = (data['high'] - data['low'])/data['low']
        data['turnover_mean_5'] = data['turnover'].rolling(window=5).mean()
        data['p_std_5'] = data['p_change'].rolling(window=5).std()
        data['ma5_10'] = (data['ma5'] - data['ma10'])/data['ma10']
        hist_data = data[['p_change','high_low','turnover_mean_5','p_std_5','ma5_10','eps','market_cap']]
        hist_data['p_change'] = hist_data['p_change'].shift(-1)
        data_dict[symbol] = hist_data[4:-1]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


In [15]:
panel = pd.Panel(data_dict)
panel = panel.transpose(1,0,2)
panel.dropna(how='all',inplace=True)
panel.fillna(0,inplace=True)
panel

  result = self.reindex_axis(new_ax, axis=axis)


<class 'pandas.core.panel.Panel'>
Dimensions: 12 (items) x 287 (major_axis) x 7 (minor_axis)
Items axis: 2018-05-08 to 2018-05-23
Major_axis axis: 000001 to 601958
Minor_axis axis: p_change to market_cap

In [39]:
from sklearn.linear_model import LinearRegression
factor_return = {}
special_factor = {}
model = LinearRegression()
for dt in panel.items.values[:-1]:
    daily_data = panel[dt]
    daily_data['large_marketcap'] = daily_data['market_cap'].apply(lambda x:int(x > daily_data['market_cap'].quantile(.7)))
    daily_data['small_marketcap'] = daily_data['market_cap'].apply(lambda x:int(x < daily_data['market_cap'].quantile(.3)))
    y = np.array(daily_data[['p_change']])
    x_data = daily_data.drop(columns=['p_change','market_cap'])
    X = np.array(x_data)
    model.fit(X, y)

    coef = model.coef_
    factor_return[dt] = pd.Series(coef[0], index=x_data.columns)
    
    Xf = np.dot(X, coef.T)
    sf = y - Xf
    special_factor[dt] = pd.Series(sf.T[0], index=x_data.index)

In [40]:
factor_df = pd.DataFrame(factor_return)
factor_df = factor_df.unstack().unstack()
factor_df

Unnamed: 0,high_low,turnover_mean_5,p_std_5,ma5_10,eps,large_marketcap,small_marketcap
2018-05-08,6.629526,-0.181045,-0.027842,8.443576,0.109345,-0.139176,0.055658
2018-05-09,5.304294,0.001881,-0.210496,17.564225,0.254441,-0.173622,-0.292922
2018-05-10,-26.164908,0.055275,0.090967,-8.90178,0.134671,0.049632,0.045309
2018-05-11,16.289434,-0.303637,-0.208229,24.597351,0.360742,0.734446,-0.676845
2018-05-14,21.218283,-0.034909,0.332691,-13.302759,0.001016,-0.60624,0.001485
2018-05-15,-17.147483,-0.185252,0.659326,1.068731,0.060719,-0.147527,0.585008
2018-05-16,8.511407,-0.234912,-0.319123,4.967584,-0.08289,-0.613034,0.243451
2018-05-17,2.362727,0.028116,-0.217224,7.986176,0.051115,0.16062,-0.208139
2018-05-18,-27.71576,0.089581,0.544773,-12.207668,-0.381505,-0.263753,0.028025
2018-05-21,-6.886727,0.271225,0.181292,16.004082,0.183822,-0.267998,0.19759


In [41]:
sp_factor_df = pd.DataFrame(special_factor)
sp_factor_df = sp_factor_df.unstack().unstack()
sp_factor_df

Unnamed: 0,000001,000002,000008,000060,000063,000069,000100,000157,000166,000333,...,601878,601881,601888,601898,601899,601901,601919,601933,601939,601958
2018-05-08,-0.072036,-1.484713,-0.235061,1.101673,-0.055658,0.271474,-1.22479,0.20652,-1.293697,0.042885,...,-0.760005,-1.882689,1.389157,-0.142824,0.013741,-0.818529,-0.547148,-1.464901,0.567608,-0.311556
2018-05-09,1.040126,0.633453,1.232421,0.73308,0.292922,2.390472,-0.244705,0.398284,0.768736,1.041402,...,0.787154,2.352083,2.674051,0.596617,1.490754,0.896051,0.091467,0.975607,0.163857,0.046565
2018-05-10,0.125256,-0.099808,-1.382545,0.154272,-0.045309,-0.763896,-0.576452,-0.585703,-0.043443,-1.065672,...,-1.659382,-1.309359,0.322823,0.231583,0.538879,0.083947,-1.747306,-0.489649,2.482316,0.699032
2018-05-11,0.725688,-0.489592,0.165032,0.201731,0.676845,-2.038016,0.165445,-0.076647,-0.686461,1.368189,...,1.105247,1.053764,0.459093,-1.671341,-1.234602,-0.533858,1.10409,-0.479147,-0.725804,-0.72747
2018-05-14,-0.475686,-1.065735,2.255674,-0.169392,-0.001485,0.085775,-0.768604,1.622801,0.085008,-0.353812,...,0.043173,-1.692649,0.492602,-0.341226,-1.086088,1.107389,0.761573,-0.065435,-0.085187,1.163824
2018-05-15,-2.062849,-1.573116,1.239115,-1.517447,-0.585008,-1.180884,-0.995033,-1.128306,-1.402017,-1.926268,...,-2.259305,-2.73087,-1.728817,0.20103,-2.014222,-1.702804,-0.429831,1.113015,-1.578831,1.309313
2018-05-16,0.23966,-0.86181,0.55361,0.050016,-0.243451,0.252314,-1.021356,-0.109398,-0.032154,-1.285585,...,0.406657,0.645362,-0.625887,-0.902623,0.037356,-0.038589,-0.75015,-1.728328,0.64974,-1.398199
2018-05-17,1.302319,1.342706,6.937477,2.167359,0.208139,1.134158,0.764916,0.893686,0.592248,1.522066,...,1.648567,1.136444,0.870938,3.60901,1.016526,1.327771,4.105065,-1.379945,1.786855,2.231569
2018-05-18,0.015977,-0.243951,0.561759,-0.926982,-0.028025,0.089462,1.00568,0.216773,1.109495,0.406341,...,0.58173,0.224883,1.742976,-0.148184,0.723986,0.108084,8.130407,1.847611,-0.695522,0.260667
2018-05-21,-0.756485,-0.975006,-2.936934,1.784638,-0.19759,-0.485782,-0.359494,-0.333434,-0.176303,-2.544747,...,0.088702,-0.747286,0.180805,-0.903253,-0.865999,-1.186443,-0.712399,2.61109,-2.15054,-1.671896


In [42]:
factor_cov = factor_df.cov()
factor_cov

Unnamed: 0,high_low,turnover_mean_5,p_std_5,ma5_10,eps,large_marketcap,small_marketcap
high_low,291.411367,-1.859578,-3.693289,98.286085,0.734466,-0.072539,-2.055236
turnover_mean_5,-1.859578,0.041578,0.031214,-0.747577,0.007057,-0.012848,0.000506
p_std_5,-3.693289,0.031214,0.124926,-2.788925,-0.013772,-0.029966,0.052145
ma5_10,98.286085,-0.747577,-2.788925,158.150671,1.477952,2.373483,-1.708704
eps,0.734466,0.007057,-0.013772,1.477952,0.049602,0.042803,-0.034707
large_marketcap,-0.072539,-0.012848,-0.029966,2.373483,0.042803,0.136918,-0.080037
small_marketcap,-2.055236,0.000506,0.052145,-1.708704,-0.034707,-0.080037,0.107513


In [43]:
sp_factor_var = sp_factor_df.var()
sp_factor_diag_cov = np.diag(sp_factor_var)
sp_factor_diag_cov

array([[0.97348762, 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 1.0176813 , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 6.36433766, ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 2.00192256, 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 1.8417316 ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        1.95652904]])

In [46]:
last_dt = panel.items.values[-1]
last_data = panel[last_dt]
last_data['large_marketcap'] = last_data['market_cap'].apply(lambda x:int(x > last_data['market_cap'].quantile(.7)))
last_data['small_marketcap'] = last_data['market_cap'].apply(lambda x:int(x < last_data['market_cap'].quantile(.3)))
x_data = last_data.drop(columns=['p_change','market_cap'])
X = np.array(x_data)
X.shape

(287, 7)

In [47]:
Sigma = np.dot(np.dot(X, np.array(factor_cov)), X.T) + sp_factor_diag_cov
Sigma

array([[1.17460817, 0.19254739, 0.21268872, ..., 0.26878575, 0.23896422,
        0.07391803],
       [0.19254739, 1.21594537, 0.22806268, ..., 0.27225924, 0.23152133,
        0.09842828],
       [0.21268872, 0.22806268, 7.35179882, ..., 0.49265818, 0.36529409,
        0.53373418],
       ...,
       [0.26878575, 0.27225924, 0.49265818, ..., 2.43799188, 0.36787341,
        0.21577044],
       [0.23896422, 0.23152133, 0.36529409, ..., 0.36787341, 2.18946417,
        0.13383846],
       [0.07391803, 0.09842828, 0.53373418, ..., 0.21577044, 0.13383846,
        2.28786376]])

In [48]:
factor_mu = np.array(factor_df.mean())
factor_mu

array([-0.14977089, -0.0849185 ,  0.04032359,  5.35645548,  0.03525533,
       -0.13401327,  0.01727299])

In [49]:
sp_factor_mu = np.array(sp_factor_df.mean())
sp_factor_mu.shape

(287,)

In [50]:
mu = np.dot(X,factor_mu)+sp_factor_mu
mu.shape

(287,)

In [51]:
import cvxpy as cvx
w = cvx.Variable(len(X))
gamma = 0.5
ret = mu.T*w 
risk = cvx.quad_form(w, Sigma)
prob = cvx.Problem(cvx.Maximize(ret - gamma*risk), 
               [cvx.sum(w) <= 1, w >= 0])
prob.solve()

0.9054326408883626

In [52]:
pd.Series(w.value, index=panel.major_axis).head()

000001    5.447284e-13
000002    3.779548e-13
000008    6.110023e-03
000060    6.071829e-13
000063    1.282621e-12
dtype: float64

### 因子中性 —— eg. 市值中性

In [55]:
import cvxpy as cvx
w = cvx.Variable(len(X))
gamma = 0.5
ret = mu.T*w 
risk = cvx.quad_form(w, Sigma)

large_weight = np.array(x_data['large_marketcap']).T*w
small_weight = np.array(x_data['small_marketcap']).T*w
prob = cvx.Problem(cvx.Maximize(ret - gamma*risk), 
               [cvx.sum(w) <= 1,
               large_weight == 0, small_weight == 0])
prob.solve()

0.6940106168090259

In [56]:
pd.Series(w.value, index=panel.major_axis).head()

000001    4.163983e-14
000002    2.108290e-14
000008    2.981469e-15
000060    8.212897e-13
000063    1.509737e-13
dtype: float64