# Machine Learning Homework 2
Calculating VaR, and using regression and Lasso regression to find their impact on VaR  
2018/10/02 Kyle Lee  
2018/10/04 found some errors in calculating returns, changed to pct_change for quicker results  
2018/10/05 added a for loop to find said alpha, also did rounded results(thanks to Harnish Patel for reminding me)  

## Import needed packages and dataset

In [80]:
#import all packages
import numpy as np
import pandas as pd
import os

In [81]:
#import pnl price data
pnl = pd.read_csv('pnl_prices_hw3.csv')

## Clean dataset

In [82]:
#Calculate returns for dataset
pnl['returns']=pnl['xyz'].pct_change()
pnl.index=pnl.Date
pnl=pnl.dropna()
pnl=pnl.drop(['Date','xyz'],axis=1)
pnl=pnl*1000000

In [83]:
#least 1% quantile (99%)
print("VaR = ", pnl.returns.quantile(0.01))

VaR =  -39496.54854882854


In [84]:
#least 1% quantile (99%) rounded
print("VaR rounded = ", pnl.returns.quantile(0.01,'lower'))

VaR rounded =  -42204.56802383321


## Import hedge dataset and clean it

In [85]:
#import dataset
hedge=pd.read_csv('hedge_prices_hw3.csv')

In [86]:
#Data cleaning
hedge.index=hedge.Date
hedge=hedge.drop('Date',axis=1)
hedge=hedge.pct_change()
hedge=hedge.dropna()

In [87]:
#Check if shape is correct
hedge.shape

(780, 100)

## Multilinear Regression

In [88]:
#import package for linearRegression and Lasso
from sklearn import linear_model

In [89]:
#Setting the linear regression model
model=linear_model.LinearRegression()

In [90]:
#Setting X,y for regressions
X=hedge
y=pnl

In [91]:
#fit data to model
model.fit(X,y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [92]:
#Sum up the coefficients
print("sum of the coefficients in regression = ",np.sum(abs(model.coef_)))


sum of the coefficients in regression =  4520430.84688


In [93]:
#Subtract hedge from portfolio
pnl_regress=pd.DataFrame()
pnl_regress['returns']=pnl.returns
for i in range(len(pnl)):
    pnl_regress.returns[i]=pnl.returns[i]-np.dot(model.coef_,hedge.iloc[i])
    

In [94]:
#Calculating VaR
print("VaR for regression = ",pnl_regress.returns.quantile(0.01))

VaR for regression =  -28365.51569344735


In [95]:
#Calculating VaR rounded
print("VaR for regression rounded = ",pnl_regress.returns.quantile(0.01, 'lower'))

VaR for regression rounded =  -28795.866717291967


## Lasso Regression

In [96]:
#Do a alpha=1 Lasso and see how many coefficients are not 0
clf = linear_model.Lasso(alpha=1)
clf.fit(X,y)
num=clf.coef_.shape[0]-sum(np.isclose(clf.coef_,0))
print("number of coefficients that are not 0 = ",num)

number of coefficients that are not 0 =  81


In [111]:
#Doing LASSO looking for only 5 variables
clf = linear_model.Lasso(alpha=89)
clf.fit(X,y)

Lasso(alpha=89, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

In [112]:
#checking the numbers that are still standing
sum(clf.coef_!=0)

5

In [113]:
#Another way to find alpha for 5 coefficients without trail and error by hand
for i in range(120):
    clf=linear_model.Lasso(alpha=i)
    clf.fit(X,y)
    if sum(clf.coef_!=0)==5:
        break
print(i)

  after removing the cwd from sys.path.
  positive)


89


In [114]:
#take a peek to make sure
clf.coef_

array([     0.        ,      0.        ,      0.        ,      0.        ,
            0.        ,      0.        ,      0.        ,      0.        ,
         1190.28865972,      0.        ,      0.        ,      0.        ,
            0.        ,      0.        ,      0.        ,      0.        ,
            0.        ,      0.        ,      0.        ,      0.        ,
            0.        ,      0.        ,      0.        ,      0.        ,
            0.        ,      0.        ,      0.        ,      0.        ,
            0.        ,      0.        ,      0.        ,      0.        ,
            0.        ,      0.        ,      0.        ,      0.        ,
            0.        ,      0.        ,      0.        ,      0.        ,
            0.        ,      0.        ,  32846.42147806,      0.        ,
            0.        ,      0.        ,      0.        ,      0.        ,
            0.        ,      0.        ,      0.        ,      0.        ,
            0.        ,  

In [115]:
#subtract lasso variables from pnl to calculate VaR
pnl_lasso=pd.DataFrame()
pnl_lasso['returns']=pnl.returns
for i in range(len(pnl)):
    pnl_lasso.returns[i]=pnl.returns[i]-np.dot(clf.coef_,hedge.iloc[i])
    

In [116]:
print("VaR for Lasso = ",pnl_lasso.returns.quantile(0.01))

VaR for Lasso =  -36497.02734012559


In [117]:
print("VaR for Lasso rounded = ",pnl_lasso.returns.quantile(0.01,'lower'))

VaR for Lasso rounded =  -37839.32476640732


In [124]:

print("Original portfolio VaR = ",pnl.returns.quantile(0.01), "\t(rounded = ", pnl.returns.quantile(0.01,'lower'), ")",
    "\nPortfolio VaR for regression = ",pnl_regress.returns.quantile(0.01), "\t(rounded = ", pnl_regress.returns.quantile(0.01,'lower'), ")"
    "\nPortfolio VaR for Lasso = ",pnl_lasso.returns.quantile(0.01), "\t(rounded = ", pnl_lasso.returns.quantile(0.01,'lower'), ")")

Original portfolio VaR =  -39496.54854882854 	(rounded =  -42204.56802383321 ) 
Portfolio VaR for regression =  -28365.51569344735 	(rounded =  -28795.866717291967 )
Portfolio VaR for Lasso =  -36497.02734012559 	(rounded =  -37839.32476640732 )


## Thoughts
Regression is the best way to lower VaR in all these three methods. But Regression takes a lot of varibales, and normally you don't have the time to pick all the 100 stocks. Lasso is good since it only takes a few variables, in our case, 5, to make the VaR lower a bit. 