# Self Practice: Evaluation Metrics & Application to Trading Rules

## Background

Suppose you are working as an analyst at an investment bank. Your manager asked you to implement a decision tree to test one of the desk trader's startegy.

The trader uses 6 indicators to implement a buy or sell strategy for Microsoft. Your manager did not specify anything else except that you need to implement a decision tree algorithm. 

Here are the indicators you will use:
* Average True Range
* Average Directional Index
* Relative Strength Index
* Binary Indicator when Price is greater than EMA(10)
* Binary Indicator when EMA(10) is greater than EMA(30)
* Binary Indicator when MACD-Signal is greater than MACD

You will train your model between 1986-04-30 and 2020-08-26, and test it on 2021-06-15-Present. You will notice that the period between 2020-08-26 and 2021-06-14 is left for validation. The forecasting horizon is 200 days. We will produce one-step ahead predictions for 200 days starting in 2021-06-15.

* Train 1986/04/30 - 2020/08/26
* Validation 2020/08/26 - 2021/06/14
* Test 2021/06/15 - 2022/03/30



## Outline:
1. Demonstrate the various evaluation metrics in sklearn.
2. We use Microsoft stock data from Yahoo Finance
3. We will build a Classification Model
4. We will visualize the ROC curve


## Importing Libraries

In [None]:
! pip install pandas_ta
! pip install yfinance

In [None]:
import pandas as pd
import pandas_ta as ta
import numpy as np
import yfinance
from sklearn import tree
from sklearn.model_selection import train_test_split
from sklearn.multiclass import OneVsRestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
%matplotlib inline
import graphviz
from IPython.display import Image  
import pydotplus
from sklearn.tree import export_graphviz
from sklearn import metrics

## Loading Data

We use the YFinance Package to download data directly from Yahoo Finance.

In [None]:
df= pd.DataFrame()
df=df.ta.ticker('msft')


## Data Transformation and Technical Analysis

In [None]:
## In this section you are required to generate the necessary techincal indicators

# Exponnetial Moving Averages
df['ema10']=None
df['ema30']=None

# Average True Range- Measures Volatility Caused by Price Gaps or Limit Moves
df['atr'] = None

# Average Directional Movement Index - to  quantify trend strength by measuring 
# the amount of movement in a single direction

adx= None
df['adx'] = None

# Moving Average Convergence/ Divergence
#   Used to identify aspects of a security's overall trend
#   MACD Line: (12-day EMA - 26-day EMA) 
#   Signal Line: 9-day EMA of MACD Line
#   MACD Histogram: MACD Line - Signal Line

macd = macd = None
df['macd']=None
df['macds']=None


# Relative Strength Index
#   momentum oscillator used to measure the
#   velocity as well as the magnitude of directional price movements

df['rsi'] =None

df['Cgtema10'] = None
df['ema10gtema30'] = None
df['macdsgtmacd'] = None

df['Return_1'] = None
df['target'] = None

df.dropna(inplace=True)

# Features
predictors_list = None
X = df[predictors_list]

# Target Variable
y = df.target



In [None]:
df.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits,ema10,ema30,atr,adx,macd,macds,rsi,Cgtema10,ema10gtema30,macdsgtmacd,Return_1,target
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
1986-04-30,0.072404,0.072953,0.069113,0.070759,30902400,0.0,0.0,0.069055,0.064604,0.002736,21.975007,0.003106,0.002142,64.263976,1,1,-1,-0.015503,0
1986-05-01,0.070759,0.070759,0.068565,0.069662,54345600,0.0,0.0,0.069165,0.064931,0.002694,21.654902,0.002976,0.002309,60.661382,1,1,-1,0.0,0
1986-05-02,0.069662,0.070759,0.069113,0.069662,20246400,0.0,0.0,0.069255,0.065236,0.002613,21.369228,0.002841,0.002415,60.661382,1,1,-1,-0.007873,0
1986-05-05,0.069662,0.069662,0.069113,0.069113,3254400,0.0,0.0,0.069229,0.065486,0.002455,21.113379,0.002658,0.002464,58.751527,-1,1,-1,0.007936,1
1986-05-06,0.069662,0.070759,0.069662,0.069662,9734400,0.0,0.0,0.069308,0.065756,0.002393,21.302145,0.002529,0.002477,60.104222,1,1,-1,0.0,0


## Splitting the Data

In [None]:
X_train=X.loc[None:None]
X_valid=X.loc[None:None]
X_test=X.loc[None:]
y_train=y.loc[None:None]
y_valid=y.loc[None:None]
y_test=y.loc[None:]

## Building The Decision Tree for Trading

We will use:
* Min-samples-leaf [8,10]
* Min-samples-split [8,10]
* Max-depth [8,10]

In [None]:
for leaf_size in [None,None]:
  for min_samples_split in [None,None]:
    for max_depth in [None,None]:
      
      clf = tree.DecisionTreeClassifier(criterion='entropy',
                                   min_samples_leaf=leaf_size,
                                   min_samples_split=min_samples_split,
                                   max_depth=max_depth, random_state=34)
      clf.fit(X_train,y_train)
      y_pred = clf.predict(X_valid)
      print ('Validation accuracy where [min_samples_leaf;min_samples_split;max_depth] '+ '=' + '[' +str(leaf_size) +';'+
             str(min_samples_split)+';'+ str(max_depth)+']' + 'is' +' = ' + 
              str(metrics.accuracy_score(y_valid, y_pred)) + '\n')

Add validation

## Make Predictions

In [None]:
predictions = None

## Evaluation Metrics

## Accuracy is the number of instances where our prediction is equal to the ground truth over total number of instances

In [None]:
print ('accuracy = ' + str(round(None,4)))

accuracy = 0.5323


## Confusion Matrix is a matrix that is used to describe the performance of the classification model

In [None]:
print('The confusion matrix is: \n')
print(None)

The confusion matrix is: 

[[47 46]
 [48 60]]


In [None]:
print ('precision = ' + str(round(None),4))

precision = 0.566


In [None]:
print ('recall = ' + str(round(None,4)))

recall = 0.5556


In [None]:
print ('f1 score = ' + str(round(None,4)))

f1 score = 0.5607


In [None]:
fpr, tpr, threshold = None
roc_auc = metrics.auc(None, None)
plt.rc("figure", figsize=(16, 8))
plt.rc("font", size=12)

plt.title('Receiver Operating Characteristic')
plt.plot(fpr, tpr, 'b', label = 'AUC = %0.4f' % roc_auc)
plt.legend(loc = 'lower right')
plt.plot([0, 1], [0, 1],'r--')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()