# Chapter 1

### Decision Tree

<center><img src="images/01.01.png"  style="width: 400px, height: 300px;"/></center>

- Sequence of if-else question
- Consists of hierarchy of nodes. Each node raise question or prediction.
- Root node : No parent
- Internal node : Has parent, has children
- Leaf node : Has no children. It is where predictions are made
- Goal : Search for pattern to produce purest leaves. Each leaf contains pattern for one dominant label.
- Information Gain : At each node, find the split point for each feature for which we get maximum correct pure split of the data. When information gain = 0, we could say that our goal is achieved, the pattern is captured, and this is a leaf node. Otherwise keep splitting it (We can stop it by specifying maximum depth of recursion split). 
- Measure of impurity in a node:
    - Gini index: For classification
    - Entropy: For classification
    - MSE : For regression
- capture non-linear relationhship between features and labels/ real values
- Do not require feature scaling
- At each split, only one feature is involved
- Decision region : Feature space where instances are assigned to a label / value
- Decision Boundary : Surface that separates different decision regions
- Steps of building a decision tree:
    1. Choose an attribute (column) of dataset
    2. Calculate the significance of that attribute when splitting the data with Entropy.
        A good split has less Entropy (disorder / randomness). 
    3. Find the best attribute that has most significance and use that attribute
    	to split the data
    4. For each branch, repeat the process (Recursive partitioning) for best 
    	information gain (The path that gives the most information using entropy).
- Limitations:
    - Can only produce orthogonal decision boundaries
    - Sensitive to small variations in training set
    - High variance overfits the model
- Solution : Ensemble learning
    - Train different models on same dataset
    - Let each model make its prediction
    - Aggregate predictions of individual models (eg: hard-voting)
    - One model's weakness is covered by another model's strength in that particular task
    - Final model is combination of models that are skillfull in different ways

### Classification Tree

```
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_score
# Split the dataset into 80% train, 20% test
X_train, X_test, y_train, y_test= train_test_split(X, y, test_size=0.2, stratify=y, random_state=1)
# Instantiate the Classification Tree
cl_dt = DecisionTreeClassifier(max_depth=2, random_state=42, criterion='gini')
# Train the model
cl_dt.fit(X_train,y_train)
# Predict using test set
y_pred = cl_dt.predict(X_test)
# Evaluate the test set accuracy
accuracy_score(y_test, y_pred)
# To check for model overfitting, compare this with test set log loss
# Compute negative log loss
neg_log_loss_cv = -cross_val_score(clf, X_train, y_train, cv=10, scoring='neg_log_loss', n_jobs=-1)
```

### Regression Tree

```
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error as MSE
from sklearn.model_selection import cross_val_score
# Split the dataset into 80% train, 20% test
X_train, X_test, y_train, y_test= train_test_split(X, y, test_size=0.2, random_state=42)
# Instantiate the Regression Tree
reg_dt = DecisionTreeRegressor(max_depth=4, min_samples_leaf=0.1, random_state=3)
# Train with training data
reg_dt.fit(X_train, y_train)
# Predict 
y_pred = reg_dt.predict(X_test)
# Compute RMSE for testing data
mse_reg_dt = MSE(y_test, y_pred)
rmse_reg_dt = mse_reg_dt**(1/2)
print(rmse_reg_dt)
# To check for model overfitting, compare this with test set MSE
MSE_CV = - cross_val_score(dt, X_train, y_train, cv= 10, scoring='neg_mean_squared_error', n_jobs = -1)
rmse_cv = MSE_CV**(1/2)
```

# Chapter 2

### Bias-Variance Trade-off

<center><img src="images/02.04.png"  style="width: 400px, height: 300px;"/></center>
<center><img src="images/02.05.png"  style="width: 400px, height: 300px;"/></center>
<center><img src="images/02.03.png"  style="width: 400px, height: 300px;"/></center>


- Overfitting : 
    - Model also memorises / trains on noise that resides within training data. 
    - Model performs well when evaluating on training data but does not perform well on unseen data
    - High variance is responsible for this error because of also capturing noise.
    - Diagnosis: cross-val prediction on test set has high error than prediction on train set
    - Possible remedy : Decrease model complexity, gather more data, 
- Underfitting :
    - Model is too simple to catch the pattern, model is not good enough to capture the underlying pattern.
    - Model is bad on both training and unseen data
    - Model is not flexibple enough to approximate the prediction values
    - High bias is responsible for this error
    - Diagnosis: cross-val prediction on train and test set are roughly equal but have very high errors that is undesirable
    - Possible remedy : Increase model complexity, gather more features, 
- Bias-Variance trade-off :
    - Generalization error = bias^2 + variance + irreducable error (noise)
    - bias = error term that tells how on average real value is different from predicted value
    - variance = error term that tells how predicted value varies over different training sets
    - When model complexity increases, variance increases and bias decreases
    - When model complexity decreases, variance decreases and bias increases
    - The sweet spot is the minimised generalization error, which gives the optimised model

### Ensemble Learning

<center><img src="images/02.01.png"  style="width: 400px, height: 300px;"/></center>
<center><img src="images/02.02.png"  style="width: 400px, height: 300px;"/></center>

```
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier as KNN
from sklearn.ensemble import VotingClassifier

# Split data into 70% train and 30% test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.3, random_state= 42)
# Instantiate individual classifiers
lr = LogisticRegression(random_state=42)
knn = KNN()
dt = DecisionTreeClassifier(random_state=42)
classifiers = [('Logistic Regression', lr),
                ('K Nearest Neighbours', knn),
                ('Classification Tree', dt)]
# Instantiate an ensemble VotingClassifier 'vc'
vc = VotingClassifier(estimators=classifiers)
# Train using traing set
vc.fit(X_train, y_train)
# Predict with test set
y_pred = vc.predict(X_test)
# Evaluate accuracy 
print(accuracy_score(y_test, y_pred))
```