# Decision Trees and Random Forests and some unorthodox usages
```In this exercise we will experience with Decision Trees and Random Forests. These are widely used algorithms in machine learning problem. Here we will understand the basics of those algorithms. In addition, we will see some less orthodox usages of them.```

```During the exercise you will be asked some questions. You can identify the questions by the appearance of a question mark. Please answer the questions in your notebook for further discussion with your instructor.
Enjoy!```

```~Ittai Haran```

## Part I
### Making things work
```Here we will experience with Decision Trees and Random Forests. During this part you will explore the different features of them and will plot your results for further discussion with your instructor. Hence, whenever exploration tasks are marked with (*), know that you are asked to plot a two graphs (on the same plot): the training score against the explored feature and the test score against it.```

```Before you start, make sure you understand how Decision trees work (wikipedia will do). Make sure you understand how are the splits in the tree done. Talk about it with your instructor.```

In [None]:
import numpy as np
import pandas as pd

from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

import matplotlib.pyplot as plt
%matplotlib inline

```Read the dataset. In this dataset, you are provided over a hundred variables describing attributes of life insurance applicants. The task is to predict the "Response" variable for each Id.```

In [None]:
df = pd.read_csv('data/insurance.csv')
columns = list(df.dtypes[df.dtypes.apply(lambda x: not str(x) == 'object' )].index)
df = df[columns].fillna(0)
target = df['Response']
df = df.drop(['Response', 'Id'], axis = 1)
df_train, df_test, target_train, target_test = train_test_split(df, target, train_size = 0.7, test_size = 0.3)

```We will start by using Decision trees. Use a simple DecisionTreeClassifier with default values to predict on your train and on your test. Evaluate the model using the accuracy metric, which you can find in sklearn.```

```Unfortunately, you are at overfit. Now let's try to get better. Try playing with the max depth of the tree (change it at the object initialization). Vary the depth from 1 to 25. (*) (That means you are asked to plot some graphs, remember? :) )```
```Choose the optimal max_depth based on the graph you got.```

```Choose the best max_depth you found. Now try playing with min_samples_leaf. use the following values:
[1, 10, 100, 300,700, 1000]. Do it also with max_depth = 20. What can we learn from the graphs? (*)```

```Decision Tree is a very nice algorithm, especially because it is very intuitive and explainable. We can even draw it!
Train a simple Decision Tree with max_depth = 3. Call it basic_tree and run the cell below.```

In [None]:
from sklearn.tree import export_graphviz
export_graphviz(basic_tree, out_file = 'tree.dot', filled  = True,
                rounded = True, feature_names = df.columns)
!dot -Tpng tree.dot -o tree.png & del tree.dot

```Look at the tree you got. What, would you say, are the most important features?
Do it again, this time by using its feature_importance_ attribute. Make sure you understand what you see :)```

```We will now move on to Random Forest. Repeat the exploration tasks with a Random forest with 100 trees. In addition, vary the number of trees between 10 and 400, while maintaining low max_depth (*) and the max_feature parameter, between 0.1 and 1 (*). Try explaining the graphs you see. Use the flag n_jobs = -1 in your experiments to accelerate your computation time. Make sure to understand where your model is overfitted.```

```Use the Random Forest to defeat the Decision Tree.```

## Part II
### Feature selection
```We will now try using the feature_importance we can get from our models to do a wise feature selection.```

```Do the following:```
- ```Take the best Random Forest model you got at the last part.```
- ```Select the top 20 feature with the greatest feature importance.```
- ```Train a KNN model with n_neighbors = 8 (and n_jobs = -1). What is its accuracy?```
- ```Train a KNN model using the 20 features you found, again with n_neighbors = 8 (and n_jobs = -1). What is its accuracy? What can we learn from it?```
- ```Repeat the two last tasks, this time with the best Random Forest configuration you found. Can you explain the results?```
- ```Draw a graph where the y axis is a feature_importance sum, and the x axis is the number of features you must take to get this sum. How can this graph help you to explain your results?```

```We will now implement a primitive reduction of Seffi's feature selection method (you are encouraged to ask him about it):```
- ```Take the best Random Forest model you found.```
- ```Create a dictionary that holds {feature: its importance}.```
- ```Transpose the df matrix, so we may think on the columns as 'samples' and vice versa.```
- ```Normalize each 'sample', so its length is 1.```
- ```Use KMeans with n_clusters = 20 on the 'samples'.```
- ```From each cluster take the feature with the highest feature importance.```

```Repeat the KNN inspecting, this time with the feature you got using Seffi's method. What features are better? Can you think of something you would change in the method described above? why?```

## Part III
### Model explaining- LIMING things up
```We will try to explain complicated models using Decision Trees, and will get to know some concepts in the field.```

```We will use the MNIST data set.```

In [None]:
df = pd.read_csv('data/MNIST_train.csv')
target = df['label']
df = df.drop('label', axis = 1)
df_train, df_test, target_train, target_test = train_test_split(df, target, train_size = 0.7, test_size = 0.3)

```Train a LGBMClassifier on the data set, using max_depth=10, n_estimators=250 (and, of course, n_jobs = -1). How good is your model?```

```Read The short paper Model-Agnostic Interpretability of Machine Learning by Riberio et al, which you can find in the papers directory. We are about to try to make the LIME method proposed in the paper work, using Decision Tree. We would like to train a simple and interpretable model that predicts, locally, the predictions of our LGBMClassifier.
Create the targets (train and test) for the interpretable model, as described in the paper.```

```Create a function that gets a sample and returns a weight function. The weight function will get a sample (or multiple samples) and return the each sample's weight, using the following function: ``` y_weight $ = \frac{1}{|x-y|+1}$

```where x is the original sample and y is the sample that we would like to get its weight.
Note: we are about to use this function in order to define the "neighborhood" of a point - this definition is crucial in LIME.```

```Make sure your function is good: pick a sample and print the images of the 20 most weighted samples in the data set, compared to the sample you picked.```

```for every digit, do the following:```
- ```take a sample of it from the dataset.```
- ```Our interpretable model is going to be a decision tree with max_depth = 5. Create the model.```
- ```Train it using the weights derived from the sample picked and the predictions of the complex LGBMClassifier model.```
- ```Create an empty image, and paint it by the feature_importance of each pixel, which you can get from the interpretable model.```
- ```Paint the image. Can you learn something from the image?```

```What are the biggest problems in LIME? What makes it limited? Regard, in your answer, the way the concept of "locality" is defined.```

```The field of interpreting models might be very important, especially when working with complex models or with unsupervised learning. This part was a small taste of it, and you are encouraged to ``` **deepen** ``` (see what I did here? :) ) your knowledge about it.```