# Tutorial Overview
This set of five tutorials (installation, package setup, data setup, running, analyzing) will explain the UncertaintyForest class. After following the steps below, you should have the ability to run the code on your own machine and interpret the results.

If you haven't seen it already, take a look at the first three parts of this set of tutorials called `UncertaintyForest_Tutorials_1-Installation`, `UncertaintyForest_Tutorial_2-Package-Setup`, and `UncertaintyForest_Tutorial_3-Data-Setup`

# 4: Running
## *Goal: Train the UncertaintyForest classifier on some training data and produce a metric of accuracy on some test data*

### 1: First, we'll run the code from before. Recall that we imported packages, defined a function to generate data, and set some parameters for the forest. 

In [1]:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.ensemble import RandomForestClassifier
from sklearn.calibration import CalibratedClassifierCV
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

from tqdm.notebook import tqdm
from joblib import Parallel, delayed

from proglearn.forest import UncertaintyForest

In [2]:
def generate_data(n, d, var): 
    '''
    Parameters
    ---
    n : int
        The number of data to be generated
    d : int
        The number of features to generate for each data point
    var : double
        The variance in the data
    '''
    # create the mean matrix for the data (here it's just a mean of 1)
    means = [np.ones(d) * -1, np.ones(d)] 
    
    # create the data with the given parameters (variance)
    X = np.concatenate([np.random.multivariate_normal(mean, var * np.eye(len(mean)), 
                                                 size=int(n / 2)) for mean in means]) 
    
    # create the labels for the data
    y = np.concatenate([np.ones(int(n / 2)) * mean_idx for mean_idx in range(len(means))])
    
    return X, y

In [3]:
# Real Params.
n_train = 50
n_test = 10000
d = 100
var = 0.25
num_trials = 10
n_estimators = 100

#### We've done a lot. Can we just run it now? Yes!

### 2: Creating & Training our UncertaintyForest 
First, generate our data:

In [4]:
X, y = generate_data(n_train+n_test, d, var)

Now, split that data into training and testing data. We don't want to accidently train on our test data.

In [5]:
X_train = X[0:n_train] # Takes the first n_train number of data points and saves as X_train
y_train = y[0:n_train] # same as above for the labels
X_test = X[n_train:] # Takes the remainder of the data (n_test data points) and saves as X_test
y_test = y[n_train:] # same as above for the labels

Then, create our forest:

In [6]:
UF = UncertaintyForest(n_estimators = n_estimators)

Then fit our learner:

In [7]:
UF.fit(X_train, y_train)

<proglearn.forest.UncertaintyForest at 0x158283780>

Well, we're done. Exciting right?

### 3: Producing a Metric of Accuracy for Our Learner
We've now created our learner and trained it. But to actually show if what we did is effective at predicting the class labels of the data, we'll create some test data (with the same distribution as the train data) and see if we classify it correctly.

In [8]:
X_test, y_test = generate_data(n_test, d, var) # creates the test data

In [9]:
predictions = UF.predict(X_test) # predict the class labels of the test data

To see the learner's accuracy, we'll now compare the predictions with the actual test data labels. We'll find the number correct and divide by the number of data.

In [10]:
accuracy = sum(predictions == y_test)/n_test

And, let's take a look at our accuracy:

In [11]:
print(accuracy)

0.5


Ta-da. That's an uncertainty forest at work. While that accuracy doesn't seem great, it is to-be-expected here since the data was randomly generated. In the next tutorial, the actual importance of uncertainty forests will be discussed.
## What's next? 
Next, we'll produce some plots to demonstrate the power of uncertainty forests.

## You're done with part 4 of the tutorial!

## Move on to part 5 (called "UncertaintyForest_Tutorial_5-Analyzing")