# Uncertainty Forest (UF) Demo
This demo provides the basic use cases of the `uncertainty_forest` module.

In [1]:
import numpy as np
from uncertainty_forest.uncertainty_forest import UncertaintyForest

## Hyperparameter Specification
Random forest hyperparameters such as minimum leaf size and maximum depth can be specified by the UF constructor.

In [2]:
# Notation from the current paper. 
max_depth = 30       # D
min_samples_leaf = 1 # k
max_features = None  # m
n_estimators = 200   # B
max_samples = .5     # s // 2
bootstrap = False    # Whether to subsample with replacement.

uf = UncertaintyForest(
    max_depth = max_depth,          
    min_samples_leaf = min_samples_leaf,    
    max_features = max_features,   
    n_estimators = n_estimators,   
    max_samples = max_samples,   
    bootstrap = bootstrap,
)

TypeError: __init__() got an unexpected keyword argument 'n_trees'

In [None]:
# Or, you can just use the defaults.
uf = UncertaintyForest()

## Estimate the Conditional Probability of `Y` given `X = x`
Use `X_train` and `y_train` to estimate the posterior, and evaluate this posterior at `X_eval`.

In [None]:
n_class = 25
d = 10
classes = [-1, 0, 1]

X_train = np.concatenate([np.random.multivariate_normal(k*np.ones(d), np.eye(d), n_class) for k in classes])
y_train = np.concatenate([k*np.ones(n_class) for k in classes])
print(X_train.shape)
print(y_train.shape)

X_eval = np.array([c*np.ones(d) for c in range(-2, 3)])
print(X_eval)

The computation can be parallelized by tree using the `parallel` flag (defaults to `False`) on any of the below functions. Usually, the overhead outweighs the benefit for computations of this scale.

In [None]:
uf.fit(X_train, y_train)
cond_probability = uf.predict_proba(X_eval, parallel = True)
print("P(Y|X = x) for x in X_eval:")
print(cond_probability)

## Estimate the Conditional Entropy
If no data is supplied, `H(Y | X)` is computed using the stored conditional probability (of the evaluation set `X_eval`).

In [None]:
cond_entropy = uf.estimate_cond_entropy()
print("0 <= H(Y|X) = %f <= log2(3) = %f" % (cond_entropy, np.log2(3.0)))

New data can be supplied directly to the `estimate_cond_entropy` function. You must provide **both** training data and evaluation data.

In [None]:
n_class = 10
d = 2

classes = range(1, 5) # Classes can be arbitrary, will be preprocessed into {0, 1, ..., K}

X_train = np.concatenate([np.random.multivariate_normal(k*np.ones(d), 0.5*k*np.eye(d), n_class) for k in classes])
y_train = np.concatenate([k*np.ones(n_class) for k in classes])
X_eval = np.array([c*np.ones(d) for c in range(-2, 3)])

uf = UncertaintyForest()

cond_entropy = uf.estimate_cond_entropy(X_train, y_train, X_eval)
print("H(Y|X) = %f" % cond_entropy)

# Estimate the mutual information
Mutual information can be estimated with the exact same schema as conditional entropy, with `estimate_mutual_info`.

In [None]:
mutual_info = uf.estimate_mutual_info()
print("I(X, Y) = %f" % mutual_info)