In [None]:
# !pip install dime-torch
import torch

# Out of Distribution Detection Examples and Strategies

## Detecting covariate shifts with Random Forests

One example from [Deep learning for coders with fastai & pytorch](https://www.amazon.com/Deep-Learning-Coders-fastai-PyTorch/dp/1492045527) is how to cleverly use a random forest to detect the possibility of having OOD data from covariate shifts.

Strategy:
1. Create your test/train split. Sometimes, especially for forecasing things in the future a good train/test split is NOT a random mix and division - instead, a training set may occur earlier in time, while the test occurs later in the future.  It is important to check if there are covariate shifts occuring as things change over time.
2. Use a RF to predict if a row comes from the training set or test set. If the model performs well, you have some detectable shift in your data - in this example, premised on shifts over time.
3. Examine the RF feature importances to explain what is changing over time.

Reasoning:
* RFs are fast, easy to train and are robust against most hyperparameter choices so your results won't be very dependent on selection of a bad model.
* Your data does need to be tabular and RF-friendly (i.e., hopefully no very high cardinality categorical features)

# OOD for Neural Networks

Most NN are applicable under "closed world" settings, that is, they are expected to see the same classes at test time as they were trained on.  It is possible to modify the training of the network to be able to detect distribution shifts, and thus predict if the input is something novel, but this requires re-training models that have already been trained. It also complicates the implementation.  Here, I discuss 3 methods to detect OOD samples that do not require re-training.

If you have a model and an OOD detector trained, you can use them in tandem like this:

In [None]:
class OpenWorldClassifier:
    def __init__(self, closed_world_model, ood_detector):
        self.model = closed_world_model
        self.ood_detector = ood_detector
        return

    def predict(self, X):
        X = np.asarray(X)

        # 1. Check which samples are 'in-distribution'
        in_distribution = self.ood_detector(X)

        # 2. Default to 'UNKNOWN'
        predictions = np.array(['UNKNOWN'] * X.shape[0])

        # 3. If ID, use model
        predictions[np.where(in_distribution)] = self.model.predict(X[in_distribution])

        return predictions

## Softmax Confidence Scores

## Energy-based OOD

## DIME