# "Mahalanobis" out-of-distribution detection

- Try rejecting based on empirical distribution of Mahalanobis distances (e.g. reject new data if it falls beyond 99th percentile of Mahalanobis distance compared to training data).

## Imports

In [12]:
import plotly.express as px
# import tensorflow_datasets as tfds
import numpy as np
import sys
sys.path.append("../src/")
import gda
import scipy.stats
import torchvision

## Test data

In [2]:
size = (10000,2)
center = [1,0]
data = np.random.normal(loc=np.array(center),
                        size=size)

In [3]:
px.scatter(x=data[:,0], y=data[:,1])

In [4]:
mean = data.mean(axis=0)
covariance = np.cov(data.T)

In [29]:
train, test = tfds.load('mnist', split=['train', 'test'], data_dir='../data', as_supervised=True)

In [5]:
distances = gda.mahalanobis(data, mean, covariance)

In [6]:
px.histogram(distances)

In [7]:
distribution = scipy.stats.distributions.chi(df=2)

In [10]:
x_vals = np.linspace(start=0, stop=4.5, num=100)
y_vals = distribution.pdf(x_vals)

In [11]:
px.line(x=x_vals, y=y_vals)

## Non-synthetic data

In [16]:
data_root = "../data/"

In [17]:
mnist_train = torchvision.datasets.MNIST(root=data_root,
                                         train=True,
                                         download=True)
mnist_test = torchvision.datasets.MNIST(root=data_root,
                                         train=False,
                                         download=True)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../data/MNIST/raw/train-images-idx3-ubyte.gz


9913344it [00:02, 4589767.78it/s]                             


Extracting ../data/MNIST/raw/train-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ../data/MNIST/raw/train-labels-idx1-ubyte.gz


29696it [00:00, 803253.23it/s]           


Extracting ../data/MNIST/raw/train-labels-idx1-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw/t10k-images-idx3-ubyte.gz


1649664it [00:01, 1498617.58it/s]                             


Extracting ../data/MNIST/raw/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz


5120it [00:00, 5244160.31it/s]          

Extracting ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw






In [21]:
kmnist_train = torchvision.datasets.KMNIST(root=data_root,
                                           train=True,
                                           download=True)
kmnist_test = torchvision.datasets.KMNIST(root=data_root,
                                          train=False,
                                          download=True)

Downloading http://codh.rois.ac.jp/kmnist/dataset/kmnist/train-images-idx3-ubyte.gz
Downloading http://codh.rois.ac.jp/kmnist/dataset/kmnist/train-images-idx3-ubyte.gz to ../data/KMNIST/raw/train-images-idx3-ubyte.gz


18165760it [00:48, 377428.96it/s]                              


Extracting ../data/KMNIST/raw/train-images-idx3-ubyte.gz to ../data/KMNIST/raw

Downloading http://codh.rois.ac.jp/kmnist/dataset/kmnist/train-labels-idx1-ubyte.gz
Downloading http://codh.rois.ac.jp/kmnist/dataset/kmnist/train-labels-idx1-ubyte.gz to ../data/KMNIST/raw/train-labels-idx1-ubyte.gz


29696it [00:00, 96546.72it/s]                           


Extracting ../data/KMNIST/raw/train-labels-idx1-ubyte.gz to ../data/KMNIST/raw

Downloading http://codh.rois.ac.jp/kmnist/dataset/kmnist/t10k-images-idx3-ubyte.gz
Downloading http://codh.rois.ac.jp/kmnist/dataset/kmnist/t10k-images-idx3-ubyte.gz to ../data/KMNIST/raw/t10k-images-idx3-ubyte.gz


3041280it [00:08, 365251.51it/s]                             


Extracting ../data/KMNIST/raw/t10k-images-idx3-ubyte.gz to ../data/KMNIST/raw

Downloading http://codh.rois.ac.jp/kmnist/dataset/kmnist/t10k-labels-idx1-ubyte.gz
Downloading http://codh.rois.ac.jp/kmnist/dataset/kmnist/t10k-labels-idx1-ubyte.gz to ../data/KMNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 5120/5120 [00:00<00:00, 6093880.95it/s]

Extracting ../data/KMNIST/raw/t10k-labels-idx1-ubyte.gz to ../data/KMNIST/raw






## Classifier model

In [22]:
model = torchvision.models.resnet50(pretrained=True)

Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to /home/mfagan/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth
100%|██████████| 97.8M/97.8M [00:15<00:00, 6.55MB/s]


In [27]:
model.forward(mnist_train.data[0:3])

RuntimeError: Number of dimensions of repeat dims can not be smaller than number of dimensions of tensor

In [25]:
mnist_train.data.shape

torch.Size([60000, 28, 28])