# Assignment: Reject Option
In this assignment, your task is to implement a reject option in order to improve the predictions of a model.
That is, data points in regions with high ambiguity should be rejected, otherwise the predictions of the original model should be used.
The model is trained with the given full data set, and we just care about the performance on this data set, and not about generalization capabilities.
So, there is not need for any cross validation setup, nor any other train/val setup-based experiments.

## Prerequisites

In [1]:
import pandas as pd
import numpy as np
from pathlib import Path

from sklearn import preprocessing
from sklearn.neural_network import MLPClassifier

from matplotlib import pyplot as plt
import seaborn as sns
sns.set_context('notebook')

random_state = 42

# general plot configuration
SMALL_SIZE = 10
MEDIUM_SIZE = 16
LARGE_SIZE = 20
HUGE_SIZE = 24

plt.rc('figure', figsize=(12, 12))        # default figure size
plt.rc('figure', titlesize=HUGE_SIZE)     # fontsize of the figure title
plt.rc('figure', titleweight='bold')      # weight of the figure title
plt.rc('font', size=MEDIUM_SIZE)          # default text sizes
plt.rc('axes', titlesize=LARGE_SIZE)      # fontsize of the axes title
plt.rc('axes', titleweight='bold')        # weight of the axes title
plt.rc('axes', labelsize=MEDIUM_SIZE)     # fontsize of the x and y labels
plt.rc('xtick', labelsize=MEDIUM_SIZE)    # fontsize of the tick labels
plt.rc('ytick', labelsize=MEDIUM_SIZE)    # fontsize of the tick labels
plt.rc('legend', fontsize=MEDIUM_SIZE)    # legend fontsize

## Specify data location

In [2]:
DATA_HOME = Path(Path.cwd(), "data")

## Importing data

In [3]:
df = sns.load_dataset("assignment_3", data_home=DATA_HOME)

print(len(df))

2157


# Your tasks
- Explore the data and look for ambiguities
- Train a MLPClassifier that has *as little neurons as possible* without overfitting (i.e., it doesn't need to work well on ambiguous data)
- Implement a reject option for the region in the data where there are ambiguities, and the model cannot be trusted (e.g., due to a high error rate according to the training data)
- Combine everything into a system that outputs either 'reject' or one of the labels as a result
- Evaluate your system *thoroughly* and present your insights and reasoning behind your experiments
- Document your approach in this notebook (no separate slides necessary) and present it
- Discuss possible further steps that might help, but would take too long for this assignment

## Tips:
- Notice there is no label for ambiguous data, so there is some room for interpretation
- Be specifically careful not to reject too much (i.e., data where there is no ambiguity)