# Task B: Meta-Learning Perfomance Prediction

In this task, you will use information on training parameters and metadata on multiple OpenML dataset to train a performance predictor that performs well even for unseen datasets. You are provided with config parameters and metafeatures for six datasets. The datasets are split into training datasets and test datasets and you should only train on the training datasets.

For questions, you can contact zimmerl@informatik.uni-freiburg.

__Note: Please use the dataloading and splits you are provided with in this notebook.__

## Specifications:

* Data: six_datasets_lw.json
* Number of datasets: 6
* Training datasets: higgs, vehicle, adult, volkert
* Test datasets: Fashion-MNIST, jasmine
* Number of configurations: 2000
* Available data: architecture parameters and hyperparameters, metafeatures 
* Target: final validation accuracy
* Evaluation metric: MSE

## Importing and splitting data

Note: There are 51 steps logged, 50 epochs plus the 0th epoch, prior to any weight updates.

In [1]:
%%capture
%cd ..
import numpy as np
import json
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

from api import Benchmark

In [2]:
bench_dir = "cached/six_datasets_lw.json"
bench = Benchmark(bench_dir, cache=False)

==> Loading data...
==> No cached data found or cache set to False.
==> Reading json data...
==> Done.


In [3]:
with open("cached/metafeatures.json", "r") as f:
    metafeatures = json.load(f)

In [4]:
# Dataset split
dataset_names = bench.get_dataset_names()
print(dataset_names)

train_datasets = ['adult', 'higgs', 'vehicle', 'volkert']
test_datasets = ['Fashion-MNIST', 'jasmine']

['Fashion-MNIST', 'adult', 'higgs', 'jasmine', 'vehicle', 'volkert']


In [5]:
# Prepare data
def read_data(datasets):
    n_configs = bench.get_number_of_configs(datasets[0])
    data = [bench.query(dataset_name=d, tag="Train/val_accuracy", config_id=ind) for d in datasets for ind in range(n_configs)]
    configs = [bench.query(dataset_name=d, tag="config", config_id=ind) for d in datasets for ind in range(n_configs)]
    dataset_names = [d for d in datasets for ind in range(n_configs)]
    
    y = np.array([curve[-1] for curve in data])
    return np.array(configs), y, np.array(dataset_names)

class TrainValSplitter():
    """Splits 25 % data as a validation split."""
    
    def __init__(self, dataset_names):
        self.ind_train, self.ind_val = train_test_split(np.arange(len(X)), test_size=0.25, stratify=dataset_names)
        
    def split(self, a):
        return a[self.ind_train], a[self.ind_val]

X, y, dataset_names = read_data(train_datasets)
X_test, y_test, dataset_names_test = read_data(test_datasets)

tv_splitter = TrainValSplitter(dataset_names=dataset_names)

X_train, X_val = tv_splitter.split(X)
y_train, y_val = tv_splitter.split(y)
dataset_names_train, dataset_names_val = tv_splitter.split(dataset_names)

print("X_train:", X_train.shape)
print("X_test:", X_test.shape)
print("X_val:", X_val.shape)

X_train: (6000,)
X_test: (4000,)
X_val: (2000,)


The data contains the configuration.

__Note__: Not all parameters vary across different configurations. The varying parameters are batch_size, max_dropout, max_units, num_layers, learning_rate, momentum, weight_decay

In [6]:
# Take a look at one datapoint
datapoint_id = 1
config = X_train[datapoint_id]
dataset_name = dataset_names_train[datapoint_id]
example_metafeature = metafeatures[dataset_name]["NumberOfClasses"]

print("Config example:", X_train[0], sep="\n")
print("\nMeta-feature 'Number of classes':", example_metafeature)

Config example:
{'batch_size': 99, 'imputation_strategy': 'mean', 'learning_rate_scheduler': 'cosine_annealing', 'loss': 'cross_entropy_weighted', 'network': 'shapedmlpnet', 'max_dropout': 0.7970306455890422, 'normalization_strategy': 'standardize', 'optimizer': 'sgd', 'cosine_annealing_T_max': 50, 'cosine_annealing_eta_min': 1e-08, 'activation': 'relu', 'max_units': 68, 'mlp_shape': 'funnel', 'num_layers': 4, 'learning_rate': 0.0019122082568439622, 'momentum': 0.20798731345013868, 'weight_decay': 0.03236170214482822}

Meta-feature 'Number of classes': 4.0


In [7]:
# Look at some metafeatures
iterator = iter(metafeatures[dataset_name].items())
for ind in range(5):
    feature, value = iterator.__next__()
    print(feature, value)

AutoCorrelation 0.2579881656804734
CfsSubsetEval_DecisionStumpAUC 0.8287300415025641
CfsSubsetEval_DecisionStumpErrRate 0.3191489361702128
CfsSubsetEval_DecisionStumpKappa 0.5743705558785386
CfsSubsetEval_NaiveBayesAUC 0.8287300415025641


## Training and scoring

In [8]:
class ConstantPerformancePredictor():
    """A predictor that predicts the mean of the performances seen on the training data."""
    
    def __init__(self):
        self.constant_prediction = 0
        
    def fit(self, X, y, dataset_names, metafeatures):
        self.constant_prediction = np.mean(y)
    
    def predict(self, X):
        predictions = [self.constant_prediction] * len(X)
        return predictions
    
def score(y_true, y_pred):
    return mean_squared_error(y_true, y_pred)

In [9]:
# Train and validate
predictor = ConstantPerformancePredictor()
predictor.fit(X_train, y_train, dataset_names_train, metafeatures)
preds = predictor.predict(X_val)
mse = score(y_val, preds)
print(mse)

283.5325483378039


In [10]:
# Final evaluation
final_preds = predictor.predict(X_test)
final_score = score(y_test, final_preds)
print("Final test score:", final_score)

Final test score: 477.5415315131301
