# kl-divergence-optim

## Optimization of final Kullback-Leibler divergence

In this Notebook, we perform optimization experiments on 6 datasets aiming to set hiperparameters that allow for the lowest KL divergence at the end of t-SNE.  

## Tools & Libraries

We use **`Python`**. The following modules are used:

* **pandas:** reading, writing and manipulating data.
* **numpy:** vectorized calculations and other relevant math functions.
* **scipy:** functions for scientific purposes. Great statistics content.
* **matplotlib & seaborn:** data visualization.
* **sklearn:** comprehensive machine learning libraries.
* **hyperopt:** random search and TPE for hyperparameter optimization.
* **BayesianOptimization:** Gaussian Processes.

In [None]:
# opening up a console as the notebook starts
%qtconsole

# making plots stay on the notebook (no extra windows!)
%matplotlib inline

# show figures with highest resolution 
%config InlineBackend.figure_format = 'retina'

# changing working directory
import os
os.chdir('C:\\Users\\Guilherme\\Documents\\TCC\\tsne-optim')

# importing modules
import pandas as pd
import numpy as np
import scipy
import matplotlib.pyplot as plt
import seaborn as sns
import png, array
from sklearn.manifold import TSNE
from sklearn.decomposition import PCA
from bayes_opt import BayesianOptimization
from hyperopt import fmin, rand, tpe, hp, STATUS_OK, Trials
from hdbscan import HDBSCAN
from sklearn.metrics import adjusted_mutual_info_score
from scipy.stats import spearmanr
from sklearn.metrics.pairwise import euclidean_distances

## 1. Search spaces

Let us define our hyperparameter search space, both for **`hyperopt`** and **`BayesianOptimization`**. Furthermore, we also define a **baseline** set of parameters, used by Maaten et. al. in the [2008 paper](http://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf) that first proposed t-SNE.

In [None]:
# choice of perplexity, PCA dims and whitening
perp_default = 30; pca_dims_default = 30; whitening_default = False

In [None]:
# search space definition - hyperopt #

space_hp = {'perplexity': hp.choice('perp',[perp_default]), # as we are optimizing KL divergence, perplexity stays fixed
            'early_exaggeration': hp.uniform('exag', 1.0, 12.0),
            'learning_rate': hp.uniform('lr', 50, 1000),
            'n_iter': hp.quniform('ni', 200, 5000, 100),
            'angle': hp.uniform('angle', 0.2, 0.8),
            'pca_dims': hp.choice('pca', [pca_dims_default]) # as we are optimizing KL divergence, PCA dims stays fixed
            'whitening_flag': hp.choice('white', [whitening_default])} # as we are optimizing KL divergence, whitening stays fixed

In [None]:
# search space definition - bayesian optimzation #

space_bo = {'perplexity': (perp_default,perp_default), # as we are optimizing KL divergence, perplexity stays fixed
            'early_exaggeration': (1.0, 12.0),
            'learning_rate': (50, 1000),
            'n_iter': (200, 5000),
            'angle': (0.2,0.8),
            'pca_dims': (pca_dims_default,pca_dims_default),
            'whitening_flag': (int(whitening_default), int(whitening_default))}

In [None]:
# set of hyperparameters definition - baseline #

space_base = {'perplexity': 30,
              'early_exaggeration': 4.0,
              'learning_rate': 100,
              'n_iter': 1000,
              'angle': 0.5,
              'pca_dims': 30,
              'whitening_flag': False}

## 2. Target function

Let us define our optimization objective. Here we run t-SNE.

In [None]:
# defining cost function: KL divergence #

# the function takes a search space sample as parameter #
def optim_target(data, perplexity, early_exaggeration, learning_rate, n_iter, angle, pca_dims, whitening_flag, n_runs=3):    
    
    # setting random seed
    np.random.seed(42)
    
    # store target info
    target_var = data['TARGET']; data = data.drop('TARGET', axis=1)
    
    # HDBSCAN to compute clusters on high dimensional space
    # HDBSCAN(min_cluster_size=10, min_samples=1, allow_single_cluster=True)
    clusterer_highd = HDBSCAN(min_cluster_size=10, min_samples=1, allow_single_cluster=True)

    # clustering points
    cluster_assign_highd = clusterer_highd.fit_predict(PCA(n_components=100).fit_transform(data))        
    
    # PCA first to reduce data
    reducer = PCA(n_components=pca_dims, whiten=whitening_flag)
    
    # reducing
    reduced_data = reducer.fit_transform(data)
    
    # let us run t-SNE 5 times and get the best KL div #
    
    # divergence accumulator, initialize with +infinity
    KL_div = np.inf
    
    # loop for each t-SNE run
    for i in range(n_runs):
        
        # configuring t-SNE
        embedder = TSNE(perplexity=perplexity, early_exaggeration=early_exaggeration,
                        learning_rate=learning_rate, n_iter=n_iter,
                        angle=angle, random_state=i)
        
        # fitting
        temp_data = embedder.fit_transform(reduced_data)
    
        # KL divergence result after optimization
        temp_div = embedder.kl_divergence_
        
        # if smaller than last experiment, update
        if temp_div < KL_div:
            
            # updating values
            KL_div = temp_div
            embedded_data = temp_data
    
    # data frame form embedded_data
    embedded_data = pd.DataFrame({'x': zip(*embedded_data)[0], 'y': zip(*embedded_data)[1]})
    
    # computing ajusted mutual information over clusterings #

    # HDBSCAN to compute clusters on embedded space
    clusterer_lowd = HDBSCAN(min_cluster_size=10, min_samples=1, allow_single_cluster=True)

    # clustering points - low-dim
    cluster_assign_lowd = clusterer_lowd.fit_predict(embedded_data)

    # ajusted mutual info score
    AMI_score = adjusted_mutual_info_score(cluster_assign_highd, cluster_assign_lowd)
    
    # ajusted mutual info on target and dimensions
    AMI_target_highd = adjusted_mutual_info_score(target_var, cluster_assign_highd)
    AMI_target_lowd = adjusted_mutual_info_score(target_var, cluster_assign_lowd)

    # computing global geometry #
    
    # treating empty centers errors
    try:
        # centers on high dimensional space
        data['assignment'] = cluster_assign_highd; c_groups = data.groupby('assignment').mean()
        centers_highd = [np.array(c_groups.iloc[i,:]) for i in c_groups.index if not i == -1]

        # distances on high dimensional space
        dists_highd = euclidean_distances(centers_highd)
        closest_highd = [np.argsort(dists_highd[:,i])[1:] for i in range(dists_highd.shape[0])]
        closest_highd_df = pd.DataFrame(np.matrix(closest_highd))

        # centers on low dimensional space
        embedded_data['assignment'] = cluster_assign_highd; c_groups = embedded_data.groupby('assignment').mean()
        centers_lowd = [np.array(c_groups.iloc[i,:]) for i in c_groups.index if not i == -1]

        # distances on high dimensional space
        dists_lowd = euclidean_distances(centers_lowd)
        closest_lowd = [np.argsort(dists_lowd[:,i])[1:] for i in range(dists_lowd.shape[0])]
        closest_lowd_df = pd.DataFrame(np.matrix(closest_lowd))

        # correlations
        rank_order_cor = [spearmanr(closest_lowd_df.iloc[i,:], closest_highd_df.iloc[i,:]).correlation for i in closest_lowd_df.index]
        rank_order_cor_score = np.mean(rank_order_cor)
        
        # if missing value, make it -1.0
        if pd.isnull(rank_order_cor_score):
            rank_order_cor_score = -1.0
        
    # lowest value for rank-order corr if error
    except ValueError:
        rank_order_cor_score = -1.0
    
    # organizing parameters to return
    params = {'perplexity': perplexity,
              'early_exaggeration': early_exaggeration,
              'learning_rate': learning_rate,
              'n_iter': n_iter,
              'angle': angle,
              'pca_dims': pca_dims,
              'whitening_flag': whitening_flag}
    
    # adding target variable to embedded data
    embedded_data.loc[:,'target'] = target_var
    
    # printing results
    print 'KL divergence:', KL_div, '| AMI score:', AMI_score
    print 'AMI target-highd:', AMI_target_highd, '| AMI target-lowd:', AMI_target_lowd
    print 'Rank-order correlation:', rank_order_cor_score
    print 'Parameters:', params
    print ' '

    # returning values
    return KL_div, AMI_score, AMI_target_highd, AMI_target_lowd, rank_order_cor_score, embedded_data, params

In [None]:
# testing the function #

# let us use coil-20 dataset on baseline parameters
test_df = pd.read_csv('data/final/coil-20.csv')

# removing target variable
test_target = test_df['TARGET']

# running target function
kl_div, ami, ami_th, ami_tl, spearman, embed, params = optim_target(test_df, n_runs=1, **space_base)

In [None]:
# let us study the result #
sns.lmplot(x='x', y='y', hue='target', data=embed, fit_reg=False, size=6, aspect=1.6); plt.title('Test of target function');

## 3. Wrappers

We need to create wrappers for the target function, so it can work with the optimization packages we chose.

In [None]:
# wrapper for hyperopt #

# hyperopt minimizes functions, so our target value is set
class hp_wrapper:
    
    def __init__(self, data, save_path=None, n_runs=5):
        self.data = data
        self.save_path = save_path
        self.n_runs = n_runs
    
    def target(self, space):
        
        # pre-processing space
        space['pca_dims'] = int(space['pca_dims'])
        space['n_iter'] = int(space['n_iter'])
        
        # trying to ignore errors
        try:        
            # running target function
            kl_div, ami, ami_th, ami_tl, spearman, embed, params = optim_target(self.data, n_runs=self.n_runs, **space)    

            # if we want to save
            if not self.save_path == None:

                # creating path if necessary
                if not os.path.exists(self.save_path):
                    os.makedirs(self.save_path)

                # save name of the plot
                save_name = str(max([int(e.split('.')[0]) for e in os.listdir(self.save_path)]+[-1]) + 1) + '.png'

                # title of the plot
                plot_title = 'KL divergence: {0:.3f} | AMI score: {1:.3f} | AMI target-highd:'.format(kl_div, ami) + \
                             '{0:.3f} | AMI target-lowd: {1:.3f} | Rank-order correlation: {2:.3f}'.format(ami_th, ami_tl, spearman)

                # subtitle showing parameters
                subtitle = '{}'.format(params)

                # creating plot
                fig = sns.lmplot(x='x', y='y', hue='target', data=embed, fit_reg=False, size=9, aspect=1.6, legend=False); 
                plt.title(plot_title); plt.xlabel(subtitle)

                # saving
                fig.savefig(os.path.join(self.save_path,save_name))
                
                # do not show plot on jupyter
                plt.close()

            # a dict with 'loss' and 'status' is required
            return {'loss': kl_div,
                    'status': STATUS_OK,
                    'parameters': params,
                    'embedding': embed}
        
        # catching exception
        except Exception as e:
            print 'An Error occurred:', e
            
            # a dict with 'loss' and 'status' is required
            return {'loss': 1e9,
                    'status': 'fail',
                    'parameters': space,
                    'embedding': 'error'}


In [None]:
# wrapper for BayesianOptimization #

# we need to coerce some variables to int, as the gaussian process regression only works with floats
class bo_wrapper:
    
    def __init__(self, data, save_path=None, n_runs=5):
        self.data = data
        self.save_path = save_path
        self.n_runs = n_runs
        
    def target(self, perplexity, early_exaggeration, learning_rate, n_iter, angle, pca_dims, whitening_flag):
    
        # pre-processing space
        pca_dims = int(pca_dims)
        n_iter = int(n_iter)

        # proxy for whitening flag
        if whitening_flag < 0.50:
            whitening_flag = False
        else:
            whitening_flag = True
        
        # hadling errors
        try:
    
            # running target function
            kl_div, ami, ami_th, ami_tl, spearman, embed, params = optim_target(self.data, perplexity, early_exaggeration, 
                                                                                learning_rate, n_iter, angle, pca_dims, 
                                                                                whitening_flag, n_runs=self.n_runs)
            # if we want to save
            if not self.save_path == None:

                # creating path if necessary
                if not os.path.exists(self.save_path):
                    os.makedirs(self.save_path)

                # save name of the plot
                save_name = str(max([int(e.split('.')[0]) for e in os.listdir(self.save_path)]+[-1]) + 1) + '.png'

                # title of the plot
                plot_title = 'KL divergence: {0:.3f} | AMI score: {1:.3f} | AMI target-highd:'.format(kl_div, ami) + \
                             '{0:.3f} | AMI target-lowd: {1:.3f} | Rank-order correlation: {2:.3f}'.format(ami_th, ami_tl, spearman)

                # subtitle showing parameters
                subtitle = '{}'.format(params)

                # creating plot
                fig = sns.lmplot(x='x', y='y', hue='target', data=embed, fit_reg=False, size=9, aspect=1.6, legend=False); 
                plt.title(plot_title); plt.xlabel(subtitle)

                # saving
                fig.savefig(os.path.join(self.save_path,save_name))
                
                # do not show plot on jupyter
                plt.close()

            # retuning target value: negative because BO maximizes functions
            return -kl_div
        
        # catching exception
        except Exception as e:
            print 'An Error occurred:', e
            
            # a dict with 'loss' and 'status' is required
            return 1e9

In [None]:
# testing hyperopt wrapper
opt_task = hp_wrapper(test_df, save_path='vis/kl_div_optim2/all_plots/test_figs', n_runs=1)
opt_task.target(space_base)['loss']

In [None]:
# testing bayes_opt wrapper
opt_task = bo_wrapper(test_df, save_path='vis/kl_div_optim2/all_plots/test_figs', n_runs=1)
opt_task.target(**space_base)

## 4. Optimization Tasks

Let us now create and execute our optimization tasks. We need to run 3 optimization procedures: Random Search, TPE and Gaussian Processes for 6 datasets. Let us divide our analysis by dataset.

In [None]:
# let us define an experiment dict #
# and index the results by dataset and optimization method #
# e.g. exp_dict['well_sep']['random_search'] refers to the results #
# of random search no the well-separated clusters data #

# defining dict
experiment_dict = dict()

# initializing dict
for dataset in ['well_sep', 'well_sep_noise', 'gaussian_noise', 'topology', 'coil_20', 'olivetti']:
    for optim in ['rand', 'tpe', 'gp']:
        try:
            experiment_dict[dataset][optim] = dict()
        except KeyError:
            experiment_dict[dataset] = dict()
            experiment_dict[dataset][optim] = dict()
            
# checking the result
experiment_dict

### 4.1 Well-separated clusters

Set of 8 well-separated gaussian blobs in a 100-dimensional space.

In [None]:
# reading the data
test_df = pd.read_csv('data/final/well-sep.csv')

In [None]:
# random search #

# initializing wrapper - hyperopt
opt_task_hp = hp_wrapper(test_df, save_path='vis/kl_div_optim2/all_plots/well_sep_rand')

# trials object stores the evaluations
trials_rand = Trials()

# using the fmin function from hyperopt
best = fmin(fn=opt_task_hp.target, algo=rand.suggest, space=space_hp, max_evals=50, trials=trials_rand)

# storing the results
experiment_dict['well_sep']['rand'] = [trials_rand.trials[i]['result'] for i in range(len(trials_rand.trials))]

In [None]:
# TPE #

# initializing wrapper - hyperopt
opt_task_hp = hp_wrapper(test_df, save_path='vis/kl_div_optim2/all_plots/well_sep_tpe')

# trials object stores the evaluations
trials_tpe = Trials()

# using the fmin function from hyperopt
best = fmin(fn=opt_task_hp.target, algo=tpe.suggest, space=space_hp, max_evals=50, trials=trials_tpe)

# storing the results
experiment_dict['well_sep']['tpe'] = [trials_tpe.trials[i]['result'] for i in range(len(trials_tpe.trials))]

In [None]:
# gaussian processes #

# initializing wrapper - BayesianOptimazion
opt_task_bo = bo_wrapper(test_df, save_path='vis/kl_div_optim2/all_plots/well_sep_gp')

# defining optimization object
bo = BayesianOptimization(opt_task_bo.target, space_bo, verbose=0)

# optimizing
bo.maximize(init_points=10, n_iter=40, acq='ucb', kappa=10)

# writing results
experiment_dict['well_sep']['gp'] = bo.res['all']

In [None]:
# computing figures of the optimization process #

# function to transform the experiment dict into a data frame - hyperopt #
def exp_dict_to_df_hp(exp_dict):

    # dictionary to story intermediate values
    temp_dict = dict()

    # for each experiment
    for dict_entry in exp_dict:

        # add parameters to dict
        for key in dict_entry['parameters'].keys():
            try:
                temp_dict[key].append(dict_entry['parameters'][key])
            except KeyError:
                temp_dict[key] = [dict_entry['parameters'][key]]

        # add loss to dict
        try:
            temp_dict['loss'].append(dict_entry['loss'])
        except KeyError:
            temp_dict['loss'] = [dict_entry['loss']]
    
    # return a data frame
    return pd.DataFrame(temp_dict)

# function to transform the experiment dict into a data frame - bayesian optim #
def exp_dict_to_df_bo(exp_dict):

    # dictionary to story intermediate values
    temp_dict = dict()

    # for each experiment
    for dict_entry in exp_dict['params']:

        # add parameters to dict
        for key in dict_entry.keys():
            try:
                temp_dict[key].append(dict_entry[key])
            except KeyError:
                temp_dict[key] = [dict_entry[key]]
    
    # adding loss
    temp_dict['loss'] = exp_dict['values']
    
    # correcting whitening_flag
    temp_dict['whitening_flag'] = [False if e < 0.50 else True for e in temp_dict['whitening_flag']]
    
    # return a data frame
    return pd.DataFrame(temp_dict)

# random search #

# results data frame
rand_res_df = exp_dict_to_df_hp(experiment_dict['well_sep']['rand'])

# plotting hyperparameter importance
g = sns.PairGrid(rand_res_df, hue='loss');
g.map_diag(plt.hist);
g.map_offdiag(plt.scatter);
g.savefig('vis/kl_div_optim2/hyperparam_maps/well_sep_rand.pdf');

# plotting loss function evolution
fig = plt.figure(figsize=[16,9]); plt.step(range(len(rand_res_df['loss'])), rand_res_df['loss'], where='mid'); 
plt.title('Loss over rounds (Random Search)'); plt.xlabel('Round'); plt.ylabel('Loss'); 
fig.savefig('vis/kl_div_optim2/loss_over_rounds/well_sep_rand.pdf');

# tpe #

# results data frame
tpe_res_df = exp_dict_to_df_hp(experiment_dict['well_sep']['tpe'])

# plotting hyperparameter importance
g = sns.PairGrid(tpe_res_df, hue='loss');
g.map_diag(plt.hist);
g.map_offdiag(plt.scatter);
g.savefig('vis/kl_div_optim2/hyperparam_maps/well_sep_tpe.pdf');

# plotting loss function evolution
fig = plt.figure(figsize=[16,9]); plt.step(range(len(tpe_res_df['loss'])), tpe_res_df['loss'], where='mid'); 
plt.title('Loss over rounds (TPE)'); plt.xlabel('Round'); plt.ylabel('Loss'); 
fig.savefig('vis/kl_div_optim2/loss_over_rounds/well_sep_tpe.pdf');

# gaussian processes #

# results data frame
gp_res_df = exp_dict_to_df_bo(experiment_dict['well_sep']['gp'])

# plotting hyperparameter importance
g = sns.PairGrid(gp_res_df, hue='loss');
g.map_diag(plt.hist);
g.map_offdiag(plt.scatter);
g.savefig('vis/kl_div_optim2/hyperparam_maps/well_sep_gp.pdf');

# plotting loss function evolution
fig = plt.figure(figsize=[16,9]); plt.step(range(len(gp_res_df['loss'])), -gp_res_df['loss'], where='mid'); 
plt.title('Loss over rounds (GP)'); plt.xlabel('Round'); plt.ylabel('Loss'); 
fig.savefig('vis/kl_div_optim2/loss_over_rounds/well_sep_gp.pdf');

In [None]:
# showing best embedded spaces #
print "Random Search best:", [i for i,e in enumerate(experiment_dict['well_sep']['rand']) if e['loss'] == np.min([e['loss'] for e in experiment_dict['well_sep']['rand']])]
print "TPE best:", [i for i,e in enumerate(experiment_dict['well_sep']['tpe']) if e['loss'] == np.min([e['loss'] for e in experiment_dict['well_sep']['tpe']])]
print "GP best:", [i for i,e in enumerate(experiment_dict['well_sep']['gp']['values']) if e == np.max(experiment_dict['well_sep']['gp']['values'])]

### 4.2 Well-separated clusters with noise

Set of 8 well-separated gaussian blobs in a 100-dimensional space under uniform noise.

In [None]:
# reading the data
test_df = pd.read_csv('data/final/well-sep-noise.csv')

In [None]:
# removing target variable
test_target = test_df['TARGET']

In [None]:
# random search #

# initializing wrapper - hyperopt
opt_task_hp = hp_wrapper(test_df, save_path='vis/kl_div_optim2/all_plots/well_sep_noise_rand')

# trials object stores the evaluations
trials_rand = Trials()

# using the fmin function from hyperopt
best = fmin(fn=opt_task_hp.target, algo=rand.suggest, space=space_hp, max_evals=50, trials=trials_rand)

# storing the results
experiment_dict['well_sep_noise']['rand'] = [trials_rand.trials[i]['result'] for i in range(len(trials_rand.trials))]

In [None]:
# TPE #

# initializing wrapper - hyperopt
opt_task_hp = hp_wrapper(test_df, save_path='vis/kl_div_optim2/all_plots/well_sep_noise_tpe')

# trials object stores the evaluations
trials_tpe = Trials()

# using the fmin function from hyperopt
best = fmin(fn=opt_task_hp.target, algo=tpe.suggest, space=space_hp, max_evals=50, trials=trials_tpe)

# storing the results
experiment_dict['well_sep_noise']['tpe'] = [trials_tpe.trials[i]['result'] for i in range(len(trials_tpe.trials))]

In [None]:
# gaussian processes #

# initializing wrapper - BayesianOptimazion
opt_task_bo = bo_wrapper(test_df, save_path='vis/kl_div_optim2/all_plots/well_sep_noise_gp')

# defining optimization object
bo = BayesianOptimization(opt_task_bo.target, space_bo, verbose=0)

# optimizing
bo.maximize(init_points=10, n_iter=40, acq='ucb', kappa=10)

# writing results
experiment_dict['well_sep_noise']['gp'] = bo.res['all']

In [None]:
# computing figures of the optimization process #

# function to transform the experiment dict into a data frame - hyperopt #
def exp_dict_to_df_hp(exp_dict):

    # dictionary to story intermediate values
    temp_dict = dict()

    # for each experiment
    for dict_entry in exp_dict:

        # add parameters to dict
        for key in dict_entry['parameters'].keys():
            try:
                temp_dict[key].append(dict_entry['parameters'][key])
            except KeyError:
                temp_dict[key] = [dict_entry['parameters'][key]]

        # add loss to dict
        try:
            temp_dict['loss'].append(dict_entry['loss'])
        except KeyError:
            temp_dict['loss'] = [dict_entry['loss']]
    
    # return a data frame
    return pd.DataFrame(temp_dict)

# function to transform the experiment dict into a data frame - bayesian optim #
def exp_dict_to_df_bo(exp_dict):

    # dictionary to story intermediate values
    temp_dict = dict()

    # for each experiment
    for dict_entry in exp_dict['params']:

        # add parameters to dict
        for key in dict_entry.keys():
            try:
                temp_dict[key].append(dict_entry[key])
            except KeyError:
                temp_dict[key] = [dict_entry[key]]
    
    # adding loss
    temp_dict['loss'] = exp_dict['values']
    
    # correcting whitening_flag
    temp_dict['whitening_flag'] = [False if e < 0.50 else True for e in temp_dict['whitening_flag']]
    
    # return a data frame
    return pd.DataFrame(temp_dict)

# random search #

# results data frame
rand_res_df = exp_dict_to_df_hp(experiment_dict['well_sep_noise']['rand'])

# plotting hyperparameter importance
g = sns.PairGrid(rand_res_df, hue='loss');
g.map_diag(plt.hist);
g.map_offdiag(plt.scatter);
g.savefig('vis/kl_div_optim2/hyperparam_maps/well_sep_noise_rand.pdf');

# plotting loss function evolution
fig = plt.figure(figsize=[16,9]); plt.step(range(len(rand_res_df['loss'])), rand_res_df['loss'], where='mid'); 
plt.title('Loss over rounds (Random Search)'); plt.xlabel('Round'); plt.ylabel('Loss'); 
fig.savefig('vis/kl_div_optim2/loss_over_rounds/well_sep_noise_rand.pdf');

# tpe #

# results data frame
tpe_res_df = exp_dict_to_df_hp(experiment_dict['well_sep_noise']['tpe'])

# plotting hyperparameter importance
g = sns.PairGrid(tpe_res_df, hue='loss');
g.map_diag(plt.hist);
g.map_offdiag(plt.scatter);
g.savefig('vis/kl_div_optim2/hyperparam_maps/well_sep_noise_tpe.pdf');

# plotting loss function evolution
fig = plt.figure(figsize=[16,9]); plt.step(range(len(tpe_res_df['loss'])), tpe_res_df['loss'], where='mid'); 
plt.title('Loss over rounds (TPE)'); plt.xlabel('Round'); plt.ylabel('Loss'); 
fig.savefig('vis/kl_div_optim2/loss_over_rounds/well_sep_noise_tpe.pdf');

# gaussian processes #

# results data frame
gp_res_df = exp_dict_to_df_bo(experiment_dict['well_sep_noise']['gp'])

# plotting hyperparameter importance
g = sns.PairGrid(gp_res_df, hue='loss');
g.map_diag(plt.hist);
g.map_offdiag(plt.scatter);
g.savefig('vis/kl_div_optim2/hyperparam_maps/well_sep_noise_gp.pdf');

# plotting loss function evolution
fig = plt.figure(figsize=[16,9]); plt.step(range(len(gp_res_df['loss'])), -gp_res_df['loss'], where='mid'); 
plt.title('Loss over rounds (GP)'); plt.xlabel('Round'); plt.ylabel('Loss'); 
fig.savefig('vis/kl_div_optim2/loss_over_rounds/well_sep_noise_gp.pdf');

In [None]:
# showing best embedded spaces #
print "Random Search best:", [i for i,e in enumerate(experiment_dict['well_sep_noise']['rand']) if e['loss'] == np.min([e['loss'] for e in experiment_dict['well_sep_noise']['rand']])]
print "TPE best:", [i for i,e in enumerate(experiment_dict['well_sep_noise']['tpe']) if e['loss'] == np.min([e['loss'] for e in experiment_dict['well_sep_noise']['tpe']])]
print "GP best:", [i for i,e in enumerate(experiment_dict['well_sep_noise']['gp']['values']) if e == np.max(experiment_dict['well_sep_noise']['gp']['values'])]

### 4.3 200-dimensional gaussian noise

Gaussian noise centered on origin. No clusters nor significant patterns to show.

In [None]:
# reading the data
test_df = pd.read_csv('data/final/gaussian-noise.csv')

In [None]:
# removing target variable
test_target = test_df['TARGET']

In [None]:
# random search #

# initializing wrapper - hyperopt
opt_task_hp = hp_wrapper(test_df, save_path='vis/kl_div_optim2/all_plots/gaussian_noise_rand')

# trials object stores the evaluations
trials_rand = Trials()

# using the fmin function from hyperopt
best = fmin(fn=opt_task_hp.target, algo=rand.suggest, space=space_hp, max_evals=50, trials=trials_rand)

# storing the results
experiment_dict['gaussian_noise']['rand'] = [trials_rand.trials[i]['result'] for i in range(len(trials_rand.trials))]

In [None]:
# TPE #

# initializing wrapper - hyperopt
opt_task_hp = hp_wrapper(test_df, save_path='vis/kl_div_optim2/all_plots/gaussian_noise_tpe')

# trials object stores the evaluations
trials_tpe = Trials()

# using the fmin function from hyperopt
best = fmin(fn=opt_task_hp.target, algo=tpe.suggest, space=space_hp, max_evals=50, trials=trials_tpe)

# storing the results
experiment_dict['gaussian_noise']['tpe'] = [trials_tpe.trials[i]['result'] for i in range(len(trials_tpe.trials))]

In [None]:
# gaussian processes #

# initializing wrapper - BayesianOptimazion
opt_task_bo = bo_wrapper(test_df, save_path='vis/kl_div_optim2/all_plots/gaussian_noise_gp')

# defining optimization object
bo = BayesianOptimization(opt_task_bo.target, space_bo, verbose=0)

# optimizing
bo.maximize(init_points=10, n_iter=40, acq='ucb', kappa=10)

# writing results
experiment_dict['gaussian_noise']['gp'] = bo.res['all']

In [None]:
# computing figures of the optimization process #

# function to transform the experiment dict into a data frame - hyperopt #
def exp_dict_to_df_hp(exp_dict):

    # dictionary to story intermediate values
    temp_dict = dict()

    # for each experiment
    for dict_entry in exp_dict:

        # add parameters to dict
        for key in dict_entry['parameters'].keys():
            try:
                temp_dict[key].append(dict_entry['parameters'][key])
            except KeyError:
                temp_dict[key] = [dict_entry['parameters'][key]]

        # add loss to dict
        try:
            temp_dict['loss'].append(dict_entry['loss'])
        except KeyError:
            temp_dict['loss'] = [dict_entry['loss']]
    
    # return a data frame
    return pd.DataFrame(temp_dict)

# function to transform the experiment dict into a data frame - bayesian optim #
def exp_dict_to_df_bo(exp_dict):

    # dictionary to story intermediate values
    temp_dict = dict()

    # for each experiment
    for dict_entry in exp_dict['params']:

        # add parameters to dict
        for key in dict_entry.keys():
            try:
                temp_dict[key].append(dict_entry[key])
            except KeyError:
                temp_dict[key] = [dict_entry[key]]
    
    # adding loss
    temp_dict['loss'] = exp_dict['values']
    
    # correcting whitening_flag
    temp_dict['whitening_flag'] = [False if e < 0.50 else True for e in temp_dict['whitening_flag']]
    
    # return a data frame
    return pd.DataFrame(temp_dict)

# random search #

# results data frame
rand_res_df = exp_dict_to_df_hp(experiment_dict['gaussian_noise']['rand'])

# plotting hyperparameter importance
g = sns.PairGrid(rand_res_df, hue='loss');
g.map_diag(plt.hist);
g.map_offdiag(plt.scatter);
g.savefig('vis/kl_div_optim2/hyperparam_maps/gaussian_noise_rand.pdf');

# plotting loss function evolution
fig = plt.figure(figsize=[16,9]); plt.step(range(len(rand_res_df['loss'])), rand_res_df['loss'], where='mid'); 
plt.title('Loss over rounds (Random Search)'); plt.xlabel('Round'); plt.ylabel('Loss'); 
fig.savefig('vis/kl_div_optim2/loss_over_rounds/gaussian_noise_rand.pdf');

# tpe #

# results data frame
tpe_res_df = exp_dict_to_df_hp(experiment_dict['gaussian_noise']['tpe'])

# plotting hyperparameter importance
g = sns.PairGrid(tpe_res_df, hue='loss');
g.map_diag(plt.hist);
g.map_offdiag(plt.scatter);
g.savefig('vis/kl_div_optim2/hyperparam_maps/gaussian_noise_tpe.pdf');

# plotting loss function evolution
fig = plt.figure(figsize=[16,9]); plt.step(range(len(tpe_res_df['loss'])), tpe_res_df['loss'], where='mid'); 
plt.title('Loss over rounds (TPE)'); plt.xlabel('Round'); plt.ylabel('Loss'); 
fig.savefig('vis/kl_div_optim2/loss_over_rounds/gaussian_noise_tpe.pdf');

# gaussian processes #

# results data frame
gp_res_df = exp_dict_to_df_bo(experiment_dict['gaussian_noise']['gp'])

# plotting hyperparameter importance
g = sns.PairGrid(gp_res_df, hue='loss');
g.map_diag(plt.hist);
g.map_offdiag(plt.scatter);
g.savefig('vis/kl_div_optim2/hyperparam_maps/gaussian_noise_gp.pdf');

# plotting loss function evolution
fig = plt.figure(figsize=[16,9]); plt.step(range(len(gp_res_df['loss'])), -gp_res_df['loss'], where='mid'); 
plt.title('Loss over rounds (GP)'); plt.xlabel('Round'); plt.ylabel('Loss'); 
fig.savefig('vis/kl_div_optim2/loss_over_rounds/gaussian_noise_gp.pdf');

In [None]:
# showing best embedded spaces #
print "Random Search best:", [i for i,e in enumerate(experiment_dict['gaussian_noise']['rand']) if e['loss'] == np.min([e['loss'] for e in experiment_dict['gaussian_noise']['rand']])]
print "TPE best:", [i for i,e in enumerate(experiment_dict['gaussian_noise']['tpe']) if e['loss'] == np.min([e['loss'] for e in experiment_dict['gaussian_noise']['tpe']])]
print "GP best:", [i for i,e in enumerate(experiment_dict['gaussian_noise']['gp']['values']) if e == np.max(experiment_dict['gaussian_noise']['gp']['values'])]

### 4.4 Two gaussian distributions with different densities

Two gaussian distributions centered on the origin, but with different standard deviations.

In [None]:
# reading the data
test_df = pd.read_csv('data/final/topology.csv')

In [None]:
# removing target variable
test_target = test_df['TARGET']

In [None]:
# random search #

# initializing wrapper - hyperopt
opt_task_hp = hp_wrapper(test_df, save_path='vis/kl_div_optim2/all_plots/topology_rand')

# trials object stores the evaluations
trials_rand = Trials()

# using the fmin function from hyperopt
best = fmin(fn=opt_task_hp.target, algo=rand.suggest, space=space_hp, max_evals=50, trials=trials_rand)

# storing the results
experiment_dict['topology']['rand'] = [trials_rand.trials[i]['result'] for i in range(len(trials_rand.trials))]

In [None]:
# TPE #

# initializing wrapper - hyperopt
opt_task_hp = hp_wrapper(test_df, save_path='vis/kl_div_optim2/all_plots/topology_tpe')

# trials object stores the evaluations
trials_tpe = Trials()

# using the fmin function from hyperopt
best = fmin(fn=opt_task_hp.target, algo=tpe.suggest, space=space_hp, max_evals=50, trials=trials_tpe)

# storing the results
experiment_dict['topology']['tpe'] = [trials_tpe.trials[i]['result'] for i in range(len(trials_tpe.trials))]

In [None]:
# gaussian processes #

# initializing wrapper - BayesianOptimazion
opt_task_bo = bo_wrapper(test_df, save_path='vis/kl_div_optim2/all_plots/topology_gp')

# defining optimization object
bo = BayesianOptimization(opt_task_bo.target, space_bo, verbose=0)

# optimizing
bo.maximize(init_points=10, n_iter=40, acq='ucb', kappa=10)

# writing results
experiment_dict['topology']['gp'] = bo.res['all']

In [None]:
# computing figures of the optimization process #

# function to transform the experiment dict into a data frame - hyperopt #
def exp_dict_to_df_hp(exp_dict):

    # dictionary to story intermediate values
    temp_dict = dict()

    # for each experiment
    for dict_entry in exp_dict:

        # add parameters to dict
        for key in dict_entry['parameters'].keys():
            try:
                temp_dict[key].append(dict_entry['parameters'][key])
            except KeyError:
                temp_dict[key] = [dict_entry['parameters'][key]]

        # add loss to dict
        try:
            temp_dict['loss'].append(dict_entry['loss'])
        except KeyError:
            temp_dict['loss'] = [dict_entry['loss']]
    
    # return a data frame
    return pd.DataFrame(temp_dict)

# function to transform the experiment dict into a data frame - bayesian optim #
def exp_dict_to_df_bo(exp_dict):

    # dictionary to story intermediate values
    temp_dict = dict()

    # for each experiment
    for dict_entry in exp_dict['params']:

        # add parameters to dict
        for key in dict_entry.keys():
            try:
                temp_dict[key].append(dict_entry[key])
            except KeyError:
                temp_dict[key] = [dict_entry[key]]
    
    # adding loss
    temp_dict['loss'] = exp_dict['values']
    
    # correcting whitening_flag
    temp_dict['whitening_flag'] = [False if e < 0.50 else True for e in temp_dict['whitening_flag']]
    
    # return a data frame
    return pd.DataFrame(temp_dict)

# random search #

# results data frame
rand_res_df = exp_dict_to_df_hp(experiment_dict['topology']['rand'])

# plotting hyperparameter importance
g = sns.PairGrid(rand_res_df, hue='loss');
g.map_diag(plt.hist);
g.map_offdiag(plt.scatter);
g.savefig('vis/kl_div_optim2/hyperparam_maps/topology_rand.pdf');

# plotting loss function evolution
fig = plt.figure(figsize=[16,9]); plt.step(range(len(rand_res_df['loss'])), rand_res_df['loss'], where='mid'); 
plt.title('Loss over rounds (Random Search)'); plt.xlabel('Round'); plt.ylabel('Loss'); 
fig.savefig('vis/kl_div_optim2/loss_over_rounds/topology_rand.pdf');

# tpe #

# results data frame
tpe_res_df = exp_dict_to_df_hp(experiment_dict['topology']['tpe'])

# plotting hyperparameter importance
g = sns.PairGrid(tpe_res_df, hue='loss');
g.map_diag(plt.hist);
g.map_offdiag(plt.scatter);
g.savefig('vis/kl_div_optim2/hyperparam_maps/topology_tpe.pdf');

# plotting loss function evolution
fig = plt.figure(figsize=[16,9]); plt.step(range(len(tpe_res_df['loss'])), tpe_res_df['loss'], where='mid'); 
plt.title('Loss over rounds (TPE)'); plt.xlabel('Round'); plt.ylabel('Loss'); 
fig.savefig('vis/kl_div_optim2/loss_over_rounds/topology_tpe.pdf');

# gaussian processes #

# results data frame
gp_res_df = exp_dict_to_df_bo(experiment_dict['topology']['gp'])

# plotting hyperparameter importance
g = sns.PairGrid(gp_res_df, hue='loss');
g.map_diag(plt.hist);
g.map_offdiag(plt.scatter);
g.savefig('vis/kl_div_optim2/hyperparam_maps/topology_gp.pdf');

# plotting loss function evolution
fig = plt.figure(figsize=[16,9]); plt.step(range(len(gp_res_df['loss'])), -gp_res_df['loss'], where='mid'); 
plt.title('Loss over rounds (GP)'); plt.xlabel('Round'); plt.ylabel('Loss'); 
fig.savefig('vis/kl_div_optim2/loss_over_rounds/topology_gp.pdf');

In [None]:
# showing best embedded spaces #
print "Random Search best:", [i for i,e in enumerate(experiment_dict['topology']['rand']) if e['loss'] == np.min([e['loss'] for e in experiment_dict['topology']['rand']])]
print "TPE best:", [i for i,e in enumerate(experiment_dict['topology']['tpe']) if e['loss'] == np.min([e['loss'] for e in experiment_dict['topology']['tpe']])]
print "GP best:", [i for i,e in enumerate(experiment_dict['topology']['gp']['values']) if e == np.max(experiment_dict['topology']['gp']['values'])]

### 4.5 COIL-20

Images of rotated objects.

In [None]:
# reading the data
test_df = pd.read_csv('data/final/coil-20.csv')

In [None]:
# removing target variable
test_target = test_df['TARGET']

In [None]:
# random search #

# initializing wrapper - hyperopt
opt_task_hp = hp_wrapper(test_df, save_path='vis/kl_div_optim2/all_plots/coil_20_rand')

# trials object stores the evaluations
trials_rand = Trials()

# using the fmin function from hyperopt
best = fmin(fn=opt_task_hp.target, algo=rand.suggest, space=space_hp, max_evals=50, trials=trials_rand)

# storing the results
experiment_dict['coil_20']['rand'] = [trials_rand.trials[i]['result'] for i in range(len(trials_rand.trials))]

In [None]:
# TPE #

# initializing wrapper - hyperopt
opt_task_hp = hp_wrapper(test_df, save_path='vis/kl_div_optim2/all_plots/coil_20_tpe')

# trials object stores the evaluations
trials_tpe = Trials()

# using the fmin function from hyperopt
best = fmin(fn=opt_task_hp.target, algo=tpe.suggest, space=space_hp, max_evals=50, trials=trials_tpe)

# storing the results
experiment_dict['coil_20']['tpe'] = [trials_tpe.trials[i]['result'] for i in range(len(trials_tpe.trials))]

In [None]:
# gaussian processes #

# initializing wrapper - BayesianOptimazion
opt_task_bo = bo_wrapper(test_df, save_path='vis/kl_div_optim2/all_plots/coil_20_gp')

# defining optimization object
bo = BayesianOptimization(opt_task_bo.target, space_bo, verbose=0)

# optimizing
bo.maximize(init_points=10, n_iter=40, acq='ucb', kappa=10)

# writing results
experiment_dict['coil_20']['gp'] = bo.res['all']

In [None]:
# computing figures of the optimization process #

# function to transform the experiment dict into a data frame - hyperopt #
def exp_dict_to_df_hp(exp_dict):

    # dictionary to story intermediate values
    temp_dict = dict()

    # for each experiment
    for dict_entry in exp_dict:

        # add parameters to dict
        for key in dict_entry['parameters'].keys():
            try:
                temp_dict[key].append(dict_entry['parameters'][key])
            except KeyError:
                temp_dict[key] = [dict_entry['parameters'][key]]

        # add loss to dict
        try:
            temp_dict['loss'].append(dict_entry['loss'])
        except KeyError:
            temp_dict['loss'] = [dict_entry['loss']]
    
    # return a data frame
    return pd.DataFrame(temp_dict)

# function to transform the experiment dict into a data frame - bayesian optim #
def exp_dict_to_df_bo(exp_dict):

    # dictionary to story intermediate values
    temp_dict = dict()

    # for each experiment
    for dict_entry in exp_dict['params']:

        # add parameters to dict
        for key in dict_entry.keys():
            try:
                temp_dict[key].append(dict_entry[key])
            except KeyError:
                temp_dict[key] = [dict_entry[key]]
    
    # adding loss
    temp_dict['loss'] = exp_dict['values']
    
    # correcting whitening_flag
    temp_dict['whitening_flag'] = [False if e < 0.50 else True for e in temp_dict['whitening_flag']]
    
    # return a data frame
    return pd.DataFrame(temp_dict)

# random search #

# results data frame
rand_res_df = exp_dict_to_df_hp(experiment_dict['coil_20']['rand'])

# plotting hyperparameter importance
g = sns.PairGrid(rand_res_df, hue='loss');
g.map_diag(plt.hist);
g.map_offdiag(plt.scatter);
g.savefig('vis/kl_div_optim2/hyperparam_maps/coil_20_rand.pdf');

# plotting loss function evolution
fig = plt.figure(figsize=[16,9]); plt.step(range(len(rand_res_df['loss'])), rand_res_df['loss'], where='mid'); 
plt.title('Loss over rounds (Random Search)'); plt.xlabel('Round'); plt.ylabel('Loss'); 
fig.savefig('vis/kl_div_optim2/loss_over_rounds/coil_20_rand.pdf');

# tpe #

# results data frame
tpe_res_df = exp_dict_to_df_hp(experiment_dict['coil_20']['tpe'])

# plotting hyperparameter importance
g = sns.PairGrid(tpe_res_df, hue='loss');
g.map_diag(plt.hist);
g.map_offdiag(plt.scatter);
g.savefig('vis/kl_div_optim2/hyperparam_maps/coil_20_tpe.pdf');

# plotting loss function evolution
fig = plt.figure(figsize=[16,9]); plt.step(range(len(tpe_res_df['loss'])), tpe_res_df['loss'], where='mid'); 
plt.title('Loss over rounds (TPE)'); plt.xlabel('Round'); plt.ylabel('Loss'); 
fig.savefig('vis/kl_div_optim2/loss_over_rounds/coil_20_tpe.pdf');

# gaussian processes #

# results data frame
gp_res_df = exp_dict_to_df_bo(experiment_dict['coil_20']['gp'])

# plotting hyperparameter importance
g = sns.PairGrid(gp_res_df, hue='loss');
g.map_diag(plt.hist);
g.map_offdiag(plt.scatter);
g.savefig('vis/kl_div_optim2/hyperparam_maps/coil_20_gp.pdf');

# plotting loss function evolution
fig = plt.figure(figsize=[16,9]); plt.step(range(len(gp_res_df['loss'])), -gp_res_df['loss'], where='mid'); 
plt.title('Loss over rounds (GP)'); plt.xlabel('Round'); plt.ylabel('Loss'); 
fig.savefig('vis/kl_div_optim2/loss_over_rounds/coil_20_gp.pdf');

In [None]:
# showing best embedded spaces #
print "Random Search best:", [i for i,e in enumerate(experiment_dict['coil_20']['rand']) if e['loss'] == np.min([e['loss'] for e in experiment_dict['coil_20']['rand']])]
print "TPE best:", [i for i,e in enumerate(experiment_dict['coil_20']['tpe']) if e['loss'] == np.min([e['loss'] for e in experiment_dict['coil_20']['tpe']])]
print "GP best:", [i for i,e in enumerate(experiment_dict['coil_20']['gp']['values']) if e == np.max(experiment_dict['coil_20']['gp']['values'])]

### 4.6 Olivetti faces

Pictures of different people with small variations.

In [None]:
# reading the data
test_df = pd.read_csv('data/final/olivetti-faces.csv')

In [None]:
# removing target variable
test_target = test_df['TARGET']

In [None]:
# random search #

# initializing wrapper - hyperopt
opt_task_hp = hp_wrapper(test_df, save_path='vis/kl_div_optim2/all_plots/olivetti_rand')

# trials object stores the evaluations
trials_rand = Trials()

# using the fmin function from hyperopt
best = fmin(fn=opt_task_hp.target, algo=rand.suggest, space=space_hp, max_evals=50, trials=trials_rand)

# storing the results
experiment_dict['olivetti']['rand'] = [trials_rand.trials[i]['result'] for i in range(len(trials_rand.trials))]

In [None]:
# TPE #

# initializing wrapper - hyperopt
opt_task_hp = hp_wrapper(test_df, save_path='vis/kl_div_optim2/all_plots/olivetti_tpe')

# trials object stores the evaluations
trials_tpe = Trials()

# using the fmin function from hyperopt
best = fmin(fn=opt_task_hp.target, algo=tpe.suggest, space=space_hp, max_evals=50, trials=trials_tpe)

# storing the results
experiment_dict['olivetti']['tpe'] = [trials_tpe.trials[i]['result'] for i in range(len(trials_tpe.trials))]

In [None]:
# gaussian processes #

# initializing wrapper - BayesianOptimazion
opt_task_bo = bo_wrapper(test_df, save_path='vis/kl_div_optim2/all_plots/olivetti_gp')

# defining optimization object
bo = BayesianOptimization(opt_task_bo.target, space_bo, verbose=0)

# optimizing
bo.maximize(init_points=10, n_iter=40, acq='ucb', kappa=10)

# writing results
experiment_dict['olivetti']['gp'] = bo.res['all']

In [None]:
# computing figures of the optimization process #

# function to transform the experiment dict into a data frame - hyperopt #
def exp_dict_to_df_hp(exp_dict):

    # dictionary to story intermediate values
    temp_dict = dict()

    # for each experiment
    for dict_entry in exp_dict:

        # add parameters to dict
        for key in dict_entry['parameters'].keys():
            try:
                temp_dict[key].append(dict_entry['parameters'][key])
            except KeyError:
                temp_dict[key] = [dict_entry['parameters'][key]]

        # add loss to dict
        try:
            temp_dict['loss'].append(dict_entry['loss'])
        except KeyError:
            temp_dict['loss'] = [dict_entry['loss']]
    
    # return a data frame
    return pd.DataFrame(temp_dict)

# function to transform the experiment dict into a data frame - bayesian optim #
def exp_dict_to_df_bo(exp_dict):

    # dictionary to story intermediate values
    temp_dict = dict()

    # for each experiment
    for dict_entry in exp_dict['params']:

        # add parameters to dict
        for key in dict_entry.keys():
            try:
                temp_dict[key].append(dict_entry[key])
            except KeyError:
                temp_dict[key] = [dict_entry[key]]
    
    # adding loss
    temp_dict['loss'] = exp_dict['values']
    
    # correcting whitening_flag
    temp_dict['whitening_flag'] = [False if e < 0.50 else True for e in temp_dict['whitening_flag']]
    
    # return a data frame
    return pd.DataFrame(temp_dict)

# random search #

# results data frame
rand_res_df = exp_dict_to_df_hp(experiment_dict['olivetti']['rand'])

# plotting hyperparameter importance
g = sns.PairGrid(rand_res_df, hue='loss');
g.map_diag(plt.hist);
g.map_offdiag(plt.scatter);
g.savefig('vis/kl_div_optim2/hyperparam_maps/olivetti_rand.pdf');

# plotting loss function evolution
fig = plt.figure(figsize=[16,9]); plt.step(range(len(rand_res_df['loss'])), rand_res_df['loss'], where='mid'); 
plt.title('Loss over rounds (Random Search)'); plt.xlabel('Round'); plt.ylabel('Loss'); 
fig.savefig('vis/kl_div_optim2/loss_over_rounds/olivetti_rand.pdf');

# tpe #

# results data frame
tpe_res_df = exp_dict_to_df_hp(experiment_dict['olivetti']['tpe'])

# plotting hyperparameter importance
g = sns.PairGrid(tpe_res_df, hue='loss');
g.map_diag(plt.hist);
g.map_offdiag(plt.scatter);
g.savefig('vis/kl_div_optim2/hyperparam_maps/olivetti_tpe.pdf');

# plotting loss function evolution
fig = plt.figure(figsize=[16,9]); plt.step(range(len(tpe_res_df['loss'])), tpe_res_df['loss'], where='mid'); 
plt.title('Loss over rounds (TPE)'); plt.xlabel('Round'); plt.ylabel('Loss'); 
fig.savefig('vis/kl_div_optim2/loss_over_rounds/olivetti_tpe.pdf');

# gaussian processes #

# results data frame
gp_res_df = exp_dict_to_df_bo(experiment_dict['olivetti']['gp'])

# plotting hyperparameter importance
g = sns.PairGrid(gp_res_df, hue='loss');
g.map_diag(plt.hist);
g.map_offdiag(plt.scatter);
g.savefig('vis/kl_div_optim2/hyperparam_maps/olivetti_gp.pdf');

# plotting loss function evolution
fig = plt.figure(figsize=[16,9]); plt.step(range(len(gp_res_df['loss'])), -gp_res_df['loss'], where='mid'); 
plt.title('Loss over rounds (GP)'); plt.xlabel('Round'); plt.ylabel('Loss'); 
fig.savefig('vis/kl_div_optim2/loss_over_rounds/olivetti_gp.pdf');

In [None]:
# showing best embedded spaces #
print "Random Search best:", [i for i,e in enumerate(experiment_dict['olivetti']['rand']) if e['loss'] == np.min([e['loss'] for e in experiment_dict['olivetti']['rand']])]
print "TPE best:", [i for i,e in enumerate(experiment_dict['olivetti']['tpe']) if e['loss'] == np.min([e['loss'] for e in experiment_dict['olivetti']['tpe']])]
print "GP best:", [i for i,e in enumerate(experiment_dict['olivetti']['gp']['values']) if e == np.max(experiment_dict['olivetti']['gp']['values'])]

## Final Data & Plots

Let us wrap the data up for publication, with includes the generation of new plots.

In [None]:
# computing figures of the optimization process #

# function to transform the experiment dict into a data frame - hyperopt #
def exp_dict_to_df_hp(exp_dict):

    # dictionary to story intermediate values
    temp_dict = dict()

    # for each experiment
    for dict_entry in exp_dict:

        # add parameters to dict
        for key in dict_entry['parameters'].keys():
            try:
                temp_dict[key].append(dict_entry['parameters'][key])
            except KeyError:
                temp_dict[key] = [dict_entry['parameters'][key]]

        # add loss to dict
        try:
            temp_dict['loss'].append(dict_entry['loss'])
        except KeyError:
            temp_dict['loss'] = [dict_entry['loss']]
    
    # return a data frame
    return pd.DataFrame(temp_dict)

# function to transform the experiment dict into a data frame - bayesian optim #
def exp_dict_to_df_bo(exp_dict):

    # dictionary to story intermediate values
    temp_dict = dict()

    # for each experiment
    for dict_entry in exp_dict['params']:

        # add parameters to dict
        for key in dict_entry.keys():
            try:
                temp_dict[key].append(dict_entry[key])
            except KeyError:
                temp_dict[key] = [dict_entry[key]]
    
    # adding loss
    temp_dict['loss'] = -1*np.array(exp_dict['values'])
    
    # correcting whitening_flag
    temp_dict['whitening_flag'] = [False if e < 0.50 else True for e in temp_dict['whitening_flag']]
    
    # return a data frame
    return pd.DataFrame(temp_dict)

In [None]:
# opening figure
fig = plt.figure(figsize=[16,20], dpi=300)

# count variable
count = 0

# x-axis limits
x_limits = {'well_sep': [0.4,2.2], 'well_sep_noise': [0.4,1.6], 'gaussian_noise': [1.4,2.1],
            'topology': [1.2,1.9], 'coil_20': [0.5, 3.0], 'olivetti': [0.5, 1.2]}
y_limits = {'well_sep': [0.0, 12.0], 'well_sep_noise': [0.0, 14.0], 'gaussian_noise': [0.0,16.0],
            'topology': [0.0,10.0], 'coil_20': [0.0, 16.0], 'olivetti': [0.0, 9.0]}
# realizing joined graphs
for dataset in ['well_sep', 'well_sep_noise', 'gaussian_noise', 'topology', 'coil_20', 'olivetti']:
        
    for optim in ['rand', 'tpe', 'gp']:
        
        # choosing right function 
        if optim == 'gp':

            # results data frame
            res_df = exp_dict_to_df_bo(experiment_dict[dataset][optim])            
            
        # if not gp, use hp data treting function
        else:
            
            # results data frame
            res_df = exp_dict_to_df_hp(experiment_dict[dataset][optim])

        # plotting loss function evolution
        plt.subplot(6, 3, count + 1); sns.distplot(res_df.loc[res_df['loss'] < 100,'loss'], label=optim, bins=30, kde=False); plt.tight_layout()
        plt.title(u'Resultados de otimização [{}-{}]'.format(dataset, optim)); plt.xlabel('Resultado'); plt.ylabel(u'Frequência'); 
        plt.xlim(x_limits[dataset]); plt.ylim(y_limits[dataset])

        # adding to count
        count += 1

In [None]:
fig.savefig('vis/kl_div_optim2/loss_over_rounds/kl-div-loss-histo.pdf');

In [None]:
# max, min, mean per optimization round table
summary_df = pd.DataFrame()

# realizing joined graphs
for dataset in ['well_sep', 'well_sep_noise', 'gaussian_noise', 'topology', 'coil_20', 'olivetti']:
        
    for optim in ['rand', 'tpe', 'gp']:
        
        # choosing right function 
        if optim == 'gp':

            # results data frame
            res_df = exp_dict_to_df_bo(experiment_dict[dataset][optim])            
            
        # if not gp, use hp data treting function
        else:
            
            # results data frame
            res_df = exp_dict_to_df_hp(experiment_dict[dataset][optim])

        # temporary df
        temp_df = pd.DataFrame()
        temp_df.loc[:,'dataset'] = [dataset]
        temp_df.loc[:,'optim'] = [optim]
        temp_df.loc[:,'mean_loss'] = [res_df['loss'].mean()]
        temp_df.loc[:,'max_loss'] = [res_df['loss'].max()]
        temp_df.loc[:,'min_loss'] = [res_df['loss'].min()]
        
        # accumulating
        summary_df = pd.concat([summary_df, temp_df])

In [None]:
summary_df.drop('dataset', axis=1)

In [None]:
# finding the distributions of parameters for the best experiments #

# opening figure
f = plt.figure(figsize=[9,9], dpi=300)

# accummulation dict
top5_dict = {}

# realizing joined graphs
for i, dataset in enumerate(['well_sep', 'well_sep_noise', 'gaussian_noise', 'topology', 'coil_20', 'olivetti']):
    
    # df for storing all the best experiments
    dataset_res_df = pd.DataFrame()
        
    for optim in ['rand', 'tpe', 'gp']:
        
        # choosing right function 
        if optim == 'gp':

            # results data frame
            res_df = exp_dict_to_df_bo(experiment_dict[dataset][optim]); res_df.loc[:,'optim'] = optim            
            
        # if not gp, use hp data treting function
        else:
            
            # results data frame
            res_df = exp_dict_to_df_hp(experiment_dict[dataset][optim]); res_df.loc[:,'optim'] = optim 
        
        # updating 
        dataset_res_df = pd.concat([dataset_res_df, res_df])
    
    # filtering top 5 trials
    top5 = dataset_res_df.sort_values('loss', ascending=True).head(5)
    top5.loc[:,'whitening_flag'] = top5.loc[:,'whitening_flag'].astype(str)
    
    # accumulating
    top5_dict[dataset] = top5.loc[:, ['loss','angle','early_exaggeration','learning_rate','n_iter','pca_dims','perplexity','whitening_flag','optim']]

In [None]:
top5_dict['olivetti'].to_latex('results_temp.txt', index=False)

In [None]:
# função para salvar dicionário de experimentos
import pickle
def save_obj(obj, name):
    with open(name + '.pkl', 'wb') as f:
        pickle.dump(obj, f, 0)
        
# salvando dicionário de resultados
save_obj(experiment_dict, 'trials/kl-div2')

In [None]:
def load_obj(name ):
    with open(name + '.pkl', 'rb') as f:
        return pickle.load(f)