# Gridsearch results aggregation

### Content

+ [1. Notebook description](#1.-Notebook-Description)
+ [2. Aggregate and Export](#2.-aggregate-and-export)

---

# 1. Notebook Description

This notebook is used to import individual gridsearchcv result files and aggregate them into a large dataframe.
Afterwards this dataframe will be exported to `csv`, in order to easily import the data with the plotting notebook, which uses an R Kernel.

---

**Imports:**

In [None]:
from digits.utils import dotdict, getoutname, gettypename

import pandas as pd
import glob
from itertools import combinations

## 2. Aggregate and Export

Individual result files are saved in compressed numpy format.
They contain the crossvalidation results for all individual instances of the meta search.
To efficiently insert the data into a data frame I am filling up lists and then create the dataframe only at the very end. Appending to the dataframe would have a huge performance impact in this case and should be avoided.

There are two objects in the numpy data file, `params` which contains the cv results and `data`, which contains the final testing score from a 10% holdout.

In [None]:
s_subj, s_type, s_kernel, s_score, s_test, s_d1, s_d2 = ([], [], [], [], [], [], [])

for resfile in glob.glob('results/*.npz'):
    objs = np.load(resfile)
    results = dotdict(objs["results"].reshape(-1)[0])
    config = dotdict(objs["config"].reshape(-1)[0])
    params = results.params
    data = results.data
    typename = gettypename(config)

    # we could just use params length, but this way we double check
    for comb in combinations(np.arange(10), 2):
        
        # put each digit CV(!) score in the df, we can average when plotting
        for cvscore in params[comb]:
            s_subj.append(config['subject'])
            s_type.append(typename)
            kernel = 'none'
            for param in cvscore.parameters:
                if 'kernel' in param:
                    kernel = cvscore.parameters[param]
            s_kernel.append(kernel)
            s_score.append(cvscore.mean_validation_score)
            s_test.append("cv")
            s_d1.append(comb[0])
            s_d2.append(comb[1])
            
        # final score, may be more than one (rbf+linear for instance)
        for finalscore in data[comb]:
            s_subj.append(config['subject'])
            s_type.append(typename)
            s_score.append(finalscore[0])
            s_test.append("final")
            kernel = 'none'
            for k,v in finalscore[1].items():
                if 'kernel' in k:
                    kernel = v
            s_kernel.append(kernel)
            s_d1.append(comb[0])
            s_d2.append(comb[1])

df = pd.DataFrame({"subject":s_subj, "type":s_type, "score":s_score,
                   "test":s_test, 'd1':s_d1, 'd2':s_d2, 'kernel':s_kernel})
df.to_csv("all_results.csv")

---