Skip to content

Hyperparameters Selection

Rafael Garcia Leiva edited this page Jan 3, 2020 · 4 revisions

Hyperparameters Selection

Nescience is a measure of how well we understand a problem represented by a dataset and described by a model. Nescience is a generic concept based on the metrics of miscoding, inaccuracy and surfeit. Nescience allow us to compare different combinations of data subsets and models in search of the optimal ones.

The class fastautoml.Nescience implements the metric of Nescience.

Hyperparameters Search

In this example we are going to train a DecisionTreeClassifier applied over the digits dataset. We are interested in finding the the optimal value for the hyperparameter max_depth (maximum depth allowed for the tree). We will a use a grid search approach based on the concept of Nescience and we will compare our method to a classical cross validation.

from fastautoml.fastautoml import Nescience
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import cross_val_score

X, y = load_digits(return_X_y=True)
nescience.fit(X, y)

lnescience  = list()
lcrossval   = list()

for i in range(1, 10):
    
    tree = DecisionTreeClassifier(max_depth=i)
    tree.fit(X, y)
    lnescience.append(nescience.nescience(tree))
    scores = cross_val_score(tree, X, y, cv=30)
    lcrossval.append(1 - scores.mean())

Next figure shows the results of compute the Nescience and the cross validation score for different depth of the tree (from 1 to 20).

Nescience vs. Cross Validation

Both approaches produce a similar result. Nescience propose to use a depth of 14 and cross validation a depth of 15. The difference is that cross validation requires to train and test the tree 30 times per depth level, meanwhile nescience only requires one training and testing.

For more uses of the Nescience class see the following blog entries:

  • Evolution of Nescience (TBD)
  • Model Selection with Nescience (TBD)
  • Hyperparameters Grid search with Nescience (TBD)

Mathematical Formulation

The nescience of a dataset, a target variable and a model is given by a function of the miscoding, inaccuracy and surfeit of these elements.

Clone this wiki locally