## Hyper-parameter tuning

There are essentially three parameters to tune in the polynomial regression class. The first, and the most obvious, is the polynomial order, which has the keyword ``order`` in the constructor. The next is the type of polynomial interaction terms called ``mindex_type``. "total_orer" os the default, but an alternative is "hyperbolic" which has even fewer interaction terms which emphasizes more higher-order terms. In practice, this rarely leads to a better polynomial, but we can try it anyway. Last, but not least, there is the polynomial ``fit_type``, which determines the solver used to solve the least squares problem (Note even though polynomials are non-linear, the fitting boils down to a linear problem). This can be a bunch of different algorithms, but the three most widely used are 'linear', 'LassoCV', and 'ElasticNetCV'. 

With these parameters in mind, we create a parameter grid just like one would when using the GridSearchCV method in sklearn. 

In [9]:
pce_grid = {
    'order': list(range(1,12)),
    'mindex_type': ['total_order','hyperbolic'],
    'fit_type': ['linear','ElasticNetCV','LassoCV'],
    }

Now we use the regression wrapper CV class which wraps the PCEReg class in sklearn's grid search CV functionality. 

In [10]:
# hyper-parameter tune the PCE regression class using all available cores
pce = tesuract.RegressionWrapperCV(
    regressor='pce',
    reg_params=pce_grid,
    n_jobs=-1,
    scorer='r2')
pce.fit(X,y)
print("Hyper-parameter CV PCE score is {0:.3f}".format(pce.best_score_))

Fitting 5 folds for each of 66 candidates, totalling 330 fits
Fitting 5 folds for each of 66 candidates, totalling 330 fits
Hyper-parameter CV PCE score is 0.998


Why so many fits? For each k-fold (5 total) we have to compute 66 fits corresponding to 66 different parameter combinations. This repeats five times to get an average cross validation score. 

Look at that! We got all the way to an R2 score of basically 1! How did we do that? One of our parameter combinations must have been really good. Which one was it? We can easily find out by called the best_params_ attribute. 

In [12]:
pce.best_params_

{'fit_type': 'ElasticNetCV', 'mindex_type': 'total_order', 'order': 4}

So it seems like 8th order way too high and probably overfit, so a fourth order was much better. Elastic net regularization also seemed to work the best, which uses a mix of l1 and l2 regularization. 

Now, to be fair, we probably should hyper-parameter tune the MLP regressor to perform a completely fair comparison, and it may probably give us ultimately a better model. In general however, neural networks are much hard to hyper-parameter tune and take longer to train, so the polynomial model can be preferred when accuracy and simplicity is required. 

## Feature importances

One last thing, before we move onto more advanced usage cases, in particular, tensor surrogates. With orthogonal polynomials like in PCEs, we can (almost) automatically obtain feature importances in the form of Sobol sensitivity indices. In particular, we can call feature importances to obtain normalized (summed to one) Sobol total order indices

In [21]:
# obtain the best estimator after hyper-parameter tuning
pce_best = pce.best_estimator_
# compute the normalized (sum to 1) Sobol total order indices
pce_best.feature_importances_

array([0.25341252, 0.2553728 , 0.08742023, 0.31989295, 0.08390151])

Now technically, the Sobol total order indices shouldn't be normalized, but we do it for consistency and with only some loss of generality. For the original total order indices use ``computeSobol()`` method. 

In [19]:
pce_best.computeSobol()

[0.27158784222563226,
 0.27368872135122857,
 0.09369020534664568,
 0.34283640074485666,
 0.0899191144641273]

The larger the value, the more "important" the parameter is. So, the first, second and fourth parameters are more importance features in the model than the remaining two. 