Benefit of the gradients on the 2-dimensional Rosenbrock function #388

BenoitPauwels · 2022-11-24T10:43:25Z

Hello,

I'm experimenting with KPLS and GEKPLS on the 2-dimensional Rosenbrock function by measuring their prediction accuracy (in terms of relative L2-distance).
In the results I get GEKPLS does not seem to benefit from the gradients.
I was expecting GEKPLS to be significantly more accurate than KPLS.
You'll find my script and the results I get hereunder. Am I doing something wrong?

from matplotlib import pyplot
from scipy.linalg import norm
from smt.surrogate_models import GEKPLS
from smt.surrogate_models import KPLS
from smt.problems import Rosenbrock
from smt.sampling_methods import LHS


class PredictionAccuracy:
    """Prediction accuracy of GEKPLS."""
    MODELS_CLASSES = [KPLS, GEKPLS]

    def __init__(self, problem, validation_size):
        self.__problem = problem
        self.__sampling_method = LHS(xlimits=problem.xlimits)
        self.__validation_inputs = self.__sampling_method(validation_size)
        self.__validation_outputs = self.__problem(self.__validation_inputs)

    def plot(self, sample_size, number_of_trainings, **options):
        """Plot the distributions of the prediction accuracy."""
        measures = self.__train_models(sample_size, number_of_trainings, **options)
        pyplot.boxplot(
            [measures[model_class] for model_class in self.MODELS_CLASSES],
            labels=[model_class.__name__ for model_class in self.MODELS_CLASSES],
            showmeans=True,
        )
        pyplot.ylim(bottom=0)
        pyplot.title("Distribution of the prediction accuracy")
        pyplot.savefig("prediction_accuracy.png")
        pyplot.close()

    def __train_models(self, sample_size, number_of_trainings, **options):
        """Train models and measure their prediction accuracy."""
        accuracy = {model_class: [] for model_class in self.MODELS_CLASSES}
        for _ in range(number_of_trainings):

            # Generate a sample of training inputs
            training_inputs = self.__sampling_method(sample_size)

            # Train the surrogate models and measure their prediction accuracy
            for model_class in self.MODELS_CLASSES:
                accuracy[model_class].append(
                    self.__measure_accuracy(
                        self.__train_model(
                            model_class,
                            training_inputs,
                            self.__problem(training_inputs),
                            **options
                        )
                    )
                )

        return accuracy

    def __train_model(self, model_class, inputs, outputs, **options):
        """Train a surrogate model."""
        # Prepare the options of the surrogate model
        model_options = dict(options)
        if model_class == GEKPLS:
            model_options["xlimits"] = self.__problem.xlimits

        # Set the training data
        model = model_class(**model_options)
        model.set_training_values(inputs, outputs)
        if model_class == GEKPLS:
            # Set the training derivatives
            for index in range(self.__problem.options["ndim"]):
                model.set_training_derivatives(
                    inputs, self.__problem(inputs, kx=index), index
                )

        # Train the model
        model.train()
        return model

    def __measure_accuracy(self, model):
        """Measure the prediction accuracy of a surrogate model."""
        return norm(
            model.predict_values(self.__validation_inputs) - self.__validation_outputs
        ) / norm(self.__validation_outputs)


if __name__ == "__main__":
    # Train the surrogate models and measure their prediction accuracy.
    PredictionAccuracy(Rosenbrock(ndim=2), validation_size=1000).plot(
        sample_size=10,
        number_of_trainings=20,
        n_comp=2,
        n_start=20,
    )

I've tried changing the number of starting points and switching the optimizer to TNC but the results were similar.

Thank you for your time,
Benoît

relf · 2022-12-07T14:19:53Z

Hi Benoit. Maybe rosenbrock is not the best function to make GEK shine. The surface changes smoothly and plain kriging fits it basically pretty well and derivatives does not bring much more information (at least it does not worsen the prediction 😅 ). I would say that benefits should be seen with a function with more strong changes between training points.

BenoitPauwels · 2022-12-09T17:15:49Z

Hi Rémi,

Thank you for your answer.

On the figure attached to my previous message we can see that the median of the relative L2-errors of both KPLS and GEKPLS is about 50%. So neither KPLS nor GEKPLS seem to fit the Rosenbrock function pretty well.

You will find below a graph of the KPLS model that achieves median accuracy and the GEKPLS model trained at the same inputs (in orange, they look almost the same) and the Rosenbrock function (in blue).

As you can see the KPLS model does not fit the Rosenbrock function very well, there is clearly room for improvement. Adding the derivatives to the training data with GEKPLS does not really improve the accuracy.

Isn't it surprising? I would expect the derivatives to bring a lot of information, especially for such a smooth function.

Thank you for your time,
Benoît

relf · 2022-12-15T16:05:35Z

We have detected some problems in GEKPLS in SMT GitHub master, not present in SMT 1.3. Do you use SMT 1.3?

BenoitPauwels · 2022-12-15T16:11:59Z

Yes, I use SMT 1.3.

Paul-Saves · 2022-12-15T17:30:22Z

Hello Benoit,

KPLS and GEKPS are meant to reduce the dimension of the problem. Here, you chose n_comp=2 but Rosenbrock(ndim=2). Therefore, the PLS matrices are just the identity matrix with numerical errors for both KPLS and GEKPLS.
You chose to build a model with only 10 points (sample_size=10). But the Rosenbrock function grows quick on the sides. Therefore, if you have no points in that zones, the model just can't predict it.

=> What you are comparing is therefore a prediction error related to the sample size and some noise from an ill-conditioned PLS matrix.

If you chose a reduction of the model size from Rosenbrock(ndim=5) to n_comp=2 and if you chose a sample size of 50 points you would obtain the result below.
PredictionAccuracy(Rosenbrock(ndim=5), validation_size=1000).plot(
sample_size=50,
number_of_trainings=20,
n_comp=2,
n_start=20,

So GEKPLS helps but need a certain amount of points and an effective reduction of the dimension for the model.

BenoitPauwels · 2022-12-16T17:19:52Z

Hello Paul,

Thank you for your answer.

I totally understand your point about dimension reduction, but I was more interested in observing the contribution of the derivatives.

I chose a relatively small size of the training sample on purpose: I wanted to measure the contribution of the derivatives when the data is scarce. I hoped that the derivatives could compensate the data scarcity to some extent.
I tried to double the size of the training sample for the 2-dimensional Rosenbrock function: KPLS becomes more accurate than GEKPLS.

It's counterintuitive to me that the derivatives worsen the accuracy. (Maybe it comes from ill-conditioning.)

Thank you for the 5-dimensional example. I observe that we gain only 5% of accuracy at the cost of computing 50 gradients.

I take note that GEKPLS requires an effective dimension reduction.

Cheers,
Benoît

Paul-Saves · 2022-12-16T18:07:11Z

Hi,

I totally understand your point. What you want to compare is Kriging vs Gradient-Enhanced Kriging (GEK). In this case, we should obtain better performances with GEK, you are totally right !

Unfortunately, GEK is not implemented in SMT (it has a lot of limitations for a large number of points or an high-dimensional problem).

You are also right for the bad accuracy, as we add more badly approximate direction with GEKPLS, the numerical errors should add up and the accuracy decrease.

In fact, GEKPLS is based on the PLS-reduced principal components derivatives. In this case, not only the PLS directions are ill-conditioned but it do not corresponds to GEK as GEK do not use dimension reduction at the price of being more expensive.

https://arxiv.org/pdf/1708.02663.pdf

BenoitPauwels · 2022-12-20T15:01:50Z

Thank you Paul for the explanation.
Cheers,
Benoît

relf · 2023-01-11T15:28:44Z

Thanks Paul!

relf closed this as completed Jan 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benefit of the gradients on the 2-dimensional Rosenbrock function #388

Benefit of the gradients on the 2-dimensional Rosenbrock function #388

BenoitPauwels commented Nov 24, 2022

relf commented Dec 7, 2022

BenoitPauwels commented Dec 9, 2022

relf commented Dec 15, 2022

BenoitPauwels commented Dec 15, 2022

Paul-Saves commented Dec 15, 2022 •

edited

Loading

BenoitPauwels commented Dec 16, 2022

Paul-Saves commented Dec 16, 2022

BenoitPauwels commented Dec 20, 2022

relf commented Jan 11, 2023

Benefit of the gradients on the 2-dimensional Rosenbrock function #388

Benefit of the gradients on the 2-dimensional Rosenbrock function #388

Comments

BenoitPauwels commented Nov 24, 2022

relf commented Dec 7, 2022

BenoitPauwels commented Dec 9, 2022

relf commented Dec 15, 2022

BenoitPauwels commented Dec 15, 2022

Paul-Saves commented Dec 15, 2022 • edited Loading

BenoitPauwels commented Dec 16, 2022

Paul-Saves commented Dec 16, 2022

BenoitPauwels commented Dec 20, 2022

relf commented Jan 11, 2023

Paul-Saves commented Dec 15, 2022 •

edited

Loading