Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute log-derivative of correlation matrix in reference prior computation #5

Open
wants to merge 2 commits into
base: gu
Choose a base branch
from

Conversation

josephmure
Copy link

This improvement was suggested, among many others, by @regislebrun at CT#11

The idea is to compute the (i,j)-th element of the Fisher information matrix the following way :

correlationMatrix_logDerivative_wrt_i * correlationMatrix_logDerivative_wrt_j

where correlationMatrix_logDerivative_wrt_i = pseudoInverseCorrelationMatrix * correlationMatrix_derivative_wrt_i

This matches Gu's implementation in his R package RobustGaSP (see file RobustGaSP/src/functions.cpp line 395).

Currently implemented unitary tests all pass.

jschueller and others added 2 commits March 13, 2020 08:42
We add optimization of the kriging parameters according to gu2016:
http://doc.openturns.org/papers/gu2016.pdf

It consists in penalizing the integrated likelihood by a chosen prior.
We implement the two proposed types of prior:
  - jointly-robust prior
  - reference prior

In addition to the new likelihood estimation options, we add the
parametrization of the covariance model scale parameter:
  - default
  - inverse
  - log-inverse
in reference prior computation.
This matches implmentation in R package RobustGaSP.
Currently implemented unitary tests all pass.
@josephmure
Copy link
Author

josephmure commented Mar 17, 2020

@mbaudin47, this may be of interest to you. All your tests from #4 now pass.

@regislebrun
Copy link

Hi all,
I am currently working on Julien's branch, and I have already made that change and many others. So give me an additional half-day and I come with a comprehensive code&benchmark!

@regislebrun
Copy link

For the most impatients, here is the par I modified in GeneralLinearModelAlgorithm.cxx:

    case REFERENCE:
    {
      SymmetricMatrix iTheta(inputDimension + 1);

      // discretize gradient wrt scale
      Collection<SquareMatrix> dCds(inputDimension, SquareMatrix(size));
      for (UnsignedInteger k1 = 0; k1 < size; ++ k1)
      {
        for (UnsignedInteger k2 = 0; k2 <= k1; ++ k2)
        {
          Matrix parameterGradient(reducedCovarianceModel_.parameterGradient(normalizedInputSample_[k1], normalizedInputSample_[k2]));
          for (UnsignedInteger j = 0; j < inputDimension; ++ j)
          {
            // assume scale gradient is at the n first components
            const Scalar value = parameterGradient(j, 0);
            dCds[j](k1, k2) = value;
            dCds[j](k2, k1) = value;
          }
        }
      }
      // TODO: cache sigmaTheta
      SquareMatrix Linv(covarianceCholeskyFactor_.solveLinearSystem(IdentityMatrix(covarianceCholeskyFactor_.getNbRows())).getImplementation());
      CovarianceMatrix SigmaInv(Linv.computeGram(true).getImplementation());
      CovarianceMatrix sigmaTheta;
      if (F_.getNbColumns() == 0)
        sigmaTheta = SigmaInv;
      else
        {
          CovarianceMatrix G((Linv*F_).computeGram(true).getImplementation());
          TriangularMatrix Lg(G.computeCholesky(false)); // false=G is not reused
          CovarianceMatrix correction((Lg.solveLinearSystem(F_.transpose()*SigmaInv)).computeGram(true).getImplementation());
          sigmaTheta = CovarianceMatrix((SigmaInv - correction).getImplementation());
      }

      // lower triangle
      for (UnsignedInteger i = 0; i < inputDimension; ++ i)
      {
        dCds[i] = sigmaTheta * dCds[i];
        for (UnsignedInteger j = 0; j <= i; ++ j)
        {
          iTheta(i, j) = (dCds[i] * dCds[j]).computeTrace();
        }
      }
      // bottom line
      for (UnsignedInteger j = 0; j < inputDimension; ++ j)
      {
        iTheta(inputDimension, j) = dCds[j].computeTrace();
      }
      // bottom right corner
      iTheta(inputDimension, inputDimension) = size - beta_.getSize();

      Scalar sign = 1.0;
      penalizationFactor = 0.5 * iTheta.computeLogAbsoluteDeterminant(sign, false);
      break;
    }

So I reuse dCds and hase to promote the collection of SymmetricMatrix into a collection of SquareMatrix to store sigmaTheta*dCds in-place. It has the exact same memory footprint (instead of doubling the memory with a copy) and the same execution time.

The main point is that these changes do nothing on the total execution time as all the time is spent on the computation of the gradients wrt the parameters. I also worked on it, and now the effect of the changes into GeneralLinearModelAlgorithm starts to be noticeable.

@regislebrun
Copy link

Here are the results of some experiments. I compare the following cases:

  1. The current gu branch
  2. The current branch with an improved AbsoluteExponential covariance model (ie with a specific implementation of parameterGradient)
  3. The improved AbsoluteExponential model and the improved GLMAlgorithm as given above
    I made the experiments in four different settings: with or without TBBs, with OMP_NUM_THREADS=1 or OMP_NUM_THREADS=4.
    bench_size_0050
    bench_size_0100
    bench_size_0200
    bench_size_0500
    bench_size_1000

We see that the most effective modification is the implementation of parameterGradient(), taking care of memory use and other tricks driven by flame graphs. We also see the bad interaction between TBBs and OpenMP, the best combo being TBB wo OpenMP. The improvement in GLMAlgo is visible only when the linear algebra part becomes significant compared to the discretization step. The comparison between the flame graphs of the current branch and my local branch gives clues on where to work to improve things. The most prominent point is to use only non-persistent objects to get rid of calls to IdFactory.

Current branch:
perf_ref
With improved AbsoluteExponential:
perf_new_1
With improved AbsoluteExponential and GLMAlgo:
perf_new_2

@regislebrun
Copy link

To complete this feedback, here is the code for AbsoluteCovariance and CovarianceModelImplementation:

/* Gradient wrt parameters */
Matrix AbsoluteExponential::parameterGradient(const Point & s,
    const Point & t) const
{
  // If the active parameters are *exactly* the scale parameters
  Matrix gradient(inputDimension_, 1);
  if (onlyScale_)
    {
      if (s.getDimension() != inputDimension_) throw InvalidArgumentException(HERE) << "Error: the point s has dimension=" << s.getDimension() << ", expected dimension=" << inputDimension_;
      if (t.getDimension() != inputDimension_) throw InvalidArgumentException(HERE) << "Error: the point t has dimension=" << t.getDimension() << ", expected dimension=" << inputDimension_;
      Point tauOverTheta(inputDimension_);
      for (UnsignedInteger i = 0; i < inputDimension_; ++i) tauOverTheta[i] = (s[i] - t[i]) / scale_[i];
      const Scalar norm1 = tauOverTheta.norm1();
      const Scalar sigma2 = amplitude_[0] * amplitude_[0];
      // For zero norm
      // Norm1 is null if all elements are zero
      // In that case gradient is null
      if (norm1 == 0.0)
        return gradient;
      // General case
      const Scalar value = std::exp(-norm1) * sigma2;
      // Gradient take as factor sign(tau_i) /theta_i
      for (UnsignedInteger i = 0; i < inputDimension_; ++i)
        gradient(i, 0) = std::abs(tauOverTheta[i]) * value / scale_[i];
      return gradient;
    }
  return CovarianceModelImplementation::parameterGradient(s, t);
}

and

/* Indices of the active parameters */
void CovarianceModelImplementation::setActiveParameter(const Indices & active)
{
  if (!active.isIncreasing()) throw InvalidArgumentException(HERE) << "Error: the active parameter indices must be given in increasing order, here active=" << active;
  activeParameter_ = active;
  onlyScale_ = (activeParameter_.getSize() == inputDimension_) && (activeParameter_.check(inputDimension_));
}

@jschueller
Copy link
Owner

Oh, I see you're cheating :)
We also thought implementing the full analytical gradients for every cov model would be a huge task.

@josephmure
Copy link
Author

Actually, we could use the partialDerivative to compute the derivative with respect to the scale parameter, so it could be implemented in CovarianceModel or CovarianceModelImplementation.

jschueller pushed a commit that referenced this pull request May 10, 2023
* Better reference values for unit tests

---------

Co-authored-by: MerlinKeller <43726507+MerlinKeller@users.noreply.github.com>
jschueller pushed a commit that referenced this pull request May 16, 2024
CombinationsDistribution: Fix unused variable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants