Compute log-derivative of correlation matrix in reference prior computation #5

josephmure · 2020-03-17T14:35:18Z

This improvement was suggested, among many others, by @regislebrun at CT#11

The idea is to compute the (i,j)-th element of the Fisher information matrix the following way :

correlationMatrix_logDerivative_wrt_i * correlationMatrix_logDerivative_wrt_j

where correlationMatrix_logDerivative_wrt_i = pseudoInverseCorrelationMatrix * correlationMatrix_derivative_wrt_i

This matches Gu's implementation in his R package RobustGaSP (see file RobustGaSP/src/functions.cpp line 395).

Currently implemented unitary tests all pass.

We add optimization of the kriging parameters according to gu2016: http://doc.openturns.org/papers/gu2016.pdf It consists in penalizing the integrated likelihood by a chosen prior. We implement the two proposed types of prior: - jointly-robust prior - reference prior In addition to the new likelihood estimation options, we add the parametrization of the covariance model scale parameter: - default - inverse - log-inverse

in reference prior computation. This matches implmentation in R package RobustGaSP. Currently implemented unitary tests all pass.

josephmure · 2020-03-17T14:40:54Z

@mbaudin47, this may be of interest to you. All your tests from #4 now pass.

regislebrun · 2020-03-17T14:46:03Z

Hi all,
I am currently working on Julien's branch, and I have already made that change and many others. So give me an additional half-day and I come with a comprehensive code&benchmark!

regislebrun · 2020-03-17T14:52:24Z

For the most impatients, here is the par I modified in GeneralLinearModelAlgorithm.cxx:

    case REFERENCE:
    {
      SymmetricMatrix iTheta(inputDimension + 1);

      // discretize gradient wrt scale
      Collection<SquareMatrix> dCds(inputDimension, SquareMatrix(size));
      for (UnsignedInteger k1 = 0; k1 < size; ++ k1)
      {
        for (UnsignedInteger k2 = 0; k2 <= k1; ++ k2)
        {
          Matrix parameterGradient(reducedCovarianceModel_.parameterGradient(normalizedInputSample_[k1], normalizedInputSample_[k2]));
          for (UnsignedInteger j = 0; j < inputDimension; ++ j)
          {
            // assume scale gradient is at the n first components
            const Scalar value = parameterGradient(j, 0);
            dCds[j](k1, k2) = value;
            dCds[j](k2, k1) = value;
          }
        }
      }
      // TODO: cache sigmaTheta
      SquareMatrix Linv(covarianceCholeskyFactor_.solveLinearSystem(IdentityMatrix(covarianceCholeskyFactor_.getNbRows())).getImplementation());
      CovarianceMatrix SigmaInv(Linv.computeGram(true).getImplementation());
      CovarianceMatrix sigmaTheta;
      if (F_.getNbColumns() == 0)
        sigmaTheta = SigmaInv;
      else
        {
          CovarianceMatrix G((Linv*F_).computeGram(true).getImplementation());
          TriangularMatrix Lg(G.computeCholesky(false)); // false=G is not reused
          CovarianceMatrix correction((Lg.solveLinearSystem(F_.transpose()*SigmaInv)).computeGram(true).getImplementation());
          sigmaTheta = CovarianceMatrix((SigmaInv - correction).getImplementation());
      }

      // lower triangle
      for (UnsignedInteger i = 0; i < inputDimension; ++ i)
      {
        dCds[i] = sigmaTheta * dCds[i];
        for (UnsignedInteger j = 0; j <= i; ++ j)
        {
          iTheta(i, j) = (dCds[i] * dCds[j]).computeTrace();
        }
      }
      // bottom line
      for (UnsignedInteger j = 0; j < inputDimension; ++ j)
      {
        iTheta(inputDimension, j) = dCds[j].computeTrace();
      }
      // bottom right corner
      iTheta(inputDimension, inputDimension) = size - beta_.getSize();

      Scalar sign = 1.0;
      penalizationFactor = 0.5 * iTheta.computeLogAbsoluteDeterminant(sign, false);
      break;
    }

So I reuse dCds and hase to promote the collection of SymmetricMatrix into a collection of SquareMatrix to store sigmaTheta*dCds in-place. It has the exact same memory footprint (instead of doubling the memory with a copy) and the same execution time.

The main point is that these changes do nothing on the total execution time as all the time is spent on the computation of the gradients wrt the parameters. I also worked on it, and now the effect of the changes into GeneralLinearModelAlgorithm starts to be noticeable.

regislebrun · 2020-03-17T19:34:44Z

Here are the results of some experiments. I compare the following cases:

The current gu branch
The current branch with an improved AbsoluteExponential covariance model (ie with a specific implementation of parameterGradient)
The improved AbsoluteExponential model and the improved GLMAlgorithm as given above
I made the experiments in four different settings: with or without TBBs, with OMP_NUM_THREADS=1 or OMP_NUM_THREADS=4.

We see that the most effective modification is the implementation of parameterGradient(), taking care of memory use and other tricks driven by flame graphs. We also see the bad interaction between TBBs and OpenMP, the best combo being TBB wo OpenMP. The improvement in GLMAlgo is visible only when the linear algebra part becomes significant compared to the discretization step. The comparison between the flame graphs of the current branch and my local branch gives clues on where to work to improve things. The most prominent point is to use only non-persistent objects to get rid of calls to IdFactory.

Current branch:

With improved AbsoluteExponential:

With improved AbsoluteExponential and GLMAlgo:

regislebrun · 2020-03-18T08:40:34Z

To complete this feedback, here is the code for AbsoluteCovariance and CovarianceModelImplementation:

/* Gradient wrt parameters */
Matrix AbsoluteExponential::parameterGradient(const Point & s,
    const Point & t) const
{
  // If the active parameters are *exactly* the scale parameters
  Matrix gradient(inputDimension_, 1);
  if (onlyScale_)
    {
      if (s.getDimension() != inputDimension_) throw InvalidArgumentException(HERE) << "Error: the point s has dimension=" << s.getDimension() << ", expected dimension=" << inputDimension_;
      if (t.getDimension() != inputDimension_) throw InvalidArgumentException(HERE) << "Error: the point t has dimension=" << t.getDimension() << ", expected dimension=" << inputDimension_;
      Point tauOverTheta(inputDimension_);
      for (UnsignedInteger i = 0; i < inputDimension_; ++i) tauOverTheta[i] = (s[i] - t[i]) / scale_[i];
      const Scalar norm1 = tauOverTheta.norm1();
      const Scalar sigma2 = amplitude_[0] * amplitude_[0];
      // For zero norm
      // Norm1 is null if all elements are zero
      // In that case gradient is null
      if (norm1 == 0.0)
        return gradient;
      // General case
      const Scalar value = std::exp(-norm1) * sigma2;
      // Gradient take as factor sign(tau_i) /theta_i
      for (UnsignedInteger i = 0; i < inputDimension_; ++i)
        gradient(i, 0) = std::abs(tauOverTheta[i]) * value / scale_[i];
      return gradient;
    }
  return CovarianceModelImplementation::parameterGradient(s, t);
}

and

/* Indices of the active parameters */
void CovarianceModelImplementation::setActiveParameter(const Indices & active)
{
  if (!active.isIncreasing()) throw InvalidArgumentException(HERE) << "Error: the active parameter indices must be given in increasing order, here active=" << active;
  activeParameter_ = active;
  onlyScale_ = (activeParameter_.getSize() == inputDimension_) && (activeParameter_.check(inputDimension_));
}

jschueller · 2020-03-18T09:08:02Z

Oh, I see you're cheating :)
We also thought implementing the full analytical gradients for every cov model would be a huge task.

josephmure · 2020-03-18T09:11:43Z

Actually, we could use the partialDerivative to compute the derivative with respect to the scale parameter, so it could be implemented in CovarianceModel or CovarianceModelImplementation.

* Better reference values for unit tests --------- Co-authored-by: MerlinKeller <43726507+MerlinKeller@users.noreply.github.com>

CombinationsDistribution: Fix unused variable

jschueller and others added 2 commits March 13, 2020 08:42

Compute log-derivative of correlation matrix

3956067

in reference prior computation. This matches implmentation in R package RobustGaSP. Currently implemented unitary tests all pass.

regislebrun mentioned this pull request Mar 18, 2020

TBB: Disable OpenBLAS threading openturns/openturns#1293

Merged

jschueller force-pushed the gu branch 2 times, most recently from 9769369 to 4203366 Compare March 25, 2020 16:24

jschueller pushed a commit that referenced this pull request May 10, 2023

Better reference values for unit tests (#5)

d29660e

* Better reference values for unit tests --------- Co-authored-by: MerlinKeller <43726507+MerlinKeller@users.noreply.github.com>

jschueller pushed a commit that referenced this pull request May 16, 2024

Merge pull request #5 from jschueller/combinations2

1387344

CombinationsDistribution: Fix unused variable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute log-derivative of correlation matrix in reference prior computation #5

Compute log-derivative of correlation matrix in reference prior computation #5

josephmure commented Mar 17, 2020

josephmure commented Mar 17, 2020 •

edited

Loading

regislebrun commented Mar 17, 2020

regislebrun commented Mar 17, 2020

regislebrun commented Mar 17, 2020

regislebrun commented Mar 18, 2020

jschueller commented Mar 18, 2020

josephmure commented Mar 18, 2020

Compute log-derivative of correlation matrix in reference prior computation #5

Are you sure you want to change the base?

Compute log-derivative of correlation matrix in reference prior computation #5

Conversation

josephmure commented Mar 17, 2020

josephmure commented Mar 17, 2020 • edited Loading

regislebrun commented Mar 17, 2020

regislebrun commented Mar 17, 2020

regislebrun commented Mar 17, 2020

regislebrun commented Mar 18, 2020

jschueller commented Mar 18, 2020

josephmure commented Mar 18, 2020

josephmure commented Mar 17, 2020 •

edited

Loading