Caluclation error in gpytorch #3

kkyamada · 2022-05-10T13:10:58Z

Hello!
I tried to run the model-based genetic baseline by following your sample command.
python scripts/black_box_opt.py optimizer=mb_genetic optimizer/algorithm=soga optimizer.encoder_obj=mll task=regex tokenizer=protein surrogate=multi_task_exact_gp acquisition=nehvi
However, it caused the following error.

[2022-05-10 21:34:54,070][root][ERROR] - Input is not a valid correlation matrix
Traceback (most recent call last):
  File "scripts/black_box_opt.py", line 55, in main
    metrics = optimizer.optimize(
  File "/home/keisuke-yamada/lambo/lambo/optimizers/pymoo.py", line 189, in optimize
    problem = self._create_inner_task(
  File "/home/keisuke-yamada/lambo/lambo/optimizers/pymoo.py", line 389, in _create_inner_task
    records = self.surrogate_model.fit(
  File "/home/keisuke-yamada/lambo/lambo/models/gp_models.py", line 321, in fit
    return fit_gp_surrogate(**fit_kwargs)
  File "/home/keisuke-yamada/lambo/lambo/models/gp_utils.py", line 238, in fit_gp_surrogate
    enc_sup_loss = fit_encoder_only(
  File "/home/keisuke-yamada/lambo/lambo/models/gp_utils.py", line 106, in fit_encoder_only
    loss = gp_train_step(surrogate, optimizer, inputs, targets, mll)
  File "/home/keisuke-yamada/lambo/lambo/models/gp_utils.py", line 91, in gp_train_step
    loss = -mll(output, targets).mean()
  File "/home/keisuke-yamada/lambo/.venv/src/gpytorch/gpytorch/module.py", line 30, in __call__
    outputs = self.forward(*inputs, **kwargs)
  File "/home/keisuke-yamada/lambo/.venv/src/gpytorch/gpytorch/mlls/exact_marginal_log_likelihood.py", line 63, in forward
    res = self._add_other_terms(res, params)
  File "/home/keisuke-yamada/lambo/.venv/src/gpytorch/gpytorch/mlls/exact_marginal_log_likelihood.py", line 43, in _add_other_terms
    res.add_(prior.log_prob(closure(module)).sum())
  File "/home/keisuke-yamada/lambo/.venv/src/gpytorch/gpytorch/priors/lkj_prior.py", line 134, in log_prob
    log_prob_corr = self.correlation_prior.log_prob(correlations)
  File "/home/keisuke-yamada/lambo/.venv/src/gpytorch/gpytorch/priors/lkj_prior.py", line 60, in log_prob
    raise ValueError("Input is not a valid correlation matrix")
ValueError: Input is not a valid correlation matrix

It seems like the code fails to calculate an appropriate correlation matrix in gpytorch.priors.lkj_prior.LKJCovariancePrior.log_prob. Do you have any ideas why it happens?

Thanks!

The text was updated successfully, but these errors were encountered:

kkyamada · 2022-05-10T13:18:50Z

When I checked the calculation process, it seems like torch.matmul has poor precision.
I manually changed the source code of gpytorch.priors.lkj_prior.LKJCovariancePrior.log_prob as follows.

    def log_prob(self, X):
        print("\n\nLKJCovariancePrior.log_prob, input:\n", X)
        marginal_var = torch.diagonal(X, dim1=-2, dim2=-1)
        print("\nLKJCovariancePrior.log_prob, marginal_var:\n", marginal_var)
        if not torch.all(marginal_var >= 0):
            raise ValueError("Variance(s) cannot be negative")
        marginal_sd = marginal_var.sqrt()
        print("\nLKJCovariancePrior.log_prob, marginal_sd:\n", marginal_sd)
        sd_diag_mat = _batch_form_diag(1 / marginal_sd)
        print("\nLKJCovariancePrior.log_prob, sd_diag_mat:\n", sd_diag_mat)
        correlations = torch.matmul(torch.matmul(sd_diag_mat, X), sd_diag_mat)
        print("\nLKJCovariancePrior.log_prob, corrs:\n", correlations, "\n")
        log_prob_corr = self.correlation_prior.log_prob(correlations)
        log_prob_sd = self.sd_prior.log_prob(marginal_sd)
        return log_prob_corr + log_prob_sd

and the output was the following.

LKJCovariancePrior.log_prob, input:
 tensor([[ 2.7795,  0.3485,  0.0401],
        [ 0.3485,  2.1543, -2.6390],
        [ 0.0401, -2.6390,  5.3450]], device='cuda:0')

LKJCovariancePrior.log_prob, marginal_var:
 tensor([2.7795, 2.1543, 5.3450], device='cuda:0')

LKJCovariancePrior.log_prob, marginal_sd:
 tensor([1.6672, 1.4677, 2.3119], device='cuda:0')

LKJCovariancePrior.log_prob, sd_diag_mat:
 tensor([[0.5998, 0.0000, 0.0000],
        [0.0000, 0.6813, 0.0000],
        [0.0000, 0.0000, 0.4325]], device='cuda:0')

LKJCovariancePrior.log_prob, corrs:
 tensor([[ 0.9990,  0.1424,  0.0104],
        [ 0.1424,  0.9998, -0.7774],
        [ 0.0104, -0.7776,  1.0004]], device='cuda:0') 

[2022-05-10 21:36:44,990][root][ERROR] - Input is not a valid correlation matrix

where its precise result must be

LKJCovariancePrior.log_prob, corrs:
 tensor([[ 1.0000,  0.1424,  0.0104],
        [ 0.1424,  1.0000, -0.7777],
        [ 0.0104, -0.7777,  1.0000]])

samuelstanton · 2022-06-13T13:57:29Z

sorry for the delayed response

you're correct this is a numerical precision issue, though I'm surprised you're having problems, I've run this code many times and never seen this specific error. If you're sure this run wasn't a fluke you have a couple options

switch from single to double precision (if running on a GPU)
tweak GPyTorch settings (https://docs.gpytorch.ai/en/stable/settings.html)
dig into the source of numerical instability (if it happens in the middle of model training the learning rate may be a bit too aggressive)

kkyamada · 2022-06-14T04:09:20Z

Thank you for the response!
Changing the datatype of inputs for the GP heads from torch.float to torch.double solved the error!

samuelstanton · 2022-06-14T14:46:54Z

glad to hear it! closing the issue.

Thomaswbt · 2022-10-21T11:49:17Z

@kkyamada Hello! May I ask what specific modifications you have made to the code to solve the numerical issues? I tried a bunch of ways to change the input of GP heads from torch.float to torch.double (e.g. in the gp_train_step function of gp_utils.py), but changing the tensor type would result in further errors in the gpytorch package (the message is "RuntimeError: expected scalar type Float but found Double"). How did you avoid this kind of type inconsistency? Thanks a lot in advance!

samuelstanton · 2022-10-21T23:01:49Z

I just pushed a commit that should make it much easier to change dtypes

In short, just change this line to torch.double

Hopefully this resolves your issue.

Thomaswbt · 2022-10-22T15:10:08Z

Thank you for your response! It's so nice of you to modify the code and that really helps! Changing torch.float to torch.double resolved my issue :)

samuelstanton · 2022-10-22T17:11:57Z

That's great!

samuelstanton closed this as completed Jun 14, 2022

Thomaswbt mentioned this issue Oct 19, 2022

Problem with installing requirements #7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caluclation error in gpytorch #3

Caluclation error in gpytorch #3

kkyamada commented May 10, 2022

kkyamada commented May 10, 2022 •

edited

Loading

samuelstanton commented Jun 13, 2022

kkyamada commented Jun 14, 2022

samuelstanton commented Jun 14, 2022

Thomaswbt commented Oct 21, 2022

samuelstanton commented Oct 21, 2022

Thomaswbt commented Oct 22, 2022 •

edited

Loading

samuelstanton commented Oct 22, 2022

Caluclation error in gpytorch #3

Caluclation error in gpytorch #3

Comments

kkyamada commented May 10, 2022

kkyamada commented May 10, 2022 • edited Loading

samuelstanton commented Jun 13, 2022

kkyamada commented Jun 14, 2022

samuelstanton commented Jun 14, 2022

Thomaswbt commented Oct 21, 2022

samuelstanton commented Oct 21, 2022

Thomaswbt commented Oct 22, 2022 • edited Loading

samuelstanton commented Oct 22, 2022

kkyamada commented May 10, 2022 •

edited

Loading

Thomaswbt commented Oct 22, 2022 •

edited

Loading