Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caluclation error in gpytorch #3

Closed
kkyamada opened this issue May 10, 2022 · 8 comments
Closed

Caluclation error in gpytorch #3

kkyamada opened this issue May 10, 2022 · 8 comments

Comments

@kkyamada
Copy link

Hello!
I tried to run the model-based genetic baseline by following your sample command.
python scripts/black_box_opt.py optimizer=mb_genetic optimizer/algorithm=soga optimizer.encoder_obj=mll task=regex tokenizer=protein surrogate=multi_task_exact_gp acquisition=nehvi
However, it caused the following error.

[2022-05-10 21:34:54,070][root][ERROR] - Input is not a valid correlation matrix
Traceback (most recent call last):
  File "scripts/black_box_opt.py", line 55, in main
    metrics = optimizer.optimize(
  File "/home/keisuke-yamada/lambo/lambo/optimizers/pymoo.py", line 189, in optimize
    problem = self._create_inner_task(
  File "/home/keisuke-yamada/lambo/lambo/optimizers/pymoo.py", line 389, in _create_inner_task
    records = self.surrogate_model.fit(
  File "/home/keisuke-yamada/lambo/lambo/models/gp_models.py", line 321, in fit
    return fit_gp_surrogate(**fit_kwargs)
  File "/home/keisuke-yamada/lambo/lambo/models/gp_utils.py", line 238, in fit_gp_surrogate
    enc_sup_loss = fit_encoder_only(
  File "/home/keisuke-yamada/lambo/lambo/models/gp_utils.py", line 106, in fit_encoder_only
    loss = gp_train_step(surrogate, optimizer, inputs, targets, mll)
  File "/home/keisuke-yamada/lambo/lambo/models/gp_utils.py", line 91, in gp_train_step
    loss = -mll(output, targets).mean()
  File "/home/keisuke-yamada/lambo/.venv/src/gpytorch/gpytorch/module.py", line 30, in __call__
    outputs = self.forward(*inputs, **kwargs)
  File "/home/keisuke-yamada/lambo/.venv/src/gpytorch/gpytorch/mlls/exact_marginal_log_likelihood.py", line 63, in forward
    res = self._add_other_terms(res, params)
  File "/home/keisuke-yamada/lambo/.venv/src/gpytorch/gpytorch/mlls/exact_marginal_log_likelihood.py", line 43, in _add_other_terms
    res.add_(prior.log_prob(closure(module)).sum())
  File "/home/keisuke-yamada/lambo/.venv/src/gpytorch/gpytorch/priors/lkj_prior.py", line 134, in log_prob
    log_prob_corr = self.correlation_prior.log_prob(correlations)
  File "/home/keisuke-yamada/lambo/.venv/src/gpytorch/gpytorch/priors/lkj_prior.py", line 60, in log_prob
    raise ValueError("Input is not a valid correlation matrix")
ValueError: Input is not a valid correlation matrix

It seems like the code fails to calculate an appropriate correlation matrix in gpytorch.priors.lkj_prior.LKJCovariancePrior.log_prob. Do you have any ideas why it happens?

Thanks!

@kkyamada
Copy link
Author

kkyamada commented May 10, 2022

When I checked the calculation process, it seems like torch.matmul has poor precision.
I manually changed the source code of gpytorch.priors.lkj_prior.LKJCovariancePrior.log_prob as follows.

    def log_prob(self, X):
        print("\n\nLKJCovariancePrior.log_prob, input:\n", X)
        marginal_var = torch.diagonal(X, dim1=-2, dim2=-1)
        print("\nLKJCovariancePrior.log_prob, marginal_var:\n", marginal_var)
        if not torch.all(marginal_var >= 0):
            raise ValueError("Variance(s) cannot be negative")
        marginal_sd = marginal_var.sqrt()
        print("\nLKJCovariancePrior.log_prob, marginal_sd:\n", marginal_sd)
        sd_diag_mat = _batch_form_diag(1 / marginal_sd)
        print("\nLKJCovariancePrior.log_prob, sd_diag_mat:\n", sd_diag_mat)
        correlations = torch.matmul(torch.matmul(sd_diag_mat, X), sd_diag_mat)
        print("\nLKJCovariancePrior.log_prob, corrs:\n", correlations, "\n")
        log_prob_corr = self.correlation_prior.log_prob(correlations)
        log_prob_sd = self.sd_prior.log_prob(marginal_sd)
        return log_prob_corr + log_prob_sd

and the output was the following.

LKJCovariancePrior.log_prob, input:
 tensor([[ 2.7795,  0.3485,  0.0401],
        [ 0.3485,  2.1543, -2.6390],
        [ 0.0401, -2.6390,  5.3450]], device='cuda:0')

LKJCovariancePrior.log_prob, marginal_var:
 tensor([2.7795, 2.1543, 5.3450], device='cuda:0')

LKJCovariancePrior.log_prob, marginal_sd:
 tensor([1.6672, 1.4677, 2.3119], device='cuda:0')

LKJCovariancePrior.log_prob, sd_diag_mat:
 tensor([[0.5998, 0.0000, 0.0000],
        [0.0000, 0.6813, 0.0000],
        [0.0000, 0.0000, 0.4325]], device='cuda:0')

LKJCovariancePrior.log_prob, corrs:
 tensor([[ 0.9990,  0.1424,  0.0104],
        [ 0.1424,  0.9998, -0.7774],
        [ 0.0104, -0.7776,  1.0004]], device='cuda:0') 

[2022-05-10 21:36:44,990][root][ERROR] - Input is not a valid correlation matrix

where its precise result must be

LKJCovariancePrior.log_prob, corrs:
 tensor([[ 1.0000,  0.1424,  0.0104],
        [ 0.1424,  1.0000, -0.7777],
        [ 0.0104, -0.7777,  1.0000]]) 

@samuelstanton
Copy link
Owner

sorry for the delayed response

you're correct this is a numerical precision issue, though I'm surprised you're having problems, I've run this code many times and never seen this specific error. If you're sure this run wasn't a fluke you have a couple options

  • switch from single to double precision (if running on a GPU)
  • tweak GPyTorch settings (https://docs.gpytorch.ai/en/stable/settings.html)
  • dig into the source of numerical instability (if it happens in the middle of model training the learning rate may be a bit too aggressive)

@kkyamada
Copy link
Author

Thank you for the response!
Changing the datatype of inputs for the GP heads from torch.float to torch.double solved the error!

@samuelstanton
Copy link
Owner

glad to hear it! closing the issue.

@Thomaswbt
Copy link

@kkyamada Hello! May I ask what specific modifications you have made to the code to solve the numerical issues? I tried a bunch of ways to change the input of GP heads from torch.float to torch.double (e.g. in the gp_train_step function of gp_utils.py), but changing the tensor type would result in further errors in the gpytorch package (the message is "RuntimeError: expected scalar type Float but found Double"). How did you avoid this kind of type inconsistency? Thanks a lot in advance!

@samuelstanton
Copy link
Owner

I just pushed a commit that should make it much easier to change dtypes

In short, just change this line to torch.double

Hopefully this resolves your issue.

@Thomaswbt
Copy link

Thomaswbt commented Oct 22, 2022

Thank you for your response! It's so nice of you to modify the code and that really helps! Changing torch.float to torch.double resolved my issue :)

@samuelstanton
Copy link
Owner

That's great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants