Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] SingleTaskGP's wrong gradients when batch_size = 1 #279

Closed
yeahrmek opened this issue Sep 27, 2019 · 3 comments
Closed

[Bug] SingleTaskGP's wrong gradients when batch_size = 1 #279

yeahrmek opened this issue Sep 27, 2019 · 3 comments
Labels
bug Something isn't working

Comments

@yeahrmek
Copy link

馃悰 Bug

When I try to calculate the gradient of loss w.r.t. input, I get different results every run when I pass one point to GP.

To reproduce

import torch
from botorch.models import SingleTaskGP
from botorch.fit import fit_gpytorch_model
from gpytorch.mlls import ExactMarginalLogLikelihood

X = torch.randn(100, 1)
y = X.sum(dim=1, keepdims=True)**2

gp = SingleTaskGP(X, y)

mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
fit_gpytorch_model(mll)

x_test = torch.randn(1, 1)
x_test.requires_grad_(True)

gp.eval()

# Calculate gradient w.r.t. to the same input point 5 times
for _ in range(5):
    loss = gp(x_test).mean.sum()
    loss.backward()

    print(x_test.grad)
    x_test = x_test.detach()
    x_test.requires_grad_(True)

The output looks like this

tensor([[0.0597]])
tensor([[-4.8402]])
tensor([[-6.9655e+37]])
tensor([[-2.6707e+17]])
tensor([[nan]])

Expected Behavior

I expect the same result in each iteration. The bug appears only when I try to evaluate gradient at one input point. If I use batch size greater than 1:

x_test = torch.randn(2, 1)
x_test.requires_grad_(True)

gp.eval()

# Calculate gradient w.r.t. to the same input point 5 times
for _ in range(5):
    loss = gp(x_test).mean.sum()
    loss.backward()

    print(x_test.grad)
    x_test = x_test.detach()
    x_test.requires_grad_(True)

The output will be correct

tensor([[2.2270],
        [2.7313]])
tensor([[2.2270],
        [2.7313]])
tensor([[2.2270],
        [2.7313]])
tensor([[2.2270],
        [2.7313]])
tensor([[2.2270],
        [2.7313]])

System information

  • Botorch==0.1.3
  • GPyTorch=0.3.5
  • PyTorch==1.2.0
  • OS: Ubuntu 19.04
@yeahrmek yeahrmek added the bug Something isn't working label Sep 27, 2019
@yeahrmek
Copy link
Author

Looks like there is a problem with MaternKernel. At least when I change it to RBFKernel everything works correctly.

@Balandat
Copy link
Contributor

Balandat commented Sep 27, 2019

Yeah there was a bug in pytorch鈥檚 cdist function. Try running this on the latest gpytorch master, that should fix this (by not using torch.cdist st all).

@yeahrmek
Copy link
Author

On gpytorch master it works correctly, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants