-
Notifications
You must be signed in to change notification settings - Fork 401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] fit_gpytorch_model complains of tensors on multiple devices when using the KroneckerMultiTaskGP #1323
Comments
Hmm this looks like a bug. @esantorella would you be able to take a look at this next week? |
@Balandat Is there any workaround for this you can recommend? |
I tried this out but wasn't able to reproduce this issue. I had to make some changes since I don't have the
If I modify Would it be possible to share the exact code you're using? Without that it'll be hard to debug this. |
The backend I'm using for some of my problem calculations is a bit involved to install, luckily I have a more minimal example based on your code above:
This still produces the same error:
|
Hmm looks like your example doesn't do the |
Explicitly adding this to the model initialization line as in my original code example results in the same error. Its worth noting that even in the example without that code, the model parameters are reported to be on the GPU according to torch (the |
Interesting. I don't run into this error (this is on pytorch and gpytorch current dev version). My first guess would be that this is either some change in type promotions on the pytorch side, or some changes in the recent gpytorch setup (some of the changes from gpytorch 1.7.0 -> 1.8.0 deal with moving tensors to the correct devices). Can you try running this on pytorch 1.12 with gpytorch 1.8.0 to see if that fixes the issue? |
Thank you for the help! I can confirm that the above script works with pytorch 1.12.0, gpytorch 1.8.0, and botorch 0.6.5 with no device errors. As an aside (I can open a separate issue for this as well, but if you have any intuition now it would be helpful) when I attempted to train on the CPU after a long period of time of optimizing over my custom problem function with no issue, seemingly randomly my training script throws the following error:
Looking at the source I can't understand why this property of the model would change at any point during training - I am following the same loop as what is in the MOBO tutorial in the docs except that I am initializing the model and computing the mll as I am above. This issue seems to appear after running for a few hours. |
So, you'd only call |
Looks like both
I guess it needs to be provided for you to be able to sample from the priors and set those values on the model. |
This is most likely the botorch/botorch/models/multitask.py Line 476 in fd30429
If you trace this down this is registered here: https://github.com/cornellius-gp/gpytorch/blob/d171863c50ab16b5bfb7035e579dcbe53169e703/gpytorch/kernels/index_kernel.py#L71 Basically this would need a |
cc @j-wilson who I just talked to about LKJ priors earlier today... |
Hi just wondering on the status of this issue and whether there is anything I could do in the meanwhile to make use of a multitask GP, in my use case all inputs correspond to all objective functions which is why I defaulted to using the Kronecker model above. |
Does fit_gpytorch_model(mll) only work with CPU and not GPU? I have to move mll from cuda to CPU, run fit_gpytorch_model(mll) and then move back to GPU to complete one BO loop. Is there a GPU-only way to do that? |
@jackliu333 what makes you think that |
Yes I met the following error:
which disappears after I move mll to CPU: The model is in GPU: |
Moving this to a new issue: #1566 |
Closing in favor of #1860 for clarity since the original issue is resolved. |
🐛 Bug
Despite my model and all tensors in the script being on the GPU, fit_gpytorch_model complains about tensors existing on both cuda:0 and CPU.
To reproduce
This code works when I just use a MultiOutput model with a series of SingleExact GPs and the corresponding SumMarginalLogLikelihoods, but trying to implement this with the multitask GP seems to cause an error. This doesn't seem to have anything to do with my problem function as it works with other models using essentially the same code, all I've changed is the model and the likelihood calculation.
Code snippet to reproduce
** Stack trace/error message **
Additional Info
When I run this script and change the problem initialization line by removing
.to(**tkwargs
(i.e.
problem = reconstruction_problem(device='cuda', **problem_kwargs)
)Then the script throws a different error:
Is fit_gpytorch_model incompatible with this model?
Expected Behavior
The fit_gpytorch_model method should run without error.
System information
Please complete the following information:
BoTorch 0.6.5
GPyTorch 1.7.0
PyTorch 1.11.0post202
Linux 5.4.0-117-generic
The text was updated successfully, but these errors were encountered: