You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently attempted to bring TPU support to gossipy. however, when using it it shows 4 hours and 30 minutes to completion of my simulation, significantly slower than the GPU training time of approximately 45 minutes. I was wondering if you could make it work:
I have read that TPUs are supposed to be significantly faster but my attempt does not reflect that!
Here is how I changed stuff :
first installing torch_XLA:
import torch_xla.core.xla_model as xm
class GlobalSettings(metaclass=Singleton):
"""Global settings for the library."""
_device = 'cpu'
def auto_device(self) -> torch.device:
"""Set device to TPU if available, otherwise cuda if available, otherwise cpu.
Returns
-------
torch.device
The device.
"""
if xm.xla_device_exists():
self._device = xm.xla_device()
elif torch.cuda.is_available():
self._device = torch.device('cuda')
else:
self._device = torch.device('cpu')
return self._device
def set_device(self, device_name: str) -> torch.device:
"""Set the device.
Parameters
----------
device_name: name of the device to set (possible values are 'auto', 'cuda', 'cpu', and 'tpu').
When device_name is 'auto', 'cuda' is used if available, otherwise 'cpu'.
Returns
-------
torch.device
The device.
"""
if device_name == "auto":
return GlobalSettings().auto_device()
elif device_name == "tpu" and xm.xla_device():
self._device = xm.xla_device()
else:
self._device = torch.device(device_name)
return self._device
def get_device(self):
"""Get the device.
Returns
-------
torch.device
The device.
"""
return self._device
*****TPU
***********GPU
The text was updated successfully, but these errors were encountered:
Hi @ParsaMokhtariHessar , thank you for your interest in gossipy. It seems that what you have done is correct, so it's hard to say what is wrong. Have you tried working with TPUs using a neural net outside gossipy? So to be sure that the problem comes from the framework. Anyway, (this happens also with GPUs), in these simulations there are a bunch of models and only one of them is loaded into the GPU/TPU memory at a time, and what happens is that the time overhead to move the models in/out of the memory usually overcomes the benefits of using GPU/TPUs. This is my take on this, but I honestly have to say that I did not dig into it.
I see! And I guess it would be too much memory to attempt to move all calculations to the TPU VRAM. Nonetheless the "Loading trip" is significantly faster in the case of a GPU. I am going to attempt to time a simple MNIST train with CPU, GPU, and TPU and get back to you.
I recently attempted to bring TPU support to gossipy. however, when using it it shows 4 hours and 30 minutes to completion of my simulation, significantly slower than the GPU training time of approximately 45 minutes. I was wondering if you could make it work:
I have read that TPUs are supposed to be significantly faster but my attempt does not reflect that!
Here is how I changed stuff :
first installing torch_XLA:
*****TPU
![CaptureTPU](https://private-user-images.githubusercontent.com/80259106/263160444-d0e5f98f-abb6-42ab-a63e-de43b28cc04d.PNG?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjEzOTczNTksIm5iZiI6MTcyMTM5NzA1OSwicGF0aCI6Ii84MDI1OTEwNi8yNjMxNjA0NDQtZDBlNWY5OGYtYWJiNi00MmFiLWE2M2UtZGU0M2IyOGNjMDRkLlBORz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzE5VDEzNTA1OVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWIxYmFmZWJhNDYwYzA1Y2E2YTNiMGJlMjNhYjQ2ZDU1YjYzMGZkZmZjMGRiMjdlZjc3NTlhZjI2NTgyN2FhMDQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.uxvVJzhKdN20vIi8KMuWuiAWXJKMXgvHBpwuzaJqfjg)
![CaptureGPU](https://private-user-images.githubusercontent.com/80259106/263160658-54ad3c5f-e742-48cb-970b-55f3826642b1.PNG?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjEzOTczNTksIm5iZiI6MTcyMTM5NzA1OSwicGF0aCI6Ii84MDI1OTEwNi8yNjMxNjA2NTgtNTRhZDNjNWYtZTc0Mi00OGNiLTk3MGItNTVmMzgyNjY0MmIxLlBORz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzE5VDEzNTA1OVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTJkY2VjYTA3Nzg1MTY0MjczN2ZhNzhjM2E4OWRjZjc0MTRiYmVhN2RhZGVkMDRhM2UwOGRlMjQ2ODQxODAzZTAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.uuhRt9KIsQmboBQ8dSjQ6ORslILK0IJ62f6WYRIOmXw)
***********GPU
The text was updated successfully, but these errors were encountered: