Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TPU support for gossipy #7

Open
ParsaMokhtariHessar opened this issue Aug 25, 2023 · 2 comments
Open

TPU support for gossipy #7

ParsaMokhtariHessar opened this issue Aug 25, 2023 · 2 comments

Comments

@ParsaMokhtariHessar
Copy link

I recently attempted to bring TPU support to gossipy. however, when using it it shows 4 hours and 30 minutes to completion of my simulation, significantly slower than the GPU training time of approximately 45 minutes. I was wondering if you could make it work:
I have read that TPUs are supposed to be significantly faster but my attempt does not reflect that!
Here is how I changed stuff :
first installing torch_XLA:

!pip install torch_xla https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-2.0-cp310-cp310-linux_x86_64.whl
import torch_xla.core.xla_model as xm

class GlobalSettings(metaclass=Singleton):
    """Global settings for the library.""" 
    
    _device = 'cpu'

    def auto_device(self) -> torch.device:
        """Set device to TPU if available, otherwise cuda if available, otherwise cpu.
        
        Returns
        -------
        torch.device
            The device.
        """
        if xm.xla_device_exists():
            self._device = xm.xla_device()
        elif torch.cuda.is_available():
            self._device = torch.device('cuda')
        else:
            self._device = torch.device('cpu')
        return self._device
    
    def set_device(self, device_name: str) -> torch.device:
        """Set the device.
    
        Parameters
        ----------
        device_name: name of the device to set (possible values are 'auto', 'cuda', 'cpu', and 'tpu').
        When device_name is 'auto', 'cuda' is used if available, otherwise 'cpu'.
        
        Returns
        -------
        torch.device
            The device.
        """

        if device_name == "auto":
            return GlobalSettings().auto_device()
        elif device_name == "tpu" and xm.xla_device():
            self._device = xm.xla_device()
        else:
            self._device = torch.device(device_name)
        
        return self._device
    
    def get_device(self):
        """Get the device.

        Returns
        -------
        torch.device
            The device.
        """
        return self._device

*****TPU
CaptureTPU
***********GPU
CaptureGPU

@makgyver
Copy link
Owner

Hi @ParsaMokhtariHessar , thank you for your interest in gossipy. It seems that what you have done is correct, so it's hard to say what is wrong. Have you tried working with TPUs using a neural net outside gossipy? So to be sure that the problem comes from the framework. Anyway, (this happens also with GPUs), in these simulations there are a bunch of models and only one of them is loaded into the GPU/TPU memory at a time, and what happens is that the time overhead to move the models in/out of the memory usually overcomes the benefits of using GPU/TPUs. This is my take on this, but I honestly have to say that I did not dig into it.

@ParsaMokhtariHessar
Copy link
Author

I see! And I guess it would be too much memory to attempt to move all calculations to the TPU VRAM. Nonetheless the "Loading trip" is significantly faster in the case of a GPU. I am going to attempt to time a simple MNIST train with CPU, GPU, and TPU and get back to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants