Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in Importing XLA #3215

Closed
ma-batita opened this issue Nov 15, 2021 · 5 comments
Closed

Error in Importing XLA #3215

ma-batita opened this issue Nov 15, 2021 · 5 comments
Assignees
Labels
bug Something isn't working stale Has not had recent activity

Comments

@ma-batita
Copy link

🐛 Bug

I am trying to import torch_xla but I keep getting some bizarre import error :

ImportError: /usr/local/lib/python3.7/dist-packages/_XLAC.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN2at13_foreach_erf_EN3c108ArrayRefINS_6TensorEEE

I followed The exact same steps as in this tuto. It is not the first time I use XLA/TPU. This one is new to me. That is why I am opening this issue.

To Reproduce

Steps to reproduce the behavior:

  1. Setup the runtime on TPU
  2. install cloud tpu client (latest verions 0.10)
  3. import torch, torch_xla, torch_xla.core.xla_model (as xm)
import os
assert os.environ['COLAB_TPU_ADDR']

import torch

import torch_xla
import torch_xla.core.xla_model as xm

the error is :


WARNING:root:Waiting for TPU to be start up with version pytorch-1.9...
WARNING:root:Waiting for TPU to be start up with version pytorch-1.9...
WARNING:root:TPU has started up successfully with version pytorch-1.9
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-5-ebe519c076f6> in <module>()
      3 
      4 # imports the torch_xla package
----> 5 import torch_xla
      6 import torch_xla.core.xla_model as xm

/usr/local/lib/python3.7/dist-packages/torch_xla/__init__.py in <module>()
    140 import torch
    141 from ._patched_functions import _apply_patches
--> 142 import _XLAC
    143 
    144 

ImportError: /usr/local/lib/python3.7/dist-packages/_XLAC.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN2at13_foreach_erf_EN3c108ArrayRefINS_6TensorEEE

---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.

Expected behavior

After running the importation of torch_xla we should get a successful start without error :

WARNING:root:Waiting for TPU to be start up with version pytorch-1.9...
WARNING:root:Waiting for TPU to be start up with version pytorch-1.9...
WARNING:root:TPU has started up successfully with version pytorch-1.9

Environment

  • Reproducible on XLA backend [CPU/TPU]: TPU
  • torch_xla version: 1.9
@blancsw
Copy link

blancsw commented Nov 15, 2021

Same error for me on google colab. Worked well last week

@blancsw
Copy link

blancsw commented Nov 16, 2021

@BttMA
I found the problem, google colab upgrade the default pytorch version 1.9.0+cu111 to 1.10.0+cu111.
So we need to downgrade the pytorch version by doing this:

pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.9-cp37-cp37m-linux_x86_64.whl
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchtext==0.10.0 -f https://download.pytorch.org/whl/cu111/torch_stable.html

Then you can import XLA

import torch
print(torch.__version__)
import torch_xla
import torch_xla.core.xla_model as xm

Enjoy !

@ma-batita
Copy link
Author

@blancsw Thanks man 👍 (It takes minutes to install and run but ... it works!)

@JackCaoG @miaoshasha ; you need to think about updating cloud-tpu-clien, torch and torch_xla (with their dependancies) ;)

@ma-batita
Copy link
Author

@JackCaoG @miaoshasha Now I have to tell you that this raise another warning related to the dataloader.

UserWarning: Your `val_dataloader` has `shuffle=True`,it is strongly recommended that you turn this off for val/test/predict dataloaders.
f"Your `{mode.dataloader_prefix}_dataloader` has `shuffle=True`,"

And there is no shuffle=True in my code (only in the train)

class dataModule(pl.LightningDataModule):

#....

  def train_dataloader(self):
    return DataLoader(self.train_dataset,
                      sampler=torch.utils.data.distributed.DistributedSampler(self.train_dataset, num_replicas=xm.xrt_world_size(), rank=xm.get_ordinal(), shuffle=True),
                      batch_size=self.batch_size,
                      num_workers=8)

  def val_dataloader(self):
    return DataLoader(self.val_dataset,
                      sampler=torch.utils.data.distributed.DistributedSampler(self.val_dataset, num_replicas=xm.xrt_world_size(), rank=xm.get_ordinal()),
                      batch_size=self.batch_size,
                      num_workers=8)

  def test_dataloader(self):
    return DataLoader(self.test_dataset,
                      sampler=torch.utils.data.distributed.DistributedSampler(self.test_dataset, num_replicas=xm.xrt_world_size(), rank=xm.get_ordinal()),
                      batch_size=self.batch_size,
                      num_workers=8)`

What should I do please ?

I even eliminated shuffle=True from the train_dataloader but the UserWarning persist and it freezes on epoch 0.

@blancsw It would be great if you could lead me through this please :)

@stale
Copy link

stale bot commented Mar 2, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Has not had recent activity label Mar 2, 2022
@stale stale bot closed this as completed Apr 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale Has not had recent activity
Projects
None yet
Development

No branches or pull requests

4 participants