Runs on GPU, error on TPU: Computation requires more parameters (546) than supported (limit 236)

## ❓ Questions and Help

Hi all,

Could anyone give a clue what might be going wrong?  I have run this commit, from this [colab](https://colab.research.google.com/drive/1MDY5ZpjlIU9Np4r0xSFNevtGekZT6vZk)

which has produced this output: [debug run](https://drive.google.com/file/d/1-2F1bx1sdjKB1WGDKl2woe_MwZ_0KxOu/view?usp=sharing)

Some lines from it are:

```
Exception in device=TPU:0: Invalid argument: From /job:tpu_worker/replica:0/task:0:
Computation requires more parameters (546) than supported (limit 236).
         [[{{node XRTCompile}}]]
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 119, in _start_fn
    fn(gindex, *args)
  File "ae-wavenet/train.py", line 56, in _mp_fn
    m.train(index)
  File "/content/ae-wavenet/chassis.py", line 127, in train
    loss = self.optim_step_fn()
  File "/content/ae-wavenet/chassis.py", line 95, in <lambda>
    optimizer_args={'closure': self.loss_fn}))
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/core/xla_model.py", line 538, in optimizer_step
    loss = optimizer.step(**optimizer_args)
  File "/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/optim/adam.py", line 62, in step
    loss = closure()
  File "/content/ae-wavenet/chassis.py", line 178, in loss_fn
    self.run_batch()
  File "/content/ae-wavenet/chassis.py", line 170, in run_batch
    batch = next(self.data_iter)
  File "/content/ae-wavenet/chassis.py", line 34, in __next__
    vb = self.per_dev_loader.__next__()[0]
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/distributed/parallel_loader.py", line 31, in __next__
    return self.next()
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/distributed/parallel_loader.py", line 34, in next
    xm.mark_step()
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/core/xla_model.py", line 477, in mark_step
    wait=xu.getenv_as('XLA_SYNC_WAIT', bool, False))
RuntimeError: Invalid argument: From /job:tpu_worker/replica:0/task:0:
Computation requires more parameters (546) than supported (limit 236).
         [[{{node XRTCompile}}]]
Writing run results to /tmp/debug_run-eef90b0a0f8e-root-0
XLA Environment:
  XRT_TPU_CONFIG=tpu_worker;0;10.74.90.234:8470
  TF_FORCE_GPU_ALLOW_GROWTH=true
  XLA_IR_DEBUG=1
  XLA_HLO_DEBUG=1
  TF_CPP_LOG_THREAD_ID=1
  TF_CPP_VMODULE=tensor=5,computation_client=5,xrt_computation_client=5,aten_xla_type=1
  XLA_SAVE_TENSORS_FILE=/tmp/debug_run-eef90b0a0f8e-root-0/graphs
  XLA_METRICS_FILE=/tmp/debug_run-eef90b0a0f8e-root-0/metrics
```

The same code has run successfully on my GTX1070 Max-Q laptop environment with PyTorch version 1.3.1

I've never seen the error before (but it has been several months since I've used torch_xla)

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Runs on GPU, error on TPU: Computation requires more parameters (546) than supported (limit 236) #1963

❓ Questions and Help

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Runs on GPU, error on TPU: Computation requires more parameters (546) than supported (limit 236) #1963

Description

❓ Questions and Help

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions