📚 Documentation
There's a subtle difference between torch.foo(device=xla) vs torch.foo().to(xla) and we should document this in a FAQ section or similar. The first one runs the foo on the TPU. The second one runs the foo on the CPU and then moves the buffer to the TPU.