Document the difference between `device=` vs `.to(device)`

## 📚 Documentation

There's a subtle difference between `torch.foo(device=xla)` vs `torch.foo().to(xla)` and we should document this in a FAQ section or similar. The first one runs the `foo` on the TPU. The second one runs the `foo` on the CPU and then moves the buffer to the TPU.