Place XLAShard data on CPU on creation #5032

jonb377 · 2023-05-18T18:22:55Z

As a follow-up from #5016, we should place the shards on CPU by default. This makes it clear to the user that the shards will not remain up-to-date with the underlying XLAShardedTensor, and it eases interoperability of the shards' data with other tensors.

Expanding on the second point, the shards are backed by PjRtData when returned from _get_local_shards. In SPMD mode, we expect all computation inputs to have PjRtShardedData handles, which makes the on-device shards incompatible with SPMD execution. In order to use the shards in an on-device computation, they will now need to be transferred back to the device, which implies implicit replication.

test/spmd/test_xla_sharding.py

alanwaketan

LGTM.

yeounoh · 2023-05-20T00:48:23Z

torch_xla/experimental/xla_sharded_tensor.py

  # Shards on the devices are materialized/available after the lazy
  # execution of the SPMDPartitioned HLO graph. Each XLAShard points
-  # to torch.Tensor (xla::device_data). The shards represent a snapshot
+  # to torch.Tensor (xla::device_data). The shards represent a snapshot on CPU


nit. Could we update the comment here, "execution of the partitioned HLO graph. Each XLAShard points to torch.Tensor. The shards represent a snapshot on CPU, detached from the global tensor." ?

yeounoh

Left a minor comment, LGTM.

jonb377 added the distributed SPMD and other distributed things. label May 18, 2023

jonb377 requested review from JackCaoG, alanwaketan and yeounoh May 18, 2023 18:22

JackCaoG reviewed May 18, 2023

View reviewed changes

test/spmd/test_xla_sharding.py Show resolved Hide resolved

jonb377 force-pushed the jonbolin-cpu-shard branch from c777c1a to 6811b8e Compare May 19, 2023 20:01

alanwaketan approved these changes May 19, 2023

View reviewed changes

JackCaoG approved these changes May 19, 2023

View reviewed changes

yeounoh reviewed May 20, 2023

View reviewed changes

yeounoh approved these changes May 20, 2023

View reviewed changes

jonb377 force-pushed the jonbolin-cpu-shard branch 2 times, most recently from 1b605d7 to 9ceadd3 Compare May 20, 2023 00:54

Place XLAShard data on CPU on creation

d967760

jonb377 force-pushed the jonbolin-cpu-shard branch from 9ceadd3 to d967760 Compare May 20, 2023 01:47

jonb377 merged commit 0464095 into master May 22, 2023

jonb377 deleted the jonbolin-cpu-shard branch May 22, 2023 17:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Place XLAShard data on CPU on creation #5032

Place XLAShard data on CPU on creation #5032

jonb377 commented May 18, 2023

Uh oh!

Uh oh!

alanwaketan left a comment

Uh oh!

yeounoh May 20, 2023

Uh oh!

yeounoh left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Place XLAShard data on CPU on creation #5032

Place XLAShard data on CPU on creation #5032

Conversation

jonb377 commented May 18, 2023

Uh oh!

Uh oh!

alanwaketan left a comment

Choose a reason for hiding this comment

Uh oh!

yeounoh May 20, 2023

Choose a reason for hiding this comment

Uh oh!

yeounoh left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants