Pin all layout by default and use all_reduce based all_gather when la… #3568

JackCaoG · 2022-05-14T00:45:33Z

…yout is pin

In PyTorch/XLA we need to pin all layouts for cross-core-communciation ops since program is built separately for every core. However there is not an easy way to pin the layout for all-gather on TPU. In this pr I changed the all-gather is to use all-reduce when user actually wants to pin the layout. I also changed the layout-pinning to True by defualt. It was set to False because I can't pin all-gather.

@hjm-aws I am guessing you guys still prefer to use all_gather directly so I set the pin_layout to False for all xla_process_group. Let me know if that works for you.

FYI @hjm-aws @ronghanghu

…yout is pin

torch_xla/core/xla_model.py

JackCaoG · 2022-05-18T18:42:50Z

@hjm-aws If this looks OK to you I will merge it today.

Pin all layout by default and use all_reduce based all_gather when la…

99c5c98

…yout is pin

JackCaoG requested review from miladm and hjm-aws May 14, 2022 00:45

hjm-aws reviewed May 16, 2022

View reviewed changes

torch_xla/core/xla_model.py Outdated Show resolved Hide resolved

hjm-aws reviewed May 16, 2022

View reviewed changes

torch_xla/core/xla_model.py Outdated Show resolved Hide resolved

JackCaoG and others added 2 commits May 17, 2022 02:32

Only using all_reduce to implement all_gather for TPU and GPU

197a2de

fix typo

0817a64

JackCaoG requested a review from hjm-aws May 18, 2022 18:42

hjm-aws approved these changes May 18, 2022

View reviewed changes

JackCaoG merged commit 7f96fbf into master May 18, 2022

JackCaoG deleted the all_gather_use_all_reduce branch May 18, 2022 20:37

ronghanghu mentioned this pull request Aug 3, 2022

TPU Pod support with PjRt #3813

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pin all layout by default and use all_reduce based all_gather when la… #3568

Pin all layout by default and use all_reduce based all_gather when la… #3568

Uh oh!

JackCaoG commented May 14, 2022

Uh oh!

Uh oh!

Uh oh!

JackCaoG commented May 18, 2022

Uh oh!

Uh oh!

Pin all layout by default and use all_reduce based all_gather when la… #3568

Pin all layout by default and use all_reduce based all_gather when la… #3568

Uh oh!

Conversation

JackCaoG commented May 14, 2022

Uh oh!

Uh oh!

Uh oh!

JackCaoG commented May 18, 2022

Uh oh!

Uh oh!