Skip to content

Conversation

@steventk-g
Copy link
Collaborator

No description provided.

device=xm.xla_device())
xs.mark_sharding(xt1, self._get_mesh((1, self.n_devices)), partition_spec)
self.assertIn("VirtualDeviceUsage", met.counter_names())
self.assertNotEqual(met.counter_value("VirtualDeviceUsage"), 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Could we add another test case showing that the model param sharding doesn't use virtual device?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test for sharding on model weights. It looks like virtual device will still be used to delay the transfer of the model weights in nn.Linear(128, 64).to(xm.xla_device()) until the sharding is applied.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Contributor

@yeounoh yeounoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, requested to add another quick test case.

@steventk-g steventk-g force-pushed the virtual-device-metrics-test branch 2 times, most recently from eb0f306 to 69d829d Compare December 15, 2022 00:18
@steventk-g steventk-g requested a review from yeounoh December 15, 2022 00:19
@steventk-g steventk-g force-pushed the virtual-device-metrics-test branch from 69d829d to ac36708 Compare December 15, 2022 00:41
Copy link
Contributor

@yeounoh yeounoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@steventk-g steventk-g force-pushed the virtual-device-metrics-test branch from ac36708 to 3f17c4a Compare December 15, 2022 19:12
@steventk-g steventk-g marked this pull request as ready for review December 15, 2022 19:12
@steventk-g steventk-g merged commit 3f01528 into master Dec 15, 2022
steventk-g added a commit that referenced this pull request Dec 15, 2022
#4331)

* Add test to verify virtual device usage metrics (#4330)

* Add test to verify that virtual device reduces outbound data size for SPMD

* Update env var manipulation for outbound data test

* Revert "Update env var manipulation for outbound data test"

This reverts commit 15d986a.

* Unwrap metric
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants