Remove more invalid / uneven shardings #23

fmassa · 2025-07-01T13:05:54Z

Before we were only removing shardings which were invalid for the inputs of the ops. Now we are also removing those which are invalid for the output. With that, we can now remove the solver constraint to remove invalid views, as those don't appear anymore.

There was also a slight issue with the way we were banning invalid views in the solver, and this should fix it

Additionally, also remove uneven sharding for the parameters / buffers of the model

Before we were only removing shardings which were invalid for the inputs of the ops. Now we are also removing those which are invalid for the output. With that, we can now remove the solver constraint to remove invalid views, as those don't appear anymore

wconstab · 2025-07-01T13:32:04Z

Could you clarify the different types of invalid sharping that we wer facing, and which ones should be fixed at the dtensor level?

I think because dtensor supports uneven sharding, it is less clear what counts as invalid. I think for auto parallel we want things to be symmetric across ranks so that's an additional constraint on our end?

fmassa · 2025-07-01T13:47:03Z

Hi Will,

I've commented in #22 just now about some of my thoughts.

I think that ultimately we will want to support uneven sharding, but I think for development it is preferable to keep the setup simpler (as we can inspect only a single GPU to verify what has been output)

The types of things we want fixed is maybe to enforce is_tensor_shardable for all ops, so that we can be sure we can rely on the outputs of the sharding propagator.

wconstab · 2025-07-02T21:35:02Z

autoparallel/propagation_rules.py

+        output_specs = strategy.output_specs
+        if isinstance(output_specs, DTensorSpec):
+            output_specs = [output_specs]
+        specs = list(strategy.input_specs) + list(output_specs)


nit: list(output_specs) seems redundant to the line above?

strategy.output_specs can also be a tuple of DTensorSpec, so I'm just trying to make sure we are not concatenating lists and tuples together

wconstab · 2025-07-02T21:37:59Z

autoparallel/optimize_sharding.py

-            if len(orig_shape) > len(shape):
-                # TODO: FIXME as I think we should also handle this case
-                continue
-            # print("in heeeererereer", orig_shape, shape)


fmassa requested review from bdhirsh and wconstab July 1, 2025 13:05

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 1, 2025

wconstab approved these changes Jul 2, 2025

View reviewed changes

wconstab reviewed Jul 2, 2025

View reviewed changes

wconstab merged commit 5726d7c into main Jul 2, 2025
4 checks passed

wconstab deleted the fmassa/remove_further_constraints branch July 2, 2025 21:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove more invalid / uneven shardings #23

Remove more invalid / uneven shardings #23

Uh oh!

fmassa commented Jul 1, 2025 •

edited

Loading

Uh oh!

wconstab commented Jul 1, 2025

Uh oh!

fmassa commented Jul 1, 2025

Uh oh!

wconstab Jul 2, 2025

Uh oh!

fmassa Jul 3, 2025

Uh oh!

wconstab Jul 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Remove more invalid / uneven shardings #23

Remove more invalid / uneven shardings #23

Uh oh!

Conversation

fmassa commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wconstab commented Jul 1, 2025

Uh oh!

fmassa commented Jul 1, 2025

Uh oh!

wconstab Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

fmassa Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

wconstab Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fmassa commented Jul 1, 2025 •

edited

Loading