Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[export] Serialize constrain_as_size ops #107386

Closed
wants to merge 1 commit into from
Closed

Conversation

angelayi
Copy link
Contributor

Since constrain_as_size has been fixed, I tried serializing it, but ran into some issues.
Notably, after each .transform call, I added a helper _get_updated_range_constraints to update the range constrains list. This is because when we retrace in a pass, the symbolic values being used changes, so we need to update this dictionary.

@angelayi angelayi marked this pull request as ready for review August 17, 2023 16:08
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 17, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/107386

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c7027cc with merge base aa9f6a4 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

k: RangeConstraint(v.lower, v.upper) for k, v in shape_env.var_to_range.items()
}
return range_constraints

transformed_ep = ExportedProgram(
Copy link
Contributor

@zhxchen17 zhxchen17 Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the root problem is not only about range constraints themselves, but rather because we use ExportedProgram directly here, therefore some metadata is not guaranteed to be in sync by design.

transformed_ep = ExportedProgram(
transformed_gm,
transformed_gm.graph,
copy.deepcopy(self.graph_signature),
copy.deepcopy(self.call_spec),
self.state_dict,
copy.deepcopy(self.range_constraints),
_get_updated_range_constraints(transformed_gm),
copy.deepcopy(self.equality_constraints),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about equality constraints? I'm assuming they will be out of sync as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, I'd like to understand what about shape env has changed. You seem to suggest that only constrain_as_size is affected, which would explain why you focused only on range constraints. But I'm not sure yet what actually changed, so can't judge whether your PR is complete.

Copy link
Contributor Author

@angelayi angelayi Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we write a pass, we retrace the graph in a pass, it creates new symbolic values. For example, if a .item call originally resulted in the symint i0, after retracing, a new symint, i1, will be produced. However, the i1 will not show up in the range constraints, causing some errors when deserializing.

Equality constraints won't be as out of sync, since it just stores the input string name and it's dimension. It won't be affected unless any passes modify the inputs. But that's the same problem to solve as the graph signature going out of sync.

As Zhengxu mentioned, the best solution would be to just retrace after .transform, but I think that will break downstream users which is why I added this helper function for now to update the range constraints.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah. @gmagogsfm made a similar change to update the signature, but assuming that the structure of inputs / outputs hasn't changed. In your case it's indeed harder to assume that transform() will preserve structure so that simple renaming is enough. So I agree that the best solution would be to retrace. :/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually you're not just renaming. Could you comment on when your solution might be incorrect?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angelayi

but I think that will break downstream users which is why I added this helper function for now to update the range constraints.

Could you clarify what will be broken at downstream? I think downstream shouldn't treat transform() as a stable API to rely on at all. If something is broken, we need to fix forward them on the other side when possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you comment on when your solution might be incorrect?

I can't think of when my solution will be incorrect, but it's just that retracing is the most stable solution.

Could you clarify what will be broken at downstream? I think downstream shouldn't treat transform() as a stable API to rely on at all.

Well, currently transform() is treated as a stable API, so that's a problem. And for what will be broken, I think edge dialect will be broken as every time we retrace we go back to ATen dialect. A fix would be to then run the aten to edge passes after each transform() call. But un-stabilizing the transform() API and using retracing during passes feels like a separate issue from this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angelayi Did we ever tell people transform() is a stable API in the first place? I don't quite remember.

Putting this aside, I think we should make def transform() as def _transform() in a separate diff, so that this doesn't confuse people any further.

I still think this function fundamentally breaks our invariant of only introducing ExportedProgram with torch.export(), therefore we shouldn't let people use it any time soon.

Copy link
Contributor

@avikchaudhuri avikchaudhuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at the code of _get_updated_range_constraints, it seems like we could always reconstruct range constraints like this, no?

I'm still curious what makes us think this code might be incorrect, but I don't think it's too bad. Thus approving to unblock.

@zhxchen17
Copy link
Contributor

to be clear I think this PR is technically correct, but I'm super duper concerned about the approach we're taking to patch ExportedProgram each time when some metadata is out of sync. imho ExportedProgram should be a result of exporting, rather than transformations (and the name suggests this).

I can let this specific PR go, but it could get complicated if we keep this approach moving forward because of all the metadata sync issues.

@angelayi
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 18, 2023
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@angelayi
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled. If you believe this is a mistake,then you can re trigger it through pytorch-bot.

@angelayi
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: export
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants