Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make LazyGraphExecutor extensible #87218

Conversation

antoniojkim
Copy link
Collaborator

Add LazyGraphExecutor to backend interface so that its is extensible by a vendor backend.

I've made some preliminary methods virtual. Not sure if we want to make all methods in LazyGraphExecutor virtual.

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 18, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/87218

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures, 1 Pending

As of commit 11d1021:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@JackCaoG
Copy link
Collaborator

@alanwaketan Does this align with what you plan to implement?

Copy link
Collaborator

@alanwaketan alanwaketan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.

}

LazyGraphExecutor::~LazyGraphExecutor() = default;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reasons why this isn't in the header?

@alanwaketan
Copy link
Collaborator

I guess we don't necessarily need to make all methods virtual all in once. We can just make them on-demand.

@@ -41,7 +42,7 @@ class TORCH_API BackendImplInterface {

virtual const IrBuilder* GetIrBuilder() const = 0;

virtual bool ShouldSyncTensor(const LazyTensorPtr tensor) const;
virtual LazyGraphExecutor* GetLazyGraphExecutor() const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an API now, maybe a few words to describe it would be great.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, have you compare this approach with providing a registration method directly in the LazyGraphExecutor class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, have you compare this approach with providing a registration method directly in the LazyGraphExecutor class?

I have not. Do we anticipate the additional layer of indirection to have much impact on performance?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think performance is the main concern. Since the LazyGraphExecutor becomes a backend interface anyway, it just doesn't feel necessary to have the registration part in the BackendInterface to make the design cleaner.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it just doesn't feel necessary to have the registration part in the BackendInterface to make the design cleaner

I'm not sure what you mean by this. When you say registration, are you referring the to static object initialization?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about things similar to BackendRegistrar.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I see now what you mean. I can implement that for LazyGraphExecutor

void UnregisterTensor(LazyTensor::Data* data);
virtual ~LazyGraphExecutor();

virtual void RegisterTensor(std::shared_ptr<LazyTensor::Data> data);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these become APIs now, maybe provide a few words explaining why people want to override these methods.

Copy link
Contributor

@wconstab wconstab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM- thanks for doing this @antoniojkim and for the feedback @alanwaketan! I like the idea of having the registrar in LazyGraphExecutor, to avoid piling stuff into backend interface.

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 19, 2022
@antoniojkim antoniojkim force-pushed the antoniojkim/extensible_lazy_graph_executor branch from ef4cc8c to fa0d4ab Compare October 19, 2022 14:34
@antoniojkim
Copy link
Collaborator Author

@JackCaoG I don't know if PyTorch/XLA is using the lazy graph executor yet, but if so, this PR will break PyTorch/XLA. If not, please ignore.

@JackCaoG
Copy link
Collaborator

yea.. I saw RuntimeError: Lazy graph executor not registered. in the CI, @alanwaketan can you arrange a fix?

@alanwaketan
Copy link
Collaborator

alanwaketan commented Oct 19, 2022

Working on a companion pull request now. @antoniojkim Here is the README to how to land such patches together if you haven't seen it before. Basically, we need the PyTorch PR to update the xla pin pointing to the companion PR and then land the PR once the CI is all green.

@alanwaketan
Copy link
Collaborator

alanwaketan commented Oct 19, 2022

Here is the companion PR: pytorch/xla#4106. You will need to update https://github.com/pytorch/pytorch/blob/master/.github/ci_commit_pins/xla.txt#L1 with eff277e81fcfdeccba71e75ff40b6e2f3e29e27b.

@antoniojkim antoniojkim requested a review from a team as a code owner October 19, 2022 19:26
@antoniojkim
Copy link
Collaborator Author

@pytorchbot merge -g

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks on your PR pass since you used the green (-g) flag (ETA: 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 2 additional jobs have failed, first few of them are: trunk ,trunk / linux-bionic-cuda11.7-py3.10-gcc7 / test (default, 1, 4, linux.4xlarge.nvidia.gpu)

Details for Dev Infra team Raised by workflow job

@antoniojkim antoniojkim force-pushed the antoniojkim/extensible_lazy_graph_executor branch 2 times, most recently from e1cf63a to 08cbf54 Compare October 20, 2022 15:37
@antoniojkim
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: The following mandatory check(s) failed (Rule superuser):

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

@antoniojkim antoniojkim force-pushed the antoniojkim/extensible_lazy_graph_executor branch from 08cbf54 to 2946121 Compare October 20, 2022 18:17
@antoniojkim antoniojkim force-pushed the antoniojkim/extensible_lazy_graph_executor branch from 2946121 to 11d1021 Compare October 21, 2022 02:11
@antoniojkim
Copy link
Collaborator Author

@pytorchbot merge -g

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks on your PR pass since you used the green (-g) flag (ETA: 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: HTTP Error 502: Bad Gateway

Details for Dev Infra team Raised by workflow job

@antoniojkim
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions
Copy link

Hey @antoniojkim.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

sgrigory pushed a commit to sgrigory/pytorch that referenced this pull request Oct 28, 2022
Add `LazyGraphExecutor` to backend interface so that its is extensible by a vendor backend.

I've made some preliminary methods virtual. Not sure if we want to make all methods in `LazyGraphExecutor` virtual.

Pull Request resolved: pytorch#87218
Approved by: https://github.com/wconstab, https://github.com/alanwaketan
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Nov 5, 2022
Add `LazyGraphExecutor` to backend interface so that its is extensible by a vendor backend.

I've made some preliminary methods virtual. Not sure if we want to make all methods in `LazyGraphExecutor` virtual.

Pull Request resolved: pytorch#87218
Approved by: https://github.com/wconstab, https://github.com/alanwaketan
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
Add `LazyGraphExecutor` to backend interface so that its is extensible by a vendor backend.

I've made some preliminary methods virtual. Not sure if we want to make all methods in `LazyGraphExecutor` virtual.

Pull Request resolved: pytorch#87218
Approved by: https://github.com/wconstab, https://github.com/alanwaketan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged open source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants