Skip to content

Conversation

kimishpatel
Copy link
Contributor

Summary:
During cleanup phase, calling recordReferencedAttrs would record
the attributes which are referenced and hence kept.
However, if you have two instances of the same type which are preserved
through freezing process, as the added testcase shows, then during
recording the attributes which are referenced, we iterate through the
type INSTANCES that we have seen so far and record those ones.
Thus if we have another instance of the same type, we will just look at
the first instance in the list, and record that instances.
This PR fixes that by traversing the getattr chains and getting the
actual instance of the getattr output.

Test Plan:
python test/test_jit.py TestFreezing

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #{issue number}

@kimishpatel kimishpatel requested a review from bzinodev August 3, 2020 14:47
@kimishpatel kimishpatel requested a review from apaszke as a code owner August 3, 2020 14:47
@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Aug 3, 2020
@kimishpatel kimishpatel requested a review from vkuzo August 3, 2020 14:47
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kimishpatel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@dr-ci
Copy link

dr-ci bot commented Aug 3, 2020

💊 CI failures summary and remediations

As of commit e8bf0e2 (more details on the Dr. CI page):


  • 1/4 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)
  • 3/4 broken upstream at merge base ebc7ebc on Aug 13 from 5:14am to 11:40am PDT (11 commits; d39cb84 - fd5ed4b)

🚧 3 fixed upstream failures:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

Since your merge base is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

Check out the recency history of this "viable master" tracking branch.


ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 49 times.

@vkuzo
Copy link
Contributor

vkuzo commented Aug 3, 2020

hmm, I'm probably not the best person to review this code properly. Can someone from JIT review?

@vkuzo vkuzo removed their request for review August 3, 2020 17:16
@kimishpatel
Copy link
Contributor Author

hmm, I'm probably not the best person to review this code properly. Can someone from JIT review?

Yeah, I put you here for FYI. I put Zino for review.

@kimishpatel kimishpatel force-pushed the fix_freeze_module_for_shared_types branch 2 times, most recently from 73ad43b to be7fbfb Compare August 3, 2020 23:10
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kimishpatel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@@ -11,6 +12,20 @@ namespace torch {
namespace jit {

namespace {
ModulePtr getModulePtrForGetAttrNode(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use findConstantAttr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could. It would require some refactoring of that function, because it does more than just follow getAttr chain. If getModulePtrForGetAttrNode was too big I would probably do that. If you prefer that I can think of refactoring that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It am okay to keep them separate for now. There is some duplication of code but not too much....


#include <torch/csrc/jit/ir/alias_analysis.h>
#include <torch/csrc/jit/passes/inliner.h>
#include <torch/csrc/jit/passes/quantization/helper.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove quantanzation/helper.h dependency?
you are invoking findChildModule (small routine that can be inlined.

@kimishpatel kimishpatel force-pushed the fix_freeze_module_for_shared_types branch from be7fbfb to 226bb37 Compare August 6, 2020 15:11
@kimishpatel kimishpatel requested a review from bzinodev August 7, 2020 14:07
@kimishpatel kimishpatel force-pushed the fix_freeze_module_for_shared_types branch from 226bb37 to 0b69301 Compare August 8, 2020 02:40
@kimishpatel kimishpatel force-pushed the fix_freeze_module_for_shared_types branch from 0b69301 to 86529f6 Compare August 10, 2020 21:11
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kimishpatel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@zou3519
Copy link
Contributor

zou3519 commented Aug 12, 2020

This seems to have broken the mac build (see https://circleci.com/gh/pytorch/pytorch/6664211?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link status on master).

That matches one of the failing builds when this PR was submitted (https://circleci.com/gh/pytorch/pytorch/6620836?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link), so I'm reverting this PR.

@kimishpatel
Copy link
Contributor Author

This seems to have broken the mac build (see https://circleci.com/gh/pytorch/pytorch/6664211?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link status on master).

That matches one of the failing builds when this PR was submitted (https://circleci.com/gh/pytorch/pytorch/6620836?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link), so I'm reverting this PR.

Thanks. My bad. Let me check.

@facebook-github-bot
Copy link
Contributor

@kimishpatel merged this pull request in 4665f3f.

@kimishpatel kimishpatel reopened this Aug 12, 2020
@kimishpatel kimishpatel force-pushed the fix_freeze_module_for_shared_types branch from 86529f6 to 52e1bce Compare August 13, 2020 16:31
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kimishpatel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@kimishpatel kimishpatel force-pushed the fix_freeze_module_for_shared_types branch 2 times, most recently from bb7c6fe to de23d86 Compare August 14, 2020 17:32
Summary:
During cleanup phase, calling recordReferencedAttrs would record
the attributes which are referenced and hence kept.
However, if you have two instances of the same type which are preserved
through freezing process, as the added testcase shows, then during
recording the attributes which are referenced, we iterate through the
type INSTANCES that we have seen so far and record those ones.
Thus if we have another instance of the same type, we will just look at
the first instance in the list, and record that instances.
This PR fixes that by traversing the getattr chains and getting the
actual instance of the getattr output.

Test Plan:
python test/test_jit.py TestFreezing

Reviewers:

Subscribers:

Tasks:

Tags:
@kimishpatel kimishpatel force-pushed the fix_freeze_module_for_shared_types branch from de23d86 to e8bf0e2 Compare August 14, 2020 18:43
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kimishpatel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@eellison
Copy link
Contributor

Can we revert this ? it's responsible for #45902

@kimishpatel
Copy link
Contributor Author

@bzinodev, would mind relanding this if it is reverted? @vkuzo is this fix actually hepful for quantization anywhere?
@eellison my guess is that the reason of the failure for torchvision stuff is that you have some create_object in the scripted module? Do you know anymore as to why it fails.

@kimishpatel
Copy link
Contributor Author

@eellison btw this fails for me:

model = torchvision.models.detection.keypoint_rcnn.keypointrcnn_resnet50_fpn(pretrained=True, min_size=200, max_size=300)
model.eval()
script_module = torch.jit.script(model)

Failure:
scripted = create_script_module_impl(orig_value, sub_concrete_type, stubs_fn)
File "/data/users/kimishpatel/pytorch_new/pytorch/torch/jit/_recursive.py", line 428, in create_script_module_impl
create_methods_and_properties_from_stubs(concrete_type, method_stubs, property_stubs)
File "/data/users/kimishpatel/pytorch_new/pytorch/torch/jit/_recursive.py", line 320, in create_methods_and_properties_from_stubs
concrete_type._create_methods_and_properties(property_defs, property_rcbs, method_defs, method_rcbs, method_defaults)
RuntimeError: Can't redefine method: forward on class: torch.torchvision.models.detection.rpn.RegionProposalNetwork

@vkuzo
Copy link
Contributor

vkuzo commented Oct 12, 2020

@vkuzo is this fix actually hepful for quantization anywhere?

I remember filing the original issue #42039 because I saw this on a toy model, and it was blocking writing a good test plan for an unrelated quantization PR. I don't have any data one way or the other on the impact of this on real models, but seems like the original issue should be fixed in some way (whether that's this PR or something else).

@eellison
Copy link
Contributor

eellison commented Oct 13, 2020

I think this is fixing the issue in the wrong way, and it's also breaking freezing for OSS users. As far as I can tell the issue is in quantization - we should fix the issue there. I wouldn't be surprised if that quant issue is also responsible for #46054

@kimishpatel
Copy link
Contributor Author

I think this is fixing the issue in the wrong way, and it's also breaking freezing for OSS users. As far as I can tell the issue is in quantization - we should fix the issue there. I wouldn't be surprised if that quant issue is also responsible for #46054

Independent of the quantization issue, the bug fixed by this PR is real. Effectively the previous code was looking at the first instance of the given type rather than finding the actual instance.

@vkuzo
Copy link
Contributor

vkuzo commented Oct 13, 2020

I think this is fixing the issue in the wrong way, and it's also breaking freezing for OSS users. As far as I can tell the issue is in quantization - we should fix the issue there. I wouldn't be surprised if that quant issue is also responsible for #46054

I'd love to learn more context on what is pointing to a problem in quantization code.

@eellison
Copy link
Contributor

I'd love to learn more context on what is pointing to a problem in quantization code.

Yea, maybe this premature, but the test case used for the PR is quantization. If it's not a quant issue it'd be nice to get an independent repro.

@vkuzo
Copy link
Contributor

vkuzo commented Oct 13, 2020

ah, ic, cool. Yeah, it seemed like an issue in freezing (since the quantization code used to surface it is super simple and has been stable for awhile), but definitely let us know if something is fishy.

@eellison
Copy link
Contributor

@kimishpatel could you provide a repro that doesn't depend on quantization?

@kimishpatel
Copy link
Contributor Author

@kimishpatel could you provide a repro that doesn't depend on quantization?

I dont have a repro without quantization right now. It will take some time produce something that does that.

@kimishpatel
Copy link
Contributor Author

@kimishpatel could you provide a repro that doesn't depend on quantization?

I dont have a repro without quantization right now. It will take some time produce something that does that.

@eellison also do you know more about the reason of the failure? As in, is it something specific to the model, since I have seen this error before when we could not follow chains of getattr due to something else.

@eellison
Copy link
Contributor

eellison commented Oct 13, 2020

@kimishpatel if you look at the linked PR, the test case in this PR was fixed by #46250. The failure seems to be a quant bug of not updating the getAttr mapped type.

Copying myself from the linked PR:

In the future it would be good to provide repro's for freezing issues without including a quantization dependency; there was another another issue in freezing (see: #46054) who's root cause was the same quantization issue #46250.

@kimishpatel
Copy link
Contributor Author

@eellison sounds good. I will try to repro without quantization and see if this still holds.

facebook-github-bot pushed a commit that referenced this pull request Oct 15, 2020
Summary:
Fixes #45902 by reverting #42457

The test case introduced by #42457 was fixed by #46250, which I'm assuming is the real source of the bug.

In the future it would be good to provide repro's for freezing issues without including a quantization dependency; there was another another issue in freezing (see: #46054) who's root cause was the same quantization issue #46250.

Pull Request resolved: #46285

Reviewed By: bdhirsh

Differential Revision: D24288739

Pulled By: eellison

fbshipit-source-id: b69ee8c713f749cd93d5eba370c3eafed86568bb
@facebook-github-bot facebook-github-bot deleted the fix_freeze_module_for_shared_types branch January 27, 2021 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged oncall: jit Add this issue/PR to JIT oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants