Break graph on `manual_seed`. #107594

ysiraichi · 2023-08-21T14:21:57Z

Stack from ghstack (oldest at bottom):

-> Break graph on manual_seed. #107594

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @anijain2305

Fix: #107187 [ghstack-poisoned]

pytorch-bot · 2023-08-21T14:22:00Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/107594

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 278fa5a with merge base b7624fc ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

win-vs2019-cpu-py3 / test (default, 1, 3, windows.4xlarge.nonephemeral) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Fix: #107187 cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx ipiszy chenyang78 aakhundov [ghstack-poisoned]

ysiraichi · 2023-08-21T19:22:49Z

test/inductor/test_torchinductor_opinfo.py

+# OpInfo that call wrapper_set_seed.
+# Ref: https://github.com/pytorch/pytorch/issues/107187
+# Since then, manual_seed calls graph-breaks. Thus, in that case, we need to allow
+# Python code.
+inductor_nopython_error = {
+    "item",
+    "cauchy",
+    "exponential",
+    "geometric",
+    "log_normal",
+    "normal",
+    "normal.in_place",
+    "normal.number_mean",
+    "uniform",
+    "nn.functional.fractional_max_pool2d",
+    "nn.functional.fractional_max_pool3d",
+    "nn.functional.rrelu",
+    "nn.functional.scaled_dot_product_attention",
+    "svd_lowrank",
+    "pca_lowrank",
+    "randn",
+    "randn_like",
+    "rand_like",
+    "randint",
+    "randint_like",
+    "empty_strided",
+    "multinomial",
+    "bernoully",
+    "nn.functional.dropout",
+    "nn.functional.dropout2d",
+    "nn.functional.dropout3d",
+    "nn.functional.alpha_dropout",
+    "nn.functional.feature_alpha_dropout",
+    "nn.functional.multi_head_attention_forward",
+}


There are many tests failing because they called torch.manual_seed in their OpInfo. Since we are graph-breaking on manual_seed calls, we need to allow Python code for them.

Can we instead monkey-patch manual_seed to be a no-op in the tests, and then set manual_seed ourselves outside the test invocation ?

Would be nice to get coverage for these ops. I think that and adding "fallback_random": True to below would cause a lot to pass.

@torch._inductor.config.patch( {"implicit_fallbacks": False, "triton.autotune_pointwise": False} )

Hmm. Apparently, it worked without fallback_random, but simply making wrapper_set_seed a noop. Am I missing anything?

The opfinos for all the rng functions expect failures right now: https://github.com/pytorch/pytorch/blob/main/test/inductor/test_torchinductor_opinfo.py#L281

I think why it worked

But, take a look at this part of check_model function. We already set torch.manual_seed(0) before calling both the plain model and the optimized one. So, maybe this is not the problem!?

pytorch/test/inductor/test_torchinductor.py

Lines 289 to 312 in c68d0a7

torch.manual_seed(0)

correct = model(*ref_inputs, **ref_kwargs)

# downcast the model back if needed

if reference_in_float and has_lowp_args:

if hasattr(model, "to"):

model = model.to(original_lowp_dtype)

torch._inductor.metrics.reset()

called = False

def compile_fx_wrapper(model_, example_inputs_):

nonlocal called

called = True

return compile_fx(model_, example_inputs_)

def run(*ex, **kwargs):

return model(*ex, **kwargs)

run = torch._dynamo.optimize(compile_fx_wrapper, nopython=nopython)(run)

torch.manual_seed(0)

actual = run(*example_inputs, **kwargs)

For randint and some others (guess), I think they have different pseudo-generating algorithms:

CPU eager: https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/CPUGeneratorImpl.h

CPU Inductor: https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/core/PhiloxRNGEngine.h

Plus, as far as I've seen, randint doesn't seem to get its seed from the default_generator (which is the one affected by manual_seed).

Fix: #107187 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov anijain2305 [ghstack-poisoned]

Fix: #107187 ghstack-source-id: ef61dd5 Pull Request resolved: #107594

Fix: #107187 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov anijain2305 [ghstack-poisoned]

Fix: #107187 ghstack-source-id: bf1921e Pull Request resolved: #107594

Fix: #107187 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov anijain2305 [ghstack-poisoned]

Fix: #107187 ghstack-source-id: 3596033 Pull Request resolved: #107594

Fix: #107187 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov anijain2305 [ghstack-poisoned]

Fix: #107187 ghstack-source-id: d76a72e Pull Request resolved: #107594

eellison · 2023-08-29T00:13:34Z

test/inductor/test_torchinductor_opinfo.py



-torch.testing._internal.common_methods_invocations.wrapper_set_seed = wrapper_set_seed
+torch.testing._internal.common_methods_invocations.wrapper_set_seed = (


We should be patching this inside of the test, with unittest.patch, so we dont inadvertently affect other test and maybe be calling freeze_rng_state as well

ysiraichi · 2023-08-29T12:57:42Z

@pytorchbot merge

pytorchmergebot · 2023-08-29T12:59:50Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

huydhn · 2023-09-04T17:59:05Z

@pytorchbot revert -m 'Sorry for reverting your change, but it has an import issue that breaks internal code' -c ghfirst

Here is the import error from the diff D48890949:

======================================================================
ERROR: test_intersect_ray_with_plane_behind_sphere (arvr.projects.eye_tracking.Fringe.test.test_geometry.TestStructuredLightGeometry)
Test intersection with a plane behind a sphere.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/par_unpack.mGtwZSdDV/arvr/projects/eye_tracking/Fringe/test/test_geometry.py", line 23, in setUp
    torch.random.manual_seed(random_seed)
  File "/tmp/par_unpack.mGtwZSdDV/torch/_compile.py", line 22, in inner
    import torch._dynamo
  File "/tmp/par_unpack.mGtwZSdDV/torch/_dynamo/__init__.py", line 2, in <module>
    from . import allowed_functions, convert_frame, eval_frame, resume_execution
  File "/tmp/par_unpack.mGtwZSdDV/torch/_dynamo/convert_frame.py", line 23, in <module>
    from . import config, exc
  File "/tmp/par_unpack.mGtwZSdDV/torch/_dynamo/exc.py", line 15, in <module>
    from torch.fb.exportdb.logging import exportdb_error_message
ModuleNotFoundError: No module named 'torch.fb'
----------------------------------------------------------------------

The internal module torch.fb was pulled via from ._compile import _disable_dynamo. cc @eellison As you have access to the internal diff, could you please help @ysiraichi fix the internal import error and reland the change?

pytorchmergebot · 2023-09-04T18:00:31Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2023-09-04T18:00:42Z

@ysiraichi your PR has been successfully reverted.

This reverts commit 6ad5568. Reverted #107594 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it has an import issue that breaks internal code ([comment](#107594 (comment)))

eellison · 2023-09-05T22:36:26Z

@ysiraichi would you try adding changing this to the following and resubmitting? :

if is_fbcode():
    try:
        from torch.fb.exportdb.logging import exportdb_error_message
    except ModuleNotFoundError:

        def exportdb_error_message(case_name):
            return ""

Trying to re-land #107594. Pull Request resolved: #108647 Approved by: https://github.com/eellison

Re-landing: #108647 (old #107594) [ghstack-poisoned]