Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DISABLED test_profiler (__main__.TestJit) #65521

Closed
mruberry opened this issue Sep 23, 2021 · 12 comments
Closed

DISABLED test_profiler (__main__.TestJit) #65521

mruberry opened this issue Sep 23, 2021 · 12 comments
Labels
high priority module: tests Issues related to tests (not the torch.testing module) oncall: jit Add this issue/PR to JIT oncall triage queue oncall: profiler profiler-related issues (cpu, gpu, kineto) skipped Denotes a (flaky) test currently skipped in CI. triage review
Projects

Comments

@mruberry
Copy link
Collaborator

mruberry commented Sep 23, 2021

See https://github.com/pytorch/pytorch/runs/3682832751. Relevant snippet:

======================================================================
ERROR [0.013s]: test_profiler (__main__.TestJit)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_jit.py", line 2728, in test_profiler
    traced_fn(x)
  File "/opt/conda/lib/python3.6/site-packages/torch/autograd/profiler.py", line 202, in __exit__
    self.kineto_results = _disable_profiler()
RuntimeError: cpu_trace->activities.size() == kineto_events_.size()INTERNAL ASSERT FAILED at "/var/lib/jenkins/workspace/torch/csrc/autograd/profiler_kineto.cpp":239, please report a bug to PyTorch. 

======================================================================
ERROR [0.017s]: test_always_alive_values (jit.test_freezing.TestMKLDNNReinplacing)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/lib/jenkins/workspace/test/jit/test_freezing.py", line 2134, in test_always_alive_values
    self.checkResults(mod_eager, mod)
  File "/var/lib/jenkins/workspace/test/jit/test_freezing.py", line 2091, in checkResults
    self.assertEqual(mod1(inp), mod2(inp))
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1106, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 406, in prof_meth_call
    return prof_callable(meth_call, *args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 400, in prof_callable
    return callable(*args, **kwargs)
RuntimeError: Couldn't find method: '_conv_forward' on class: '__torch__.torch.nn.modules.conv.___torch_mangle_2152.Conv2d (of Python compilation unit at: 0x56250e1ebd80)'

======================================================================
ERROR [0.010s]: test_merge_liveness (jit.test_freezing.TestMKLDNNReinplacing)
----------------------------------------------------------------------
  File "test_jit.py", line 5156, in test_cat
    self.assertAutodiffNode(func2.graph_for(x, y), True, ['aten::cat'], [])
  File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_jit.py", line 282, in assertAutodiffNode
    found_all_nonfusible_nodes and found_all_fusible_nodes, err_msg)
  File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1946, in assertEqual
    super().assertTrue(x == y, msg=msg)
AssertionError: False is not true : 
Failure in testing nodes' autodifferentiation. One or more nodes were expected to be autodiffed, but were not found in specified fusible/nonfusible DifferentiableGraph groups. 
Specifically:
  ['aten::cat'] were not in one of the DifferentiableGraphs when they were expected to be. Did you intend for these nodes to be autodiffed? If not, remove them from the list of nonfusible nodes.

======================================================================
FAIL [0.008s]: test_stack (__main__.TestScript)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_jit.py", line 5198, in test_stack
    self.assertAutodiffNode(func2.graph_for(x, y), True, ['aten::stack'], [])
  File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_jit.py", line 282, in assertAutodiffNode
    found_all_nonfusible_nodes and found_all_fusible_nodes, err_msg)
  File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1946, in assertEqual
    super().assertTrue(x == y, msg=msg)
AssertionError: False is not true : 
Failure in testing nodes' autodifferentiation. One or more nodes were expected to be autodiffed, but were not found in specified fusible/nonfusible DifferentiableGraph groups. 
Specifically:
  ['aten::stack'] were not in one of the DifferentiableGraphs when they were expected to be. Did you intend for these nodes to be autodiffed? If not, remove them from the list of nonfusible nodes.

======================================================================
FAIL [0.004s]: test_tensor_requires_grad (__main__.TestScript)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_jit.py", line 7165, in test_tensor_requires_grad
    self.assertFalse(out_inp[2].requires_grad())
AssertionError: True is not false

======================================================================
FAIL [0.004s]: test_size_and_sizes (jit.test_symbolic_shape_analysis.TestSymbolicShapeAnalysis)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/lib/jenkins/workspace/test/jit/test_symbolic_shape_analysis.py", line 160, in test_size_and_sizes
    self.assertEqual(next(graph.outputs()).type().symbolic_sizes(), [5, 8, sym1])
  File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1957, in assertEqual
    super().assertEqual(x, y, msg=msg)
AssertionError: None != [5, 8, -4896]

======================================================================
FAIL [0.007s]: test_canonicalize_tensor_iterator (jit.test_tracer.TestTracer)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/lib/jenkins/workspace/test/jit/test_tracer.py", line 244, in test_canonicalize_tensor_iterator
    self.assertTrue(str(traced.graph_for(x)).count(': int = prim::Constant') == 5)
AssertionError: False is not true

======================================================================
FAIL [0.002s]: test_inplace_check (jit.test_tracer.TestTracer)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/lib/jenkins/workspace/test/jit/test_tracer.py", line 342, in test_inplace_check
    ge(x)
AssertionError: RuntimeError not raised

----------------------------------------------------------------------
Ran 2564 tests in 86.085s

FAILED (failures=33, errors=16, skipped=91, expected failures=1)

cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @mruberry @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git

@mruberry mruberry added high priority oncall: jit Add this issue/PR to JIT oncall triage queue module: tests Issues related to tests (not the torch.testing module) labels Sep 23, 2021
@github-actions github-actions bot added this to Need triage in JIT Triage Sep 23, 2021
@suo
Copy link
Member

suo commented Sep 23, 2021

@eellison can you/someone look into the freezing tests?

@eellison
Copy link
Contributor

@mruberry do you know when these started failing ?

@mruberry
Copy link
Collaborator Author

@mruberry do you know when these started failing ?

I just saw them fail yesterday

@mruberry mruberry added the oncall: profiler profiler-related issues (cpu, gpu, kineto) label Sep 29, 2021
@mruberry mruberry changed the title Some jit tests are intermittently failing Some jit tests, including test_profiler, are intermittently failing Sep 29, 2021
@mruberry
Copy link
Collaborator Author

Happened again just now: https://github.com/pytorch/pytorch/runs/3741154257. Maybe an issue with the profiler or test_profiler triggering a cascade of failures?

@mruberry
Copy link
Collaborator Author

mruberry commented Oct 4, 2021

Continuing to occur: https://github.com/pytorch/pytorch/runs/3784790656

@eellison
Copy link
Contributor

eellison commented Oct 4, 2021

I'll disable the test_profiler tests or look into them... weird

@suo
Copy link
Member

suo commented Oct 5, 2021

hm, another example: https://github.com/pytorch/pytorch/runs/3804782432.

@eellison can I disable these tests for now pending an investigation?

@eellison
Copy link
Contributor

eellison commented Oct 5, 2021

Sure can you disable https://github.com/pytorch/pytorch/blob/master/test/test_profiler.py ? that seems to be source of error

@suo
Copy link
Member

suo commented Oct 5, 2021

I think it's actually the test_profiler method in test_jit, but I will change this issue to disable that.

@suo suo changed the title Some jit tests, including test_profiler, are intermittently failing DISABLED test_profiler (__main__.TestJit) Oct 5, 2021
@suo suo added the skipped Denotes a (flaky) test currently skipped in CI. label Dec 21, 2021
davidberard98 added a commit that referenced this issue Mar 24, 2022
fixes #65521

[ghstack-poisoned]
davidberard98 added a commit that referenced this issue Mar 24, 2022
fixes #65521

ghstack-source-id: 3202e492b185b981782c5df292d74806b18e25b6
Pull Request resolved: #74697
davidberard98 added a commit that referenced this issue Mar 24, 2022
fixes #65521

[ghstack-poisoned]
davidberard98 added a commit that referenced this issue Mar 24, 2022
fixes #65521

ghstack-source-id: e491e682c4d96e11590ecba27a6f59a7b6f810c6
Pull Request resolved: #74703
davidberard98 added a commit that referenced this issue Apr 4, 2022
fixes #65521

ghstack-source-id: c7ae46718fa046fd4dbafad12a04d1773a885d45
Pull Request resolved: #74703
davidberard98 added a commit that referenced this issue Apr 4, 2022
davidberard98 added a commit that referenced this issue Apr 4, 2022
fixes #65521

[ghstack-poisoned]
davidberard98 added a commit that referenced this issue Apr 8, 2022
davidberard98 added a commit that referenced this issue Apr 8, 2022
fixes #65521

ghstack-source-id: d5b12957e034fde5b92b720232c6396e9e74ff56
Pull Request resolved: #74703
davidberard98 added a commit that referenced this issue Apr 8, 2022
fixes #65521

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 3, 2022

Another case of trunk flakiness has been found here.
Please verify the issue was opened after this instance, that the platforms list includes all of
[linux], or disable bot might not be working as expected.

@sraikund16
Copy link
Contributor

This test does not seem to be in test_profiler anymore

JIT Triage automation moved this from Need triage to Done May 28, 2024
@davidberard98
Copy link
Contributor

@sraikund16 I think this is actually this test:

pytorch/test/test_jit.py

Lines 370 to 377 in 4154c83

def test_profiler(self):
torch._C._set_graph_executor_optimize(False)
def other_fn(x):
return x * 2
x = torch.rand(3, 4)
traced_other_fn = torch.jit.trace(other_fn, x)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority module: tests Issues related to tests (not the torch.testing module) oncall: jit Add this issue/PR to JIT oncall triage queue oncall: profiler profiler-related issues (cpu, gpu, kineto) skipped Denotes a (flaky) test currently skipped in CI. triage review
Projects
JIT Triage
  
Done
Development

No branches or pull requests

5 participants