[inductor][cpu] Reformer multiple thread CPP wrapper performance passed with static shape but failed with dynamic shape

### 🐛 Describe the bug

[inductor_single_run.sh](https://github.com/chuanqi129/inductor-tools/blob/main/scripts/modelbench/inductor_single_run.sh)
static shape:
```
/workspace/pytorch# bash inductor_single_run.sh multiple inference performance huggingface Reformer float32 first static cpp
Testing with cpp wrapper.
Testing with freezing on.
multi-threads testing....
/opt/conda/lib/python3.8/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
loading model: 0it [00:00, ?it/s]config.num_buckets is not set. Setting config.num_buckets to 128...
config.num_buckets is not set. Setting config.num_buckets to 128...
config.num_buckets is not set. Setting config.num_buckets to 128...
loading model: 0it [00:01, ?it/s]
cpu  eval  Reformer
config.num_buckets is not set. Setting config.num_buckets to 128...
config.num_buckets is not set. Setting config.num_buckets to 128...
config.num_buckets is not set. Setting config.num_buckets to 128...
W0731 16:18:38.740529 32259 torch/_dynamo/variables/tensor.py:746] [4/0] Graph break from `Tensor.item()`, consider setting:
W0731 16:18:38.740529 32259 torch/_dynamo/variables/tensor.py:746] [4/0]     torch._dynamo.config.capture_scalar_outputs = True
W0731 16:18:38.740529 32259 torch/_dynamo/variables/tensor.py:746] [4/0] or:
W0731 16:18:38.740529 32259 torch/_dynamo/variables/tensor.py:746] [4/0]     env TORCHDYNAMO_CAPTURE_SCALAR_OUTPUTS=1
W0731 16:18:38.740529 32259 torch/_dynamo/variables/tensor.py:746] [4/0] to include these operations in the captured graph.
W0731 16:18:38.740529 32259 torch/_dynamo/variables/tensor.py:746] [4/0]
running benchmark: 100%|█████████████████████████████████████████████████████████████████████| 50/50 [01:24<00:00,  1.70s/it]
1.283x
WARNING:common:Trying to call the empty_gpu_cache for device: cpu, which is not in list [cuda, xpu]
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks,autograd_captures,autograd_compiles,cudagraph_skips
cpu,Reformer,16,1.283031,743.495714,74.732269,0.839061,371.038618,442.207027,533,7,5,2,0,0,0
```
dynamic shape:
```
/workspace/pytorch# bash inductor_single_run.sh multiple inference performance huggingface Reformer float32 first dynamic cpp
Testing with dynamic shapes.
Testing with cpp wrapper.
Testing with freezing on.
multi-threads testing....
/opt/conda/lib/python3.8/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
loading model: 0it [00:00, ?it/s]config.num_buckets is not set. Setting config.num_buckets to 128...
config.num_buckets is not set. Setting config.num_buckets to 128...
config.num_buckets is not set. Setting config.num_buckets to 128...
loading model: 0it [00:01, ?it/s]
cpu  eval  Reformer
config.num_buckets is not set. Setting config.num_buckets to 128...
config.num_buckets is not set. Setting config.num_buckets to 128...
config.num_buckets is not set. Setting config.num_buckets to 128...
W0731 16:21:58.194067 32489 torch/_dynamo/variables/tensor.py:746] [4/0] Graph break from `Tensor.item()`, consider setting:
W0731 16:21:58.194067 32489 torch/_dynamo/variables/tensor.py:746] [4/0]     torch._dynamo.config.capture_scalar_outputs = True
W0731 16:21:58.194067 32489 torch/_dynamo/variables/tensor.py:746] [4/0] or:
W0731 16:21:58.194067 32489 torch/_dynamo/variables/tensor.py:746] [4/0]     env TORCHDYNAMO_CAPTURE_SCALAR_OUTPUTS=1
W0731 16:21:58.194067 32489 torch/_dynamo/variables/tensor.py:746] [4/0] to include these operations in the captured graph.
W0731 16:21:58.194067 32489 torch/_dynamo/variables/tensor.py:746] [4/0]
E0731 16:23:01.499804 32489 torch/_guards.py:261] [16/0] Error while creating guard:
E0731 16:23:01.499804 32489 torch/_guards.py:261] [16/0] Name: ''
E0731 16:23:01.499804 32489 torch/_guards.py:261] [16/0]     Source: shape_env
E0731 16:23:01.499804 32489 torch/_guards.py:261] [16/0]     Create Function: SHAPE_ENV
E0731 16:23:01.499804 32489 torch/_guards.py:261] [16/0]     Guard Types: None
E0731 16:23:01.499804 32489 torch/_guards.py:261] [16/0]     Code List: None
E0731 16:23:01.499804 32489 torch/_guards.py:261] [16/0]     Object Weakref: None
E0731 16:23:01.499804 32489 torch/_guards.py:261] [16/0]     Guarded Class Weakref: None
E0731 16:23:01.499804 32489 torch/_guards.py:261] [16/0] Traceback (most recent call last):
E0731 16:23:01.499804 32489 torch/_guards.py:261] [16/0]   File "/workspace/pytorch/torch/_guards.py", line 259, in create
E0731 16:23:01.499804 32489 torch/_guards.py:261] [16/0]     return self.create_fn(builder, self)
E0731 16:23:01.499804 32489 torch/_guards.py:261] [16/0]   File "/workspace/pytorch/torch/_dynamo/guards.py", line 1765, in SHAPE_ENV
E0731 16:23:01.499804 32489 torch/_guards.py:261] [16/0]     guards = output_graph.shape_env.produce_guards(
E0731 16:23:01.499804 32489 torch/_guards.py:261] [16/0]   File "/workspace/pytorch/torch/fx/experimental/symbolic_shapes.py", line 4161, in produce_guards
E0731 16:23:01.499804 32489 torch/_guards.py:261] [16/0]     raise ConstraintViolationError(
E0731 16:23:01.499804 32489 torch/_guards.py:261] [16/0] torch.fx.experimental.symbolic_shapes.ConstraintViolationError: Constraints violated (L['labels'].size()[0])! For more information, run with TORCH_LOGS="+dynamic".
E0731 16:23:01.499804 32489 torch/_guards.py:261] [16/0]   - Not all values of RelaxedUnspecConstraint(L['labels'].size()[0]) are valid because L['labels'].size()[0] was inferred to be a constant (16).
E0731 16:23:01.501225 32489 torch/_guards.py:263] [16/0] Created at:
E0731 16:23:01.501225 32489 torch/_guards.py:263] [16/0]   File "/workspace/pytorch/torch/_dynamo/convert_frame.py", line 592, in transform
E0731 16:23:01.501225 32489 torch/_guards.py:263] [16/0]     tracer = InstructionTranslator(
E0731 16:23:01.501225 32489 torch/_guards.py:263] [16/0]   File "/workspace/pytorch/torch/_dynamo/symbolic_convert.py", line 2455, in __init__
E0731 16:23:01.501225 32489 torch/_guards.py:263] [16/0]     output=OutputGraph(
E0731 16:23:01.501225 32489 torch/_guards.py:263] [16/0]   File "/workspace/pytorch/torch/_dynamo/output_graph.py", line 313, in __init__
E0731 16:23:01.501225 32489 torch/_guards.py:263] [16/0]     self.init_ambient_guards()
E0731 16:23:01.501225 32489 torch/_guards.py:263] [16/0]   File "/workspace/pytorch/torch/_dynamo/output_graph.py", line 452, in init_ambient_guards
E0731 16:23:01.501225 32489 torch/_guards.py:263] [16/0]     self.guards.add(ShapeEnvSource().make_guard(GuardBuilder.SHAPE_ENV))
ERROR:common:Backend dynamo failed in warmup()
Traceback (most recent call last):
  File "/workspace/pytorch/benchmarks/dynamo/common.py", line 3110, in warmup
    fn(model, example_inputs)
  File "/workspace/pytorch/torch/_dynamo/eval_frame.py", line 448, in _fn
    return fn(*args, **kwargs)
  File "benchmarks/dynamo/huggingface.py", line 597, in forward_pass
    return mod(**inputs)
  File "/workspace/pytorch/torch/nn/modules/module.py", line 1716, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/pytorch/torch/nn/modules/module.py", line 1727, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/transformers/models/reformer/modeling_reformer.py", line 2404, in forward
    reformer_outputs = self.reformer(
  File "/workspace/pytorch/torch/_dynamo/convert_frame.py", line 1170, in __call__
    return self._torchdynamo_orig_callable(
  File "/workspace/pytorch/torch/_dynamo/convert_frame.py", line 979, in __call__
    result = self._inner_convert(
  File "/workspace/pytorch/torch/_dynamo/convert_frame.py", line 499, in __call__
    return _compile(
  File "/opt/conda/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/workspace/pytorch/torch/_dynamo/convert_frame.py", line 850, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
  File "/workspace/pytorch/torch/_dynamo/utils.py", line 246, in time_wrapper
    r = func(*args, **kwargs)
  File "/workspace/pytorch/torch/_utils_internal.py", line 85, in wrapper_function
    return StrobelightCompileTimeProfiler.profile_compile_time(
  File "/workspace/pytorch/torch/_strobelight/compile_time_profiler.py", line 129, in profile_compile_time
    return func(*args, **kwargs)
  File "/workspace/pytorch/torch/_dynamo/convert_frame.py", line 759, in compile_inner
    check_fn = CheckFunctionManager(
  File "/workspace/pytorch/torch/_dynamo/guards.py", line 2179, in __init__
    guard.create(builder)
  File "/workspace/pytorch/torch/_guards.py", line 259, in create
    return self.create_fn(builder, self)
  File "/workspace/pytorch/torch/_dynamo/guards.py", line 1765, in SHAPE_ENV
    guards = output_graph.shape_env.produce_guards(
  File "/workspace/pytorch/torch/fx/experimental/symbolic_shapes.py", line 4161, in produce_guards
    raise ConstraintViolationError(
torch.fx.experimental.symbolic_shapes.ConstraintViolationError: Constraints violated (L['labels'].size()[0])! For more information, run with TORCH_LOGS="+dynamic".
  - Not all values of RelaxedUnspecConstraint(L['labels'].size()[0]) are valid because L['labels'].size()[0] was inferred to be a constant (16).


You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True

warmup_failed
```

### Versions

</table><p>SW info</p><table border="1" class="dataframe table">
  <thead>
    <tr style="text-align: right;">
      <th>name</th>
      <th>target_branch</th>
      <th>target_commit</th>
      <th>refer_branch</th>
      <th>refer_commit</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>torchbench</td>
      <td>main</td>
      <td>23512dbe</td>
      <td>main</td>
      <td>23512dbe</td>
    </tr>
    <tr>
      <td>torch</td>
      <td>main</td>
      <td>500aea8d5033fd3540c6ed325dd80e7e1420b0f3</td>
      <td>main</td>
      <td>1d1d074072ecb0aa6ca95e3f43221d2275e16d74</td>
    </tr>
    <tr>
      <td>torchvision</td>
      <td>main</td>
      <td>0.19.0a0+d23a6e1</td>
      <td>main</td>
      <td>0.19.0a0+d23a6e1</td>
    </tr>
    <tr>
      <td>torchtext</td>
      <td>main</td>
      <td>0.16.0a0+b0ebddc</td>
      <td>main</td>
      <td>0.16.0a0+b0ebddc</td>
    </tr>
    <tr>
      <td>torchaudio</td>
      <td>main</td>
      <td>2.4.0a0+69b2a0a</td>
      <td>main</td>
      <td>2.4.0a0+69b2a0a</td>
    </tr>
    <tr>
      <td>torchdata</td>
      <td>main</td>
      <td>0.7.1a0+0790338</td>
      <td>main</td>
      <td>0.7.1a0+0790338</td>
    </tr>
    <tr>
      <td>dynamo_benchmarks</td>
      <td>main</td>
      <td>nightly</td>
      <td>main</td>
      <td>nightly</td>
    </tr>
  </tbody>
</table>

</table>

Repro:
[inductor_single_run.sh](https://github.com/chuanqi129/inductor-tools/blob/main/scripts/modelbench/inductor_single_run.sh)
static shape:
bash inductor_single_run.sh multiple inference performance huggingface Reformer float32 first static cpp
dynamic shape:
bash inductor_single_run.sh multiple inference performance huggingface Reformer float32 first dynamic cpp
This is the first time Reformer run dynamic shape, no guilty commit.
cc @ezyang @chauhang @penguinwu @WeizhuoZhang-intel @chuanqi129

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[inductor][cpu] Reformer multiple thread CPP wrapper performance passed with static shape but failed with dynamic shape #132286

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

name	target_branch	target_commit	refer_branch	refer_commit
torchbench	main	23512dbe	main	23512dbe
torch	main	`500aea8`	main	`1d1d074`
torchvision	main	0.19.0a0+d23a6e1	main	0.19.0a0+d23a6e1
torchtext	main	0.16.0a0+b0ebddc	main	0.16.0a0+b0ebddc
torchaudio	main	2.4.0a0+69b2a0a	main	2.4.0a0+69b2a0a
torchdata	main	0.7.1a0+0790338	main	0.7.1a0+0790338
dynamo_benchmarks	main	nightly	main	nightly

[inductor][cpu] Reformer multiple thread CPP wrapper performance passed with static shape but failed with dynamic shape #132286

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions