[Inductor] [CPU] Torchbench model soft_actor_critic performance regression > 10% on ww02.3

### 🐛 Describe the bug

Compare with the [TorchInductor CPU Performance Dashboard](https://github.com/pytorch/pytorch/issues/93531) on ww02.2, there is a performance regression on Torchbench model soft_actor_critic on ww02.3 as bellow:

|  ww02.3   |   |     |   |  ww02.2   |   |     |   |   |     |   |
|  ----  | ----  |  ----  | ----  |  ----  | ----  |  ----  | ----  | ----  |  ----  | ----  |
| batch_size  | speedup | inductor  | eager | batch_size  | speedup | inductor  | eager | speedup ratio | eager ratio  | inductor ratio |
|256	|1.6536	|0.0004336	|0.000717001	|256	|1.8405	|0.0003333	|0.000613439	|0.9	|0.86	|0.77|


WW02.3 SW info:

SW	| Nightly commit	| Master/Main commit
-- | -- | --
Pytorch|[fac4361](https://github.com/pytorch/pytorch/commit/fac4361e840f6da36ba8c83359c7d0ab5e00cf37)|[73e5379](https://github.com/pytorch/pytorch/commit/73e5379fab05c40ff6f42500309cdc17ee57548c)
Torchbench|/|[354378b](https://github.com/pytorch/benchmark/commit/354378b4e77db4619e77c8e3d886b2e6152b65a1)
torchaudio|[ecc2781](https://github.com/pytorch/audio/commit/ecc2781b5b5c353e85a8ca9f16741cf92dd08344)|[4a037b0](https://github.com/pytorch/audio/commit/4a037b03915c4f6f81407e697fb11a7e7ace27fa)
torchtext|[112d757](https://github.com/pytorch/text/commit/112d757efd2482ea5f60bc23d85c2c09f0cfde61)	| [c7cc5fc](https://github.com/pytorch/text/commit/c7cc5fc1e669f548eef619ae055690831d6cc75e)
torchvision|[ac06efe](https://github.com/pytorch/vision/commit/ac06efed4ad2867c68e6b49ac31b554fcbbaa472)|[35f68a0](https://github.com/pytorch/vision/commit/35f68a09f94b2d7afb3f6adc2ba850216413f28e)
torchdata|[049fb62](https://github.com/pytorch/data/commit/049fb626615e6cd965897af8fea2fa73cecd6a2a)|[c0934b9](https://github.com/pytorch/data/commit/c0934b9afa96458c0f8aa3bef528830835c22195)

WW02.2 SW info: 

|  SW   | Nightly commit  |  Master/Main commit |
|  ----  | ----  | ---- |
| Pytorch  | [fac4361](https://github.com/pytorch/pytorch/commit/fac4361e840f6da36ba8c83359c7d0ab5e00cf37) | [73e5379](https://github.com/pytorch/pytorch/commit/73e5379fab05c40ff6f42500309cdc17ee57548c) |
| Torchbench  | / |[ff361c6](https://github.com/pytorch/benchmark/commit/ff361c69258a62e3f76d1fede6c48bb86fde3cde) |
| torchaudio  | [1c98d76](https://github.com/pytorch/audio/commit/1c98d76585ccc5c5576ddd2eefb7699a7d9e8d93) | [0be8423](https://github.com/pytorch/audio/commit/0be8423deb9c3546894f9082f8f7bdea265afa7b) |
| torchtext  | [6cbfd3e](https://github.com/pytorch/text/commit/6cbfd3e3094d2cb96843f744ac08326228640e2f) |[7c7b640](https://github.com/pytorch/text/commit/7c7b6409dff2802a0ae8bdaa6ec8e42fb1b85ad3) |
| torchvision  | [b7637f6](https://github.com/pytorch/vision/commit/b7637f6a5083f0c457c98da26e18e2180629b683) | [0dceac0](https://github.com/pytorch/vision/commit/0dceac025615a1c2df6ec1675d8f9d7757432a49) |
| torchdata | [0d9aa37](https://github.com/pytorch/data/commit/0d9aa37f90f6ae5729c620cfdeb7bdaa0673ec5a) | [0a0ae5d](https://github.com/pytorch/data/commit/0a0ae5d35be439c786c9ab798ea1596238d8640a) |

### Error logs

<details>
<summary>grapy.py of this model on ww02.3</summary>

```
GRAPH_INDEX:0
class <lambda>(torch.nn.Module):
    def forward(self, arg0_1: f32[1024, 3], arg1_1: f32[1024], arg2_1: f32[2638049, 1], arg3_1: f32[1024, 1024], arg4_1: f32[1024], arg5_1: f32[3490017, 1], arg6_1: f32[2, 1024], arg7_1: f32[2], arg8_1: f32[2310369, 1], arg9_1: f32[256, 3]):
        # File: /workspace/benchmark/torchbenchmark/models/soft_actor_critic/nets.py:114, code: x = F.relu(self.fc1(state))
        _mkl_linear: f32[256, 1024] = torch.ops.mkl._mkl_linear.default(arg9_1, arg2_1, arg0_1, arg1_1, 256);  arg9_1 = arg2_1 = arg0_1 = arg1_1 = None
        relu: f32[256, 1024] = torch.ops.aten.relu.default(_mkl_linear);  _mkl_linear = None
        
        # File: /workspace/benchmark/torchbenchmark/models/soft_actor_critic/nets.py:115, code: x = F.relu(self.fc2(x))
        _mkl_linear_1: f32[256, 1024] = torch.ops.mkl._mkl_linear.default(relu, arg5_1, arg3_1, arg4_1, 256);  relu = arg5_1 = arg3_1 = arg4_1 = None
        relu_1: f32[256, 1024] = torch.ops.aten.relu.default(_mkl_linear_1);  _mkl_linear_1 = None
        
        # File: /workspace/benchmark/torchbenchmark/models/soft_actor_critic/nets.py:116, code: out = self.fc3(x)
        _mkl_linear_2: f32[256, 2] = torch.ops.mkl._mkl_linear.default(relu_1, arg8_1, arg6_1, arg7_1, 256);  relu_1 = arg8_1 = arg6_1 = arg7_1 = None
        
        # File: /workspace/benchmark/torchbenchmark/models/soft_actor_critic/nets.py:117, code: mu, log_std = out.chunk(2, dim=1)
        split = torch.ops.aten.split.Tensor(_mkl_linear_2, 1, 1);  _mkl_linear_2 = None
        getitem: f32[256, 1] = split[0]
        getitem_1: f32[256, 1] = split[1];  split = None
        
        # File: /workspace/benchmark/torchbenchmark/models/soft_actor_critic/nets.py:119, code: log_std = torch.tanh(log_std)
        tanh: f32[256, 1] = torch.ops.aten.tanh.default(getitem_1);  getitem_1 = None
        
        # File: /workspace/benchmark/torchbenchmark/models/soft_actor_critic/nets.py:122, code: ) * (log_std + 1)
        add: f32[256, 1] = torch.ops.aten.add.Tensor(tanh, 1);  tanh = None
        
        # File: /workspace/benchmark/torchbenchmark/models/soft_actor_critic/nets.py:120, code: log_std = self.log_std_low + 0.5 * (
        mul: f32[256, 1] = torch.ops.aten.mul.Tensor(add, 6.0);  add = None
        add_1: f32[256, 1] = torch.ops.aten.add.Tensor(mul, -10.0);  mul = None
        
        # File: /workspace/benchmark/torchbenchmark/models/soft_actor_critic/nets.py:123, code: std = log_std.exp()
        exp: f32[256, 1] = torch.ops.aten.exp.default(add_1);  add_1 = None
        return (getitem, exp, getitem, exp)
        

```
</details>

<details>
<summary>grapy.py of this model on ww02.2</summary>

```
GRAPH_INDEX:0
class <lambda>(torch.nn.Module):
    def forward(self, arg0_1: f32[1024, 3], arg1_1: f32[1024], arg2_1: f32[2638049, 1], arg3_1: f32[1024, 1024], arg4_1: f32[1024], arg5_1: f32[3490017, 1], arg6_1: f32[2, 1024], arg7_1: f32[2], arg8_1: f32[2310369, 1], arg9_1: f32[256, 3]):
        # File: /workspace/benchmark/torchbenchmark/models/soft_actor_critic/nets.py:114, code: x = F.relu(self.fc1(state))
        _mkl_linear: f32[256, 1024] = torch.ops.mkl._mkl_linear.default(arg9_1, arg2_1, arg0_1, arg1_1, 256);  arg9_1 = arg2_1 = arg0_1 = arg1_1 = None
        relu: f32[256, 1024] = torch.ops.aten.relu.default(_mkl_linear);  _mkl_linear = None
        
        # File: /workspace/benchmark/torchbenchmark/models/soft_actor_critic/nets.py:115, code: x = F.relu(self.fc2(x))
        _mkl_linear_1: f32[256, 1024] = torch.ops.mkl._mkl_linear.default(relu, arg5_1, arg3_1, arg4_1, 256);  relu = arg5_1 = arg3_1 = arg4_1 = None
        relu_1: f32[256, 1024] = torch.ops.aten.relu.default(_mkl_linear_1);  _mkl_linear_1 = None
        
        # File: /workspace/benchmark/torchbenchmark/models/soft_actor_critic/nets.py:116, code: out = self.fc3(x)
        _mkl_linear_2: f32[256, 2] = torch.ops.mkl._mkl_linear.default(relu_1, arg8_1, arg6_1, arg7_1, 256);  relu_1 = arg8_1 = arg6_1 = arg7_1 = None
        
        # File: /workspace/benchmark/torchbenchmark/models/soft_actor_critic/nets.py:117, code: mu, log_std = out.chunk(2, dim=1)
        split = torch.ops.aten.split.Tensor(_mkl_linear_2, 1, 1);  _mkl_linear_2 = None
        getitem: f32[256, 1] = split[0]
        getitem_1: f32[256, 1] = split[1];  split = None
        
        # File: /workspace/benchmark/torchbenchmark/models/soft_actor_critic/nets.py:119, code: log_std = torch.tanh(log_std)
        tanh: f32[256, 1] = torch.ops.aten.tanh.default(getitem_1);  getitem_1 = None
        
        # File: /workspace/benchmark/torchbenchmark/models/soft_actor_critic/nets.py:122, code: ) * (log_std + 1)
        add: f32[256, 1] = torch.ops.aten.add.Tensor(tanh, 1);  tanh = None
        
        # File: /workspace/benchmark/torchbenchmark/models/soft_actor_critic/nets.py:120, code: log_std = self.log_std_low + 0.5 * (
        mul: f32[256, 1] = torch.ops.aten.mul.Tensor(add, 6.0);  add = None
        add_1: f32[256, 1] = torch.ops.aten.add.Tensor(mul, -10.0);  mul = None
        
        # File: /workspace/benchmark/torchbenchmark/models/soft_actor_critic/nets.py:123, code: std = log_std.exp()
        exp: f32[256, 1] = torch.ops.aten.exp.default(add_1);  add_1 = None
        return (getitem, exp, getitem, exp)
        

```
</details>

### Minified repro

```
python -m torch.backends.xeon.run_cpu --node_id 0 benchmarks/dynamo/torchbench.py --performance --float32 -dcpu --output=inductor_log/ww022.csv -n50 --inductor  --no-skip --dashboard --only soft_actor_critic  --cold_start_latency
```

cc @ezyang @soumith @msaroufim @wconstab @ngimel @bdhirsh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Inductor] [CPU] Torchbench model soft_actor_critic performance regression > 10% on ww02.3 #93505

🐛 Describe the bug

Error logs

Minified repro

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ww02.3				ww02.2
batch_size	speedup	inductor	eager	batch_size	speedup	inductor	eager	speedup ratio	eager ratio	inductor ratio
256	1.6536	0.0004336	0.000717001	256	1.8405	0.0003333	0.000613439	0.9	0.86	0.77

SW	Nightly commit	Master/Main commit
Pytorch	fac4361	73e5379
Torchbench	/	354378b
torchaudio	ecc2781	4a037b0
torchtext	112d757	c7cc5fc
torchvision	ac06efe	35f68a0
torchdata	049fb62	c0934b9

[Inductor] [CPU] Torchbench model soft_actor_critic performance regression > 10% on ww02.3 #93505

Description

🐛 Describe the bug

Error logs

Minified repro

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions