[inductor][cpu]lennard_jones, pyhpc_isoneutral_mixing and pyhpc_equation_of_state performance regression in 2024-05-12 nightly release #126293

zxd1997066 · 2024-05-15T15:33:31Z

🐛 Describe the bug

fp32 static shape default wrapper

suite	name	thread	batch_size_new	speed_up_new	inductor_new	eager_new	compilation_latency_new	batch_size_old	speed_up_old	inductor_old	eager_old	compilation_latency_old	Ratio Speedup(New/old)	Eager Ratio(old/new)	Inductor Ratio(old/new)	Compilation_latency_Ratio(old/new)
torchbench	lennard_jones	single	1	1.572944	4.5809e-05	7.2054991696e-05	3.945146	1.0	1.834938	3.7955999999999994e-05	6.964690672799999e-05	5.762626	0.86	0.97	0.83	1.46
torchbench	pyhpc_isoneutral_mixing	single	1	53.873001	5.0139e-05	0.002701138397139	10.347831	1.0	64.445867	4.2233e-05	0.0027217423010110005	12.185837	0.84	1.01	0.84	1.18

fp32 dynamic shape default wrapper

suite	name	thread	batch_size_new	speed_up_new	inductor_new	eager_new	compilation_latency_new	batch_size_old	speed_up_old	inductor_old	eager_old	compilation_latency_old	Ratio Speedup(New/old)	Eager Ratio(old/new)	Inductor Ratio(old/new)	Compilation_latency_Ratio(old/new)
torchbench	lennard_jones	single	1	1.539954	4.5865e-05	7.062999021e-05	3.927617	1.0	1.817732	3.8005e-05	6.908290466e-05	5.76899	0.85	0.98	0.83	1.47
torchbench	pyhpc_equation_of_state	single	1	20.214927	5.2469e-05	0.0010606570047630001	6.882186	1.0	23.225694	4.4226e-05	0.001027179542844	8.763813	0.87	0.97	0.84	1.27
torchbench	pyhpc_isoneutral_mixing	single	1	54.378022	5.0307e-05	0.002735595152754	10.30333	1.0	64.706471	4.1711999999999996e-05	0.0026990363183519994	12.204166	0.84	0.99	0.83	1.18

fp32 static shape cpp wrapper

suite	name	thread	batch_size_new	speed_up_new	inductor_new	eager_new	compilation_latency_new	batch_size_old	speed_up_old	inductor_old	eager_old	compilation_latency_old	Ratio Speedup(New/old)	Eager Ratio(old/new)	Inductor Ratio(old/new)	Compilation_latency_Ratio(old/new)
torchbench	lennard_jones	single	1	1.908596	3.6968e-05	7.0556976928e-05	12.051354	1.0	2.093192	3.2567e-05	6.816898386400001e-05	13.964945	0.91	0.97	0.88	1.16
torchbench	pyhpc_isoneutral_mixing	single	1	49.319962	5.5797e-05	0.002751905919714	18.445618	1.0	55.909156	4.9216000000000004e-05	0.002751625021696	20.460509	0.88	1.0	0.88	1.11

fp32 dynamic shape cpp wrapper

suite	name	thread	batch_size_new	speed_up_new	inductor_new	eager_new	compilation_latency_new	batch_size_old	speed_up_old	inductor_old	eager_old	compilation_latency_old	Ratio Speedup(New/old)	Eager Ratio(old/new)	Inductor Ratio(old/new)	Compilation_latency_Ratio(old/new)
torchbench	lennard_jones	single	1	1.867954	3.7521e-05	7.0087502034e-05	11.966901	1.0	2.199674	3.1592000000000005e-05	6.949210100800001e-05	13.932152	0.85	0.99	0.84	1.16
torchbench	pyhpc_isoneutral_mixing	single	1	48.235191	5.6397e-05	0.002720320066827	18.342588	1.0	54.152881	4.9823e-05	0.002698058990063	20.393653	0.89	0.99	0.88	1.11

AMP static shape default wrapper

suite	name	thread	batch_size_new	speed_up_new	inductor_new	eager_new	compilation_latency_new	batch_size_old	speed_up_old	inductor_old	eager_old	compilation_latency_old	Ratio Speedup(New/old)	Eager Ratio(old/new)	Inductor Ratio(old/new)	Compilation_latency_Ratio(old/new)
torchbench	pyhpc_equation_of_state	single	1	18.280486	2.5393e-05	0.00046419638099799997	5.111035	1.0	21.430476	2.1813e-05	0.00046746297298799993	6.583431	0.85	1.01	0.86	1.29

AMP dynamic shape default wrapper

suite	name	thread	batch_size_new	speed_up_new	inductor_new	eager_new	compilation_latency_new	batch_size_old	speed_up_old	inductor_old	eager_old	compilation_latency_old	Ratio Speedup(New/old)	Eager Ratio(old/new)	Inductor Ratio(old/new)	Compilation_latency_Ratio(old/new)
torchbench	pyhpc_equation_of_state	single	1	18.205576	2.5667e-05	0.000467282519192	5.100246	1.0	21.220869	2.1705e-05	0.000460598961645	6.595176	0.86	0.99	0.85	1.29

AMP dynamic shape cpp wrapper

suite	name	thread	batch_size_new	speed_up_new	inductor_new	eager_new	compilation_latency_new	batch_size_old	speed_up_old	inductor_old	eager_old	compilation_latency_old	Ratio Speedup(New/old)	Eager Ratio(old/new)	Inductor Ratio(old/new)	Compilation_latency_Ratio(old/new)
torchbench	pyhpc_isoneutral_mixing	single	1	37.581918	3.8873e-05	0.001460921898414	14.335784	1.0	45.019578	3.2526e-05	0.001464306794028	15.774886	0.83	1.0	0.84	1.1

SW info

name	target_branch	target_commit	refer_branch	refer_commit
torchbench	main	d6015d42	main	d6015d42
torch	main	`02093b6`	main	`fc183f0`
torchvision	main	0.19.0a0+d23a6e1	main	0.19.0a0+06ad737
torchtext	main	0.16.0a0+b0ebddc	main	0.16.0a0+b0ebddc
torchaudio	main	2.2.0a0+ea437b3	main	2.2.0a0+ea437b3
torchdata	main	0.7.1a0+0790338	main	0.7.1a0+0790338
dynamo_benchmarks	main	nightly	main	nightly

Repro:
inductor_single_run.sh
bash inductor_single_run.sh single inference performance torchbench model float32/amp first dynamic/static default/cpp
Suspected guilty commit: b23b6e7
torchbench-pyhpc_isoneutral_mixing-inference-float32-static-default-single-performance-drop_guilty_commit.log
cc @WeizhuoZhang-intel @chuanqi129

The text was updated successfully, but these errors were encountered:

zxd1997066 · 2024-05-20T14:50:26Z

196a0b1

/workspace/pytorch# bash inductor_single_run.sh single inference performance torchbench pyhpc_isoneutral_mixing amp first dynamic cpp
Testing with dynamic shapes.
Testing with cpp wrapper.
Testing with freezing on.
single-thread testing....
loading model: 0it [00:00, ?it/s]
cpu  eval  pyhpc_isoneutral_mixing
running benchmark: 100%|█████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 509.93it/s]
50.073x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks,autograd_captures,autograd_compiles,cudagraph_skips
cpu,pyhpc_isoneutral_mixing,1,50.072613,0.033701,7.295003,0.795865,38.535168,48.419226,746,1,0,0,0,0,0

/workspace/pytorch# bash inductor_single_run.sh single inference performance torchbench pyhpc_equation_of_state amp first static default
Testing with freezing on.
single-thread testing....
loading model: 0it [00:00, ?it/s]
cpu  eval  pyhpc_equation_of_state
running benchmark: 100%|████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 1344.97it/s]
24.374x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks,autograd_captures,autograd_compiles,cudagraph_skips
cpu,pyhpc_equation_of_state,1,24.374139,0.022332,4.709370,0.823529,38.535168,46.792704,368,1,0,0,0,0,0

/workspace/pytorch# bash inductor_single_run.sh single inference performance torchbench lennard_jones float32 first static default
Testing with freezing on.
single-thread testing....
loading model: 0it [00:00, ?it/s]
cpu  eval  lennard_jones
running benchmark: 100%|████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 4385.33it/s]
1.866x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks,autograd_captures,autograd_compiles,cudagraph_skips
cpu,lennard_jones,1,1.866308,0.021217,3.615042,0.849057,38.928384,45.848986,9,1,0,0,0,0,0

b23b6e7

/workspace/pytorch# bash inductor_single_run.sh single inference performance torchbench pyhpc_isoneutral_mixing amp first dynamic cpp
Testing with dynamic shapes.
Testing with cpp wrapper.
Testing with freezing on.
single-thread testing....
loading model: 0it [00:00, ?it/s]
cpu  eval  pyhpc_isoneutral_mixing
running benchmark: 100%|█████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 512.97it/s]
43.962x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks,autograd_captures,autograd_compiles,cudagraph_skips
cpu,pyhpc_isoneutral_mixing,1,43.961587,0.038023,7.248890,0.800464,38.757171,48.418406,746,1,0,0,0,0,0

/workspace/pytorch# bash inductor_single_run.sh single inference performance torchbench pyhpc_equation_of_state amp first static default
Testing with freezing on.
single-thread testing....
loading model: 0it [00:00, ?it/s]
cpu  eval  pyhpc_equation_of_state
running benchmark: 100%|████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 1338.79it/s]
20.614x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks,autograd_captures,autograd_compiles,cudagraph_skips
cpu,pyhpc_equation_of_state,1,20.614192,0.026325,4.696770,0.824128,38.613811,46.854144,368,1,0,0,0,0,0

/workspace/pytorch# bash inductor_single_run.sh single inference performance torchbench lennard_jones float32 first static default
Testing with freezing on.
single-thread testing....
loading model: 0it [00:00, ?it/s]
cpu  eval  lennard_jones
running benchmark: 100%|████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 4308.39it/s]
1.614x
dev,name,batch_size,speedup,abs_latency,compilation_latency,compression_ratio,eager_peak_mem,dynamo_peak_mem,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks,autograd_captures,autograd_compiles,cudagraph_skips
cpu,lennard_jones,1,1.613721,0.025152,2.632870,0.852487,39.085670,45.848986,9,1,0,0,0,0,0

Hi @aorenste, according to the bisect search log and test results, the PR #122074 may introduce performance regression issues on CPU, could you please help to double check it?

The original change was about 9.5% slower than then backout. This improves it to be only about 1.41% slower than the backout. Fixes #126293 Ran torchbench 3 times on each change. Perf values before (stable), after (fix), and with #122074 backed out (backout): ``` ../inductor-tools/scripts/modelbench/inductor_single_run.sh single inference performance torchbench pyhpc_isoneutral_mixing amp first dynamic cpp stable: 43.948x 45.754x 44.906x fix: 47.505x 49.987x 47.493x backout: 48.243x 48.199x 48.192x ../inductor-tools/scripts/modelbench/inductor_single_run.sh single inference performance torchbench pyhpc_equation_of_state amp first static default stable: 15.224x 13.286x 15.354x fix: 16.402x 16.370x 16.183x backout: 16.554x 16.675x 16.787x ../inductor-tools/scripts/modelbench/inductor_single_run.sh single inference performance torchbench lennard_jones float32 first static default stable: 1.712x 1.651x 1.640x fix: 1.804x 1.798x 1.792x backout: 1.864x 1.824x 1.836x ``` [ghstack-poisoned]

The original change was about 9.5% slower than then backout. This improves it to be only about 1.41% slower than the backout. Fixes #126293 Ran torchbench 3 times on each change. Perf values before (stable), after (fix), and with #122074 backed out (backout): ``` ../inductor-tools/scripts/modelbench/inductor_single_run.sh single inference performance torchbench pyhpc_isoneutral_mixing amp first dynamic cpp stable: 43.948x 45.754x 44.906x fix: 47.505x 49.987x 47.493x backout: 48.243x 48.199x 48.192x ../inductor-tools/scripts/modelbench/inductor_single_run.sh single inference performance torchbench pyhpc_equation_of_state amp first static default stable: 15.224x 13.286x 15.354x fix: 16.402x 16.370x 16.183x backout: 16.554x 16.675x 16.787x ../inductor-tools/scripts/modelbench/inductor_single_run.sh single inference performance torchbench lennard_jones float32 first static default stable: 1.712x 1.651x 1.640x fix: 1.804x 1.798x 1.792x backout: 1.864x 1.824x 1.836x ``` ghstack-source-id: ecdcee8881a666a27530ce73f2c0d1b1276e7b20 Pull Request resolved: #126996

The original change was about 9.5% slower than then backout. This improves it to be only about 1.41% slower than the backout. Fixes #126293 Ran torchbench 3 times on each change. Perf values before (stable), after (fix), and with #122074 backed out (backout): ``` ../inductor-tools/scripts/modelbench/inductor_single_run.sh single inference performance torchbench pyhpc_isoneutral_mixing amp first dynamic cpp stable: 43.948x 45.754x 44.906x fix: 47.505x 49.987x 47.493x backout: 48.243x 48.199x 48.192x ../inductor-tools/scripts/modelbench/inductor_single_run.sh single inference performance torchbench pyhpc_equation_of_state amp first static default stable: 15.224x 13.286x 15.354x fix: 16.402x 16.370x 16.183x backout: 16.554x 16.675x 16.787x ../inductor-tools/scripts/modelbench/inductor_single_run.sh single inference performance torchbench lennard_jones float32 first static default stable: 1.712x 1.651x 1.640x fix: 1.804x 1.798x 1.792x backout: 1.864x 1.824x 1.836x ``` cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

The original change was about 9.5% slower than then backout. This improves it to be only about 1.41% slower than the backout. Fixes #126293 Ran torchbench 3 times on each change. Perf values before (stable), after (fix), and with #122074 backed out (backout): ``` ../inductor-tools/scripts/modelbench/inductor_single_run.sh single inference performance torchbench pyhpc_isoneutral_mixing amp first dynamic cpp stable: 43.948x 45.754x 44.906x fix: 47.505x 49.987x 47.493x backout: 48.243x 48.199x 48.192x ../inductor-tools/scripts/modelbench/inductor_single_run.sh single inference performance torchbench pyhpc_equation_of_state amp first static default stable: 15.224x 13.286x 15.354x fix: 16.402x 16.370x 16.183x backout: 16.554x 16.675x 16.787x ../inductor-tools/scripts/modelbench/inductor_single_run.sh single inference performance torchbench lennard_jones float32 first static default stable: 1.712x 1.651x 1.640x fix: 1.804x 1.798x 1.792x backout: 1.864x 1.824x 1.836x ``` ghstack-source-id: 2342f889c59771845dd46ac5a6d1f3c1fe5d1d10 Pull Request resolved: #126996

The original change was about 9.5% slower than then before pytorch#122074 . This improves it to be only about 1.4% slower. Also touched up some unrelated nits that the linter complained about. Fixes pytorch#126293 Ran torchbench 3 times on each change. Perf values before (stable), after (fix), and with pytorch#122074 backed out (backout): ``` ../inductor-tools/scripts/modelbench/inductor_single_run.sh single inference performance torchbench pyhpc_isoneutral_mixing amp first dynamic cpp stable: 43.948x 45.754x 44.906x fix: 47.505x 49.987x 47.493x backout: 48.243x 48.199x 48.192x ../inductor-tools/scripts/modelbench/inductor_single_run.sh single inference performance torchbench pyhpc_equation_of_state amp first static default stable: 15.224x 13.286x 15.354x fix: 16.402x 16.370x 16.183x backout: 16.554x 16.675x 16.787x ../inductor-tools/scripts/modelbench/inductor_single_run.sh single inference performance torchbench lennard_jones float32 first static default stable: 1.712x 1.651x 1.640x fix: 1.804x 1.798x 1.792x backout: 1.864x 1.824x 1.836x ``` Pull Request resolved: pytorch#126996 Approved by: https://github.com/jansel

chuanqi129 added the oncall: cpu inductor CPU Inductor issues for Intel team to triage label May 15, 2024

leslie-fang-intel assigned aorenste May 23, 2024

aorenste mentioned this issue May 23, 2024

Fix perf regression caused by #122074 #126996

Closed

pytorchmergebot closed this as completed in 70dc59c May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[inductor][cpu]lennard_jones, pyhpc_isoneutral_mixing and pyhpc_equation_of_state performance regression in 2024-05-12 nightly release #126293

[inductor][cpu]lennard_jones, pyhpc_isoneutral_mixing and pyhpc_equation_of_state performance regression in 2024-05-12 nightly release #126293

zxd1997066 commented May 15, 2024 •

edited

zxd1997066 commented May 20, 2024

[inductor][cpu]lennard_jones, pyhpc_isoneutral_mixing and pyhpc_equation_of_state performance regression in 2024-05-12 nightly release #126293

[inductor][cpu]lennard_jones, pyhpc_isoneutral_mixing and pyhpc_equation_of_state performance regression in 2024-05-12 nightly release #126293

Comments

zxd1997066 commented May 15, 2024 • edited

🐛 Describe the bug

zxd1997066 commented May 20, 2024

zxd1997066 commented May 15, 2024 •

edited