Add support for building CUDA extension on Windows #396

gau-nernst · 2024-06-18T13:15:13Z

Continue work from #305

To make sure FP6-LLM kernel is compiled correctly, run

python benchmarks/benchmark_fp6_llm.py

Expected outputs up to m=8 (ran on 4070Ti SUPER, Windows 11, PyTorch 2.3, MSVC 14.38.33130, CUDA Toolkit 12.3)

m	k	n	fp6_latency (ms)	fp16_latency (ms)	speedup (d/s)	correct
1	8192	8192	0.0387494	0.223442	5.76635	1
1	10240	8192	0.113143	0.288529	2.55014	1
1	57344	8192	0.598838	1.53449	2.56244	1
1	8192	28672	0.289421	0.789677	2.72847	1
2	8192	8192	0.0430889	0.226758	5.26255	1
2	10240	8192	0.107766	0.280107	2.59921	1
2	57344	8192	0.569473	1.52654	2.68062	1
2	8192	28672	0.29369	0.759794	2.58706	1
4	8192	8192	0.0469865	0.228517	4.86347	1
4	10240	8192	0.108724	0.280926	2.58386	1
4	57344	8192	0.567743	1.53572	2.70496	1
4	8192	28672	0.300316	0.763842	2.54346	1
8	8192	8192	0.0527722	0.228795	4.33551	1
8	10240	8192	0.110667	0.280704	2.53649	1
8	57344	8192	0.570488	1.52786	2.67815	1
8	8192	28672	0.310384	0.776489	2.50171	1

The speedup is slightly worse than that on Ubuntu for k=8192 n=8192. To make sure there is no regression on Ubuntu, this is the outputs ran on the same machine, but on Ubuntu 22.04

m	k	n	fp6_latency (ms)	fp16_latency (ms)	speedup (d/s)	correct
1	8192	8192	0.0257488	0.216757	8.41815	1
1	10240	8192	0.105257	0.267286	2.53936	1
1	57344	8192	0.597281	1.58615	2.65562	1
1	8192	28672	0.286471	0.753127	2.62898	1
2	8192	8192	0.0290186	0.222586	7.67045	1
2	10240	8192	0.105451	0.27609	2.61818	1
2	57344	8192	0.560466	1.51298	2.69951	1
2	8192	28672	0.290177	0.740288	2.55116	1
4	8192	8192	0.0364796	0.226285	6.20305	1
4	10240	8192	0.107137	0.27615	2.57755	1
4	57344	8192	0.605757	1.60981	2.65751	1
4	8192	28672	0.319282	0.758238	2.37483	1
8	8192	8192	0.0521174	0.225405	4.32494	1
8	10240	8192	0.110171	0.279006	2.53248	1
8	57344	8192	0.569247	1.52264	2.67484	1
8	8192	28672	0.308445	0.7467	2.42085	1

There is no significant difference from the previous results at #223

pytorch-bot · 2024-06-18T13:15:16Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/396

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 02fd6ad with merge base f5b6ec9 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

* Enable FP6-LLM kernel build on Windows * fix benchmark script * update setup.py * update * fix indent * add -t=0 for linux --------- Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

matthewdouglas and others added 2 commits June 3, 2024 11:17

Enable FP6-LLM kernel build on Windows

b7d8ba1

Merge branch 'main' into windows

d96f49b

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 18, 2024

gau-nernst added 4 commits June 18, 2024 21:21

fix benchmark script

0a61e59

update setup.py

ce2b9fc

update

54f4594

fix indent

8e93557

gau-nernst marked this pull request as ready for review June 18, 2024 14:53

gau-nernst requested a review from msaroufim June 18, 2024 14:59

add -t=0 for linux

02fd6ad

msaroufim approved these changes Jun 18, 2024

View reviewed changes

msaroufim merged commit d0af941 into pytorch:main Jun 18, 2024
13 checks passed

gau-nernst deleted the windows branch June 18, 2024 21:01

This was referenced Jun 19, 2024

FP6 dtype! #208

Open

Enable FP6-LLM kernel build on Windows #305

Closed

gau-nernst mentioned this pull request Sep 30, 2024

is this only for linux? #957

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for building CUDA extension on Windows #396

Add support for building CUDA extension on Windows #396

gau-nernst commented Jun 18, 2024 •

edited

Loading

pytorch-bot bot commented Jun 18, 2024 •

edited

Loading

Add support for building CUDA extension on Windows #396

Add support for building CUDA extension on Windows #396

Conversation

gau-nernst commented Jun 18, 2024 • edited Loading

pytorch-bot bot commented Jun 18, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/396

✅ No Failures

gau-nernst commented Jun 18, 2024 •

edited

Loading

pytorch-bot bot commented Jun 18, 2024 •

edited

Loading