[Pass] Support two-stage softmax #2220

MasterJH5574 · 2024-04-25T16:58:54Z

This PR introduces the compiler pass that rewrites the normal softmax to a two-stage softmax. This is based on our finding that when vocabulary size is large, the normal softmax cannot have high-enough parallelism on GPU. So we partition the workload into two stages for better parallelism and better performance.

MasterJH5574 · 2024-04-25T16:59:14Z

~~Depending on apache/tvm#16923. Mark as draft for now.~~

This PR introduces the compiler pass that rewrites the normal softmax to a two-stage softmax. This is based on our finding that when vocabulary size is large, the normal softmax cannot have high-enough parallelism on GPU. So we partition the workload into two stages for better parallelism and better performance.

MasterJH5574 marked this pull request as draft April 25, 2024 16:59

MasterJH5574 force-pushed the 04-25-two-stage-softmax branch from 876f399 to 7176219 Compare April 25, 2024 21:57

MasterJH5574 marked this pull request as ready for review April 26, 2024 14:27

tqchen merged commit ff72113 into mlc-ai:main Apr 26, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pass] Support two-stage softmax #2220

[Pass] Support two-stage softmax #2220

MasterJH5574 commented Apr 25, 2024

MasterJH5574 commented Apr 25, 2024 •

edited

Loading

[Pass] Support two-stage softmax #2220

[Pass] Support two-stage softmax #2220

Conversation

MasterJH5574 commented Apr 25, 2024

MasterJH5574 commented Apr 25, 2024 • edited Loading

MasterJH5574 commented Apr 25, 2024 •

edited

Loading