Suppose Gemma model shape #130

yzh119 · 2024-02-21T20:23:27Z

Gemma uses head_dim=256 which is enabled in pip wheels by default. We should compile kernels for head_dim=256 and change some kernel parameters for best performance in this case.

The text was updated successfully, but these errors were encountered:

As mentioned in #130 , the kernels for `head_dim=256` are not compiled by default, this PR expose these attention kernels to pip wheels and adds unittests/benchmarks for `head_dim=256`.

yzh119 · 2024-02-25T02:15:33Z

Fixed in #132 .

yzh119 mentioned this issue Feb 22, 2024

feat: enable head_dim=256 for attention kernels #132

Merged

yzh119 closed this as completed Feb 25, 2024

yzh119 mentioned this issue Feb 27, 2024

[Roadmap] 0.0.3 Release Checklist #138

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suppose Gemma model shape #130

Suppose Gemma model shape #130

yzh119 commented Feb 21, 2024 •

edited

yzh119 commented Feb 25, 2024

Suppose Gemma model shape #130

Suppose Gemma model shape #130

Comments

yzh119 commented Feb 21, 2024 • edited

yzh119 commented Feb 25, 2024

yzh119 commented Feb 21, 2024 •

edited