optimize generation caching #12

neverix · 2021-11-03T13:46:48Z

Over 10x speedup, adds MLP caching and optimizes attention caching.
Uses changes from the notebook.

Over 10x speedup, adds MLP caching and optimizes attention caching. Uses changes from https://t.co/BTwo6NKq9H.

HetagKoroev · 2021-11-03T14:25:08Z

@neverix indeed, the generation rate has increased by more than 10 times. Thanks!

optimize generation caching

56590a5

Over 10x speedup, adds MLP caching and optimizes attention caching. Uses changes from https://t.co/BTwo6NKq9H.

neverix mentioned this pull request Nov 3, 2021

Smaller / Distilled model? #6

Closed

shonenkov merged commit 47de7a2 into ai-forever:master Nov 3, 2021

ouhenio mentioned this pull request Nov 3, 2021

Add cache optimization ouhenio/en2ru-DALLE_notebook#2

Closed

borzunov mentioned this pull request Jan 10, 2022

Implement cached inference lucidrains/DALLE-pytorch#409

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize generation caching #12

optimize generation caching #12

neverix commented Nov 3, 2021

HetagKoroev commented Nov 3, 2021 •

edited

optimize generation caching #12

optimize generation caching #12

Conversation

neverix commented Nov 3, 2021

HetagKoroev commented Nov 3, 2021 • edited

HetagKoroev commented Nov 3, 2021 •

edited