[flax] unfreeze initial cache in gpt models #14535

patil-suraj · 2021-11-26T12:27:22Z

What does this PR do?

Fix flax generate for GPT models when the initial seq_len is 1.

The issue is the init_cache method of flax GPT2 returns the cache as a FrozenDict, but the model’s forward returns cache as a dict.

It works with seq_len > 1 because, when seq_len > 1, we call the body fun outside of the while loop -> body calls forward -> which returns cache as a dict.

then we iterate over body_fn, using lax.while_loop, and it works as the type signature of cache is similar.

It breaks for seq_len = 1 because, when it’s one we directly call the body_fn with lax.while_loop , so here the initial type of cache is FrozenDict but the forward in body_fn returns dict, which raises this error

body_fun output and input must have same type structure, got PyTreeDe...

cc @Narsil

Narsil

LGTM,

Thanks for looking into it!

unfreeze initial cache in gpt models

d2931f9

patil-suraj requested a review from patrickvonplaten November 26, 2021 12:36

patil-suraj changed the title ~~unfreeze initial cache in gpt models~~ [flax] unfreeze initial cache in gpt models Nov 26, 2021

patil-suraj requested a review from LysandreJik November 26, 2021 12:47

Narsil approved these changes Nov 26, 2021

View reviewed changes

patil-suraj merged commit 69511cd into huggingface:master Nov 26, 2021

patil-suraj deleted the fix-flax-generate-gpt branch November 26, 2021 12:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[flax] unfreeze initial cache in gpt models #14535

[flax] unfreeze initial cache in gpt models #14535

patil-suraj commented Nov 26, 2021 •

edited

Loading

Narsil left a comment

[flax] unfreeze initial cache in gpt models #14535

[flax] unfreeze initial cache in gpt models #14535

Conversation

patil-suraj commented Nov 26, 2021 • edited Loading

What does this PR do?

Narsil left a comment

Choose a reason for hiding this comment

patil-suraj commented Nov 26, 2021 •

edited

Loading