Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for the Falcon new decoder architecture #253

Merged

Conversation

danieldk
Copy link
Contributor

@danieldk danieldk commented Jul 19, 2023

Description

The 40B Falcon model uses the so-called new decoder architecture. This change adds support for the new decoder architecture. This necessitates a bunch of changes across the board:

  • So far we supported having a uniform number of query/key/value heads or full head-sharing of key/value heads. The new decoder architecture provides a configurable number of key/value heads, where the number of query heads is a multiple of the number of key/value heads. In order to support this, we replace the QkvHeadSharing enum by a AttentionHeads class that allows more flexible configurations. The attention layer is extended to support this new scenario.

  • The new decoder architecture's transformer layer is much more canonical, allowing us to reuse the shared decoder layer. However, in contrast to the other decoders that use the shared layer, Falcon puts the dropout after parallel attention. To accomodate more flexible dropouts configurations, we introduce the TransformerDropouts class, which works similar to the TransformerLayerNorms class, but for dropouts.

  • Split the HF configuration parsing for Falcon in functions for the RefinedWebModel and falcon model types.

This change also adds two new models to test the new-decoder architecture.

Types of change

Feature

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

The 40B Falcon model uses the so-called new decoder architecture. This
change adds support for the new decoder architecture. This necessitates
a bunch of changes across the board:

* So far we supported having a uniform number of query/key/value heads
  or full head-sharing of key/value heads. The new decoder architecture
  provides a configurable number of key/value heads, where the number
  of query heads is a multiple of the number of key/value heads. In
  order to support this, we replace the `QkvHeadSharing` enum by a
  `AttentionHeads` class that allows more flexible configurations. The
  attention layer is extended to support this new scenario.

* The new decoder architecture's transformer layer is much more
  canonical, allowing us to reuse the shared decoder layer. However,
  in contrast to the other decoders that use the shared layer, Falcon
  puts the dropout after parallel attention. To accomodate more flexible
  dropouts configurations, we introduce the `TransformerDropouts` class,
  which works similar to the `TransformerLayerNorms` class, but for dropouts.

* Split the HF configuration parsing for Falcon in functions for the
  `RefinedWebModel` and `Falcon` model types.

This change also adds two new models to test the new-decoder architecture.
@danieldk danieldk added type/feature Type: Feature feat/model Feature: models feat/layers Feature: Layers labels Jul 19, 2023
curated_transformers/layers/attention.py Outdated Show resolved Hide resolved
curated_transformers/layers/attention.py Outdated Show resolved Hide resolved
curated_transformers/layers/attention.py Show resolved Hide resolved
curated_transformers/layers/attention.py Show resolved Hide resolved
curated_transformers/layers/attention.py Show resolved Hide resolved
curated_transformers/layers/attention.py Outdated Show resolved Hide resolved
curated_transformers/layers/transformer.py Outdated Show resolved Hide resolved
curated_transformers/layers/transformer.py Outdated Show resolved Hide resolved
curated_transformers/models/falcon/_hf.py Outdated Show resolved Hide resolved
curated_transformers/models/falcon/config.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@shadeMe shadeMe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! One minor fix.

curated_transformers/layers/attention.py Outdated Show resolved Hide resolved
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
@shadeMe shadeMe merged commit 6de9b98 into explosion:main Jul 20, 2023
7 checks passed
@danieldk danieldk deleted the maintenance/falcon-shared-decoder-layer branch August 2, 2023 17:23
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat/layers Feature: Layers feat/model Feature: models type/feature Type: Feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants