Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hybrid Engine Refactor and Llama Inference Support #3425

Merged
merged 44 commits into from May 4, 2023

Conversation

cmikeh2
Copy link
Contributor

@cmikeh2 cmikeh2 commented May 1, 2023

This PR introduces a number of features and bugfixes:

  • The Hybrid Engine integration with Containers has been refactored. Models that support the Hybrid Engine now inherit from a feature container, either the HybridEngineContainer itself or something more specialized for the particular model architecture.
  • Llama support for both inference and RLHF training acceleration with Hybrid Engine support
  • Additional BF16 compilation support
  • Additional unit test coverage for new operators and data types
  • Clean up of unused code

@@ -29,6 +29,32 @@ def create_module(self, config=None):
self.module.config.scale_attention = self.scale_attention
return self.module

def set_lora_params(self):
"""
Necessry to implement for `HybridEngineContainer`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spell check

Copy link
Contributor

@jeffra jeffra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀🚀🚀

@jeffra jeffra enabled auto-merge (squash) May 2, 2023 23:07
@jeffra jeffra disabled auto-merge May 3, 2023 18:21
@jeffra jeffra requested a review from loadams as a code owner May 3, 2023 18:43
@jeffra jeffra merged commit 0a61d5d into master May 4, 2023
18 checks passed
@jeffra jeffra deleted the cholmes/llama-inference-v2 branch May 4, 2023 00:20
lekurile added a commit that referenced this pull request May 23, 2023
This PR fixes Hybrid Engine (HE) support for the BLOOM model, which was accidentally broken during the HE refactor in GH-3425.

The BLOOM container now inherits the HybridEngineContainer feature and defines a set_lora_params() function necessary for the feature to work. get_lora_params() is correspondingly removed from the BLOOM policy class as well.

GPT-NeoX was also cleaned up by removing a get_lora_params() function from its policy due to it no longer being used.
molly-smith pushed a commit that referenced this pull request Jun 23, 2023
This PR fixes Hybrid Engine (HE) support for the BLOOM model, which was accidentally broken during the HE refactor in GH-3425.

The BLOOM container now inherits the HybridEngineContainer feature and defines a set_lora_params() function necessary for the feature to work. get_lora_params() is correspondingly removed from the BLOOM policy class as well.

GPT-NeoX was also cleaned up by removing a get_lora_params() function from its policy due to it no longer being used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants