Hybrid Engine Refactor and Llama Inference Support #3425

cmikeh2 · 2023-05-01T22:00:57Z

This PR introduces a number of features and bugfixes:

The Hybrid Engine integration with Containers has been refactored. Models that support the Hybrid Engine now inherit from a feature container, either the HybridEngineContainer itself or something more specialized for the particular model architecture.
Llama support for both inference and RLHF training acceleration with Hybrid Engine support
Additional BF16 compilation support
Additional unit test coverage for new operators and data types
Clean up of unused code

* Refactor changes out of base container * Re-add to OPT * Align HE API * Break down release into components and add to feature containers * Consolidate copy methods * Align policy API * Missed file * Add missing arguments to qkv_gemm call * Aligning attributes, data types * Handle TP scaling * Python interface alignment, TP>1 now working * HybridEngineContainer, refactor changes out of Policy

…ence-v2

…pSpeed into cholmes/llama-inference-v2

awan-10 · 2023-05-02T17:32:26Z

deepspeed/module_inject/containers/opt.py

@@ -29,6 +29,32 @@ def create_module(self, config=None):
        self.module.config.scale_attention = self.scale_attention
        return self.module

+    def set_lora_params(self):
+        """
+        Necessry to implement for `HybridEngineContainer`


spell check

jeffra

🚀🚀🚀

…pSpeed into cholmes/llama-inference-v2

This PR fixes Hybrid Engine (HE) support for the BLOOM model, which was accidentally broken during the HE refactor in GH-3425. The BLOOM container now inherits the HybridEngineContainer feature and defines a set_lora_params() function necessary for the feature to work. get_lora_params() is correspondingly removed from the BLOOM policy class as well. GPT-NeoX was also cleaned up by removing a get_lora_params() function from its policy due to it no longer being used.

cmikeh2 added 16 commits April 12, 2023 18:04

Rebase changes onto correct git history

21b0540

Further generalize rotate half rotary position embeddings

bd74c32

Upgrade local clang-format to match CI

56e1de9

Restore GeGLU behavior and template for SiLU. Add unit test.

264d49b

Restore experimental qkv reset

14c6a9b

Switch to named constant to improve readability

b01e7ea

Name refactor to align with functionality rather than implementation

0215867

Finish hybrid engine integration

0f371d8

Fix for MLP dimensions

48db17b

Update explanations

88a821d

Complete merge, fix BF16 integration

e42859b

Merge master

6bad46d

BF16_AVAILABLE should derive solely from the op_builder

eebfcdf

Merge remote-tracking branch 'public/master' into cholmes/llama-infer…

6ad5f0f

…ence-v2

Refactor on top of additional model support

63fe26f

cmikeh2 requested review from jeffra, tjruwase, RezaYazdaniAminabadi, mrwyattii, awan-10 and arashb as code owners May 1, 2023 22:00

cmikeh2 added 8 commits May 1, 2023 22:16

Guard is_bf16_supported check

e9137d1

Even stronger guards

838b6f4

Remove deprecated policy members

a496dd1

Merge branch 'master' into cholmes/llama-inference-v2

123e195

Another guard

c670344

Merge branch 'cholmes/llama-inference-v2' of github.com:microsoft/Dee…

6830f5a

…pSpeed into cholmes/llama-inference-v2

Bad check for BF16 support

708fb45

Merge fix

ace1967

awan-10 reviewed May 2, 2023

View reviewed changes

awan-10 approved these changes May 2, 2023

View reviewed changes

cmikeh2 added 3 commits May 2, 2023 18:22

BF16 model inference support

9725f09

Remove debug code

e0e70fe

Review feedback

9d64515

jeffra approved these changes May 2, 2023

View reviewed changes

Merge branch 'master' into cholmes/llama-inference-v2

c94c67b

lekurile approved these changes May 2, 2023

View reviewed changes

Fix inheritance

79ad7d0

jeffra enabled auto-merge (squash) May 2, 2023 23:07

Merge branch 'cholmes/llama-inference-v2' of github.com:microsoft/Dee…

a044359

…pSpeed into cholmes/llama-inference-v2

jeffra disabled auto-merge May 3, 2023 18:21

don't use cache dir for torch installs

5245e0a

jeffra requested a review from loadams as a code owner May 3, 2023 18:43

Merge branch 'master' into cholmes/llama-inference-v2

8c10a29

loadams approved these changes May 3, 2023

View reviewed changes

cmikeh2 and others added 6 commits May 3, 2023 20:22

Align APIs

cd1b617

Merge branch 'cholmes/llama-inference-v2' of github.com:microsoft/Dee…

149e193

…pSpeed into cholmes/llama-inference-v2

add HE unit test for OPT

8adc056

fix typo, missing policy ref to client module

995c3ae

Merge branch 'master' into cholmes/llama-inference-v2

1596640

Merge branch 'master' into cholmes/llama-inference-v2

d124817

jeffra merged commit 0a61d5d into master May 4, 2023
18 checks passed

jeffra deleted the cholmes/llama-inference-v2 branch May 4, 2023 00:20

lekurile mentioned this pull request May 19, 2023

Fix Hybrid Engine for BLOOM #3580

Merged

tokestermw mentioned this pull request Jun 12, 2023

[BUG] Pythia (GPT-NeoX based) models degrade in generation quality using DeepSpeed Inference #2855

Closed

lekurile mentioned this pull request Aug 2, 2023

Fix Stable Diffusion Injection #4078

Merged

sakogan mentioned this pull request Sep 22, 2023

Fix a bug in DeepSpeedMLP #4389

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hybrid Engine Refactor and Llama Inference Support #3425

Hybrid Engine Refactor and Llama Inference Support #3425

cmikeh2 commented May 1, 2023 •

edited by jeffra

awan-10 May 2, 2023

jeffra left a comment

Hybrid Engine Refactor and Llama Inference Support #3425

Hybrid Engine Refactor and Llama Inference Support #3425

Conversation

cmikeh2 commented May 1, 2023 • edited by jeffra

awan-10 May 2, 2023

Choose a reason for hiding this comment

jeffra left a comment

Choose a reason for hiding this comment

cmikeh2 commented May 1, 2023 •

edited by jeffra