Skip to content

Commit

Permalink
Support gpt-j-6b model for Habana (#1170)
Browse files Browse the repository at this point in the history
* Support gpt-j-6b for Habana graph mode

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* add deepspeed script

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* add other graph mode models for habana

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* mv models and habana code for both inference and finetuning use

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* update models path

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* support stream output for SPR and Habana

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* add load_model and predict_stream for customer

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* fix issue

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>

* use main to test load_model/predict/predict_stream, add StoppingCriteria (#1194)

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

---------

Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: Wang, Yi <yi.a.wang@intel.com>
  • Loading branch information
lvliang-intel and sywangyi committed Jul 17, 2023
1 parent a84eabe commit 9ef6ad8
Show file tree
Hide file tree
Showing 6 changed files with 786 additions and 187 deletions.
110 changes: 110 additions & 0 deletions workflows/chatbot/habana/gaudi_spawn.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# coding=utf-8
# Copyright 2022 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
A simple launcher script for distributed training on HPUs.
Single node:
::
>>> python gaudi_spawn.py --world_size=NUM_CARDS_YOU_HAVE --use_mpi
YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3 and all other
arguments of your training script)
Multi node:
::
>>> python gaudi_spawn.py --hostfile=PATH_TO_HOSTFILE --use_deepspeed
YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3 and all other
arguments of your training script)
"""


import sys
from argparse import REMAINDER, ArgumentParser

from optimum.habana.distributed import DistributedRunner
from optimum.utils import logging


logger = logging.get_logger(__name__)


def parse_args():
"""
Helper function parsing the command line options.
@retval ArgumentParser
"""
parser = ArgumentParser(
description=(
"Habana Gaudi distributed training launch helper utility that will spawn up multiple distributed"
" processes."
)
)

# Optional arguments for the launch helper
parser.add_argument("--world_size", type=int, default=1, help="Number of HPUs to use (1 or 8)")
parser.add_argument("--hostfile", type=str, default=None, help="Path to the file where hosts are specified.")
parser.add_argument("--use_mpi", action="store_true", help="Use MPI for distributed training")
parser.add_argument("--use_deepspeed", action="store_true", help="Use DeepSpeed for distributed training")

# positional
parser.add_argument(
"training_script",
type=str,
help=(
"The full path to the single HPU training "
"program/script to be launched in parallel, "
"followed by all the arguments for the "
"training script."
),
)

# rest from the training program
parser.add_argument("training_script_args", nargs=REMAINDER)

return parser.parse_args()


def main():
args = parse_args()

if args.use_deepspeed:
from transformers.deepspeed import is_deepspeed_available

if not is_deepspeed_available():
raise ImportError(
"--use_deepspeed requires deepspeed: `pip install"
" git+https://github.com/HabanaAI/DeepSpeed.git@1.10.0`."
)

# Patch sys.argv
sys.argv = [args.training_script] + args.training_script_args
# Handle the case where arguments contain whitespaces
argv = ['"{}"'.format(arg) if " " in arg and arg[0] != '"' and arg[-1] != '"' else arg for arg in sys.argv]
command_list = [" ".join(argv)]

distributed_runner = DistributedRunner(
command_list=command_list,
world_size=args.world_size,
hostfile=args.hostfile,
use_mpi=args.use_mpi,
use_deepspeed=args.use_deepspeed,
)

ret_code = distributed_runner.run()
sys.exit(ret_code)


if __name__ == "__main__":
main()

20 changes: 18 additions & 2 deletions workflows/chatbot/inference/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,15 +186,31 @@ You can use the [generate.py](./generate.py) script for performing direct infere
python generate.py --base_model_path "./mpt-7b-chat" \
--habana \
--tokenizer_name "EleutherAI/gpt-neox-20b" \
--use_hpu_graphs \
--use_kv_cache \
--instructions "Transform the following sentence into one that shows contrast. The tree is rotten."
```

And you can use `deepspeed` to speedup the inference.
And you can use `deepspeed` to speedup the inference. currently, TP is not supported for mpt

```bash
python ../gaudi_spawn.py --use_deepspeed --world_size 8 generate.py \
python ../habana/gaudi_spawn.py --use_deepspeed --world_size 8 generate.py \
--base_model_path "./mpt-7b-chat" \
--habana \
--tokenizer_name "EleutherAI/gpt-neox-20b" \
--use_hpu_graphs \
--use_kv_cache \
--instructions "Transform the following sentence into one that shows contrast. The tree is rotten."
```

Habana supports HPU graph mode for inference speedup, which is available for bloom, gpt2, opt, gptj, gpt_neox. However, mpt and llama model have not supported this mode yet. You can use the parameter `use_hpu_graphs` to speed up the inference.

```bash
python generate.py --base_model_path "EleutherAI/gpt-j-6b" \
--habana \
--use_kv_cache \
--use_hpu_graphs \
--tokenizer_name "EleutherAI/gpt-j-6b" \
--instructions "Transform the following sentence into one that shows contrast. The tree is rotten."
```

Empty file.
2 changes: 1 addition & 1 deletion workflows/chatbot/inference/docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ RUN git clone https://github.com/huggingface/optimum-habana.git && \
apt-get install git-lfs && \
git-lfs install

RUN pip install optimum[habana] && \
RUN pip install git+https://github.com/huggingface/optimum-habana.git && \
pip install peft && \
pip install einops && \
pip install datasets && \
Expand Down
2 changes: 2 additions & 0 deletions workflows/chatbot/inference/docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,5 +53,7 @@ python generate.py \
--base_model_path "./mpt-7b-chat" \
--tokenizer_name "EleutherAI/gpt-neox-20b" \
--habana \
--use_hpu_graphs \
--use_kv_cache \
--instructions "Transform the following sentence into one that shows contrast. The tree is rotten."
```

0 comments on commit 9ef6ad8

Please sign in to comment.