# Train gpt-neo-2.7B using deepspeed and transformers (27/9/21)!

In [None]:
!pip install -r requirements.txt

In [None]:
# Ensure deepspeed requirements are met (asyncio is not necessary)
!ds_report


Next we use deepspeed to execute training for train and validation sets

Note: checkpointing is possible using `save_steps`, but a large amount of disk space is used so we disable it for now.
However if you're using a preemptible instance, this might be a good idea.

If evaluation is happening too frequently you can increase `eval_steps`

In [6]:
!deepspeed --num_gpus=1 run_clm.py \
--deepspeed ds_config_gptneo.json \
--model_name_or_path EleutherAI/gpt-neo-2.7B \
--train_file train.csv \
--validation_file validation.csv \
--do_train \
--do_eval \
--fp16 \
--overwrite_cache \
--overwrite_output_dir \
--evaluation_strategy="steps" \
--output_dir chat-model \
--num_train_epochs 1 \
--eval_steps 30 \
--gradient_accumulation_steps 2 \
--per_device_train_batch_size 4 \
--use_fast_tokenizer False \
--learning_rate 5e-06 \
--warmup_steps 10

[2021-09-26 02:16:07,944] [INFO] [runner.py:360:main] cmd = /opt/conda/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 run_clm.py --deepspeed ds_config_gptneo.json --model_name_or_path EleutherAI/gpt-neo-2.7B --train_file train.csv --validation_file validation.csv --do_train --do_eval --fp16 --overwrite_cache --overwrite_output_dir --evaluation_strategy=steps --output_dir jy-chat-model --num_train_epochs 1 --eval_steps 30 --gradient_accumulation_steps 2 --per_device_train_batch_size 4 --use_fast_tokenizer False --learning_rate 5e-06 --warmup_steps 10
[2021-09-26 02:16:08,764] [INFO] [launch.py:80:main] WORLD INFO DICT: {'localhost': [0]}
[2021-09-26 02:16:08,764] [INFO] [launch.py:89:main] nnodes=1, num_local_procs=1, node_rank=0
[2021-09-26 02:16:08,764] [INFO] [launch.py:101:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2021-09-26 02:16:08,764] [INFO] [launch.py:102:main] dist_wo

Now let's evaluate the model with some prompts

In [7]:
import time

import deepspeed
import torch
from transformers import GPTNeoForCausalLM, AutoTokenizer

# casting to fp16 "half" gives a large speedup during model loading
model = GPTNeoForCausalLM.from_pretrained("chat-model").half().to("cuda:0")
tokenizer = AutoTokenizer.from_pretrained("chat-model")
tokenizer.pad_token = tokenizer.eos_token

# using deepspeed inference is optional: it gives about a 2x speed up
deepspeed.init_inference(model, mp_size=1, dtype=torch.half, replace_method='auto')


In [8]:
def infer_deepspeed(text):
    start_time = time.time()
    input_ids = tokenizer(text, padding=True, return_tensors='pt').to('cuda:0').input_ids
    prompt_length = len(tokenizer.decode(input_ids[0], skip_special_tokens=True))
    gen_tokens = model.generate(input_ids,
                                top_p=0.9,
                                temperature=0.9,
                                max_length=prompt_length + 300,
                                do_sample=True,
                                use_cache=True)  # Without this you get a dimension error

    gen_text = tokenizer.batch_decode(gen_tokens, skip_special_tokens=True)[0][prompt_length:]

    print(f'Took {time.time() - start_time:.2f}s')
    print(f"\033[1m{text}\033[0m{gen_text}")

In [27]:
infer_deepspeed("""Person A: Some text

Person B: Some reply
""")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Took 12.70s
[1m2021-09-19T22:17:15 Nicholas L: I don't love you anymore

2021-09-19T22:17:21 Jing[0m❤️: you know i’ve thought about this before

2021-09-19T22:17:30 Jing❤️: what you said and what you didn’t say

2021-09-19T22:17:41 Jing❤️: how you said it

2021-09-19T22:17:55 Jing❤️: how you meant it

2021-09-19T22:18:07 Nicholas L: Hmm

2021-09-19T22:18:19 Jing❤️: i was thinking about it today

2021-09-19T22:18:24 Jing❤️: the way you said it

2021-09-19T22:18:31 Jing❤️: it hurt

2021-09-19T22:18:38 Jing❤️: and i’ve been hurt before

2021-09-19T22:18:42 Nicholas L: No.

2021-09-19T22:18:46 Nicholas L: I don't believe you

2021-09-19T22:18:52 Jing❤️: i know

2021-09-19T22:18:56 Nicholas L: I never said that

2021-09-19T22:18:57 Jing❤️: you’ve hurt me before

2021
