## Testing Qwen 1.5

- Qwen1.5 14B AWQ example : https://huggingface.co/Qwen/Qwen1.5-14B-Chat-AWQ?text=%EB%82%B4%EC%9D%BC+%EA%B8%88%EC%9A%94%EC%9D%BC%EC%9D%B8%EB%8D%B0+%EC%A0%80%EB%85%81%EC%8B%9D%EC%82%AC%EB%A1%9C+%ED%95%A0+%EB%A7%8C%ED%95%9C+%EA%B2%83%EC%A2%80+%EC%B6%94%EC%B2%9C%ED%95%B4%EC%A4%98



In [1]:
# !pip install -q transformers accelerate sentencepiece bitsandbytes tiktoken autoawq

In [None]:
# autoawq cause error sometimes. install from src
# refer to https://github.com/casper-hansen/AutoAWQ/issues/298#issuecomment-1943919894
!git clone https://github.com/casper-hansen/AutoAWQ; cd AutoAWQ; pip install -e .

In [1]:
!pip list | grep autoawq

autoawq                       0.1.8+cu118     /home/ec2-user/SageMaker/efs/aiml/gen-ai-sagemaker/playground/AutoAWQ
autoawq_kernels               0.0.4+cu118


In [2]:
# !pip list | grep transformers
!pip list | grep torch

torch                         2.0.1
torch-model-archiver          0.8.2b20230828
torch-workflow-archiver       0.2.11b20231012
torchaudio                    2.0.2
torchdata                     0.6.1
torchserve                    0.8.2b20230828
torchtext                     0.15.2
torchvision                   0.15.2


In [1]:
import sagemaker
import transformers
print(sagemaker.__version__)
print(transformers.__version__)

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml
2.207.1
4.37.2


In [2]:
from huggingface_hub import snapshot_download
from pathlib import Path
import os

local_model_path = Path("./pretrained-models")
local_model_path.mkdir(exist_ok=True)
model_name = "Qwen/Qwen1.5-14B-Chat-AWQ"
allow_patterns = ["*.json", "*.pt", "*.bin", "*.txt", "*.model", "*.py", "*.safetensors"]

model_download_path = snapshot_download(
    repo_id=model_name,
    cache_dir=local_model_path,
    allow_patterns=allow_patterns,
)

Fetching 11 files:   0%|          | 0/11 [00:00<?, ?it/s]

In [3]:
print(f"Local model download path: {model_download_path}")

Local model download path: pretrained-models/models--Qwen--Qwen1.5-14B-Chat-AWQ/snapshots/e1da15d0ab8fcca8d19269b0279eed02598daa91


In [4]:
s3_model_prefix = "llm/qwen1.5/model"  # folder where model checkpoint will go

In [5]:
base_model_s3 = f"{s3_model_prefix}/qwen1.5-14b-awq"

In [6]:
sagemaker_session = sagemaker.Session()
# s3_model_artifact = sagemaker_session.upload_data(path=model_download_path, key_prefix=base_model_s3)

In [7]:
# print(f"Model s3 uri : {s3_model_artifact}")

### Testing model in local

- Note that AWQ model needs autoawq

In [8]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

In [9]:
tokenizer = AutoTokenizer.from_pretrained(model_download_path)
model = AutoModelForCausalLM.from_pretrained(
    model_download_path,
    device_map='auto'
)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [46]:
character = "flying cat and cute puppy"
prompt = f"Tell me a creative fairy tale for children. The main character are {character}"

In [47]:

instruction = """
You are a story teller for kids. Please make a story for kids
- The story should start with \"Title:\"
- The end of story should finished by \"The end.\" and stop to make story.
- Please break out the sentences appropriately.
- Make the story as long as possible.
"""

messages = [
    {"role": "system", "content": instruction},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)


In [48]:
print(text)

<|im_start|>system

You are a story teller for kids. Please make a story for kids to Korean
- The story should start with "Title:"
- The end of story should finished by "The end." and stop to make story.
- Please break out the sentences appropriately.
- Make the story as long as possible.
<|im_end|>
<|im_start|>user
Tell me a creative fairy tale for children. The main character are flying cat and cute puppy<|im_end|>
<|im_start|>assistant



In [52]:
params = {
    "max_new_tokens": 4096,
    "temperature": 0.9,
    "top_p": 0.9,
}

In [53]:
%%time

model_inputs = tokenizer([text], return_tensors="pt").to("cuda")

generated_ids = model.generate(
    model_inputs.input_ids,
    **params
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation.


CPU times: user 35.2 s, sys: 79.1 ms, total: 35.3 s
Wall time: 35.2 s


In [54]:
print(response)

Title: The Flying Cat and the Loyal Puppy

Once upon a time, in a magical land called Cloudland, there lived a little flying cat named Fluffy. Fluffy was a special cat with soft, fluffy fur and bright green eyes. She had wings that allowed her to soar through the sky like a feathered angel. One day, Fluffy met a cute little puppy named Sparky. Sparky was a tiny, happy dog with a wagging tail and a big smile.

Fluffy and Sparky became the best of friends. They lived in a beautiful forest filled with vibrant flowers and tall trees. Together, they explored the world, sharing adventures and playing together. 

One day, while they were playing, Fluffy found a magic amulet in a hidden cave. The amulet was glowing with a bright light and had the power to grant wishes. Excitedly, she wished for a magical adventure for both of them. Suddenly, they were whisked away to a mysterious island called Dreamland.

On the island, they discovered a magical garden where every flower bloomed with a differe

### Note

- Qwen 1.5 is good to generate creative story
- 14B AWQ model works well on `g4dn.xlarge` and speed is acceptable
- But understanding/generating korean is not good


### Test DJL deployment

- Deploy Qwen with DJL
- Note that `autoawq` is not stable and LMI DJL have different version (recent autoawq installation from pip using cuda 12.1 but DeepSpeed LMI DJL uses cuda 11.8) need to find correct version from [release](https://github.com/casper-hansen/AutoAWQ/releases) page.
- check the `qwen15-14b-src/requirements.txt` for details


In [2]:
from sagemaker.utils import name_from_base
from sagemaker import image_uris

In [3]:
sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()
sm_client = sagemaker_session.sagemaker_client
sm_runtime_client = sagemaker_session.sagemaker_runtime_client
default_bucket = sagemaker_session.default_bucket()

In [4]:
llm_engine = "deepspeed"
# llm_engine = "fastertransformer"

In [5]:
framework_name = f"djl-{llm_engine}"
inference_image_uri = image_uris.retrieve(
    framework=framework_name, region=sagemaker_session.boto_session.region_name, version="0.25.0"
)
print(f"Inference container uri: {inference_image_uri}")

Inference container uri: 763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.25.0-deepspeed0.11.0-cu118


In [6]:
src_dir_name = f"qwen15-14b-src"
s3_target = f"s3://{sagemaker_session.default_bucket()}/llm/qwen1.5/code/"

In [7]:
!rm -rf {src_dir_name}.tar.gz
!tar zcvf {src_dir_name}.tar.gz {src_dir_name} --exclude ".ipynb_checkpoints" --exclude "__pycache__"
!aws s3 cp {src_dir_name}.tar.gz {s3_target}

qwen15-14b-src/
qwen15-14b-src/model.py
qwen15-14b-src/requirements.txt
qwen15-14b-src/serving.properties
upload: ./qwen15-14b-src.tar.gz to s3://sagemaker-us-west-2-723597067299/llm/qwen1.5/code/qwen15-14b-src.tar.gz


In [8]:
model_uri = f"{s3_target}{src_dir_name}.tar.gz"
print(model_uri)

s3://sagemaker-us-west-2-723597067299/llm/qwen1.5/code/qwen15-14b-src.tar.gz


In [9]:
model_name = name_from_base(f"qwen15-14b-djl")
print(model_name)

create_model_response = sm_client.create_model(
    ModelName=model_name,
    ExecutionRoleArn=role,
    PrimaryContainer={"Image": inference_image_uri, "ModelDataUrl": model_uri},
)
model_arn = create_model_response["ModelArn"]

print(f"Created Model: {model_arn}")

qwen15-14b-djl-2024-02-15-16-24-07-731
Created Model: arn:aws:sagemaker:us-west-2:723597067299:model/qwen15-14b-djl-2024-02-15-16-24-07-731


In [10]:
instance_type = "ml.g4dn.xlarge"

endpoint_config_name = f"{model_name}-config"
endpoint_name = f"{model_name}-endpoint"

endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "VariantName": "variant1",
            "ModelName": model_name,
            "InstanceType": instance_type,
            "InitialInstanceCount": 1,
            "ContainerStartupHealthCheckTimeoutInSeconds": 600,
        },
    ],
)
print(endpoint_config_response)

{'EndpointConfigArn': 'arn:aws:sagemaker:us-west-2:723597067299:endpoint-config/qwen15-14b-djl-2024-02-15-16-24-07-731-config', 'ResponseMetadata': {'RequestId': '970a626b-7286-4bab-a62c-85351a84acef', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '970a626b-7286-4bab-a62c-85351a84acef', 'content-type': 'application/x-amz-json-1.1', 'content-length': '126', 'date': 'Thu, 15 Feb 2024 16:24:08 GMT'}, 'RetryAttempts': 0}}


In [11]:
create_endpoint_response = sm_client.create_endpoint(
    EndpointName=f"{endpoint_name}", EndpointConfigName=endpoint_config_name
)
print(f"Created Endpoint: {create_endpoint_response['EndpointArn']}")

Created Endpoint: arn:aws:sagemaker:us-west-2:723597067299:endpoint/qwen15-14b-djl-2024-02-15-16-24-07-731-endpoint


In [12]:
import time

resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp["EndpointStatus"]
print("Status: " + status)

while status == "Creating":
    time.sleep(60)
    resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
    status = resp["EndpointStatus"]
    print("Status: " + status)

print("Arn: " + resp["EndpointArn"])
print("Status: " + status)

Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: InService
Arn: arn:aws:sagemaker:us-west-2:723597067299:endpoint/qwen15-14b-djl-2024-02-15-16-24-07-731-endpoint
Status: InService


In [13]:
import json

In [21]:
character = "flying cat and cute puppy"
prompt = f"Tell me a creative fairy tale for children. The main character are {character}"
print(prompt)

Tell me a creative fairy tale for children. The main character are flying cat and cute puppy


In [22]:
%%time
# prompts = [prompt]

instruction = """
You are a story teller for kids. Please make a story for kids in english
- The story should start with \"Title:\"
- The end of story should finished by \"The end.\" and stop to make story.
- Please break out the sentences appropriately.
- Make the story as long as possible.
"""

response_model = sm_runtime_client.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=json.dumps(
        {
            "input_text": prompt,
            "instruction": instruction,
            "parameters": {
                "max_new_tokens": 4096,
                "temperature": 0.9,
                "top_p": 0.9,
            },
        }
    ),
    ContentType="application/json",
)

CPU times: user 4.2 ms, sys: 693 µs, total: 4.89 ms
Wall time: 34 s


In [23]:
output = str(response_model["Body"].read(), "utf-8")
print(output)

Title: The Magical Flight of Feline and Pup

Once upon a time, in a magical forest called Whimsywood, there lived a curious flying cat named Felix, who had soft fur the color of sunshine and enormous ears that allowed him to hear the whispers of the wind. His best friend was a cheerful little puppy named Paws, who had a fluffy coat as white as snow and a wagging tail that could light up even the darkest nights.

Felix and Paws were inseparable, spending their days exploring the enchanted trees and talking to the friendly forest creatures. One sunny afternoon, while they were playing near a sparkling pond, they discovered an ancient, enchanted amulet. It glimmered under the sunlight, and as they touched it, a warm glow spread across them.

Suddenly, Paws transformed into a talking dog, his barks turned into sentences, and he was filled with boundless energy. Felix, now a magnificent feline superhero, leaped higher than ever before, feeling the power of the amulet within him. They were o