## StableLM local test

- Test StableLM as a local mode
- base model : https://huggingface.co/stabilityai/stablelm-base-alpha-7b
- fine-tuned model : https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b
- Code example : https://github.com/Stability-AI/StableLM

### License Issue
- Note that tuned model is not a commercial license. (base model is possible to use commercial purpose)

### Tested version

Tested on `Python 3.9.15`

```
sagemaker: 2.146.0
transformers: 4.29.2
torch: 1.13.1
accelerate: 0.19.0
sentencepiece: 0.1.99
bitsandbytes: 0.38.1
```

In [None]:
# !pip install -q transformers accelerate sentencepiece bitsandbytes

- Test local mode first here.
- You can download model using git lfs, but also can use HF package
```
git lfs install
git clone https://huggingface.co/stabilityai/stablelm-base-alpha-7b
```

In [None]:
import sagemaker
import transformers
print(sagemaker.__version__)
print(transformers.__version__)

In [None]:
from huggingface_hub import snapshot_download
from pathlib import Path
import os

local_model_path = Path("./pretrained-models")
local_model_path.mkdir(exist_ok=True)
model_name = "stabilityai/stablelm-base-alpha-7b"
# model_name = "stabilityai/stablelm-tuned-alpha-7b"
allow_patterns = ["*.json", "*.pt", "*.bin", "*.txt", "*.model", "*.py"]

model_download_path = snapshot_download(
    repo_id=model_name,
    cache_dir=local_model_path,
    allow_patterns=allow_patterns,
)

In [None]:
model_download_path

### Instance size and model

- int8 quantization consumes more than 10GB of GPU memory. `g4dn.xlarge` is possible
- float16 needs at least `g5.2xlarge` instance

In [None]:
import os
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList

model_path = model_download_path

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, load_in_8bit=True, device_map="auto")
# model.half().cuda()

### Test model inference

- After loading model you can test inference.
- Fine-tuned model needs default prompt for better performance, and for the base model you can just input simple text

In [None]:
import torch

In [None]:

class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        stop_ids = [50278, 50279, 50277, 1, 0]
        for stop_id in stop_ids:
            if input_ids[0][-1] == stop_id:
                return True
        return False


In [None]:
# system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version)
# - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
# - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
# - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
# - StableLM will refuse to participate in anything that could harm a human.
# """

# prompt = f"{system_prompt}<|USER|>Hi, when can I get a driver license?<|ASSISTANT|>"


In [None]:
# prompt = f"{system_prompt}"
# prompt += "<|USER|>Hi, when can I get a driver license?<|ASSISTANT|>As an AI language model, I don't have access to real-time data, but typically, it would be possible to obtain a driver's license as long as you are legally eligible to drive and have the necessary documents. Some states and countries may have different regulations or requirements for obtaining a driver's license, so it's best to check with the relevant authorities for the state or country you plan to visit."
# prompt += "<|USER|>How about Japan?\n<|ASSISTANT|>"
# print(prompt)

In [None]:
prompt = "Could you recommend some food at this weekend?"

In [None]:
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

In [None]:
%%time
tokens = model.generate(
  **inputs,
  max_new_tokens=256,
  temperature=0.7,
  do_sample=True,
  # stopping_criteria=StoppingCriteriaList([StopOnTokens()])
)

output = tokenizer.decode(tokens[0], skip_special_tokens=True)


In [None]:
print(output)

In [None]:
s3_model_prefix = "llm/stablelm/model"  # folder where model checkpoint will go

In [None]:
base_7b_s3 = f"{s3_model_prefix}/base-7b"

In [None]:
sagemaker_session = sagemaker.Session()
stablelm_model_artifact = sagemaker_session.upload_data(path=model_download_path, key_prefix=base_7b_s3)


In [None]:
%store model_download_path
%store stablelm_model_artifact