In [None]:
!pip install transformers

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer

In [2]:
def get_generator(model_id: str):
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

    def generator(prompt: str):
        model_inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
        generated_ids = model.generate(**model_inputs, max_new_tokens=100, do_sample=True)
        return tokenizer.batch_decode(generated_ids)[0]

    return generator

In [3]:
MODELS = [
    "lmsys/vicuna-7b-v1.5",
    "openai-community/gpt2",
    "microsoft/wavecoder-ultra-6.7b",
    "Qwen/Qwen2-7B-Instruct",
]

In [4]:
QUESTION = "What are large language models?"

# Only question
TEMPLATE_0 = QUESTION

# Question-Answer template
TEMPLATE_1 =  f"""SYSTEM: A chat between a curious user and an AI assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
USER: {QUESTION}
ASSISTANT:"""

# Template for instruction tuned models
TEMPLATE_2 = f"""[INST] <s>
You are an AI assistant. You are supposed to give helpful, detailed, and polite answers to the user's questions.
</s>
Answer the following question: {QUESTION}[/INST]"""

# Template using few-shot technique
TEMPLATE_3 = f"""SYSTEM: A chat between a curious user and an AI assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
USER: Hi
ASSISTANT: Hi! How can I help you?
USER: {QUESTION}
ASSISTANT:"""

In [5]:
generator = get_generator(MODELS[3])

tokenizer_config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.03M [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/663 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/27.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/3.95G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/3.86G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/3.86G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/3.56G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/243 [00:00<?, ?B/s]

In [6]:
generator(TEMPLATE_1)

2024-06-08 10:28:44.652774: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-08 10:28:44.652873: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-08 10:28:44.811570: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


'SYSTEM: A chat between a curious user and an AI assistant. The assistant gives helpful, detailed, and polite answers to the user\'s questions.\nUSER: What are large language models?\nASSISTANT: Large language models refer to sophisticated computer algorithms designed to generate human-like text. These models are trained on vast amounts of textual data, allowing them to learn patterns, syntax, and semantics of natural language. They can be used for various applications such as language translation, text summarization, chatbot creation, content generation, and more.\n\nThe term "large" in large language models typically refers to their size, which is measured by factors like the number of parameters they contain. Larger models generally have more parameters'

In [8]:
generator(TEMPLATE_2)

"[INST] <s>\nYou are an AI assistant. You are supposed to give helpful, detailed, and polite answers to the user's questions.\n</s>\nAnswer the following question: What are large language models?[/INST] [INST] <s>\nLarge language models refer to sophisticated artificial intelligence systems designed to understand, generate, and manipulate human language. These models are typically based on deep learning algorithms, specifically neural networks, which have been trained on vast amounts of textual data.\n\nThe primary goal of large language models is to learn patterns and structures in language, enabling them to perform various natural language processing (NLP) tasks such as text generation, translation, summarization, question answering, and more. They can be categorized"

In [9]:
generator(TEMPLATE_3)

"SYSTEM: A chat between a curious user and an AI assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\nUSER: Hi\nASSISTANT: Hi! How can I help you?\nUSER: What are large language models?\nASSISTANT: Large language models refer to complex computer algorithms that are designed to generate human-like text based on patterns learned from vast amounts of textual data. These models are typically trained using deep learning techniques, such as those based on neural networks.\n\nThere are several types of large language models:\n\n1. Generative Pretrained Transformer (GPT) series: Developed by the AI research lab at Alibaba Cloud, GPT is a family of models that uses a transformer architecture to generate text. The original GPT model was trained"