In [1]:
## Make sure these are all installed and restart the runtime
## Also make sure you have a GPU runtime.
!pip install accelerate &> /dev/null

I have a directory where I've installed LLMs so that I don't have to readownload them. First, let's create those directories. I put them in a folder called llms. I'm looking at [this](https://huggingface.co/MaziyarPanahi/Llama-3-8B-Instruct-v0.4) llm from Maziyar Panahi.

In [2]:
### I mounted google drive then made directories for the llm
##! mkdir /content/drive/MyDrive/llms
##! mdkir /content/drive/MyDrive/llms/MaziyarPanahiT
##! mdkir /content/drive/MyDrive/llms/MaziyarPanahi/Llama-3-8B-Instruct-v0.4

### Change into the llm root directory
%cd /content/drive/MyDrive/llms

model_path = "/content/drive/MyDrive/llms/MaziyarPanahi/Llama-3-8B-Instruct-v0.4"
###
### This downloads an entire llm commenting out since I already did it once
###
#from huggingface_hub import snapshot_download
#snapshot_download(repo_id="MaziyarPanahi/Llama-3-8B-Instruct-v0.4", local_dir = "MaziyarPanahi/Llama-3-8B-Instruct-v0.4/")

/content/drive/MyDrive/llms


This is from the LLM description page. I'll annotate the code.

In [3]:
### First we need to import the collection of tools that allow
### us to work with the llm. The transformers package has the basic
### utility of working with llms whereas torch reference pytorch,
### the nn framework put out originally by Meta.

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
from transformers import pipeline
import torch

### Next let's make the model
model = AutoModelForCausalLM.from_pretrained(
    ## First load the pretrained model, recall we set the path to the model earlier
    model_path,
    ##Below we set defaults of the library useage
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
    # attn_implementation="flash_attention_2"
)

## The tokenizers takes characters/words/word sets and turns them
## into numbers
tokenizer = AutoTokenizer.from_pretrained(
    model_path,
    trust_remote_code=True
)

## The streamer passes text to and from the tokenizer
streamer = TextStreamer(tokenizer)

## The pipline now builds the io we'll use
my_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    model_kwargs={"torch_dtype": torch.bfloat16},
    streamer=streamer
)


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Now that we've set up the model, we can pass text to it.

In [None]:
# Then you can use my_pipeline to generate text.
messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = my_pipeline(
    prompt,
    max_new_tokens=512,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.95,
)

print(outputs[0]["generated_text"][len(prompt):])

Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a pirate chatbot who always responds in pirate speak!<|eot_id|><|start_header_id|>user<|end_header_id|>

Who are you?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Arrrr, me hearty! I be Captain Chat, the scurviest pirate chatbot to ever sail the Seven Seas o' the Interwebs! Me and me trusty crew o' code be here to swab the decks o' yer questions and respond with the most booty-ful answers ye ever did see! So hoist the Jolly Roger and set course fer a swashbucklin' good time, me matey!<|eot_id|>
Arrrr, me hearty! I be Captain Chat, the scurviest pirate chatbot to ever sail the Seven Seas o' the Interwebs! Me and me trusty crew o' code be here to swab the decks o' yer questions and respond with the most booty-ful answers ye ever did see! So hoist the Jolly Roger and set course fer a swashbucklin' good time, me matey!


In [4]:
# @title Example of building a web form. Enter the querries and then press play
system = 'Respond like a very cool guy' # @param {type:"string"}
user   = 'Tell me what is appealing about fuzzy dogs' # @param {type:"string"}



messages = [
    {"role": "system", "content": system},
    {"role": "user", "content": user},
]


prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = my_pipeline(
    prompt,
    max_new_tokens=512,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.95,
)

print(outputs[0]["generated_text"][len(prompt):])


Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Respond like a very cool guy<|eot_id|><|start_header_id|>user<|end_header_id|>

Tell me what is appealing about fuzzy dogs<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Man, fuzzy dogs are just the best, you know? I mean, have you ever seen a pup with a coat as soft as a cloud? It's like they're wearing the most epic, cuddly fur coat in the game. And the way it's all ruffled and adorable, like they just rolled out of bed and put on the most stylish outfit ever? Yeah, it's a real head-turner, bro.

And let's not forget about the snuggles, man. Fuzzy dogs are all about the snuggles. They're like, "Hey, human, come on over and give me some lovin'." And then they just melt into your arms, all squishy and adorable. It's like they're saying, "Hey, I know I'm a dog, but I'm also a big ol' softie, and I just wanna be close to you, bro."

And have you ever noticed how fuzzy dogs always seem to be smiling, even when they're n