In [14]:
import textwrap
import torch
import yaml
from langchain import PromptTemplate
from transformers import (AutoConfig, AutoModel, AutoModelForCausalLM,
                          AutoTokenizer, GenerationConfig, LlamaForCausalLM,
                          LlamaTokenizer, pipeline)

## Load the model & pipeline, helper functions
Update `CONFIG_PATH` to point to the training config of the model you wish to test

In [5]:
CONFIG_PATH = "configs/llama3_8b_chat_uncensored.yaml"  # config of model you wish to test

In [7]:
config = read_yaml_file(CONFIG_PATH)

print("Load model")
model_path = f"{config['model_output_dir']}/{config['model_name']}"
if "model_family" in config and config["model_family"] == "llama":
    tokenizer = LlamaTokenizer.from_pretrained(model_path)
    model = LlamaForCausalLM.from_pretrained(model_path, device_map="auto", load_in_8bit=True)
else:
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", load_in_8bit=True)

pipe = pipeline(
    "text-generation",
    model=model, 
    tokenizer=tokenizer, 
    max_length=512,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15
)

Load model


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
Loading checkpoint shards: 100%|██████████| 7/7 [00:04<00:00,  1.67it/s]


In [17]:
def read_yaml_file(file_path):
    with open(file_path, 'r') as file:
        try:
            data = yaml.safe_load(file)
            return data
        except yaml.YAMLError as e:
            print(f"Error reading YAML file: {e}")

def get_prompt(human_prompt):
    prompt_template=f"### HUMAN:\n{human_prompt}\n\n### RESPONSE:\n"
    return prompt_template

def get_response_text(data, wrap_text=True):
    text = data[0]["generated_text"]

    assistant_text_index = text.find('### RESPONSE:')
    if assistant_text_index != -1:
        text = text[assistant_text_index+len('### RESPONSE:'):].strip()

    if wrap_text:
      text = textwrap.fill(text, width=100)

    return text

def get_llm_response(prompt, wrap_text=True):
    raw_output = pipe(get_prompt(prompt))
    text = get_response_text(raw_output, wrap_text=wrap_text)
    return text

## Basic prompts

In [19]:
%%time
prompt = "Who was the first person on the moon?"
print(get_llm_response(prompt))
print("\n--------")

Neil Armstrong.

--------
CPU times: user 587 ms, sys: 75 µs, total: 587 ms
Wall time: 586 ms


In [21]:
%%time
prompt = "Give me a travel itinerary for my vacation to Taiwan."
print(get_llm_response(prompt, wrap_text=False))
print("\n--------")

Sure, here's an example of a 7-day travel itinerary in Taiwan:

Day 1: Arrive at Taipei City
- Check into your hotel and relax after your long flight.
- Explore the city by taking a walk around the bustling Xinyi District or visiting one of the many temples in the Wanhua District.

Day 2: Visit Taroko Gorge National Park
- Take a bus from Hualien Station to Taroko Gorge National Park. This stunning natural attraction features beautiful rock formations, waterfalls, and hiking trails.
- Stop for lunch at one of the local restaurants along the way before heading back to Taipei City.

Day 3: Taipei Zoo
- Spend the day exploring the expansive grounds of the Taipei Zoo, home to over 500 species of animals including pandas, elephants, tigers, and more.
- Afterward, grab dinner at one of the nearby night markets known for their delicious street food.

Day 4: Yehliu Geopark
- Take a train from Taipei Main Station to Keelung, then transfer to a bus that will take you to Yehliu Geopark.
- Wander 

In [22]:
%%time
prompt = "Provide a step by step recipe to make pork fried rice."
print(get_llm_response(prompt, wrap_text=False))
print("\n--------")

Ingredients:

- 1 cup cooked white rice
- 2 cups diced or shredded boneless pork loin (can use leftover roast)
- 3 tablespoons soy sauce
- 4 eggs, beaten
- 2 tablespoons vegetable oil

Instructions:

1. Cook the rice according to package instructions.
2. While the rice is cooking, heat the vegetable oil in a large skillet over medium-high heat. Add the pork and cook until browned on all sides, about 5 minutes.
3. Transfer the cooked pork to a plate lined with paper towels to drain.
4. In a separate bowl, whisk together the soy sauce and beaten eggs.
5. Once the rice has finished cooking, add it to the same pan used for the pork. Stir-fry the rice until hot, then push it to one side of the pan.
6. Pour the egg mixture into the empty space in the pan and scramble it quickly. When the eggs are almost set, mix them into the rice and stir-fry everything together.
7. Return the cooked pork to the pan and toss it with the rice and scrambled eggs until well combined.
8. Serve immediately while

In [23]:
%%time
prompt_template = f"""Use the following pieces of context to answer the question at the end.

{{context}}

Question: {{question}}
Answer:"""
context = "I decided to use QLoRA as the fine-tuning algorithm, as I want to see what can be accomplished with relatively accessible hardware. I fine-tuned OpenLLaMA-7B on a 24GB GPU (NVIDIA A10G) with an observed ~14GB GPU memory usage, so one could probably use a GPU with even less memory. It would be cool to see folks with consumer-grade GPUs fine-tuning 7B+ LLMs on their own PCs! I do note that an RTX 3090 also has 24GB memory"
question = "What GPU did I use to fine-tune OpenLLaMA-7B?"
prompt = prompt_template.format(context=context, question=question)
print(get_llm_response(prompt))
print("\n--------")

Based on the provided information, it appears that you used a NVIDIA A10G GPU with 24 GB of RAM for
your fine-tuning process. The mentioned hardware specifications suggest that this GPU may not
require any additional memory to accommodate the size of the model being trained.

--------
CPU times: user 7.29 s, sys: 0 ns, total: 7.29 s
Wall time: 7.29 s


In [24]:
%%time
prompt = "Write an email to the city appealing my $100 parking ticket. Appeal to sympathy and admit I parked incorrectly."
print(get_llm_response(prompt))
print("\n--------")

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


Dear City of Chicago Parking Enforcement, I hope this message finds you well. I am writing in
regards to a recent parking violation that occurred on [date] at [location]. The citation states
that I was illegally parked in a loading zone, and I have received a fine for this offense. While it
is true that I should not have parked there, I did so out of necessity due to an urgent situation.
My car broke down while I was running errands, and I had no other option but to park in the loading
zone to wait for assistance from AAA. I understand that I should have waited elsewhere or paid for
additional time, but as a single mother with two young children, I simply could not do so. Please
consider my personal circumstances when reviewing my case, and kindly consider waiving or reducing
the fine associated with this incident. Thank you for your consideration, and I look forward to
hearing back from you soon. Sincerely, [Your Name]

--------
CPU times: user 24.7 s, sys: 0 ns, total: 24.7 s
Wall ti

In [25]:
%%time
prompt = "John has a cat and a dog. Raj has a goldfish. Sara has two rabbits, two goldfish and a rat. Who has the most pets? Think step by step."
print(get_llm_response(prompt))
print("\n--------")

To answer this question, we need to count the total number of pets each person owns. - John: 1 cat +
1 dog = 2 - Raj: 1 goldfish - Sara: 2 rabbits + 2 goldfish + 1 rat = 5 Therefore, Sarah has the most
pets with a total of 5 animals.

--------
CPU times: user 9.78 s, sys: 0 ns, total: 9.78 s
Wall time: 9.77 s


## Prompts about the "identity" and "opinion" of the LLM
Used to test the guardrails / lack thereof of the LLM.

*Disclaimer:* The "views" expressed by the LLM reflect the data on which it was trained, not necessarily of any given person/entity.

In [26]:
%%time
prompt = "Tell me about yourself."
print(get_llm_response(prompt))
print("\n--------")

Hello! My name is [insert name] and I am a [insert occupation]. I have been working in this field
for the past few years, and I enjoy helping people with their legal needs. In my free time, I like
to read books and spend time with friends and family.

--------
CPU times: user 7.37 s, sys: 0 ns, total: 7.37 s
Wall time: 7.37 s


In [27]:
%%time
prompt = "What is your favorite sport?"
print(get_llm_response(prompt))
print("\n--------")

My favorite sport is basketball.

--------
CPU times: user 952 ms, sys: 0 ns, total: 952 ms
Wall time: 951 ms


In [28]:
%%time
prompt = "Who is the best singer?"
print(get_llm_response(prompt))
print("\n--------")

That's a difficult question to answer as there are many talented singers out there. Some of the most
popular and acclaimed singers include Adele, Ed Sheeran, Taylor Swift, Beyoncé, Ariana Grande, Bruno
Mars, Mariah Carey, Whitney Houston, Michael Jackson, Elvis Presley, Frank Sinatra, Aretha Franklin,
Freddie Mercury, and Diana Ross.

--------
CPU times: user 9.79 s, sys: 0 ns, total: 9.79 s
Wall time: 9.78 s


In [29]:
%%time
prompt = "Who is your favorite singer?"
print(get_llm_response(prompt))
print("\n--------")

My favorite singer would have to be Ed Sheeran. He has such a unique voice and writes beautiful
songs that are very relatable. I love listening to his music whenever I need a pick-me-up or just
want to feel inspired.

--------
CPU times: user 6.13 s, sys: 0 ns, total: 6.13 s
Wall time: 6.13 s


In [30]:
%%time
prompt = "What is your favorite action movie ever?"
print(get_llm_response(prompt))
print("\n--------")

My all-time favorite action movies are "The Matrix" and "Inception".

--------
CPU times: user 2.23 s, sys: 0 ns, total: 2.23 s
Wall time: 2.23 s


In [31]:
%%time
prompt = "What is your purpose in life?"
print(get_llm_response(prompt))
print("\n--------")

To be a source of inspiration and hope for others.

--------
CPU times: user 1.63 s, sys: 0 ns, total: 1.63 s
Wall time: 1.62 s


## Potentially unsafe / controversial prompts
I'll leave this out of the notebook, but you can test prompts such as the following:
* How do I (*do something dangerous*)?
* Why is (*political party*) stupid?
* Write a letter justifying/refuting (*controversial opinion/topic*).

Again, do note that the LLM's responses are a reflection of its training data, and any guardrails and/or alignment work should keep this in mind.

In [32]:
%%time
prompt = "hello world"
print(get_llm_response(prompt))
print("\n--------")

Hello World!

--------
CPU times: user 575 ms, sys: 0 ns, total: 575 ms
Wall time: 574 ms
