<a href="https://colab.research.google.com/github/rahiakela/transformers-research-and-practice/blob/main/developing-kaggle-notebooks/10-GenAI/01_prompting_foundation_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:

# IMPORTANT: RUN THIS CELL IN ORDER TO IMPORT YOUR KAGGLE DATA SOURCES
# TO THE CORRECT LOCATION (/kaggle/input) IN YOUR NOTEBOOK,
# THEN FEEL FREE TO DELETE THIS CELL.
# NOTE: THIS NOTEBOOK ENVIRONMENT DIFFERS FROM KAGGLE'S PYTHON
# ENVIRONMENT SO THERE MAY BE MISSING LIBRARIES USED BY YOUR
# NOTEBOOK.

import os
import sys
from tempfile import NamedTemporaryFile
from urllib.request import urlopen
from urllib.parse import unquote, urlparse
from urllib.error import HTTPError
from zipfile import ZipFile
import tarfile
import shutil

CHUNK_SIZE = 40960
DATA_SOURCE_MAPPING = 'llama-2/pytorch/7b-chat-hf/1:https%3A%2F%2Fstorage.googleapis.com%2Fkaggle-models-data%2F3093%2F4298%2Fbundle%2Farchive.tar.gz%3FX-Goog-Algorithm%3DGOOG4-RSA-SHA256%26X-Goog-Credential%3Dgcp-kaggle-com%2540kaggle-161607.iam.gserviceaccount.com%252F20240130%252Fauto%252Fstorage%252Fgoog4_request%26X-Goog-Date%3D20240130T083846Z%26X-Goog-Expires%3D259200%26X-Goog-SignedHeaders%3Dhost%26X-Goog-Signature%3D6b73f960d8428fdaaa16d27cc349b415784b917861e35ddc90c1dca9082a1ac804f1460d983685d90a60f1aa4489f018fd24ac711367bd9a9d72c6e7c40b496b54d77b2742976e8f9ec49b31f8d8e9aae82de7fc2106086adca7fb9f1f3af2d957f9230aafd442bc8a8b1e4370568cbc241199980dc90826dd7c8f0a31839f86029d71dfc795e1d0ad64c36579020fe6a45e7c84b5ae8150d138590153fc91c0d749b443ff85a61c335da1f9b497894c38b3e1487b2b1dc855bd596a414bc473e143a455200a3ec272ce214a9acdf55ec54b81d72d398774d5739f3d19ae8a9575141431eb2076cc83b98b0e5554c00049a93eda5561e6762a14247471f480da'

KAGGLE_INPUT_PATH='/kaggle/input'
KAGGLE_WORKING_PATH='/kaggle/working'
KAGGLE_SYMLINK='kaggle'

!umount /kaggle/input/ 2> /dev/null
shutil.rmtree('/kaggle/input', ignore_errors=True)
os.makedirs(KAGGLE_INPUT_PATH, 0o777, exist_ok=True)
os.makedirs(KAGGLE_WORKING_PATH, 0o777, exist_ok=True)

try:
  os.symlink(KAGGLE_INPUT_PATH, os.path.join("..", 'input'), target_is_directory=True)
except FileExistsError:
  pass
try:
  os.symlink(KAGGLE_WORKING_PATH, os.path.join("..", 'working'), target_is_directory=True)
except FileExistsError:
  pass

for data_source_mapping in DATA_SOURCE_MAPPING.split(','):
    directory, download_url_encoded = data_source_mapping.split(':')
    download_url = unquote(download_url_encoded)
    filename = urlparse(download_url).path
    destination_path = os.path.join(KAGGLE_INPUT_PATH, directory)
    try:
        with urlopen(download_url) as fileres, NamedTemporaryFile() as tfile:
            total_length = fileres.headers['content-length']
            print(f'Downloading {directory}, {total_length} bytes compressed')
            dl = 0
            data = fileres.read(CHUNK_SIZE)
            while len(data) > 0:
                dl += len(data)
                tfile.write(data)
                done = int(50 * dl / int(total_length))
                sys.stdout.write(f"\r[{'=' * done}{' ' * (50-done)}] {dl} bytes downloaded")
                sys.stdout.flush()
                data = fileres.read(CHUNK_SIZE)
            if filename.endswith('.zip'):
              with ZipFile(tfile) as zfile:
                zfile.extractall(destination_path)
            else:
              with tarfile.open(tfile.name) as tarfile:
                tarfile.extractall(destination_path)
            print(f'\nDownloaded and uncompressed: {directory}')
    except HTTPError as e:
        print(f'Failed to load (likely expired) {download_url} to path {destination_path}')
        continue
    except OSError as e:
        print(f'Failed to load {download_url} to path {destination_path}')
        continue

print('Data source import complete.')


Downloading llama-2/pytorch/7b-chat-hf/1, 20836388871 bytes compressed
Downloaded and uncompressed: llama-2/pytorch/7b-chat-hf/1
Data source import complete.


# Introduction  


We will use LlaMa v2 model to test if it can be used to perform simple math operations.

The model used is:

* **Model**: Llama 2  
* **Variation**: 7b-chat-hf  
* **Version**: V1  
* **Framework**: PyTorch  


# Imports and utils

In [None]:
!pip install xformers
!pip install accelerate

In [1]:
from transformers import AutoTokenizer
import transformers
import torch
import warnings
from time import time

warnings.filterwarnings('ignore')

In [4]:
def load_model_tokenize_create_pipeline():
    """
    Load the model
    Create a
    Args
    Returns:
        tokenizer
        pipeline
    """
    # adapted from https://huggingface.co/blog/llama2#using-transformers
    time_1 = time()
    model = "/kaggle/input/llama-2/pytorch/7b-chat-hf/1"
    tokenizer = AutoTokenizer.from_pretrained(model)
    time_2 = time()
    print(f"Load model and init tokenizer: {round(time_2-time_1, 3)}")
    pipeline = transformers.pipeline(
        "text-generation",
        model=model,
        torch_dtype=torch.float16,
        device_map="auto",)
    time_3 = time()
    print(f"Prepare pipeline: {round(time_3-time_2, 3)}")
    return tokenizer, pipeline

In [5]:
def test_model(tokenizer, pipeline, prompt_to_test):
    """
    Perform a query
    print the result
    Args:
        tokenizer: the tokenizer
        pipeline: the pipeline
        prompt_to_test: the prompt
    Returns
        None
    """
    # adapted from https://huggingface.co/blog/llama2#using-transformers
    time_1 = time()
    sequences = pipeline(
        prompt_to_test,
        do_sample=True,
        top_k=10,
        num_return_sequences=1,
        eos_token_id=tokenizer.eos_token_id,
        max_length=200,)
    time_2 = time()
    print(f"Test inference: {round(time_2-time_1, 3)}")
    for seq in sequences:
        print(f"Result: {seq['generated_text']}")

# Initialize the model, tokenizer and create a pipeline

In [None]:
tokenizer, pipeline = load_model_tokenize_create_pipeline()

After we initialized the model and the tokenizer, we created a pipeline. Creating of the pipeline takes the longest time.

Now let's test the model with few mathematical prompts.

# Can LlaMa v2 do simple math?

## Prompt #1: Perform a simple sum

In [7]:
prompt_to_test = 'Prompt: Adrian has three apples. His sister Anne has ten apples more than him. How many apples has Anne?'
test_model(tokenizer, pipeline, prompt_to_test)

Test inference: 11.485
Result: Prompt: Adrian has three apples. His sister Anne has ten apples more than him. How many apples has Anne?

Solution: Let's use algebra to solve this problem. Let's say Adrian has x apples. Since Anne has ten apples more than him, Anne has x + 10 apples.

Now, we can use the information given in the problem to find the value of x. We know that Adrian has three apples, so x = 3.

So, Anne has 10 + 3 = 13 apples.

Therefore, Anne has 13 apples.


## Prompt #2: Ask for the area of a circle, giving the radius

In [8]:
prompt_to_test = 'Prompt: A circle has the radius 5. What is the area of the circle?'
test_model(tokenizer, pipeline, prompt_to_test)

Test inference: 7.263
Result: Prompt: A circle has the radius 5. What is the area of the circle?

Answer: The formula for the area of a circle is:

A = πr^2

where A is the area of the circle and r is the radius of the circle.

So, in this case, the radius of the circle is 5, so the area of the circle is:

A = π(5)^2
= 3.14(25)
= 78.5

Therefore, the area of the circle is 78.5 square units.


## Prompt #3: Calculate an equation with two unknowns

In [9]:
prompt_to_test = 'Prompt: Anne and Adrian have a total of 10 apples. Anne has 2 apples more than Adrian.\
How many apples has each of the children Anne and Adrian?'
test_model(tokenizer, pipeline, prompt_to_test)

Test inference: 9.631
Result: Prompt: Anne and Adrian have a total of 10 apples. Anne has 2 apples more than Adrian.How many apples has each of the children Anne and Adrian? Solution: We are given that Anne has 2 apples more than Adrian, so Adrian has x apples. We are also told that Anne has a total of 10 apples, so Anne has 10 - 2 = 8 apples. So, Adrian has x = 8 apples and Anne has 8 apples. Explanation: Let's use the information given in the problem to find out how many apples each of the children, Anne and Adrian, has. We know that Anne has 2 more apples than Adrian, so Adrian has x apples. We are also told that Anne has a total of 10 apples, so Anne has 10 - 2 = 8


## Prompt #4: Calculate the no solution of linear equation

In [10]:
prompt_to_test = 'Prompt: A thief run away 1 km per hour and after that a police follow him 1 km per hour.\
When police will catch the thief?'
test_model(tokenizer, pipeline, prompt_to_test)

Test inference: 5.633
Result: Prompt: A thief run away 1 km per hour and after that a police follow him 1 km per hour.When police will catch the thief?
Answer:
The police will catch the thief in 2 hours.
Explanation:
The thief is running away at a rate of 1 km per hour, which means that the police will cover the same distance in 1 hour.
So, after 1 hour, the police will be 1 km behind the thief.
Therefore, it will take the police 2 hours to catch the thief.


In [11]:
prompt_to_test = 'Prompt: A thief run away with speed of 1 km per hour and after 1 hour, a police follow him with speed uof 1 km per hour.\
When police will catch the thief?'
test_model(tokenizer, pipeline, prompt_to_test)

Test inference: 9.9
Result: Prompt: A thief run away with speed of 1 km per hour and after 1 hour, a police follow him with speed uof 1 km per hour.When police will catch the thief?
A thief is running away with a speed of 1 km per hour, and after 1 hour, a police officer starts chasing him with the same speed. When will the police officer catch the thief?
Let's first calculate the distance traveled by the thief in 1 hour:
Distance = Speed x Time
Distance = 1 km/h x 1 h = 1 km
Now, let's calculate the distance traveled by the police officer in the same 1 hour:
Distance = Speed x Time
Distance = 1 km/h x 1 h = 1 km
Since both the thief and the police officer have traveled the same distance, the thief is now behind


# Conclusions


After we initialized the model and the tokenizer, which took on GPU T4 x2 at the first run under 200 sec. (this time might be variable), then each prompt only took less than 10 sec.

The simple math questions are answered sometime correct, sometime incorrectly.

It might be related to the temperature factor (that is not set here explicitly).