<a href="https://www.kaggle.com/code/gpreda/test-llama-v2-with-math?scriptVersionId=144057186" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Introduction  


We will use LlaMa v2 model to test if it can be used to perform simple math operations.

The model used is:

* **Model**: Llama 2  
* **Variation**: 7b-chat-hf  
* **Version**: V1  
* **Framework**: PyTorch  


# Imports and utils

In [None]:
from transformers import AutoTokenizer
import transformers
import torch
import warnings
from time import time
warnings.filterwarnings('ignore')
!pip install xformers

In [None]:
def load_model_tokenize_create_pipeline():
    """
    Load the model
    Create a 
    Args
    Returns:
        tokenizer
        pipeline
    """
    # adapted from https://huggingface.co/blog/llama2#using-transformers
    time_1 = time()
    model = "/kaggle/input/llama-2/pytorch/7b-chat-hf/1"
    tokenizer = AutoTokenizer.from_pretrained(model)
    time_2 = time()
    print(f"Load model and init tokenizer: {round(time_2-time_1, 3)}")
    pipeline = transformers.pipeline(
        "text-generation",
        model=model,
        torch_dtype=torch.float16,
        device_map="auto",)
    time_3 = time()
    print(f"Prepare pipeline: {round(time_3-time_2, 3)}")
    return tokenizer, pipeline

In [None]:
def test_model(tokenizer, pipeline, prompt_to_test):
    """
    Perform a query
    print the result
    Args:
        tokenizer: the tokenizer
        pipeline: the pipeline
        prompt_to_test: the prompt
    Returns
        None
    """
    # adapted from https://huggingface.co/blog/llama2#using-transformers
    time_1 = time()
    sequences = pipeline(
        prompt_to_test,
        do_sample=True,
        top_k=10,
        num_return_sequences=1,
        eos_token_id=tokenizer.eos_token_id,
        max_length=200,)
    time_2 = time()
    print(f"Test inference: {round(time_2-time_1, 3)}")
    for seq in sequences:
        print(f"Result: {seq['generated_text']}")

# Initialize the model, tokenizer and create a pipeline

In [None]:
tokenizer, pipeline = load_model_tokenize_create_pipeline()

After we initialized the model and the tokenizer, we created a pipeline. Creatin of the pipeline takes the longest time.
Now let's test the model with few mathematical prompts.

# Can LlaMa v2 do simple math?

## Prompt #1: Perform a simple sum

In [None]:
prompt_to_test = 'Prompt: Adrian has three apples. His sister Anne has ten apples more than him. How many apples has Anne?'
test_model(tokenizer, pipeline, prompt_to_test)

## Prompt #2: Ask for the area of a circle, giving the radius

In [None]:
prompt_to_test = 'Prompt: A circle has the radius 5. What is the area of the circle?'
test_model(tokenizer, pipeline, prompt_to_test)

## Prompt #3: Calculate an equation with two unknowns

In [None]:
prompt_to_test = 'Prompt: Anne and Adrian have a total of 10 apples. Anne has 2 apples more than Adrian.\
How many apples has each of the children Anne and Adrian?'
test_model(tokenizer, pipeline, prompt_to_test)

# Conclusions


After we initialized the model and the tokenizer, which took on GPU T4 x2 at the first run under 200 sec. (this time might be variable), then each prompt only took less than 10 sec. 

The simple math questions are answered sometime correct, sometime incorrectly. 

It might be related to the temperature factor (that is not set here explicitly).