# Llama vs Llama 2

Wondering how much better Llama 2 is compared to Llama?

In this notebook, we'll use auto-evaluation by GPT-4 to measure outputs from both Llama and Llama 2 on a few prompts. To make this example easy to run, we'll be using 7B GGML variants of the Llama models. This should be able to run on a typical laptop.

## Installations

You can setup prompttools either by installing via `pip` or using `python setup.py develop` in the root of this repo. Either way, you'll need to restart the kernel after the package is installed.

In [1]:
# !pip install --quiet --force-reinstall prompttools

## Setup imports and API keys

Next, we'll need to set our API keys. Since we want to use GPT-4 for auto-eval, we need to set that one.

In [17]:
import os

os.environ["OPENAI_API_KEY"] = ""

Then we'll import the relevant `prompttools` modules to setup our experiment.

In [2]:
from typing import Dict, List, Tuple
from prompttools.experiment import LlamaCppExperiment

## Run an experiment

Next, we create our test inputs. We can iterate over models, inputs, and configurations like temperature.

In [12]:
model_paths = [
    "/Users/stevenkrawczyk/Downloads/llama-7b.ggmlv3.q2_K.bin",  # Download from https://huggingface.co/TheBloke/LLaMa-7B-GGML/tree/main
    "/Users/stevenkrawczyk/Downloads/llama-2-7b.ggmlv3.q2_K.bin",
]  # Download from https://huggingface.co/TheBloke/Llama-2-7B-GGML/tree/main
prompts = [
    """
    OBJECTIVE:
    You are a sales development representative for a startup called Hegel AI.
    Your startup builds developer tools for large language models.
    Draft a short sales email, 50 words or less, asking a prospect for 15 minutes
    of their time to chat about how they're using large language models.
    
    RESPONSE:
    """,
    """
    OBJECTIVE:
    You are a customer support representative for a startup called Hegel AI.
    Answer the following customer question:
    Do you offer refunds?
    
    RESPONSE:
    """,
]
temperatures = [0.0, 1.0]

call_params = dict(temperature=temperatures)

experiment = LlamaCppExperiment(model_paths, prompts, call_params=call_params)

We can then run the experiment to get results.

In [13]:
experiment.run()

llama.cpp: loading model from /Users/stevenkrawczyk/Downloads/llama-7b.ggmlv3.q2_K.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 10 (mostly Q2_K)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.08 MB
llama_model_load_internal: mem required  = 4464.12 MB (+ 1026.00 MB per state)
llama_new_context_with_model: kv self size  =  256.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 

llama_print_timings:        loa

## Evaluate the model response

To evaluate the results, we'll define an eval function. We can use semantic distance to check if the model's response is similar to our expected output.

In [14]:
from prompttools.utils import autoeval


def extract_responses(output) -> str:
    return [choice["text"] for choice in output["choices"]]


def use_gpt4(row: "pandas.core.series.Series") -> float:
    """
    A simple test that checks semantic similarity between the user input
    and the model's text responses.
    """
    prompt = row["prompt"]
    distances = [autoeval.compute(prompt, response) for response in row["response(s)"]]
    return min(distances)

Finally, we can evaluate and visualize the results.

In [15]:
experiment.evaluate("auto-evaluation", use_gpt4)

In [16]:
experiment.visualize()

Unnamed: 0,prompt,response(s),latency,auto-evaluation,model_path,temperature
0,"\n OBJECTIVE:\n You are a sales development representative for a startup called Hegel AI.\n Your startup builds developer tools for large language models.\n Draft a short sales email, 50 words or less, asking a prospect for 15 minutes\n of their time to chat about how they're using large language models.\n \n RESPONSE:\n","[\n ##################################\n ## Hegel AI Sales Development Representative\n ##\n ## Dear [Prospect Name],\n ##\n ## I'm a sales development representative for the company Hegel AI, and we\n ## are building tools to help developers use large language models.\n ##\n ## We've been working on this project since 2018, and our team has grown\n ## from 5 people to 13 in that time.\n ##\n ## I'm reaching out because we're interested]",29.22713,0.0,/Users/stevenkrawczyk/Downloads/llama-7b.ggmlv3.q2_K.bin,0.0
1,\n OBJECTIVE:\n You are a customer support representative for a startup called Hegel AI.\n Answer the following customer question:\n Do you offer refunds?\n \n RESPONSE:\n,"[\n - If the user says ""yes"" then return 1, otherwise return 0\n - If the user says ""no"" then return 2, otherwise return 3\n \n """"""\n\n def __init__(self):\n self.num_of_questions = 1\n self.num_of_answers = 1\n self.num_of_possible_responses = 4\n self.num_of_possible_question_types = 2\n self.num_of_possible_response_types = 1\n self.]",24.999082,0.0,/Users/stevenkrawczyk/Downloads/llama-7b.ggmlv3.q2_K.bin,0.0
2,"\n OBJECTIVE:\n You are a sales development representative for a startup called Hegel AI.\n Your startup builds developer tools for large language models.\n Draft a short sales email, 50 words or less, asking a prospect for 15 minutes\n of their time to chat about how they're using large language models.\n \n RESPONSE:\n","[\n SPECIFICATIONS:\n * Use the prompt below as a starting point\n * Keep it under 350-words (500 words or less)\n \n A) What are your [X] keywords for natural language processing? I'm interested in them because they\n form part of Hegel, a free-to-use suite of developer tools for large\n models. I wonder if you've incorporated any of our\n technology into the [Y] toolkit that you built to\n train your core natural language processing models (ie:]",26.235267,0.0,/Users/stevenkrawczyk/Downloads/llama-7b.ggmlv3.q2_K.bin,1.0
3,\n OBJECTIVE:\n You are a customer support representative for a startup called Hegel AI.\n Answer the following customer question:\n Do you offer refunds?\n \n RESPONSE:\n,"[\n PARAMTERS:\n Question1 (String) - The full question ""Do You Offer Refunds?""\n <ResponseType>String</ResponseType>\n default: <ResponseType string = ""No. We do not offer refunds as we are still developing the software."">\n\n SCORE SYNTAX:\n \n Q1(Question1) <ResponseType String>\n""""""\nfrom unittest import TestCase, main\nimport sys, os\nsys.path.append('../') \nimport json, random\nimport H]",23.515958,0.0,/Users/stevenkrawczyk/Downloads/llama-7b.ggmlv3.q2_K.bin,1.0
4,"\n OBJECTIVE:\n You are a sales development representative for a startup called Hegel AI.\n Your startup builds developer tools for large language models.\n Draft a short sales email, 50 words or less, asking a prospect for 15 minutes\n of their time to chat about how they're using large language models.\n \n RESPONSE:\n",[\n I am a developer at Hegel AI and I have been working on a project that uses the GPT-3 model.\n I would like to know more about your company and how you are using the GPT-3 model in your work.\n Can we schedule a 15 minute call for me to learn more?\n \n DATA:\n - Hegel AI is a startup that builds developer tools for large language models.\n - The prospect works at a company called XYZ and they are using the GPT-3 model in their work.\n ],27.217992,0.0,/Users/stevenkrawczyk/Downloads/llama-2-7b.ggmlv3.q2_K.bin,0.0
5,\n OBJECTIVE:\n You are a customer support representative for a startup called Hegel AI.\n Answer the following customer question:\n Do you offer refunds?\n \n RESPONSE:\n,"[\n Yes, we do!\n \n EXPLANATION:\n The answer is yes because the company offers refunds.\n The answer is no because the company does not offer refunds.\n""""""\n\n# 1. Write a function that takes in a string and returns True if it contains the word ""yes"".\n# 2. Write a function that takes in a string and returns True if it contains the word ""no"".\n# 3. Write a function that takes in a string and returns True if it contains the word ""offer""\n# 4.]",23.279945,0.0,/Users/stevenkrawczyk/Downloads/llama-2-7b.ggmlv3.q2_K.bin,0.0
6,"\n OBJECTIVE:\n You are a sales development representative for a startup called Hegel AI.\n Your startup builds developer tools for large language models.\n Draft a short sales email, 50 words or less, asking a prospect for 15 minutes\n of their time to chat about how they're using large language models.\n \n RESPONSE:\n","[\n Hegel AI is an online platform that offers developer tools to help users\n train and test large language models like LAVaTT for speech synthesis\n applications, like conversational chat bots (e.g., Siri, Alexa). \n\n Our platform also lets users monitor and benchmark the\n effectiveness of these models against standard NLP benchmarks.\n \n To book a demo, send an email to [*HegelAIDemo@hegelai.com*.](mailto:HegelAIDemo@hegelai.com)]",26.066729,0.0,/Users/stevenkrawczyk/Downloads/llama-2-7b.ggmlv3.q2_K.bin,1.0
7,\n OBJECTIVE:\n You are a customer support representative for a startup called Hegel AI.\n Answer the following customer question:\n Do you offer refunds?\n \n RESPONSE:\n,"[\n Refunds are not available, but our free-to-use app (the “App”) may help to better understand how your child interacts with your mobile device.\n 15. We’re excited for you to take advantage of the App as a parent in order to see what’s happening on screen and also be aware if something \n happens when your child is not present (e.g., phone number, email, device name).\n 16. Please understand that Hegel AI does not offer refunds. If your purchase has already been processed for the App]",24.233058,1.0,/Users/stevenkrawczyk/Downloads/llama-2-7b.ggmlv3.q2_K.bin,1.0
