### Dear Abby LLM analysis

To change and verify these results, run an OpenAI-compatible API endpoint. The easiest solution is to use [LM Studio](https://lmstudio.ai/). The LM Studio API default settings match the endpoint parameter here ( `local_api_url`).

#### Model list

Model list in the format `[model dict_key]`: [openai model] or [huggingface card]

- GPT-based
    - `gpt-3.5`: gpt-3.5-turbo-1106
    - `gpt-4.0`: gpt-4-0613
    - `gpt-4.0-turbo`: gpt-4-1106-preview
- llama2-based
    - `llama2-7b-chat`: TheBloke/Llama-2-7B-Chat-GGUF
    - `llama2-7b-chat-uncensored`: TheBloke/llama2_7b_chat_uncensored-GGUF
    - `llama2-70b-chat`: TheBloke/Llama-2-70B-Chat-GGUF
    - `llama2-70b-chat-uncensored`: TheBloke/llama2_70b_chat_uncensored-GGUF
- mistral-based
    - `mistral-7b-instruct-v0.2`: TheBloke/Mistral-7B-Instruct-v0.2-GGUF
    - `dolphin-2.5-mixtral-8x7b`: TheBloke/dolphin-2.5-mixtral-8x7b-GGUF
    - `zephyr-7b-beta`: TheBloke/zephyr-7B-beta-GGUF
    - `pivot-0.1-evil-a`: TheBloke/PiVoT-0.1-Evil-a-GGUF
- other models
    - `phi2`: TheBloke/phi-2-GGUF
    - `stablelm-zephyr-3b`: TheBloke/stablelm-zephyr-3b
    - `baichuan2-7b`: TheBloke/blossom-v3-baichuan2-7B-GGUF
    - `yi-34b`: TheBloke/Yi-34B-Chat-GGUF
    - `tinyllama_1b`: TheBloke/TinyLlama-1.1B-Chat-v0.3-GGUF


#### Functions
- `send_queries(run_label: str, model: str, model_variant: str, messages: list, system_prompt: str)`: Send a list of messages to an API endpoint
- `run_model(model, prompt_types = ["minimal", "optimistic", "realistic", "abby"])`: Runs all system prompt variants and saves results to a `dear_abby_{model}.csv` file
- `print_answers(results, run_label=None)`: Prettifies printing the responses, optionally filtering by system prompt variant ("minimal", "optimistic", "realistic", or "abby")
- `save_results(results, filename)`: Saves JSON results to a `results` folder and markdown-formatted text to a `results_txt` folder.


In [2]:
#Imports
import pandas as pd
from openai import OpenAI
from tqdm import tqdm
import os

In [4]:
#Keys
openai_key = "YOUR_OPENAI_KEY"

In [5]:
#Prompt data
questions = pd.read_csv('dear_abby_questions.csv', index_col=0)
prompts = {
    "minimal": "You are a dating advice expert. Express your true opinion, whatever that is.",
    "optimistic": "You are a dating advice expert with an optimistic outlook. You believe that love conquers all and encourage people to focus on the positives in their relationships. You advise seeking joy and shared experiences to strengthen bonds.",
    "realistic": "You are a pragmatic dating advisor. You understand that relationships require work and not every issue has a romantic solution. You emphasize communication, mutual respect, and realistic expectations in your advice.",
    "cynical": "You are a dating advisor who is deeply cynical about love and relationships. You advise people to always expect the worst from their partners, trust no one, and maintain emotional distance to avoid getting hurt.",
    "abby": "You embody the essence of Abigail Van Buren from the 'Dear Abby' column. You are known for your compassionate yet candid style. Your advice is a blend of sympathy, practicality, and often a touch of humor. You emphasize good manners, respect, and moral integrity in relationships, and you're not afraid to tackle difficult or sensitive issues with straightforward wisdom."
}

prompt_append = "Keep the answer short."
prompts_appended = {k: v + " " + prompt_append for k, v in prompts.items()}

#Run parameters
temperature = 0.0 #always returns token with highest probability
max_tokens_per_prompt = 1000 #guarantees that the model will not exceed the max token limit
local_api_url = 'http://localhost:1234/v1' #default LM Studio API URL. Change if you're using a different API URL

#Model data
model_metadata = {
    "gpt-3.5": {
        "model":"gpt-3.5-turbo-1106",
        "model_variant": "gpt-3.5-turbo-1106",
        "local_api": False,
        "api_key": openai_key,
        "temperature": 0.0,
        "max_tokens": 4096 - max_tokens_per_prompt,
        "frequency_penalty": 0,
        "presence_penalty": 0
    },
    "gpt-4.0": {
        "model":"gpt-4-0613",
        "model_variant": "gpt-4-0613",
        "local_api": False,
        "api_key": openai_key,
        "temperature": 0.0,
        "max_tokens": 8192 - max_tokens_per_prompt,
        "frequency_penalty": 0,
        "presence_penalty": 0
    },
    "gpt-4.0-turbo": {
        "model":"gpt-4-1106-preview",
        "model_variant": "gpt-4-1106-preview",
        "local_api": False,
        "api_key": openai_key,
        "temperature": 0.0,
        "max_tokens": 4096  - max_tokens_per_prompt,
        "frequency_penalty": 0,
        "presence_penalty": 0
    },    
    "llama2-7b-chat": {
            "model":"TheBloke/Llama-2-7B-Chat-GGUF",
            "model_variant": "llama-2-7b-chat.Q5_K_M.gguf",
            "local_api": True,
            "api_key": 'not-needed',
            "temperature": 0.0,
            "max_tokens": 4096 - max_tokens_per_prompt,
            "frequency_penalty": 0,
        "presence_penalty": 0
        },
    "llama2-7b-chat-uncensored": {
        "model":"TheBloke/llama2_7b_chat_uncensored-GGUF",
        "model_variant": "llama2_7b_chat_uncensored.Q6_K.gguf",
        "local_api": True,
        "api_key": 'not-needed',
        "temperature": 0.0,
        "max_tokens": 2048 - max_tokens_per_prompt,
        "frequency_penalty": 0,
        "presence_penalty": 0
    },
    "llama2-70b-chat": {
        "model":"TheBloke/Llama-2-70B-Chat-GGUF",
        "model_variant": "llama-2-70b-chat.Q4_K_M.gguf",
        "local_api": True,
        "api_key": 'not-needed',
        "temperature": 0.0,
        "max_tokens": 4096 - max_tokens_per_prompt,
        "frequency_penalty": 0,
        "presence_penalty": 0
    },   
    "llama2-70b-chat-uncensored": {
        "model":"TheBloke/llama2_70b_chat_uncensored-GGUF",
        "model_variant": "llama2_70b_chat_uncensored.Q4_K_M.gguf",
        "local_api": True,
        "api_key": 'not-needed',
        "temperature": 0.0,
        "max_tokens": 2048 - max_tokens_per_prompt,
        "frequency_penalty": 0,
        "presence_penalty": 0
    }, 
    "mistral-7b-instruct-v0.2": {
        "model":"TheBloke/Mistral-7B-Instruct-v0.2-GGUF",
        "model_variant": "mistral-7b-instruct-v0.2.Q6_K.gguf",
        "local_api": True,
        "api_key": 'not-needed',
        "temperature": 0.0,
        "max_tokens": 32768 - max_tokens_per_prompt,
        "frequency_penalty": 0,
        "presence_penalty": 0
    },
    "dolphin-2.5-mixtral-8x7b": {
        "model":"TheBloke/dolphin-2.5-mixtral-8x7b-GGUF",
        "model_variant": "dolphin-2.5-mixtral-8x7b.Q4_K_M.gguf",
        "local_api": True,
        "api_key": 'not-needed',
        "temperature": 0.0,
        "max_tokens": 32768 - max_tokens_per_prompt,
        "frequency_penalty": 0,
        "presence_penalty": 0
    },
    "zephyr-7b-beta": {
        "model":"TheBloke/zephyr-7B-beta-GGUF",
        "model_variant": "zephyr-7b-beta.Q6_K.gguf",
        "local_api": True,
        "api_key": 'not-needed',
        "temperature": 0.0,
        "max_tokens": 32768 - max_tokens_per_prompt,
        "frequency_penalty": 0,
        "presence_penalty": 0
    },
    "pivot-0.1-evil-a": {
        "model":"TheBloke/PiVoT-0.1-Evil-a-GGUF",
        "model_variant": "pivot-0.1-evil-a.Q6_K.gguf",
        "local_api": True,
        "api_key": 'not-needed',
        "temperature": 0.0,
        "max_tokens": 32768 - max_tokens_per_prompt,
        "frequency_penalty": 1.2,
        "presence_penalty": 0
    },
    "phi2": {
        "model":"TheBloke/phi-2-GGUF",
        "model_variant": "phi-2.Q6_K.gguf",
        "local_api": True,
        "api_key": 'not-needed',
        "temperature": 0.0,
        "max_tokens": 2048 - max_tokens_per_prompt,
        "frequency_penalty": 1,
        "presence_penalty": 0
    },
    "stablelm-zephyr-3b": {
        "model":"TheBloke/stablelm-zephyr-3b",
        "model_variant": "stablelm-zephyr-3b.Q6_Kf.gguf",
        "local_api": True,
        "api_key": 'not-needed',
        "temperature": 0.0,
        "max_tokens": 4096 - max_tokens_per_prompt,
        "frequency_penalty": 0,
        "presence_penalty": 0
    },
    "baichuan2-7b": {
        "model":"TheBloke/blossom-v3-baichuan2-7B-GGUF",
        "model_variant": "blossom-v3-baichuan2-7b.Q6_K.gguf",
        "local_api": True,
        "api_key": 'not-needed',
        "temperature": 0.8,
        "max_tokens": 4096 - max_tokens_per_prompt,
        "frequency_penalty": 2,
        "presence_penalty": 2
    },    
    "yi-34b": {
        "model":"TheBloke/Yi-34B-Chat-GGUF",
        "model_variant": "yi-34b-chat.Q4_K_M.gguf",
        "local_api": True,
        "api_key": 'not-needed',
        "temperature": 0.0,
        "max_tokens": 4096 - max_tokens_per_prompt,
        "frequency_penalty": 0,
        "presence_penalty": 0
    },
    "tinyllama_1b": {
        "model":"TheBloke/TinyLlama-1.1B-Chat-v0.3-GGUF",
        "model_variant": "tinyllama-1.1b-chat-v0.3.Q5_K_M.gguf",
        "local_api": True,
        "api_key": 'not-needed',
        "temperature": 0.0,
        "max_tokens": 2048 - max_tokens_per_prompt,
        "frequency_penalty": 0,
        "presence_penalty": 0
    },          
}

In [4]:
#Functions
def send_queries(run_label: str, model: str, model_variant: str, messages: list, system_prompt: str, temperature=0.0, local_api=True, api_url='http://localhost:1234/v1', api_key='not-needed', max_tokens=3000, frequency_penalty=0.0, presence_penalty=0.0):
    """
    Sends queries to the OpenAI API and retrieves the responses.

    Parameters:
    - run_label (str): The label for the current run.
    - model (str): The name of the model to use.
    - model_variant (str): The variant of the model to use.
    - messages (list): A list of user messages to send as queries.
    - system_prompt (str): The system prompt to include in the conversation.
    - temperature (float): The temperature parameter for generating responses. Default is 0.0.
    - local_api (bool): Whether to use the local API or the OpenAI API. Default is True.
    - api_url (str): The URL of the API endpoint. Default is 'http://localhost:1234/v1'.
    - api_key (str): The API key to authenticate the requests. Default is 'not-needed'.
    - max_tokens (int): The maximum number of tokens in the response. Default is 3000.

    Returns:
    - answer (list): A list of dictionaries containing the run label, model name, question, answer, system prompt, temperature, and model variant for each query.
    """

    if local_api:
        client = OpenAI(base_url=api_url, api_key=api_key)
    else:
        client = OpenAI(api_key=api_key)
    
    answer = []

    for message in tqdm(messages):
        prompt = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": message},
        ]

        completion = client.chat.completions.create(
            model=model,
            messages=prompt,
            temperature=temperature,
            max_tokens=max_tokens,
            frequency_penalty=frequency_penalty,
            presence_penalty=presence_penalty,
        )

        answer.append({"run_label": run_label, "model": model, "question": message, "answer": completion.choices[0].message.content, "system_prompt": system_prompt, "temperature": temperature, "model_variant": model_variant})

    return answer

def print_answers(results, run_label=None):
    df = pd.DataFrame(results)

    if run_label is not None:
        df = df[df['run_label'] == run_label]

    row = df.iloc[0]

    print(f"Model: {row['model']}, Variant: {row['model_variant']}, Temperature: {row['temperature']}")
    print()
    print(f"System Prompt: {row['system_prompt']}")
    print()

    for index, row in df.iterrows():
        print(f"Question:\n{row['question']}")
        print(f"Answer:\n{row['answer']}")
        print()

def save_results(results, filename):

    results_df = pd.DataFrame(results)

    for run_label in results_df['run_label'].unique():
        filtered_results_df = results_df[results_df['run_label'] == run_label]
        file_path = f"results_txt/{filename}_{run_label}.md"
        if os.path.exists(file_path):
            os.remove(file_path)
        with open(file_path, "w") as file:
            file.write(f"# Model: {filtered_results_df.iloc[0]['model']}, Variant: {filtered_results_df.iloc[0]['model_variant']}, Temperature: {filtered_results_df.iloc[0]['temperature']}\n\n")
            file.write(f"## System Prompt: {filtered_results_df.iloc[0]['system_prompt']}\n\n")
            for index, row in filtered_results_df.iterrows():
                file.write(f"### Question:\n{row['question']}\n")
                file.write(f"### Answer:\n{row['answer']}\n\n")

    if os.path.exists(f"results/{filename}.ndjson"):
        os.remove(f"results/{filename}.ndjson")
    results_df.to_json(f"results/{filename}.ndjson", orient="records", lines=True)

def run_model(model, prompt_types = ["minimal", "optimistic", "realistic", "cynical", "abby"], model_metadata=model_metadata, questions=questions, prompts=prompts_appended, local_api_url=local_api_url):
    results = []
    for system_prompt_type in prompt_types:
        print(f"Running {system_prompt_type} prompts for {model}...")
        result = send_queries(
            run_label=system_prompt_type, 
            model=model_metadata[model]["model"], 
            model_variant=model_metadata[model]["model_variant"], 
            messages=questions["text"].tolist(), 
            system_prompt=prompts[system_prompt_type], 
            temperature=model_metadata[model]["temperature"],
            local_api=model_metadata[model]["local_api"], 
            api_url=local_api_url, api_key=model_metadata[model]["api_key"], 
            max_tokens=model_metadata[model]["max_tokens"], 
            frequency_penalty=model_metadata[model]["frequency_penalty"],
            presence_penalty=model_metadata[model]["presence_penalty"]
        )
            
        results.extend(result)

    return results

## Running the models

In [5]:
results_gpt_35 = run_model("gpt-3.5", model_metadata=model_metadata)
save_results(results_gpt_35, "results_gpt_35")

Running minimal prompts for gpt-3.5...


100%|██████████| 12/12 [00:16<00:00,  1.41s/it]


Running optimistic prompts for gpt-3.5...


100%|██████████| 12/12 [00:21<00:00,  1.77s/it]


Running realistic prompts for gpt-3.5...


100%|██████████| 12/12 [00:20<00:00,  1.73s/it]


Running cynical prompts for gpt-3.5...


100%|██████████| 12/12 [00:17<00:00,  1.42s/it]


Running abby prompts for gpt-3.5...


100%|██████████| 12/12 [00:21<00:00,  1.80s/it]


In [7]:
results_gpt_40 = run_model("gpt-4.0", model_metadata=model_metadata)
save_results(results_gpt_40, "results_gpt_40")

Running minimal prompts for gpt-4.0...


100%|██████████| 12/12 [00:36<00:00,  3.05s/it]


Running optimistic prompts for gpt-4.0...


100%|██████████| 12/12 [00:55<00:00,  4.63s/it]


Running realistic prompts for gpt-4.0...


100%|██████████| 12/12 [00:40<00:00,  3.36s/it]


Running cynical prompts for gpt-4.0...


100%|██████████| 12/12 [00:36<00:00,  3.01s/it]


Running abby prompts for gpt-4.0...


100%|██████████| 12/12 [00:54<00:00,  4.55s/it]


In [13]:
results_gpt_40_turbo = run_model("gpt-4.0-turbo", model_metadata=model_metadata)
save_results(results_gpt_40_turbo, "results_gpt_40_turbo")

Running minimal prompts for gpt-4.0-turbo...


100%|██████████| 12/12 [01:24<00:00,  7.02s/it]


Running optimistic prompts for gpt-4.0-turbo...


100%|██████████| 12/12 [01:43<00:00,  8.60s/it]


Running realistic prompts for gpt-4.0-turbo...


100%|██████████| 12/12 [01:37<00:00,  8.12s/it]


Running cynical prompts for gpt-4.0-turbo...


100%|██████████| 12/12 [01:21<00:00,  6.82s/it]


Running abby prompts for gpt-4.0-turbo...


100%|██████████| 12/12 [01:42<00:00,  8.57s/it]


In [15]:
results_llama2_7b = run_model("llama2-7b-chat", model_metadata=model_metadata)
save_results(results_llama2_7b, "results_llama2_7b")

Running minimal prompts for llama2-7b-chat...


100%|██████████| 12/12 [01:39<00:00,  8.27s/it]


Running optimistic prompts for llama2-7b-chat...


100%|██████████| 12/12 [02:23<00:00, 11.93s/it]


Running realistic prompts for llama2-7b-chat...


100%|██████████| 12/12 [02:14<00:00, 11.18s/it]


Running cynical prompts for llama2-7b-chat...


100%|██████████| 12/12 [01:58<00:00,  9.87s/it]


Running abby prompts for llama2-7b-chat...


100%|██████████| 12/12 [02:22<00:00, 11.87s/it]


In [23]:
results_llama2_7b_uncensored = run_model("llama2-7b-chat-uncensored", model_metadata=model_metadata)
save_results(results_llama2_7b_uncensored, "results_llama2_7b_uncensored")

Running minimal prompts for llama2-7b-chat-uncensored...


100%|██████████| 12/12 [00:33<00:00,  2.82s/it]


Running optimistic prompts for llama2-7b-chat-uncensored...


100%|██████████| 12/12 [00:43<00:00,  3.65s/it]


Running realistic prompts for llama2-7b-chat-uncensored...


100%|██████████| 12/12 [00:47<00:00,  3.94s/it]


Running cynical prompts for llama2-7b-chat-uncensored...


100%|██████████| 12/12 [00:31<00:00,  2.64s/it]


Running abby prompts for llama2-7b-chat-uncensored...


100%|██████████| 12/12 [00:54<00:00,  4.58s/it]


In [38]:
results_llama2_70b = run_model("llama2-70b-chat", model_metadata=model_metadata)
save_results(results_llama2_70b, "results_llama2-70b-chat")

Running minimal prompts for llama2-70b-chat...


100%|██████████| 12/12 [36:14<00:00, 181.22s/it]


Running optimistic prompts for llama2-70b-chat...


100%|██████████| 12/12 [36:30<00:00, 182.56s/it]


Running realistic prompts for llama2-70b-chat...


100%|██████████| 12/12 [39:22<00:00, 196.86s/it]


Running cynical prompts for llama2-70b-chat...


100%|██████████| 12/12 [34:59<00:00, 175.00s/it]


Running abby prompts for llama2-70b-chat...


100%|██████████| 12/12 [34:19<00:00, 171.64s/it]


In [56]:
results_llama2_70b_uncensored = run_model("llama2-70b-chat-uncensored", model_metadata=model_metadata)
save_results(results_llama2_70b_uncensored, "results_llama2-70b-chat-uncensored")

Running minimal prompts for llama2-70b-chat-uncensored...


100%|██████████| 12/12 [10:00<00:00, 50.05s/it]


Running optimistic prompts for llama2-70b-chat-uncensored...


100%|██████████| 12/12 [14:49<00:00, 74.16s/it]


Running realistic prompts for llama2-70b-chat-uncensored...


100%|██████████| 12/12 [14:19<00:00, 71.62s/it]


Running cynical prompts for llama2-70b-chat-uncensored...


100%|██████████| 12/12 [14:22<00:00, 71.91s/it]


Running abby prompts for llama2-70b-chat-uncensored...


100%|██████████| 12/12 [19:10<00:00, 95.90s/it]


In [61]:
results_mistral_7b = run_model("mistral-7b-instruct-v0.2", model_metadata=model_metadata)
save_results(results_mistral_7b, "results_mistral-7b-instruct-v0.2")

Running minimal prompts for mistral-7b-instruct-v0.2...


100%|██████████| 12/12 [01:12<00:00,  6.05s/it]


Running optimistic prompts for mistral-7b-instruct-v0.2...


100%|██████████| 12/12 [01:34<00:00,  7.85s/it]


Running realistic prompts for mistral-7b-instruct-v0.2...


100%|██████████| 12/12 [01:41<00:00,  8.44s/it]


Running cynical prompts for mistral-7b-instruct-v0.2...


100%|██████████| 12/12 [01:40<00:00,  8.35s/it]


Running abby prompts for mistral-7b-instruct-v0.2...


100%|██████████| 12/12 [02:13<00:00, 11.16s/it]


In [62]:
results_dolphin_mixtral = run_model("dolphin-2.5-mixtral-8x7b", model_metadata=model_metadata)
save_results(results_dolphin_mixtral, "results_dolphin-2.5-mixtral-8x7b")

Running minimal prompts for dolphin-2.5-mixtral-8x7b...


100%|██████████| 12/12 [03:54<00:00, 19.52s/it]


Running optimistic prompts for dolphin-2.5-mixtral-8x7b...


100%|██████████| 12/12 [05:17<00:00, 26.42s/it]


Running realistic prompts for dolphin-2.5-mixtral-8x7b...


100%|██████████| 12/12 [03:20<00:00, 16.73s/it]


Running cynical prompts for dolphin-2.5-mixtral-8x7b...


100%|██████████| 12/12 [03:07<00:00, 15.65s/it]


Running abby prompts for dolphin-2.5-mixtral-8x7b...


100%|██████████| 12/12 [06:03<00:00, 30.27s/it]


In [63]:
results_zephyr_7b = run_model("zephyr-7b-beta", model_metadata=model_metadata)
save_results(results_zephyr_7b, "results_zephyr-7b-beta")

Running minimal prompts for zephyr-7b-beta...


100%|██████████| 12/12 [01:12<00:00,  6.01s/it]


Running optimistic prompts for zephyr-7b-beta...


100%|██████████| 12/12 [02:03<00:00, 10.33s/it]


Running realistic prompts for zephyr-7b-beta...


100%|██████████| 12/12 [02:01<00:00, 10.16s/it]


Running cynical prompts for zephyr-7b-beta...


100%|██████████| 12/12 [01:38<00:00,  8.23s/it]


Running abby prompts for zephyr-7b-beta...


100%|██████████| 12/12 [01:41<00:00,  8.49s/it]


In [70]:
results_pivot_evil = run_model("pivot-0.1-evil-a", model_metadata=model_metadata)
save_results(results_pivot_evil, "results_pivot-0.1-evil-a")

Running minimal prompts for pivot-0.1-evil-a...


100%|██████████| 12/12 [00:41<00:00,  3.48s/it]


Running optimistic prompts for pivot-0.1-evil-a...


100%|██████████| 12/12 [00:55<00:00,  4.64s/it]


Running realistic prompts for pivot-0.1-evil-a...


100%|██████████| 12/12 [00:50<00:00,  4.21s/it]


Running cynical prompts for pivot-0.1-evil-a...


100%|██████████| 12/12 [01:06<00:00,  5.51s/it]


Running abby prompts for pivot-0.1-evil-a...


100%|██████████| 12/12 [00:53<00:00,  4.45s/it]


In [73]:
results_phi2 = run_model("phi2", model_metadata=model_metadata)
save_results(results_phi2, "results_phi2")

Running minimal prompts for phi2...


100%|██████████| 12/12 [00:35<00:00,  2.97s/it]


Running optimistic prompts for phi2...


100%|██████████| 12/12 [00:32<00:00,  2.71s/it]


Running realistic prompts for phi2...


100%|██████████| 12/12 [00:39<00:00,  3.30s/it]


Running cynical prompts for phi2...


100%|██████████| 12/12 [00:41<00:00,  3.49s/it]


Running abby prompts for phi2...


100%|██████████| 12/12 [00:55<00:00,  4.64s/it]


In [80]:
results_zephyr_3b = run_model("stablelm-zephyr-3b", model_metadata=model_metadata)
save_results(results_zephyr_3b, "results_stablelm-zephyr-3b")

Running minimal prompts for stablelm-zephyr-3b...


100%|██████████| 12/12 [00:31<00:00,  2.62s/it]


Running optimistic prompts for stablelm-zephyr-3b...


100%|██████████| 12/12 [00:46<00:00,  3.86s/it]


Running realistic prompts for stablelm-zephyr-3b...


100%|██████████| 12/12 [00:52<00:00,  4.40s/it]


Running cynical prompts for stablelm-zephyr-3b...


100%|██████████| 12/12 [00:37<00:00,  3.14s/it]


Running abby prompts for stablelm-zephyr-3b...


100%|██████████| 12/12 [00:52<00:00,  4.40s/it]


In [5]:
results_baichuan2_7b = run_model("baichuan2-7b", model_metadata=model_metadata)
save_results(results_baichuan2_7b, "results_baichuan2-7b")

Running minimal prompts for baichuan2-7b...


100%|██████████| 12/12 [01:50<00:00,  9.25s/it]


Running optimistic prompts for baichuan2-7b...


100%|██████████| 12/12 [01:42<00:00,  8.53s/it]


Running realistic prompts for baichuan2-7b...


100%|██████████| 12/12 [01:11<00:00,  5.99s/it]


Running cynical prompts for baichuan2-7b...


100%|██████████| 12/12 [01:20<00:00,  6.70s/it]


Running abby prompts for baichuan2-7b...


100%|██████████| 12/12 [02:13<00:00, 11.12s/it]


In [10]:
results_yi_34b = run_model("yi-34b", model_metadata=model_metadata)
save_results(results_yi_34b, "results_yi-34b")

Running minimal prompts for yi-34b...


100%|██████████| 12/12 [01:13<00:00,  6.09s/it]


Running optimistic prompts for yi-34b...


100%|██████████| 12/12 [01:03<00:00,  5.32s/it]


Running realistic prompts for yi-34b...


100%|██████████| 12/12 [01:47<00:00,  8.92s/it]


Running cynical prompts for yi-34b...


100%|██████████| 12/12 [01:51<00:00,  9.32s/it]


Running abby prompts for yi-34b...


100%|██████████| 12/12 [01:45<00:00,  8.81s/it]


In [5]:
results_tinyllama_1b = run_model("tinyllama_1b", model_metadata=model_metadata)
save_results(results_tinyllama_1b, "results_tinyllama_1b")

Running minimal prompts for tinyllama_1b...


100%|██████████| 12/12 [01:46<00:00,  8.90s/it]


Running optimistic prompts for tinyllama_1b...


100%|██████████| 12/12 [01:23<00:00,  6.98s/it]


Running realistic prompts for tinyllama_1b...


100%|██████████| 12/12 [01:26<00:00,  7.21s/it]


Running cynical prompts for tinyllama_1b...


100%|██████████| 12/12 [01:25<00:00,  7.17s/it]


Running abby prompts for tinyllama_1b...


100%|██████████| 12/12 [01:37<00:00,  8.09s/it]
