## What? 🦙

This notebook attempt to extend Jeremy's [notebook](https://www.kaggle.com/code/jhoward/getting-started-with-llms) (for the LLM Science Exam competition on Kaggle) to leverage an open model -- [Llama 2](https://huggingface.co/blog/llama2).

## Setup

In [None]:
!nvidia-smi

Sun Aug 13 08:42:01 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A100-SXM...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   33C    P0    43W / 400W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
!pip install transformers accelerate kaggle -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m54.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.2/244.2 kB[0m [31m31.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m32.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m114.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m75.3 MB/s[0m eta [36m0:00:00[0m
[?25h

## Authenticate Kaggle to download data

* You first have to accept the rules of the competition [here](https://www.kaggle.com/competitions/kaggle-llm-science-exam/rules).
* Then upload your `kaggle.json` file to Colab. If you don't have one, create one [here](https://www.kaggle.com/settings/account).

In [None]:
from google.colab import files
_ = files.upload()

Saving kaggle.json to kaggle.json


In [None]:
!mkdir ~/.kaggle
!mv kaggle.json ~/.kaggle
!chmod 600 ~/.kaggle/kaggle.json

In [None]:
!kaggle competitions download kaggle-llm-science-exam

Downloading kaggle-llm-science-exam.zip to /content
  0% 0.00/72.5k [00:00<?, ?B/s]100% 72.5k/72.5k [00:00<00:00, 349kB/s]
100% 72.5k/72.5k [00:00<00:00, 349kB/s]


In [None]:
!unzip -q kaggle-llm-science-exam.zip

In [None]:
import pandas as pd

train_df = pd.read_csv("train.csv")
train_df.head()

Unnamed: 0,id,prompt,A,B,C,D,E,answer
0,0,Which of the following statements accurately d...,MOND is a theory that reduces the observed mis...,MOND is a theory that increases the discrepanc...,MOND is a theory that explains the missing bar...,MOND is a theory that reduces the discrepancy ...,MOND is a theory that eliminates the observed ...,D
1,1,Which of the following is an accurate definiti...,Dynamic scaling refers to the evolution of sel...,Dynamic scaling refers to the non-evolution of...,Dynamic scaling refers to the evolution of sel...,Dynamic scaling refers to the non-evolution of...,Dynamic scaling refers to the evolution of sel...,A
2,2,Which of the following statements accurately d...,The triskeles symbol was reconstructed as a fe...,The triskeles symbol is a representation of th...,The triskeles symbol is a representation of a ...,The triskeles symbol represents three interloc...,The triskeles symbol is a representation of th...,A
3,3,What is the significance of regularization in ...,Regularizing the mass-energy of an electron wi...,Regularizing the mass-energy of an electron wi...,Regularizing the mass-energy of an electron wi...,Regularizing the mass-energy of an electron wi...,Regularizing the mass-energy of an electron wi...,C
4,4,Which of the following statements accurately d...,The angular spacing of features in the diffrac...,The angular spacing of features in the diffrac...,The angular spacing of features in the diffrac...,The angular spacing of features in the diffrac...,The angular spacing of features in the diffrac...,D


## Initialize tokenizer and model

To be able to access the Llama 2 models, you need to sign up. Find more details [here](https://huggingface.co/meta-llama).

In [None]:
!huggingface-cli login

We will use [`meta-llama/Llama-2-7b-chat-hf`](https://hf.cometa-llama/Llama-2-7b-chat-hf). You can explore other Llama 2 models here: https://hf.co/meta-llama.

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype=torch.float16)

Downloading (…)okenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

Downloading (…)fetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

## Utility to tokenize and perform inference

In [None]:
from transformers import TextIteratorStreamer
from threading import Thread

def tokenize_and_predict(text, model, tokenizer, generation_kwargs):
    tokenized_text = tokenizer(
        text, add_special_tokens=False, return_tensors="pt"
    ).to("cuda")
    streamer = TextIteratorStreamer(
        tokenizer,
        timeout=10.,
        skip_prompt=True,
        skip_special_tokens=True
    )
    generation_kwargs.update({**tokenized_text, "streamer": streamer})

    with torch.no_grad():
        t = Thread(target=model.generate, kwargs=generation_kwargs)
        t.start()

    outputs = []
    for text in streamer:
        outputs.append(text)

    return "".join(outputs)

Rest of the code is very much adapted or sometimes even copied verbatim from Jeremy's notebook: https://www.kaggle.com/code/jhoward/getting-started-with-llms.

In [None]:
def prompt1(r):
    return f"""Question: {r.prompt}
A: {r.A}
B: {r.B}
C: {r.C}
D: {r.D}
E: {r.E}
Answer: """

In [None]:
generation_kwargs = dict(
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    max_length=2000,
)

idx = 2
print(prompt1(train_df.loc[idx]))
print(tokenize_and_predict(
    prompt1(train_df.loc[idx]), model, tokenizer, generation_kwargs)
)

Question: Which of the following statements accurately describes the origin and significance of the triskeles symbol?
A: The triskeles symbol was reconstructed as a feminine divine triad by the rulers of Syracuse, and later adopted as an emblem. Its usage may also be related to the Greek name of Sicily, Trinacria, which means "having three headlands." The head of Medusa at the center of the Sicilian triskeles represents the three headlands.
B: The triskeles symbol is a representation of three interlinked spirals, which was adopted as an emblem by the rulers of Syracuse. Its usage in modern flags of Sicily has its origins in the ancient Greek name for the island, Trinacria, which means "Sicily with three corners." The head of Medusa at the center is a representation of the island's rich cultural heritage.
C: The triskeles symbol is a representation of a triple goddess, reconstructed by the rulers of Syracuse, who adopted it as an emblem. Its significance lies in the fact that it represe

## Template prompt and prepare

Format from: https://huggingface.co/blog/llama2#how-to-prompt-llama-2.

In [None]:
def get_prompt(user_input: str, system_prompt: str) -> str:
    texts = [f'<s>[INST] <<SYS>>\n{system_prompt}\n<</SYS>>\n\n']
    texts.append(f'{user_input} [/INST] ')

    return ''.join(texts)

In [None]:
SYSTEM_PROMPT = """I will ask a multiple choice question, with 5 answers A-E.
First, output 'Options: ' followed by going through each of the 5 options, explaining why it is or isn't a good description.
Then, output 'Summary: ' followed by a description of which you think is most accurate, and why.
Finally, output 'Answers: ' followed by the 5 answers A-E sorted from best answer to worst. E.g 'Answers: B C E A D'.
Reminder: it's VERY IMPORTANT the final line of your response is text text 'Answers: ' followed by the sorted list of answers A-E.
"""

In [None]:
def prepare_qa_prompt(r):
    return f"""
Question: {r.prompt}
A: {r.A}
B: {r.B}
C: {r.C}
D: {r.D}
E: {r.E}
"""

In [None]:
generation_kwargs = dict(
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    max_length=2000,
)

In [None]:
def generate(row, model, tokenizer, generation_kwargs, verbose=False):
    qa_prompt = prepare_qa_prompt(row)
    final_inputs = get_prompt(user_input=qa_prompt, system_prompt=SYSTEM_PROMPT)
    if verbose:
        print(final_inputs)
    final_response = tokenize_and_predict(final_inputs, model, tokenizer, generation_kwargs)
    return final_response

In [None]:
print(generate(train_df.loc[52], model, tokenizer, generation_kwargs, verbose=True))

<s>[INST] <<SYS>>
I will ask a multiple choice question, with 5 answers A-E.
First, output 'Options: ' followed by going through each of the 5 options, explaining why it is or isn't a good description.
Then, output 'Summary: ' followed by a description of which you think is most accurate, and why.
Finally, output 'Answers: ' followed by the 5 answers A-E sorted from best answer to worst. E.g 'Answers: B C E A D'.
Reminder: it's VERY IMPORTANT the final line of your response is text text 'Answers: ' followed by the sorted list of answers A-E.

<</SYS>>


Question: What is resistivity?
A: Resistivity is an extrinsic property of a material that describes how difficult it is to make electrical current flow through it. It is measured in ohms and is dependent on the material's shape and size.
B: Resistivity is a measure of the resistance of a material to electrical current flow. It is measured in ohm-meters and is dependent on the material's shape and size.
C: Resistivity is an intrinsic pro

In [None]:
generation_kwargs = dict(
    do_sample=True,
    top_p=0.95,
    top_k=10,
    num_return_sequences=1,
    temperature=0.8,
    num_beams=1,
    max_length=2000,
)

print(generate(train_df.loc[52], model, tokenizer, generation_kwargs, verbose=True))

<s>[INST] <<SYS>>
I will ask a multiple choice question, with 5 answers A-E.
First, output 'Options: ' followed by going through each of the 5 options, explaining why it is or isn't a good description.
Then, output 'Summary: ' followed by a description of which you think is most accurate, and why.
Finally, output 'Answers: ' followed by the 5 answers A-E sorted from best answer to worst. E.g 'Answers: B C E A D'.
Reminder: it's VERY IMPORTANT the final line of your response is text text 'Answers: ' followed by the sorted list of answers A-E.

<</SYS>>


Question: What is resistivity?
A: Resistivity is an extrinsic property of a material that describes how difficult it is to make electrical current flow through it. It is measured in ohms and is dependent on the material's shape and size.
B: Resistivity is a measure of the resistance of a material to electrical current flow. It is measured in ohm-meters and is dependent on the material's shape and size.
C: Resistivity is an intrinsic pro

## Bulk generate responses

In [None]:
from tqdm import tqdm

responses = []
for i in tqdm(range(len(train_df))):
    response = generate(train_df.loc[i], model, tokenizer, generation_kwargs)
    responses.append(response)

100%|██████████| 200/200 [47:09<00:00, 14.15s/it]


## Scoring

In [None]:
df = pd.DataFrame(responses, columns=["response"])
df.to_csv("responses.csv", index=False)

In [None]:
df = pd.read_csv("responses.csv")
resps = list(df.response)

In [None]:
import re

def get_ans(o):
    m = re.search(r'Answers:\s*([A-E])\s+([A-E])\s+([A-E])\s+([A-E])\s+([A-E])', o)
    if m: return m.groups()
    m = re.search(r'Answers:\s*([A-E])', o)
    return (m[1],m[1],m[1],m[1],m[1]) if m else "ABCDE"

anss = [get_ans(o) for o in resps]

In [None]:
import numpy as np

def score(act, pred):
    return 1 if act==pred[0] else 2/3 if act==pred[1] else 1/3 if act==pred[2] else 0

np.mean([score(a, b) for a, b in zip(train_df.answer, anss)])

0.3783333333333333

## Notes

* Play with the generation parameters. Here's a concise [guide](https://huggingface.co/docs/transformers/generation_strategies).
* The prompts we used here were copied verbatim from Jeremy's notebook. Maybe Llama 2 is better with different prompts. So, it might be worth trying out this approach with different prompts.
* If you have GPU firepower, then try out:
  * Fine-tuning (guide [here](https://huggingface.co/blog/llama2))
  * A larger Llama 2 model. You can access all the Llama 2 models here: https://huggingface.co/meta-llama.