# Introduction to Fine-Tuning with LLaMA.cpp

This notebook provides a framework for fine-tuning LLaMA models using **LLaMA.cpp**, a lightweight and efficient implementation of Meta's LLaMA models. LLaMA.cpp is designed to run LLaMA models on modest hardware by leveraging optimizations that reduce memory usage and enhance performance. This makes it an excellent option for scenarios with limited computational resources.

LLaMA models, including LLaMA 2, offer parameter sizes ranging from 7B to 70B. However, due to memory and processing requirements, larger models may need more powerful hardware. By using LLaMA.cpp, we aim to make it feasible to fine-tune and deploy these models effectively even on devices with limited resources.

# Import and dependencies

In [None]:
from langchain.llms import LlamaCpp
from langchain import PromptTemplate, LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.schema import HumanMessage, SystemMessage, AIMessage
from langchain.memory import ConversationBufferWindowMemory
from langchain.agents import load_tools, initialize_agent
import langchain
langchain.debug = False

import pandas as pd
from tqdm.notebook import tqdm

# from google.colab import drive
# drive.mount('/content/drive/')

In [None]:
# Callbacks support token-wise streaming
# Verbose is required to pass to the callback manager
# callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

# Instanciating llama model with parameters

In [None]:
# LLama pipeline model
llm = LlamaCpp(
    model_path = "files/llama-2-7b-chat.ggmlv3.q4_0.bin",
    temperature = 0.1,
    max_tokens = 2000,
    top_p = 1,
    # callback_manager = callback_manager,
    verbose = True,
    n_gpu_layers=35,
    n_batch = 512,
    n_ctx=4096
)

# Setting the agent, and the conversation window

In [None]:
# Memory buffer set for 2 messages
memory = ConversationBufferWindowMemory(memory_key = 'chat_history', k = 2, return_messages = True, output_key = "output")

# Agent configuration
tools = load_tools(['llm-math'], llm)

# Agent initialisation
agent = initialize_agent(
    agent = "chat-conversational-react-description",
    tools = tools,
    llm = llm,
    verbose = True,
    early_stopping_method = 'generate',
    memory = memory,
    handle_parsing_errors = True)

# Loading train data from csv

And Sorting it

In [None]:
train_data = pd.read_csv('data/train.csv')
train_data['answer_contents'] = train_data.apply(lambda row: row[row['answer']], axis = 1)
test_data = pd.read_csv('data/test.csv')

In [None]:
new_idx = train_data['prompt'].str.len().sort_values().index
train_data_sorted = train_data.reindex(new_idx)

# Initializing tags for prompt engineering

In [None]:
# Configure system message tags
B_INST, E_INST = "[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"

# Function and variables to format the prompt

In [None]:
def human_prompt(n: int = 0):
    message = train_data.iloc[n]

    q = message['prompt']
    A = message['A']
    B = message['B']
    C = message['C']
    D = message['D']
    E = message['E']

    # instruction = B_INST + "Pick the most accurate letter of the next multi choice question:" + E_INST

    question = """
    \nUser: {question}

    A. {answer_1}
    B. {answer_2}
    C. {answer_3}
    D. {answer_4}
    E. {answer_5}
    """

    prompt = PromptTemplate(template = question, input_variables = ['question', 'answer_1', 'answer_2', 'answer_3', 'answer_4', 'answer_5'])
    # final_prompt = instruction + prompt.format(answer_1 = A, answer_5 = E, answer_4 = D, question = q, answer_3 = C, answer_2= B)
    final_prompt = prompt.format(answer_1 = A, answer_5 = E, answer_4 = D, question = q, answer_3 = C, answer_2= B)

    return final_prompt

In [None]:
system_prompt = "<s>" + B_SYS + """Assistant will answer a multi choice question by giving 3 and only 3 letters from the options given. The letters will be separated by comma. The order of the answers given by assistant are from the most likely correct to the less likely.
No explanation needed for the answers. Assistant never ask for anything. Assistant never ask for answers.

Here is a previous conversation between the Assistant and the User:

\nUser: What is the chemical formula of water

A. H2O
B. O2
C. NACL
D. C2H5OH
E. O3

Assistant: (A, B, E).


\nUser: What type of organism is commonly used in preparation of foods such as cheese and yogurt

A. viruses
B. protozoa
C. cells
D. gymnosperms
E. mesophilic organisms

Assistant: (E, C, B).


\nUser: What is the least dangerous radioactive decay

A. zeta decay
B. beta decay
C. gamma decay
D. alpha decay
E. all of the above

Assistant: (D, C, B).


\nUser: What phenomenon makes global winds blow northeast to southwest or the reverse in the northern hemisphere and northwest to southeast or the reverse in the southern hemisphere?

A. hurricanes
B. tropical effect
C. muon effect
D. centrifugal effect
E. coriolis effect

Assistant: (E, C, A).


\nUser: Kilauea in hawaii is the world\u2019s most continuously active volcano. very active volcanoes characteristically eject red-hot rocks and lava rather than this?

A. carbon and smog
B. smoke and ash
C. greenhouse gases
D. magma
E. fire

Assistant: (B, E, A).""" + E_SYS
new_prompt = agent.agent.create_prompt(system_message = system_prompt, tools = tools)
agent.agent.llm_chain.prompt = new_prompt

# Inference loop

In [None]:
ans = []
model_ans = []
for i in tqdm(range(train_data.shape[0])):
    t = []
    tmp = train_data.iloc[i]
    t.append(tmp['id'])
    t.append(tmp['answer'])
    try:
        res = llm(system_prompt + human_prompt(i))
        model_ans.append((tmp['id'], res))
        l = res.split(':')[1].split('.')[0].strip().replace('(', '').replace(')', '').strip().split(', ')
        if len(l[0]) == 1:
            t.extend(l)
        else:
            #if the answer is empty, append an empty response. 
            t.append('-')
            t.append('-')
            t.append('-')
    except:
        t.append('-')
        t.append('-')
        t.append('-')
    ans.append(t)

# Formatting and saving the results

In [None]:
ans = pd.DataFrame(ans, columns=['id', 'answer', 'prediction1', 'prediction2', 'prediction3'])
ans.fillna('-', inplace=True)

In [None]:
cols = ['prediction1', 'prediction2', 'prediction3']
ans['prediction'] = ans[cols].apply(lambda x: ' '.join(x.values.astype(str)), axis=1)

In [None]:
cols_to_delete = ['answer', 'prediction1', 'prediction2', 'prediction3']
ans.drop(cols_to_delete, axis=1, inplace=True)
ans.to_csv('submission.csv', index=False)