GitHub - jmaczan/c-137: 🦙 Llama 2 7B fine-tuned to revive Rick

Llama 2 7B fine-tuned to revive Rick

Dear diary

Get llama.cpp (all commands are ran in terminal) (you need git)

git clone https://github.com/ggerganov/llama.cpp.git

Build for M1/M2 (you need make)

cd llama.cpp
LLAMA_METAL=1 make
cd ..

Get Llama 2 7B (you need curl)

mkdir models
curl -L "https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q3_K_M.bin" -o models/llama-2-7b-chat.ggmlv3.q3_K_M.bin

I used llama-2-7b-chat.ggmlv3.q3_K_M.bin - Llama 2 7B quantized to 3 bits of size 3.28 GB, with 5.78 GB max RAM required

According to model card on Hugging Face

New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K

Wise thing probably

Run it

./llama.cpp/main -m ./models/llama-2-7b-chat.ggmlv3.q3_K_M.bin \
  --color \
  --ctx_size 2048 \
  -n -1 \
  -ins -b 256 \
  --top_k 10000 \
  --temp 0.2 \
  --repeat_penalty 1.1 \
  -t 8

Now when I prompt the model with "Pretend you're Rick Sanchez from Rick and Morty cartoon.", it responds with a really good response

Woah, dude! adjusts sunglasses What's up? It's me, Rick Sanchez, the coolest, most brilliant scientist in the multiverse! takes a hit from a bong You know, I've been working on some pretty wild stuff lately. Like, have you heard of my latest invention? coughs It's a device that can manipulate reality itself! adjusts sunglasses With this baby, I can make anything happen! laughs maniacally But enough about that. What's your problem, dude? Need some help with something? cracks knuckles Just give me a shout and I'll be there in a jiffy! adjusts sunglasses again Don't worry, man, I'm here to save the day... or at least make it more interesting. winks

It sounds a bit like Rick, right? but I hope we can make it better

Let's figure out how to fine-tune it

I find this dataset with Rick and Morty transcript, which we may use for fine-tuning

Btw clone this repository, if you haven't do this yet

git clone https://github.com/jmaczan/c-137.git

Ok, so I'm going to use QLoRA, which is an efficient way of fine-tuning by quantizing weights (shortening weights to n-bits). Don't expect me to provide more details, I'm still learning as well!

Wait a second. A quick look on mlabonne/guanaco-llama2-1k dataset makes me think whether the transcript needs such pre-processing as well? So all segments likely need to be in a format

<s>[INST] {human_text} [/INST] {assistant_text} </s>

or

<s>[INST] {human_text} [/INST] </s>

What is human text and assistant text in case of the transcript? That's some decision to make. We have info about speaker and their line, so maybe we can loop through all rows and merge all non-Rick dialogues to human_text and once Rick is speaking, put his line to assistant_text and repeat it until we reach end of the transcript. But I dislike these descriptions of actions which are in dialogues and they are indistinguishable from the words a character says. I find another dataset with transcription on Kaggle which seems to not have these actions descriptions

I find this neat colab and decide to go with fine tuning their way. They're using AlexanderDoria/novel17_test dataset, which when you inspect it, it's a JSONL file (it wraps all content with an array and so you can keep all data in one line - saves disk space). Sample object has this structure

{"text":"### Human: Some human text going here### Assistant: Assistant's response here"},{"text":"... and so on

Dataset I'm going to use is a csv with the following columns

index	season no.	episode no.	episode name	name	line

We can skip all columns except name and line, unless we want to do some fancy stuff like make Rick recognize in which episode he said a given line

My initial approach to data processing will be to iterate through the file and

GLOBAL INITIALIZATION

once - at the beginning of processing - create a prompt template {"text":"### Human: {other_lines} ### Assistant: {rick_lines}"}
create a temp empty strings other_lines and rick_lines
we will have two pointers (one local and one global)
global pointer initalized to -1
global output initialized to empty string
calculate total number of lines and store it in total_lines_count variable

NEXT GLOBAL LINE

global pointer += 1
if global pointer is equal to total_lines_count, stop the script and save global output to data.jsonl file
local pointer initalized to 0
set empty strings to other_lines and rick_lines

NEXT LOCAL LINE

get row of index = global + local
if current line is not Rick's line and rick_lines is not empty:
- add template filled with other_lines and rick_lines to global output and append ', at the end of the prompt template
- go to NEXT GLOBAL LINE
preprocess row's line content and store it in local variable preprocessed_line
- remove all double quotes from lines
if current line is not Rick's line, then append it to other_lines += preprocessed_line + ' '
if current line is Rick's line, then append it to rick_lines += preprocessed_line + ' '`
local pointer += 1
go to NEXT LOCAL LINE

We can also think of

remove first row because it doesn't have other person text before Rick's line
create a bunch of questions and answers to learn Rick who is he, who are people around him etc
if we wanted to clear Rick's memory, we could remove all names from the dataset and replace them with "you" or something
combining all names and lines before Rick's line into one prompt, like

{"text":"### Human: Jerry said 'Damn it!'. Beth said 'Jerry!'. Jerry said 'Beth!'. Summer said 'Oh my god, my parents are so loud, I want to die.' ### Assistant: Mm, there is no God, Summer. You gotta rip that band-aid off now. You'll thank me later."}

Not sure what kind of combining few lines into Human prompt would be the most effective one, so I guess it needs some experiments to be done

Let's start implementing script for preparing training data

First, create venv

python3 -m venv venv

I have Python installed as python3, you may need to replace it with python or whatever you named it. Same goes for pip3 -> pip

Activate it

source venv/bin/activate

I use pandas to manipulate csv data

pip install pandas

Since dataset is on Kaggle (you need to sign up first), let's install kaggle as well

pip install kaggle

Go to settings in Kaggle web app and click Create New Token. It generates and downloads an API token, which you need to copy into ~/.kaggle/kaggle.json (you need to mkdir ~/.kaggle first)

Set correct rights on the file

chmod 600 ~/.kaggle/kaggle.json

Let's install langchain to help with some stuff, for now I just going to use it for prompt formatting

pip install langchain

Ok, the code to prepare data is done. You can check it here

I pushed the dataset on Hugging Face, so you can reuse it

I take this great Google Colab, adjust it a little to use my dataset and run it

And that's it actually. The finished model is available here on Hugging Face

The model seems to be crappy. I'm not sure what I did wrong. I think the prompt format could be bad, actually. I learn that it's likely Vicuna 1 format

Time for second round of training

I do another training based on this article

In order to make it work, I need to adjust the data. You can see a script here

It produces dataset.csv, which needs some post processing, because it appends some excessive empty columns

I publish the correct dataset on Hugging Face and update the old one to have proper naming

Anyway, here it is! Google Colab with fine tuning by PEFT, using Torch, Transformers, trl for SFT and QLoRA on 4 bits. The foundation model is meta-llama/Llama-2-7b-chat-hf

To see how it performs, you can click here to scroll to the example

The model is hosted on Hugging Face as well

🤗 Thanks for reading