Llama 2 7B fine-tuned to revive Rick
Get llama.cpp (all commands are ran in terminal) (you need git)
git clone https://github.com/ggerganov/llama.cpp.git
Build for M1/M2 (you need make)
cd llama.cpp
LLAMA_METAL=1 make
cd ..
Get Llama 2 7B (you need curl)
mkdir models
curl -L "https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q3_K_M.bin" -o models/llama-2-7b-chat.ggmlv3.q3_K_M.bin
I used llama-2-7b-chat.ggmlv3.q3_K_M.bin
- Llama 2 7B quantized to 3 bits of size 3.28 GB, with 5.78 GB max RAM required
According to model card on Hugging Face
New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K
Wise thing probably
Run it
./llama.cpp/main -m ./models/llama-2-7b-chat.ggmlv3.q3_K_M.bin \
--color \
--ctx_size 2048 \
-n -1 \
-ins -b 256 \
--top_k 10000 \
--temp 0.2 \
--repeat_penalty 1.1 \
-t 8
Now when I prompt the model with "Pretend you're Rick Sanchez from Rick and Morty cartoon.", it responds with a really good response
Woah, dude! adjusts sunglasses What's up? It's me, Rick Sanchez, the coolest, most brilliant scientist in the multiverse! takes a hit from a bong You know, I've been working on some pretty wild stuff lately. Like, have you heard of my latest invention? coughs It's a device that can manipulate reality itself! adjusts sunglasses With this baby, I can make anything happen! laughs maniacally But enough about that. What's your problem, dude? Need some help with something? cracks knuckles Just give me a shout and I'll be there in a jiffy! adjusts sunglasses again Don't worry, man, I'm here to save the day... or at least make it more interesting. winks
It sounds a bit like Rick, right? but I hope we can make it better
Let's figure out how to fine-tune it
I find this dataset with Rick and Morty transcript, which we may use for fine-tuning
Btw clone this repository, if you haven't do this yet
git clone https://github.com/jmaczan/c-137.git
Ok, so I'm going to use QLoRA, which is an efficient way of fine-tuning by quantizing weights (shortening weights to n-bits). Don't expect me to provide more details, I'm still learning as well!
Wait a second. A quick look on mlabonne/guanaco-llama2-1k dataset makes me think whether the transcript needs such pre-processing as well? So all segments likely need to be in a format
<s>[INST] {human_text} [/INST] {assistant_text} </s>
or
<s>[INST] {human_text} [/INST] </s>
What is human text and assistant text in case of the transcript? That's some decision to make. We have info about speaker and their line, so maybe we can loop through all rows and merge all non-Rick dialogues to human_text and once Rick is speaking, put his line to assistant_text and repeat it until we reach end of the transcript. But I dislike these descriptions of actions which are in dialogues and they are indistinguishable from the words a character says. I find another dataset with transcription on Kaggle which seems to not have these actions descriptions
I find this neat colab and decide to go with fine tuning their way. They're using AlexanderDoria/novel17_test dataset, which when you inspect it, it's a JSONL file (it wraps all content with an array and so you can keep all data in one line - saves disk space). Sample object has this structure
{"text":"### Human: Some human text going here### Assistant: Assistant's response here"},{"text":"... and so on
Dataset I'm going to use is a csv with the following columns
index season no. episode no. episode name name line
We can skip all columns except name and line, unless we want to do some fancy stuff like make Rick recognize in which episode he said a given line
My initial approach to data processing will be to iterate through the file and
GLOBAL INITIALIZATION
- once - at the beginning of processing - create a prompt template
{"text":"### Human: {other_lines} ### Assistant: {rick_lines}"}
- create a temp empty strings
other_lines
andrick_lines
- we will have two pointers (one local and one global)
- global pointer initalized to -1
- global output initialized to empty string
- calculate total number of lines and store it in
total_lines_count
variable
NEXT GLOBAL LINE
- global pointer += 1
- if global pointer is equal to total_lines_count, stop the script and save global output to data.jsonl file
- local pointer initalized to 0
- set empty strings to
other_lines
andrick_lines
NEXT LOCAL LINE
- get row of index = global + local
- if current line is not Rick's line and
rick_lines
is not empty:- add template filled with
other_lines
andrick_lines
to global output and append',
at the end of the prompt template - go to NEXT GLOBAL LINE
- add template filled with
- preprocess row's line content and store it in local variable
preprocessed_line
- remove all double quotes from lines
- if current line is not Rick's line, then append it to
other_lines += preprocessed_line + ' '
- if current line is Rick's line, then append it to
rick_lines
+= preprocessed_line + ' '` - local pointer += 1
- go to NEXT LOCAL LINE
We can also think of
- remove first row because it doesn't have other person text before Rick's line
- create a bunch of questions and answers to learn Rick who is he, who are people around him etc
- if we wanted to clear Rick's memory, we could remove all names from the dataset and replace them with "you" or something
- combining all names and lines before Rick's line into one prompt, like
{"text":"### Human: Jerry said 'Damn it!'. Beth said 'Jerry!'. Jerry said 'Beth!'. Summer said 'Oh my god, my parents are so loud, I want to die.' ### Assistant: Mm, there is no God, Summer. You gotta rip that band-aid off now. You'll thank me later."}
Not sure what kind of combining few lines into Human prompt would be the most effective one, so I guess it needs some experiments to be done
Let's start implementing script for preparing training data
First, create venv
python3 -m venv venv
I have Python installed as python3
, you may need to replace it with python
or whatever you named it. Same goes for pip3 -> pip
Activate it
source venv/bin/activate
I use pandas to manipulate csv data
pip install pandas
Since dataset is on Kaggle (you need to sign up first), let's install kaggle as well
pip install kaggle
Go to settings in Kaggle web app and click Create New Token. It generates and downloads an API token, which you need to copy into ~/.kaggle/kaggle.json
(you need to mkdir ~/.kaggle
first)
Set correct rights on the file
chmod 600 ~/.kaggle/kaggle.json
Let's install langchain to help with some stuff, for now I just going to use it for prompt formatting
pip install langchain
Ok, the code to prepare data is done. You can check it here
I pushed the dataset on Hugging Face, so you can reuse it
I take this great Google Colab, adjust it a little to use my dataset and run it
And that's it actually. The finished model is available here on Hugging Face
The model seems to be crappy. I'm not sure what I did wrong. I think the prompt format could be bad, actually. I learn that it's likely Vicuna 1 format
Time for second round of training
I do another training based on this article
In order to make it work, I need to adjust the data. You can see a script here
It produces dataset.csv, which needs some post processing, because it appends some excessive empty columns
I publish the correct dataset on Hugging Face and update the old one to have proper naming
Anyway, here it is! Google Colab with fine tuning by PEFT, using Torch, Transformers, trl for SFT and QLoRA on 4 bits. The foundation model is meta-llama/Llama-2-7b-chat-hf
To see how it performs, you can click here to scroll to the example
The model is hosted on Hugging Face as well
🤗 Thanks for reading
https://replicate.com/blog/run-llama-locally
https://github.com/ggerganov/llama.cpp
https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML
https://towardsdatascience.com/fine-tune-your-own-llama-2-model-in-a-colab-notebook-df9823a04a32
https://huggingface.co/datasets/Prarabdha/Rick_and_Morty_Transcript
https://www.kaggle.com/datasets/andradaolteanu/rickmorty-scripts
https://gist.github.com/younesbelkada/9f7f75c94bdc1981c8ca5cc937d4a4da
https://github.com/artidoro/qlora
https://colab.research.google.com/drive/12dVqXZMIVxGI0uutU6HG9RWbWPXL3vts
https://huggingface.co/datasets/jmaczan/rick-and-morty-scripts-llama-2
https://huggingface.co/jmaczan/llama-2-7b-qlora-rick-sanchez-c-137
Inspiration: https://towardsdatascience.com/make-your-own-rick-sanchez-bot-with-transformers-and-dialogpt-fine-tuning-f85e6d1f4e30