Steps to Finetune Cheater-7b

Update your Huggingface repository + remember to log in via CLI (huggingface-cli login)
Update your Weights&Biases project + remember to log in via CLI (wandb login)
Update dataset path + output path

Generate dataset with prepare_data.jl. It will create leaderboard_code_gen_data_trainset11.jsonl.

You require a GPU from here onward

All steps are executed from the axolotl folder (the main folder of the docker / VS Code)
Preprocess datasets
Finetune the model based on lora.yml config
Merge the adapter into the base model

You could stop here.

The adapter will be automatically uploaded to HuggingFace hub (it has c. 85MB.)
The training run will be recorded by Weights&Biases, so you investigate your loss curves (and pick the right checkpoint.)

It sets up a new Python environment and changed current directory to llama.cpp/
Downloads and compiles llama.cpp
Installs its Python dependencies
Converts the Lora MERGED model into FP16 bin file
Quantizes the model into selected format (eg, Q5_K_M)
Upload to HuggingFace Hub. Remember to change the HuggingFace repo name!

[OPTIONAL] Benchmarking the results

Install Julia if it's running remotely - script julia_install.sh
Install vLLM and launch the server - script vllm_server.sh (executed from main project directory axolotl/)
Benchmark the base model (easiest with vLLM Server) - see benchmark_finetune.jl
Benchmark the base model + LORA adapter - see benchmark_finetune.jl

You can shut down vLLM server now.

Start llama.cpp server - see llama_server.sh (executed from directory llama.cpp/)
Benchmark the GGUF quantized model - see benchmark_finetune.jl

Tips

If you get GPU / CUDA memory errors, reduce the micro_batch_size
If your loss quickly starts increasing, your learning_rate is too high. Halve it.
Watch your GPU utilization and GPU memory stats with watch -n0.1 nvidia-smi. If it's not close to 100%, you're wasting time :)
Set LORA hyperparameters based on https://www.anyscale.com/blog/fine-tuning-llms-lora-or-full-parameter-an-in-depth-analysis-with-llama-2

Cheater-7b Results

Training dataset

File: leaderboard_code_gen_data_trainset11.jsonl
It contains 11/14 test cases, the top 50 samples that scored full points (100 points).
The held-out test cases are q_and_a_extractor, extract_julia_code, and pig_latinify.

Overall Results

Model	Elapsed	Score	Score Std Deviation	Count Zero Score	Count Full Score	Cost Cents
cheater-7b.Q5_K_M	2.1	86.7	26.3	5	265	0.0
cheater-7b	6.1	85.0	27.4	11	242	0.0
gpt-4-1106-preview	22.5	68.9	35.0	50	149	1.21
dolphin-2.6-mistral	8.2	57.9	30.4	32	66	0.0

By test case

name	cheater-7b	cheater-7b.Q5_K_M	dolphin-2.6-mistral	gpt-4-1106-preview
timezone_bumper	97.0	100.0	84.4	91.1
FloatWithUnits	100.0	100.0	82.0	67.9
clean_column	95.8	100.0	70.8	80.8
count_model_rows	96.0	100.0	51.5	87.9
weather_data_analyzer	98.0	92.0	57.8	86.2
keep_only_names	100.0	100.0	48.5	81.2
wrap_string	89.5	85.0	67.4	87.4
event_scheduler	97.8	100.0	56.6	63.0
ispersonal	100.0	100.0	60.0	50.0
add_yearmonth	95.0	100.0	45.2	65.0
audi_filter	93.8	100.0	55.5	51.8
q_and_a_extractor	55.0	73.3	42.3	47.6
extract_julia_code	48.4	38.5	55.6	50.0
pig_latinify	23.5	25.0	32.8	54.8