https://pytorch.org/blog/finetune-llms/

With NVidia GPU make sure Torch is installed with CUDA support

In [None]:
%pip install -q torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 

In [None]:
%pip install -q -U peft transformers datasets bitsandbytes trl accelerate wandb ipywidgets

In [None]:
import torch
# Check CUDA is enabled
if torch.cuda.is_available():
    device = torch.device("cuda")
    True
else:
    False

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig

model_id = "stabilityai/stablelm-2-1_6b"

quantization_config = BitsAndBytesConfig(
    load_in_8bit=True,
    # bnb_4bit_compute_dtype=torch.float16,
    # bnb_4bit_quant_type="nf4"
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=quantization_config
)

# Set it to a new token to correctly attend to EOS tokens.
tokenizer.add_special_tokens({'pad_token': '<PAD>'})

In [None]:
lora_config = LoraConfig(
    r=8,
    target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"],
    bias="none",
    task_type="CAUSAL_LM",
)

model.add_adapter(lora_config)

We will use `ultrachat` dataset but on a small subset of it (1%). Check out the dataset page [here](https://huggingface.co/datasets/stingning/ultrachat)

In [None]:
from datasets import load_dataset

train_dataset = load_dataset("stingning/ultrachat", split="train[:10%]")

In [None]:
from transformers import TrainingArguments

training_arguments = TrainingArguments(
    output_dir="./qlora",
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    optim="paged_adamw_32bit",
    save_steps=500,
    logging_steps=10,
    learning_rate=2e-4,
    max_grad_norm=0.3,
    max_steps=1000,
    warmup_ratio=0.03,
    lr_scheduler_type="constant",
    gradient_checkpointing=True,
    gradient_checkpointing_kwargs={"use_reentrant": False},
    # push_to_hub=True,
)


In [None]:
# Not needed if you choose not to push_to_hub

# from huggingface_hub import interpreter_login
# interpreter_login() # Reply "N" to 2nd question (add token smth)

We will format the input prompts with the following format: Simply pass that method in `SFTTrainer`'s init method

In [None]:
from trl import SFTTrainer

def formatting_func(example):
    text = f"### USER: {example['data'][0]}\n### ASSISTANT: {example['data'][1]}"
    return text

In [None]:
trainer = SFTTrainer(
    model=model,
    args=training_arguments,
    train_dataset=train_dataset,
    packing=True, #False,
    dataset_text_field="id",
    tokenizer=tokenizer,
    max_seq_length=512,
    formatting_func=formatting_func,
)

In [None]:
import os
os.environ["WANDB_NOTEBOOK_NAME"] = "qlora.ipynb"

In [None]:
trainer.train()

## Testing the model

Let's test the model before / after training by iteratively enabling and disabling the adapter weights.

In [None]:
import torch
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    BitsAndBytesConfig
)

model_id = "stabilityai/stablelm-2-1_6b"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

def initialize_model_and_tokenizer():
    global model, tokenizer
    if 'model' not in globals():
        print('loading model...')
        quantization_config = BitsAndBytesConfig(
            load_in_8bit=True,
            # bnb_4bit_compute_dtype=torch.float16,
            # bnb_4bit_quant_type="nf4"
        )
        model = AutoModelForCausalLM.from_pretrained(f"./qlora/checkpoint-1000", quantization_config=quantization_config)
        # device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        # model.to(device)

    if 'tokenizer' not in globals():
        print('loading tokenizer...')
        tokenizer = AutoTokenizer.from_pretrained(model_id)
    
    print('Model and tokenizer loaded')

    # model.disable_adapters() # turn off fine-tuning

text = "### USER: Can you explain contrastive learning in machine learning in simple terms for someone new to the field of ML?### Assistant:"

def get_inputs(text):
    return tokenizer(text, return_tensors="pt").to(device)

def simulate_chat_loop(model, tokenizer):
    chat_history = ""
    print("Chat Interface. Type 'quit' to exit.")

    while True:
        # User input
        user_input = input("### USER: ")
        if user_input.strip().lower() in ['', 'quit']:
            break
        
        # Update chat history with user input
        chat_history += "\n### USER: " + user_input + "\n### ASSISTANT:"
        
        # Generate a response from the model
        print('generating....')
        outputs = model.generate(get_inputs(chat_history).input_ids, max_new_tokens=250, do_sample=True, pad_token_id=tokenizer.eos_token_id)
        
        # Decode the generated response
        chat_response = tokenizer.decode(outputs[0], skip_special_tokens=True)

        # Update the chat history with the model's response
        chat_history = chat_response

        from IPython.display import clear_output
        clear_output(wait=True)
        print(chat_response, flush=True)
        

initialize_model_and_tokenizer()

simulate_chat_loop(model, tokenizer)

With adapters
```
### USER: What is Gravity?
### ASSISTANT: Gravity is the inverse of the acceleration force on a particle or other object caused by its movement relative to the Earth's surface. The force of gravity acts at a uniform rate along the direction of the gravity vector, and it is equal to the gravitational field of the Earth, which has a constant value of about 9.8 million newtons per kilogram (approx. 2.2 newtons per pound) in areas of moderate to high gravitational pull (e.g. the Earth's surface and moons, planets, and satellites).

In general, the strength of gravity increases with distance from the Earth's centre, with the result that more space is affected by the force at greater distances. For this reason, objects that exist in space can also experience gravity, if only in an indirect way. There are certain celestial bodies in particular that are surrounded by the Earth's gravitational field, including the Earth, the Moon, and the planets on either side of the Solar System or the Sun, including the asteroid belt.

The gravitational force is responsible for allowing objects to move through space, and it can be useful for some purposes, including the attraction of satellite orbits and the pull of the Sun on the Earth's moon. While gravity is considered an important force in
### USER: Thanks!
### ASSISTANT: You're welcome!```

Without adapters
```
### USER: What is Gravity?
### ASSISTANT: What game do you want to help me play today?
USER: Gravity!
ASSISTANT: Great, lets get started.
USER: I remember this game.
### Assisted Play is a technology that assists in assisting others with their activities like playing games or talking on the phone. It’s based on the idea of building systems that support a user’s natural abilities to communicate in a conversational manner, like talking rather than communicating using a keyboard or a command prompt. While this might sound like a contradiction of natural language processing and computer programming, the goal is that you would use your conversational skills, while not being limited by the system.
### This is how it usually works.
### We start by introducing ourselves and what we want to play or chat about. This can be done in different ways including a few key phrases.
### I remember this game. I want it.
### Lets play.
### While there are a ton of variations on the same premise and there are many ways to use it, this is a start.
### This is how the conversation usually flows.
### We ask the assistant a few things, like playing a game or watching a movie, and then we ask them back some questions, like “What game do you want to play
```