<a href="https://colab.research.google.com/github/mille055/duke_chatbot/blob/main/notebooks/Mistral_7B_finetune_0413.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import os

# Remove Colab default sample_data
!rm -r ./sample_data

# Clone GitHub files to colab workspace
repo_name = "duke_chatbot" # Enter repo name
git_path = 'https://github.com/changyuhsin1999/duke_chatbot.git'
!git clone "{git_path}"

Cloning into 'duke_chatbot'...
remote: Enumerating objects: 168, done.[K
remote: Counting objects: 100% (30/30), done.[K
remote: Compressing objects: 100% (20/20), done.[K
remote: Total 168 (delta 21), reused 10 (delta 10), pack-reused 138[K
Receiving objects: 100% (168/168), 8.67 MiB | 8.57 MiB/s, done.
Resolving deltas: 100% (82/82), done.


In [None]:
!pip install transformers datasets peft accelerate bitsandbytes trl safetensors torch --no-cache

Collecting datasets
  Downloading datasets-2.18.0-py3-none-any.whl (510 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m510.5/510.5 kB[0m [31m14.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting peft
  Downloading peft-0.10.0-py3-none-any.whl (199 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.1/199.1 kB[0m [31m271.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate
  Downloading accelerate-0.29.2-py3-none-any.whl (297 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m297.4/297.4 kB[0m [31m196.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting bitsandbytes
  Downloading bitsandbytes-0.43.1-py3-none-manylinux_2_24_x86_64.whl (119.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m119.8/119.8 MB[0m [31m275.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting trl
  Downloading trl-0.8.3-py3-none-any.whl (244 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.1/244.1 kB[0m [31m

In [None]:
import json
import pandas as pd
import torch
from datasets import Dataset, load_dataset
from huggingface_hub import notebook_login
from peft import LoraConfig, PeftModel
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging,
)
from trl import SFTTrainer

In [None]:
from datasets import load_dataset
from random import randrange

# Load dataset from the huggingface
dataset = load_dataset("cindy990915/duke-chat-rag", split="train")

# Split the dataset into training and test sets
train_test_split = dataset.train_test_split(test_size=0.2)  # 10% for testing, 90% for training

# Access the training and test sets
train_dataset = train_test_split['train']
test_dataset = train_test_split['test']

# Print some information about the splits
print(f"Training Set Size: {len(train_dataset)}")
print(f"Test Set Size: {len(test_dataset)}")

print(f"Dataset Size: {len(dataset)}")
print(dataset[randrange(len(dataset))])

Training Set Size: 548
Test Set Size: 138
Dataset Size: 686
{'answer': 'The first segment of the course focuses on an introduction to numerical programming and building skills in working with data via the Numpy and Pandas libraries.', 'query': 'What will the first segment of the course focus on?', 'context': ['lysis and visualization. The first segment of the course will be an introduction to numerical programming focused on building skills in working with data via the Numpy and Pandas libraries, two of the most common tools used by teams working with data and modeling. Technical aspects covered will include the types of data, methods of sourcing data via the web, APIs, and from domain-specific sensors and hardware (IoT devices), an increasingly common source of analytics data in technical industries. ', 'ng principles and tools. It covers foundational concepts and provides hands-on experience with critical skills including loading, cleaning, manipulating, visualizing, analyzing and in

In [None]:
def formatting_func(sample):
    text = f"""###You are a trusted advisor in this content, helping to explain the text to prospective or current students who are seeking answers to questions. Here is some context and the query:\n### Query: {sample['query']} \n### Answer: {sample['answer']}"""
    return text

In [None]:
from random import randrange

print(formatting_func(dataset[467]))

###You are a trusted advisor in this content, helping to explain the text to prospective or current students who are seeking answers to questions. Here is some context and the query:
### Query: What will the final module of the boot camp focus on? 
### Answer: The final module will focus on a review of probability and statistics with an emphasis on simulation of chance experiments.


# Load base model

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

base_model_id = "mistralai/Mistral-7B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(
    base_model_id,
    padding_side="left",
    add_eos_token=True,
    add_bos_token=True,
)
tokenizer.pad_token = tokenizer.eos_token

max_length = 512

def generate_and_tokenize_prompt(prompt):
    result = tokenizer(
        formatting_func(prompt),
        truncation=True,
        max_length=max_length,
        padding="max_length",
    )
    result["labels"] = result["input_ids"].copy()
    return result

tokenized_train_dataset = train_dataset.map(generate_and_tokenize_prompt)
tokenized_test_dataset = test_dataset.map(generate_and_tokenize_prompt)

# Base Model Performance

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

base_model_id = "mistralai/Mistral-7B-v0.1"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(base_model_id, quantization_config=bnb_config, resume_download=True)

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

`low_cpu_mem_usage` was None, now set to True since model is quantized.


model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

In [None]:
eval_prompt = " what does AIPI 520 covers? "
tokenizer = AutoTokenizer.from_pretrained(
    base_model_id,
    add_bos_token=True,
)
if tokenizer.pad_token is None:
    tokenizer.add_special_tokens({'pad_token': '[PAD]'})

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100, repetition_penalty=1.15)[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


 what does AIPI 520 covers? 1. The purpose of this standard is to provide a set of requirements for the design, development and testing of an integrated circuit (IC) that can be used in a wide range of applications.

## What are the benefits of using AIPI 520?

The main benefit of using AIPI 520 is that it provides a common set of requirements for ICs, which makes it easier for manufacturers to develop products that meet these requirements. This helps to


# PEFT and LoRA config

In [None]:
from peft import prepare_model_for_kbit_training
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

In [None]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [None]:
from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=32,
    lora_alpha=64,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
        "lm_head",
    ],
    bias="none",
    lora_dropout=0.05,  # Conventional
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, config)
print_trainable_parameters(model)

trainable params: 85041152 || all params: 3837112320 || trainable%: 2.2162799758751914


In [None]:
import transformers
from datetime import datetime

project = "duke-chat"
base_model_name = "mistral"
run_name = base_model_name + "-" + project
output_dir = "./" + run_name

trainer = transformers.Trainer(
    model=model,
    train_dataset=tokenized_train_dataset,
    eval_dataset=tokenized_test_dataset,
    args=transformers.TrainingArguments(
        output_dir=output_dir,
        warmup_steps=1,
        per_device_train_batch_size=4,
        gradient_accumulation_steps=1,
        gradient_checkpointing=True,
        max_steps=100,
        learning_rate=2.5e-4, # Want a small lr for finetuning
        #bf16=True,
        optim="paged_adamw_8bit",
        logging_steps=25,              # When to start reporting loss
        logging_dir="./logs",        # Directory for storing logs
        save_strategy="steps",       # Save the model checkpoint every logging step
        save_steps=25,                # Save checkpoints every 50 steps
        evaluation_strategy="steps", # Evaluate the model every logging step
        eval_steps=25,               # Evaluate and save checkpoints every 50 steps
        do_eval=True,                # Perform evaluation at the end of training
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

model.config.use_cache = True
trainer.train()

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


Step,Training Loss,Validation Loss
25,0.8648,0.17377
50,0.1437,0.150194
75,0.1343,0.12944
100,0.117,0.121804


Checkpoint destination directory ./mistral-duke-chat/checkpoint-25 already exists and is non-empty. Saving will proceed but saved results may be invalid.


TrainOutput(global_step=100, training_loss=0.31491842985153196, metrics={'train_runtime': 6848.6792, 'train_samples_per_second': 0.058, 'train_steps_per_second': 0.015, 'total_flos': 8842077693542400.0, 'train_loss': 0.31491842985153196, 'epoch': 0.73})

In [None]:
# Save the fine-tuned model

trainer.model.save_pretrained('cindy990915/duke_chatbot0413')
model.config.use_cache = True

In [None]:
# Login to Hugging Face within the notebook to store your credentials (if not using CLI)
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
write_token = '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'

In [None]:
trainer.model.push_to_hub("cindy990915/duke_chatbot_0413", token= write_token)



adapter_model.safetensors:   0%|          | 0.00/865M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/cindy990915/duke_chatbot_0413/commit/c7b8adc32dd9e55f632b016a0e8953d0336696da', commit_message='Upload model', commit_description='', oid='c7b8adc32dd9e55f632b016a0e8953d0336696da', pr_url=None, pr_revision=None, pr_num=None)

In [None]:
tokenizer.push_to_hub("cindy990915/duke_chatbot_0413", token=write_token)

README.md:   0%|          | 0.00/5.18k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/cindy990915/duke_chatbot_0413/commit/a227fb5c9bc03de4c387c767fe39ae50b90bb613', commit_message='Upload tokenizer', commit_description='', oid='a227fb5c9bc03de4c387c767fe39ae50b90bb613', pr_url=None, pr_revision=None, pr_num=None)

In [None]:
exit()

# Inference

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

base_model_id = "mistralai/Mistral-7B-v0.1"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,  # Mistral, same as before
    quantization_config=bnb_config,  # Same quantization config as before
    device_map="auto",
    trust_remote_code=True,
    use_auth_token=True
)

tokenizer = AutoTokenizer.from_pretrained(base_model_id, add_bos_token=True, trust_remote_code=True)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
from peft import PeftModel

ft_model = PeftModel.from_pretrained(base_model, "mistral-duke-chat/checkpoint-100/")

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
ft_model.to(device)  # Ensure the model uses the same device
def respond(query):
    eval_prompt = """User Query:\n\n {} ###\n\n""".format(query)
    model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")
    output = ft_model.generate(input_ids=model_input["input_ids"].to(device),
                           attention_mask=model_input["attention_mask"],
                           max_new_tokens=125, repetition_penalty=1.15)
    result = tokenizer.decode(output[0], skip_special_tokens=True).replace(eval_prompt, "")
    return result

In [None]:
respond("What is the flexibility of the degree program?")

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


'Solution:\n\nThe MBA program at IBS Hyderabad offers a lot of flexibility to students. The program has been designed in such a way that it can be completed within 2 years or even less if you are able to complete all your courses on time and do not have any backlogs. However, there is no compulsion for students to finish their coursework in two years. Students who wish to take more than two years to complete their coursework will also be allowed to do so. In fact, we encourage our students to take as much time as they need to complete their coursework because we'

In [None]:
import gradio as gr

def duke_chat_response(message, history):
    return respond(message)

demo = gr.ChatInterface(duke_chat_response)

demo.launch(debug=True)

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://259b4952225953f746.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7861 <> https://259b4952225953f746.gradio.live


