## Trying Out Finetuning of an LLM

following this tutorial on [Youtube](https://www.youtube.com/watch?v=2PlPqSc3jM0)

the idea is to try and compose an instruction prompt rather than features and labels and see if the LLM can train. We will use the Falcon 7B with Bits and Bytes tokenisation it seems

We will use the OpenAssistant Guanaco, but lets try other stuff later if this works.

I have used 'knkarthick/dialogsum' dataset to finetune the Falcon 7B model to summarise conversations. Subsequently I have merged the peft adapted model into the base model and quantised the result to 8bit saving it here: './data/finetuned_Falcon7b_summary_8bit'

In [1]:
!pip install einops wandb

Collecting einops
  Downloading einops-0.7.0-py3-none-any.whl.metadata (13 kB)
Collecting wandb
  Downloading wandb-0.16.1-py3-none-any.whl.metadata (9.8 kB)
Collecting Click!=8.0.0,>=7.1 (from wandb)
  Downloading click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Collecting GitPython!=3.1.29,>=1.0.0 (from wandb)
  Downloading GitPython-3.1.40-py3-none-any.whl.metadata (12 kB)
Collecting sentry-sdk>=1.0.0 (from wandb)
  Downloading sentry_sdk-1.39.1-py2.py3-none-any.whl.metadata (9.7 kB)
Collecting docker-pycreds>=0.4.0 (from wandb)
  Downloading docker_pycreds-0.4.0-py2.py3-none-any.whl (9.0 kB)
Collecting setproctitle (from wandb)
  Downloading setproctitle-1.3.3-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.9 kB)
Collecting appdirs>=1.4.3 (from wandb)
  Downloading appdirs-1.4.4-py2.py3-none-any.whl (9.6 kB)
Collecting protobuf!=4.21.0,<5,>=3.19.0 (from wandb)
  Downloading protobuf-4.25.1-cp37-abi3-manylinux2014_x86_64.wh

In [2]:
!pip install trl

Collecting trl
  Downloading trl-0.7.7-py3-none-any.whl.metadata (10 kB)
Collecting tyro>=0.5.11 (from trl)
  Downloading tyro-0.6.2-py3-none-any.whl.metadata (7.7 kB)
Collecting docstring-parser>=0.14.1 (from tyro>=0.5.11->trl)
  Downloading docstring_parser-0.15-py3-none-any.whl (36 kB)
Collecting rich>=11.1.0 (from tyro>=0.5.11->trl)
  Downloading rich-13.7.0-py3-none-any.whl.metadata (18 kB)
Collecting shtab>=1.5.6 (from tyro>=0.5.11->trl)
  Downloading shtab-1.6.5-py3-none-any.whl.metadata (7.3 kB)
Collecting markdown-it-py>=2.2.0 (from rich>=11.1.0->tyro>=0.5.11->trl)
  Downloading markdown_it_py-3.0.0-py3-none-any.whl.metadata (6.9 kB)
Collecting mdurl~=0.1 (from markdown-it-py>=2.2.0->rich>=11.1.0->tyro>=0.5.11->trl)
  Downloading mdurl-0.1.2-py3-none-any.whl (10.0 kB)
Downloading trl-0.7.7-py3-none-any.whl (139 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.1/139.1 kB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading tyro-0.6.2-py3-none-any.w

In [1]:
### imports
import pandas as pd
import torch
import numpy as np
from transformers import (
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    AutoModelForCausalLM,
    DataCollatorForLanguageModeling,
    Trainer
)
from peft import LoraConfig, peft_model, get_peft_model, AutoPeftModelForCausalLM
from peft.tuners.lora import LoraLayer
# from trl import SFTTrainer # this is only needed when we Tune
from datasets import load_dataset

In [2]:
DEVICE = "cuda:0" if torch.cuda.is_available() else "cpu"

In [3]:
import gc
def release_mem(model, tokenizer, pipeline=None):
    if model is not None:
        del model
    if tokenizer is not None:
        del tokenizer
    if pipeline is not None:
        del pipeline
    gc.collect()
    torch.cuda.empty_cache()

In [2]:
dataset_name = "timdettmers/openassistant-guanaco"

dataset = load_dataset(dataset_name)



In [9]:
dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 12460
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 500
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 1500
    })
})

In [4]:
df = pd.DataFrame(dataset["train"].shuffle()[:10])

In [5]:
pd.set_option("expand_frame_repr", False)
pd.set_option('display.max_colwidth', None)

In [6]:
df = pd.DataFrame(data)

Unnamed: 0,text
0,"### Human: Por favor, explica por qué este chiste es gracioso para un adolescente de 11 años.\n\n- ¿Qué tiene Darth Vader en la nevera?\n- Helado oscuro### Assistant: El chiste del helado oscuro es divertido para un adolescente de 11 años porque juega con la cultura pop que probablemente les sea familiar. Muchos adolescentes de 11 años conocen a Darth Vader, un personaje icónico de Star Wars, y el chiste hace referencia al concepto de ""El lado oscuro"" del cual Darth Vader hace parte. Esto es divertido para los adolescentes porque les permite ver un juego de palabras que les resulta familiar"
1,"### Human: Tengo un Elekid en PokeMMO, qué habilidad es mejor ""spark"" o ""electro ball"", ten en cuenta las siguientes estadisticas que tiene mi Elekid:\n- attack: 38\n- defense: 25\n- sp. atk: 37\n- sp. def: 27\n- speed: 53### Assistant: De las dos habilidades que mencionas, ""Spark"" y ""Electro Ball"", la habilidad que sería mejor para tu Elekid dependerá de cómo quieras utilizarlo en la batalla.\n\nSi estás buscando maximizar la potencia de sus movimientos eléctricos, entonces la habilidad ""Electro Ball"" es la mejor opción. La potencia de este movimiento aumenta a medida que la velocidad del usuario es mayor que la del objetivo, y dado que la velocidad de tu Elekid es relativamente alta (53), es posible que pueda causar una gran cantidad de daño a los oponentes más lentos.\n\nPor otro lado, si estás buscando una habilidad que aumente la probabilidad de causar un efecto secundario en los movimientos eléctricos de tu Elekid, entonces ""Spark"" sería una mejor opción. La habilidad ""Spark"" tiene una probabilidad del 30% de paralizar al objetivo cuando se usa un movimiento eléctrico, lo que puede ser útil para interrumpir la estrategia del oponente o para asegurarte de que tu Elekid tenga la oportunidad de atacar primero.\n\nEn última instancia, la elección de la habilidad dependerá de tus preferencias y estrategias en la batalla, así que asegúrate de considerar cómo planeas usar a tu Elekid antes de tomar una decisión final."
2,"### Human: Was bist du?### Assistant: Ich bin eine künstliche Intelligenz(KI). Das Künstlich in KI steht dafür, dass ich von Menschen geschaffen worden bin. Die Intelligenz in KI steht dafür, dass ich in der Lage bin zu lernen.### Human: Was kannst du alles für Sachen?"
3,"### Human: Haz una lista de las mejores tarjetas gráficas ordenada de mejor a peor, comparando las características de las mismas.### Assistant: Aquí te presento una lista de las mejores tarjetas gráficas disponibles actualmente en el mercado, ordenadas de mejor a peor en términos de características:\n\nNVIDIA GeForce RTX 3090: Es la tarjeta gráfica más potente actualmente disponible, con un alto rendimiento en juegos y aplicaciones de edición de video. Tiene una memoria GDDR6X de 24GB y un ancho de banda de memoria de 936 GB/s.\nNVIDIA GeForce RTX 3080 Ti: Es la segunda tarjeta gráfica más potente, también con un alto rendimiento en juegos y edición de video. Cuenta con una memoria GDDR6X de 12GB y un ancho de banda de memoria de 912 GB/s.\nAMD Radeon RX 6900 XT: Es una tarjeta gráfica potente que ofrece un buen rendimiento en juegos y aplicaciones de edición de video. Tiene una memoria GDDR6 de 16GB y un ancho de banda de memoria de 512 GB/s.\nNVIDIA GeForce RTX 3080: Es una tarjeta gráfica de alto rendimiento que ofrece una buena experiencia de juego y edición de video. Tiene una memoria GDDR6X de 10GB y un ancho de banda de memoria de 760 GB/s.\nAMD Radeon RX 6800 XT: Es una tarjeta gráfica de alta gama que ofrece un buen rendimiento en juegos y aplicaciones de edición de video. Tiene una memoria GDDR6 de 16GB y un ancho de banda de memoria de 512 GB/s.\nNVIDIA GeForce RTX 3070 Ti: Es una tarjeta gráfica de alto rendimiento que ofrece una buena experiencia de juego y edición de video. Tiene una memoria GDDR6X de 8GB y un ancho de banda de memoria de 608 GB/s.\nAMD Radeon RX 6800: Es una tarjeta gráfica de gama alta que ofrece un buen rendimiento en juegos y aplicaciones de edición de video. Tiene una memoria GDDR6 de 16GB y un ancho de banda de memoria de 512 GB/s.\nNVIDIA GeForce RTX 3060 Ti: Es una tarjeta gráfica de gama media-alta que ofrece un buen rendimiento en juegos y edición de video. Tiene una memoria GDDR6 de 8GB y un ancho de banda de memoria de 448 GB/s.\nNVIDIA GeForce RTX 3060: Es una tarjeta gráfica de gama media que ofrece un buen rendimiento en juegos y aplicaciones de edición de video. Tiene una memoria GDDR6 de 12GB y un ancho de banda de memoria de 360 GB/s.\nAMD Radeon RX 6700 XT: Es una tarjeta gráfica de gama media que ofrece un buen rendimiento en juegos y aplicaciones de edición de video. Tiene una memoria GDDR6 de 12GB y un ancho de banda de memoria de 384 GB/s.\nEs importante destacar que la mejor tarjeta gráfica para ti dependerá de tus necesidades y presupuesto específicos.### Human: Reduce la lista a las mejores tarjetas graficas en relación calidad precio teniendo en cuenta que usare un monitor 2K de 144Hz### Assistant: Teniendo en cuenta tu monitor 2K de 144Hz y la relación calidad-precio, estas son las mejores tarjetas gráficas:\n\nAMD Radeon RX 6800: Es una excelente opción para un monitor 2K de 144Hz, con un buen rendimiento en juegos y aplicaciones de edición de video, y una memoria GDDR6 de 16GB y un ancho de banda de memoria de 512 GB/s.\n\nNVIDIA GeForce RTX 3070 Ti: Ofrece un rendimiento sólido en juegos y edición de video, y es compatible con Ray Tracing y DLSS. Tiene una memoria GDDR6X de 8GB y un ancho de banda de memoria de 608 GB/s.\n\nAMD Radeon RX 6700 XT: Es una tarjeta gráfica de gama media con un buen rendimiento en juegos y aplicaciones de edición de video, y una memoria GDDR6 de 12GB y un ancho de banda de memoria de 384 GB/s.\n\nSi necesitas ayuda en algo más acá estoy para ti."
4,"### Human: ¿Cuál es el mejor momento del año para visitar Valencia? ¿Qué lugares debería visitar?### Assistant: El mejor momento depende mucho de tus preferencias personales. Hay quién busca la fiesta y el espectáculo, como las famosas fallas que se celebran en marzo, o la tranquilidad de pasear por su ciudad o albufera. Se considera que de mayo a octubre el clima es muy bueno. En la ciudad de Valencia hay lugares de interés aptos para ir con toda la familia, como su ciudad de las ciencias."
5,### Human: Podrías contarme un chiste sobre gatos?### Assistant: Ahí va uno:\n¿Qué es un gato debajo de un carro?\nUn gato hidráulico
6,"### Human: Is the following sentence about climate change? Provide a reason why.\n\nSentence: How each country’s emissions and pledges compare### Assistant: Yes, the sentence is about climate change.\n\nThe reason is that the sentence refers to ""emissions"" and ""pledges,"" which are related to greenhouse gas emissions and efforts to mitigate climate change. The sentence implies that different countries have different levels of emissions and that they are making pledges to reduce those emissions. This is a central topic in discussions around climate change and efforts to address its impact.### Human: The answer is correct.### Assistant: Thank you! Do you have any other questions?### Human: yeah, do you think any sentences with emissions or pledges is related to climate change?"
7,"### Human: Can you tell me a bit about what has gone into your creation?### Assistant: Sure! I was created using the latest in natural language processing technology, which enables me to understand and process natural language, allowing me to engage in conversations and provide relevant and accurate answers. My responses are generated using a combination of statistical models, deep learning algorithms, and semantic analysis. Additionally, I use a search engine to find relevant snippets from webpages to provide additional information and context. I'm constantly learning and improving as I interact with more people, and I'm always striving to provide the best possible experience for my users.### Human: Can you write paper with reference"
8,"### Human: When is Diablo IV coming out?### Assistant: As of 2/4/2023 the release date for Diablo 4 is June 6, 2023 on PS5, Xbox Series X/S, PS4, Xbox One, and PC.\n\nWhile Diablo 4 is set to release on June 6, 2023, please note that video game release dates are commonly delayed as developers aim to create the best possible product for their players.### Human: Thank you for the information provided."
9,"### Human: В каком стиле написаны следующие два абзаца?\n\nИ не было бы смысла в нём, кроме зла, если бы эти истории нельзя было читать... ведь то, что собрано, поглощено, взято, обязательно должно вернуться обратно, так или иначе. таков закон Порядка. А металлическое ядро небесной планеты взяло много знаний и с ними обрело очень сильное поле... Оно могло бы послужить доброму делу: любой маг нашёл бы знания ядра полезными и нужными, если бы не притяжение поля. Оно сгубило бы его навсегда. Даже дракон не смог бы безопасно к нему приблизиться... Но у ядра было небо. А в небе жили вот такие духи:\n\nИх руки не всегда были руками. А лица не всегда были лицами. Сначала они были как бы обрывками ветра. вплетались в небо и носились вокруг тогда ещё пустого ядра, как воздушные кометы. Ветер на их планете был ураганным. На любой земле его сила ровняла бы с пылью горы и поднимала бы моря над континентами. Необузданным, изменчивым и яростным было то небо. Но духам такое нравилось. Они были вольны и радостны, как всякие дети на заре мира.### Assistant: Стиль предоставленного вами текста — художественный.### Human: Ну, такое я тоже могу понять. Даже, что это скорее всего фэнтези или фантастика. А можно ли как-то более детально?### Assistant: Данные абзацы написаны в жанре фэнтези/научной фантастики. Стиль написания описаний духов, небесной планеты, ядра и т.д. - это характеристика и атрибуты жанра. Язык достаточно выразителен, с элементами описательности, метафоричности, которые характерны для таких текстов."


In [4]:
model_name = "ybelkada/falcon-7b-sharded-bf16"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    trust_remote_code=True
)


Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

In [4]:
# configs

model.config.use_cache = False
lora_alpha = 16
lora_dropout = 0.1
lora_r = 64

peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=[
        "query_key_value",
        "dense",
        "dense_h_to_4h",
        "dense_4h_to_h",
    ]
)

training_arguments = TrainingArguments(
    output_dir="./data/finetuned_falcon7_summary_1",
    save_strategy="epoch",
    per_device_train_batch_size=4,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=4,
    optim="paged_adamw_32bit",
    evaluation_strategy="steps",
    logging_steps=3,
    learning_rate=1e-4,
    fp16=False,
    max_grad_norm = 0.3,
    eval_steps=0.25,
    num_train_epochs=1,
    weight_decay=0.001,
    warmup_ratio=0.05,
    group_by_length=True,
    lr_scheduler_type="constant",
    gradient_checkpointing=True,
)

NameError: name 'model' is not defined

In [6]:
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

In [10]:
max_seq_length = 1024

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
)

In [7]:
for name, module in trainer.model.named_modules():
    if "norm" in name:
        module = module.to(torch.float32)

NameError: name 'trainer' is not defined

In [12]:
trainer.train()

[34m[1mwandb[0m: Currently logged in as: [33msnoop088[0m. Use [1m`wandb login --relogin`[0m to force relogin


You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss,Validation Loss
246,No log,1.346052
492,No log,1.329145




TrainOutput(global_step=614, training_loss=1.3918702315041607, metrics={'train_runtime': 4291.6324, 'train_samples_per_second': 4.588, 'train_steps_per_second': 0.143, 'total_flos': 2.6774139087649536e+17, 'train_loss': 1.3918702315041607, 'epoch': 2.0})

In [13]:
trainer.save_model('./data/fine_tuned_falcon7B')


### How to Merge a Peft Adapter with Base Model

this is to export a full finetuned model instead of just the Peft Adapter. For sharing on HF.

In [3]:
model_name = "./data/finetuned_falcon7_summary_1"
#load the Peft adapter
loaded_model = AutoPeftModelForCausalLM.from_pretrained(model_name, low_cpu_mem_usage=True)
#merge the adapter with the Base Model
merged_model = loaded_model.merge_and_unload()


Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

In [5]:
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

In [6]:
#Save the merged model
merged_model.save_pretrained("./data/merged_finetuned_Falcon7b_summary", safe_serialization=True)
#Do not forget to save the tokenizer separately. The tokenizer is the same as the one on the Base Model
tokenizer.save_pretrained("./data/merged_finetuned_Falcon7b_summary")

('./data/merged_finetuned_Falcon7b_summary/tokenizer_config.json',
 './data/merged_finetuned_Falcon7b_summary/special_tokens_map.json',
 './data/merged_finetuned_Falcon7b_summary/tokenizer.json')

### Load a Dataset and Examine

In [1]:
from datasets import load_dataset
dataset = load_dataset('knkarthick/dialogsum')

In [7]:
dataset

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 12460
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 500
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 1500
    })
})

In [8]:
df = pd.DataFrame(dataset["train"].shuffle()[:5])
pd.set_option('display.max_colwidth', None)
df

Unnamed: 0,id,dialogue,summary,topic
0,train_2491,"#Person1#: What are your plans for today Mark? Nick and I are going shopping. Do you want to come too?\n#Person2#: Well as a matter of fact. I'm eating Steve. He's writing an article and he's asked me to take some photos for it.\n#Person1#: An article? About What?\n#Person2#: Oh, just People. Anyway, I'm seeing Steve at the zoo at 10.\n#Person1#: Oh. well, let's meet for lunch shall we? How about that sandwich bar we went to on Friday? I'll see you there about 12:30.\n#Person2#: Sounds good. See you.","#Person1# invites Mark to shop, but Mark has to help Steve take photos. They'll meet for lunch then.",invitation
1,train_1986,"#Person1#: Excuse me. But are you Mrs. Smith from America?\n#Person2#: That's it. I am Maria Smith. You must be Zhang Lin from Tianjin Sports Facility Co. Ltd.\n#Person1#: Yes. Nice to meet you, Mrs. Smith.\n#Person2#: Nice to meet you too, Mr. Zhang.",Mrs. Smith and Zhang Lin meet for the first time and greet each other.,social meeting
2,train_5622,"#Person1#: Hello, Pam.\n#Person2#: I'm glad that you can make it.\n#Person1#: It looks like there are a lot of people inside.\n#Person2#: Yeah. I've invited a lot of friends besides you.\n#Person1#: Should I take my shoes off?\n#Person2#: We all keep our shoes on indoors.\n#Person1#: Where are your parents?\n#Person2#: They've gone out so that we could have the house to ourselves.\n#Person1#: That's great!",Pam has invited lots of friends including #Person1#. Pam's parents are out so they could have the whole house.,Party
3,train_7647,"#Person1#: David, imagine meeting you here!\n#Person2#: Janice, I found you stole my vegetables at four o'clock this morning. Is that true?\n#Person1#: All right! I stayed up yesterday and waited for your vegetables. I stole your peaches and flowers.\n#Person2#: It is so hard to prevent them from being stolen. I also got something this morning.\n#Person1#: How many vegetables do you steal today?\n#Person2#: I stole many from Fred's farm, and from yours. I planned to have a dog on farm.\n#Person1#: So funny. By stealing, I forgot all my sorrows and pressure from work.\n#Person2#: I could not agree with you more. For us, there are so many unhappy things and I am so bored ; however, I got lots of fun from stealing.\n#Person1#: I really want to be far away from the reality now.\n#Person2#: But we still need to go back to it. Don't overdo it.",Janice stayed up and stole many vegetables from Fred's farm and David's and says it makes her forget all her sorrows and pressure. David asks her not to overdo it.,steal the vegetables
4,train_2368,"#Person1#: Good morning. Miss Lee. My name is Alex Jones. I'm the new assistant in the office.\n#Person2#: Welcome and nice to meet you. I heard you were coming today. Is today your first day here in the company?\n#Person1#: Yes, I'm looking forward to meeting everybody and getting started on my new job.\n#Person2#: First day is often exciting, isn't it? Here, let me show you to your desk. You can have this computer and telephone and share the copy machine with us in the office. How do you like it?\n#Person1#: This is wonderful. Thank you for doing all this for me, Miss Lee.\n#Person2#: You are welcome. And, please call me Betty.",Alex Jones comes to the office as a new assistant and Betty shows Alex to Alex's desk.,entrant


### Create a Function to Map a Dataset

Creating the factory function to pass to the map of the DataSet to create a prompt and tokenize it for fine tuning. Excellent resource here: [Fine Tuning LLAMA2 on Custom DataSet](https://github.com/curiousily/Get-Things-Done-with-Prompt-Engineering-and-LangChain/blob/master/14.fine-tuning-llama-2-7b-on-custom-dataset.ipynb) by Venko

Lets extract few functions to use as utils in the final mapped function

In [2]:
from bs4 import BeautifulSoup
from datasets import Dataset

In [6]:
def create_prompt(input, summary):
    INSTR = 'You are an AI assistant that will summarise the correspondence in ###Input as ###Summary'
    inputs = input.split('\n')
    concat_inputs = " ".join(inputs)
    prompt = f""" ###Instruction: {INSTR}\n
###Input: {BeautifulSoup(input.strip())}\n
###Summary: {BeautifulSoup(summary.strip())}
"""
    return prompt.strip()

def process_data(example):
    # print(f"{BeautifulSoup(example['dialogue'])}\n\n")
    return {
        "prompt": create_prompt(example["dialogue"], example["summary"])
    }
    

In [10]:


small_set = Dataset.from_dict(dataset["train"][:5])

In [10]:
processed_set = dataset.map(process_data)

In [11]:
processed_set.remove_columns(["dialogue", "summary", "topic"])

DatasetDict({
    train: Dataset({
        features: ['id', 'prompt'],
        num_rows: 12460
    })
    validation: Dataset({
        features: ['id', 'prompt'],
        num_rows: 500
    })
    test: Dataset({
        features: ['id', 'prompt'],
        num_rows: 1500
    })
})

In [12]:
print(processed_set["train"][8]['prompt'])

###Instruction: You are an AI assistant that will summarise the correspondence in ###Input as ###Summary

###Input: #Person1#: This is a good basic computer package. It's got a good CPU, 256 megabytes of RAM, and a DVD player.
#Person2#: Does it come with a modem?
#Person1#: Yes, it has a built-in modem. You just plug a phone line into the back of the computer.
#Person2#: How about the monitor?
#Person1#: A 15 - inch monitor is included in the deal. If you want, you can switch it for a 17 - inch monitor, for a little more money.
#Person2#: That's okay. A 15 - inch is good enough. All right, I'll take it.

###Summary: #Person1# shows a basic computer package to #Person2#. #Person2# thinks it's good and will take it.


### Try Finetuning the Falcon 7 to the above

In [13]:
max_seq_length = 1024

trainer = SFTTrainer(
    model=model,
    train_dataset=processed_set["train"],
    eval_dataset=processed_set["validation"],
    peft_config=peft_config,
    dataset_text_field="prompt",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
)

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

In [14]:
for name, module in trainer.model.named_modules():
    if isinstance(module, LoraLayer):
        if training_arguments.bf16:
            module = module.to(torch.bfloat16)
    if "norm" in name:
        module = module.to(torch.float32)
    if "lm_head" in name or "embed_tokens" in name:
        if hasattr(module, "weight"):
            if training_arguments.bf16 and module.weight.dtype == torch.float32:
                module = module.to(torch.bfloat16)

In [15]:
trainer.train()

[34m[1mwandb[0m: Currently logged in as: [33msnoop088[0m. Use [1m`wandb login --relogin`[0m to force relogin


You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss,Validation Loss
195,1.1462,1.294437
390,1.2512,1.278507
585,1.1291,1.270795


TrainOutput(global_step=778, training_loss=1.2670443421157895, metrics={'train_runtime': 5045.7962, 'train_samples_per_second': 2.469, 'train_steps_per_second': 0.154, 'total_flos': 1.393849329095424e+17, 'train_loss': 1.2670443421157895, 'epoch': 1.0})

In [16]:
trainer.save_model('./data/finetuned_falcon7_summary_1/')

### Playing around with the Tuned Model

In [3]:
# model_name = './data/merged_finetuned_Falcon7b'
model_name = './data/finetuned_falcon7_summary_1/'
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"


#### Loading a Peft Adapted Model

In [4]:
loaded_model = AutoPeftModelForCausalLM.from_pretrained(model_name, low_cpu_mem_usage=True)

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

#### Processing the test dataset to take an input for testing

In [3]:
small_set = Dataset.from_dict(dataset["test"].shuffle()[:5])

In [7]:
processed_input = small_set.map(process_data)

Map:   0%|          | 0/5 [00:00<?, ? examples/s]

In [11]:
processed_input[4]

{'id': 'test_88_3',
 'dialogue': "#Person1#: What are the main differences between this country and your country?\n#Person2#: Well, in Russia, everything happens very fast. People talk quickly, they drive their cars too fast, the good deals go by really quickly...but here in Canada, it seems like people are a little more relaxed.\n#Person1#: Is that true for everything?\n#Person2#: No, of course not. In Russia, going to the bank can take hours. The same is true for the post office and the supermarket. In Canada, however, these places are pretty easy to get through quickly.\n#Person1#: So, what is it that makes some things go either faster or slower compared to us here in Canada? I was born and raised here, so I guess I don't notice these things. I've also never been outside the country before.\n#Person2#: I think the people in Russia are fast movers by nature, at least in the big cities. Public places are still very slow because they haven't tried to do business any differently than th

In [13]:
index_of_summary = processed_input[4]["prompt"].find("###Summary:")
summaryless = processed_input[4]["prompt"][:index_of_summary + len("###Summary:")]
# inputs = tokenizer(summaryless, return_tensors="pt", return_token_type_ids=False)
# inputs, summaryless
summaryless

"###Instruction: You are an AI assistant that will summarise the correspondence in ###Input as ###Summary\n\n###Input: #Person1#: What are the main differences between this country and your country?\n#Person2#: Well, in Russia, everything happens very fast. People talk quickly, they drive their cars too fast, the good deals go by really quickly...but here in Canada, it seems like people are a little more relaxed.\n#Person1#: Is that true for everything?\n#Person2#: No, of course not. In Russia, going to the bank can take hours. The same is true for the post office and the supermarket. In Canada, however, these places are pretty easy to get through quickly.\n#Person1#: So, what is it that makes some things go either faster or slower compared to us here in Canada? I was born and raised here, so I guess I don't notice these things. I've also never been outside the country before.\n#Person2#: I think the people in Russia are fast movers by nature, at least in the big cities. Public places 

In [29]:
with torch.inference_mode():
        outputs = loaded_model.generate(**inputs, max_new_tokens=156, 
                                 num_return_sequences = 1, 
                                 temperature=0.1, 
                                 do_sample = True, 
                                 top_k = 15, 
                                 top_p= 0.8)

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


In [30]:
tokenizer.decode(outputs[0], skip_special_tokens=True)

"###Instruction: You are an AI assistant that will summarise the correspondence in ###Input as ###Summary\n\n###Input: #Person1#: I want to make sure my son receives this letter. It has an important certificate in it.\n#Person2#: You can send it either by certified mail or registered mail. If you only want to make sure it is received, send it by certified mail. It's less expensive.\n#Person1#: OK. How about this package?\n#Person2#: What's in it?\n#Person1#: A watch.\n#Person2#: You should insure it for the value of the watch. And send it by registered mail if it's more expensive. As it's the safest way.\n\n###Summary: #Person1# wants to make sure the letter and the package are received. #Person2# recommends sending the letter by certified mail and the package by registered mail. #Person1# agrees.\n\n###Input: #Person1#: I want to send this package to my sister in New York.\n#Person2#: How much does it weigh?\n#Person1#: It's 2 kg.\n#Person2#: You should send it by air mail. It's the c

#### Testing the Original Falcon 7 Model

lets test the original model :) it will still return the same results I think

In [59]:
release_mem(loaded_model, tokenizer)

In [21]:
original_model = AutoModelForCausalLM.from_pretrained("ybelkada/falcon-7b-sharded-bf16", 
                                                      load_in_8bit=True, 
                                                      trust_remote_code=True)

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

In [22]:
tokenizer1 = AutoTokenizer.from_pretrained("ybelkada/falcon-7b-sharded-bf16")
tokenizer1.pad_token = tokenizer.eos_token
tokenizer1.padding_side = "right"

In [31]:
inputs = tokenizer1(summaryless, return_tensors="pt", return_token_type_ids=False).to(DEVICE)
with torch.inference_mode():
    outputs = original_model.generate(**inputs, max_new_tokens=156, 
                             num_return_sequences = 1, 
                             temperature=0.1, 
                             do_sample = True, 
                             top_k = 15, 
                             top_p= 0.8)
tokenizer1.decode(outputs[0], skip_special_tokens=True)

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


"###Instruction: You are an AI assistant that will summarise the correspondence in ###Input as ###Summary\n\n###Input: #Person1#: I want to make sure my son receives this letter. It has an important certificate in it.\n#Person2#: You can send it either by certified mail or registered mail. If you only want to make sure it is received, send it by certified mail. It's less expensive.\n#Person1#: OK. How about this package?\n#Person2#: What's in it?\n#Person1#: A watch.\n#Person2#: You should insure it for the value of the watch. And send it by registered mail if it's more expensive. As it's the safest way.\n\n###Summary: You should insure it for the value of the watch. And send it by registered mail if it's more expensive. As it's the safest way.\n\n###Output: #Person1#: OK. I'll send it by registered mail.\n#Person2#: OK. I'll insure it for the value of the watch. And send it by registered mail if it's more expensive. As it's the safest way.\n"

To sum up the trained model deliveres as required and summarises the conversation after ### Summary. It does ramble on by starting to repeat the input so it must be cutoff manually.

Using the same generation params on the original model does not work. Sometimes it summarises, but many times it does not. Also introduces a new ###Output conversation for some reason.

Overall happy with the results. We still need to see why the oscillating training patter occurs and try again tweaking the training params. Also worth testing another LLM..

#### Loading the Merged Model with 4-bit quant

In [9]:
model_name = "./data/merged_finetuned_Falcon7b_summary"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_4bit=True,
    trust_remote_code=True
)


Loading checkpoint shards:   0%|          | 0/6 [00:00<?, ?it/s]

In [4]:
torch.cuda.is_available(), torch.cuda.get_device_name(0)

(True, 'NVIDIA GeForce RTX 4090')

> Note: Cannot save it when quantized with bits and bytes!

In [10]:
model.save_pretrained('./data/finetuned_Falcon7b_summary_4bit/')

NotImplementedError: You are calling `save_pretrained` on a 4-bit converted model. This is currently not supported

In [11]:
model.get_memory_footprint()

3921295104

#### Loading the Merged Model with 8-bit quant

Saving the model to reuse as it is much smaller than the 27gb merged!

In [14]:
model_name = "./data/merged_finetuned_Falcon7b_summary"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_8bit=True,
    trust_remote_code=True
)


Loading checkpoint shards:   0%|          | 0/6 [00:00<?, ?it/s]

In [13]:
model.get_memory_footprint() / 1024 / 1024 / 1024

NameError: name 'model' is not defined

In [15]:
tokenizer = AutoTokenizer.from_pretrained(model_name)

In [29]:
inputs = tokenizer(summaryless, return_tensors="pt", return_token_type_ids=False).to(DEVICE)
with torch.inference_mode():
    outputs = model.generate(**inputs, max_new_tokens=128, 
                             num_return_sequences = 1, 
                             temperature=0.1, 
                             do_sample = True, 
                             top_k = 33, 
                             top_p= 0.80)
output = tokenizer.decode(outputs[0], skip_special_tokens=True)

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


In [30]:
print(output)

###Instruction: You are an AI assistant that will summarise the correspondence in ###Input as ###Summary

###Input: #Person1#: Can I introduce myself? I'm Gian Luca Donatelli. I'm from Spain.
#Person2#: I'm Gina. I'm from Finland.
#Person1#: And who do you work for?
#Person2#: I don't work for a company. I'm self-employed. I am a journalist, I write articles for magazines. I'm here at this conference to research for an article on internet service providers.
#Person1#: That's interesting, a friend of mine works for an Italian service provider. Can I introduce you to him?
#Person2#: Yes, of course, that would be nice.
#Person1#: Robert, can you come here for a minute? This is Gina.

###Summary: Gian Luca Donatelli introduces himself to Gina. Gina is a journalist and she's here to research for an article on internet service providers. Gian Luca introduces her to Robert. Gina's friend works for an Italian service provider. Robert is interested in her article. They'll meet later.

###Input:

In [7]:
model.save_pretrained("./data/finetuned_Falcon7b_summary_8bit", safe_serialization=True)



In [9]:
tokenizer.save_pretrained("./data/finetuned_Falcon7b_summary_8bit")

('./data/finetuned_Falcon7b_summary_8bit/tokenizer_config.json',
 './data/finetuned_Falcon7b_summary_8bit/special_tokens_map.json',
 './data/finetuned_Falcon7b_summary_8bit/tokenizer.json')

#### Loading the 8bit quantized model with bits and bytes.

Lets test the performance. We need to explore the GPTQ and AWK quantisations still.

[Tutorial by HF](https://huggingface.co/docs/transformers/quantization)

In [3]:
model_name = "./data/finetuned_Falcon7b_summary_8bit"
model_8bit = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer = padding_side = "right"

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [29]:
inputs = tokenizer("Miley Cyrus was caught shoplifting from Abercrombie and Fitch on Hollywood Boulevard today.", return_tensors="pt", return_token_type_ids=False).to(DEVICE)

In [30]:
inputs

{'input_ids': tensor([[   56, 11339, 45576,   398,  5992,  4611, 44475,   427, 19590,    78,
           397,  9221,   273,   378,  2371,   313, 11332, 30049,  1722,    25]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}

In [37]:
with torch.inference_mode():
        outputs = model.generate(**inputs, max_new_tokens=200, 
                                 num_return_sequences = 2, 
                                 temperature=0.33, 
                                 do_sample = True, 
                                 top_k = 25, 
                                 top_p= 0.8)

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.


In [38]:
tokenizer.decode(outputs[0], skip_special_tokens=True)

'Miley Cyrus was caught shoplifting from Abercrombie and Fitch on Hollywood Boulevard today.\nThe 19-year-old “Hannah Montana” star was seen walking out of the store with a bag of clothes, but she was stopped by security and asked to return the items.\nMiley was later seen leaving the store with her mother, Tish, and her bodyguard.\nA rep for Miley Cyrus has confirmed the incident, saying that she was “unaware that the items were not paid for.”\nMiley has been in the news recently for her controversial performance at the MTV Video Music Awards, where she twerked on Robin Thicke and sang “Blurred Lines” with him.\nShe also recently announced that she will be releasing a new album in 2014.\nMiley Cyrus was caught shoplifting from Abercrombie and Fitch on Hollywood Boulevard today.'