## Describe your model -> fine-tuned LLaMA 2
Adapted froma contribution of  Matt Shumer (https://twitter.com/mattshumer_)

The goal of this notebook is to experiment with a new way to make it very easy to build a task-specific model for your use-case.

First, use the best GPU available (go to Runtime -> change runtime type)

To create your model, just go to the first code cell, and describe the model you want to build in the prompt. Be descriptive and clear.

Select a temperature (high=creative, low=precise), and the number of training examples to generate to train the model. From there, just run all the cells.

You can change the model you want to fine-tune by changing `model_name` in the `Define Hyperparameters` cell.

# Data generation step (optional, start for section 2, here is if you want a custom model for other use case)





Write your prompt here. Make it as descriptive as possible!

Then, choose the temperature (between 0 and 1) to use when generating data. Lower values are great for precise tasks, like writing code, whereas larger values are better for creative tasks, like writing stories.

Finally, choose how many examples you want to generate. The more you generate, a) the longer it takes and b) the more expensive data generation will be. But generally, more examples will lead to a higher-quality model. 100 is usually the minimum to start.

In [None]:
prompt = "A model that takes in a puzzle-like reasoning-heavy question in English, and responds with a well-reasoned, step-by-step thought out response in Spanish."
temperature = .4
number_of_examples = 100

Run this to generate the dataset.

In [None]:
!pip install openai

Collecting openai
  Downloading openai-1.25.0-py3-none-any.whl (312 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/312.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m307.2/312.9 kB[0m [31m10.6 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m312.9/312.9 kB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/75.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m11.6 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m11.5 MB/s[0m eta [36m0:00:00[0

In [None]:
import os
import openai
import random

openai.api_key = "YOUR KEY HERE"

def generate_example(prompt, prev_examples, temperature=.5):
    messages=[
        {
            "role": "system",
            "content": f"You are generating data which will be used to train a machine learning model.\n\nYou will be given a high-level description of the model we want to train, and from that, you will generate data samples, each with a prompt/response pair.\n\nYou will do so in this format:\n```\nprompt\n-----------\n$prompt_goes_here\n-----------\n\nresponse\n-----------\n$response_goes_here\n-----------\n```\n\nOnly one prompt/response pair should be generated per turn.\n\nFor each turn, make the example slightly more complex than the last, while ensuring diversity.\n\nMake sure your samples are unique and diverse, yet high-quality and complex enough to train a well-performing model.\n\nHere is the type of model we want to train:\n`{prompt}`"
        }
    ]

    if len(prev_examples) > 0:
        if len(prev_examples) > 10:
            prev_examples = random.sample(prev_examples, 10)
        for example in prev_examples:
            messages.append({
                "role": "assistant",
                "content": example
            })

    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=messages,
        temperature=temperature,
        max_tokens=1354,
    )

    return response.choices[0].message['content']

# Generate examples
prev_examples = []
for i in range(number_of_examples):
    print(f'Generating example {i}')
    example = generate_example(prompt, prev_examples, temperature)
    prev_examples.append(example)

print(prev_examples)

We also need to generate a system message.

In [None]:
def generate_system_message(prompt):

    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[
          {
            "role": "system",
            "content": "You will be given a high-level description of the model we are training, and from that, you will generate a simple system prompt for that model to use. Remember, you are not generating the system message for data generation -- you are generating the system message to use for inference. A good format to follow is `Given $INPUT_DATA, you will $WHAT_THE_MODEL_SHOULD_DO.`.\n\nMake it as concise as possible. Include nothing but the system prompt in your response.\n\nFor example, never write: `\"$SYSTEM_PROMPT_HERE\"`.\n\nIt should be like: `$SYSTEM_PROMPT_HERE`."
          },
          {
              "role": "user",
              "content": prompt.strip(),
          }
        ],
        temperature=temperature,
        max_tokens=500,
    )

    return response.choices[0].message['content']

system_message = generate_system_message(prompt)

print(f'The system message is: `{system_message}`. Feel free to re-run this cell if you want a better result.')

Now let's put our examples into a dataframe and turn them into a final pair of datasets.

In [None]:
import pandas as pd

# Initialize lists to store prompts and responses
prompts = []
responses = []

# Parse out prompts and responses from examples
for example in prev_examples:
  try:
    split_example = example.split('-----------')
    prompts.append(split_example[1].strip())
    responses.append(split_example[3].strip())
  except:
    pass

# Create a DataFrame
df = pd.DataFrame({
    'prompt': prompts,
    'response': responses
})

# Remove duplicates
df = df.drop_duplicates()

print('There are ' + str(len(df)) + ' successfully-generated examples. Here are the first few:')

df.head()

Split into train and test sets.

In [None]:
# Split the data into train and test sets, with 90% in the train set
train_df = df.sample(frac=0.9, random_state=42)
test_df = df.drop(train_df.index)

# Save the dataframes to .jsonl files
train_df.to_json('train.jsonl', orient='records', lines=True)
test_df.to_json('test.jsonl', orient='records', lines=True)

# ALTERNATIVE. Importing dataset from confluence

In [1]:
!pip install python-dotenv requests langchain-community
!pip install langchain
!pip install atlassian-python-api
!pip install lxml
!pip install tiktoken
!pip install pandas
!pip install boto3



In [13]:
import os
import sys
from dotenv import load_dotenv, find_dotenv
from langchain.document_loaders import ConfluenceLoader
from langchain.text_splitter import MarkdownHeaderTextSplitter, RecursiveCharacterTextSplitter
# Intenta importar BeautifulSoup y lxml, maneja la excepción si lxml no está instalado
try:
    from bs4 import BeautifulSoup
except ImportError as e:
    raise ImportError("Por favor, asegúrate de que la librería 'lxml' está instalada. Usa: pip install lxml") from e


# Configuración de variables de entorno
sys.path.append('../')
load_dotenv(find_dotenv())

# Configuración de Confluence
CONFLUENCE_URL = os.getenv("CONFLUENCE_URL")
CONFLUENCE_API_KEY = os.getenv("CONFLUENCE_API_KEY")
CONFLUENCE_USERNAME = os.getenv("CONFLUENCE_USERNAME")
CONFLUENCE_SPACE_KEY = os.getenv("CONFLUENCE_SPACE_KEY")

# Cargar documentos desde Confluence
loader = ConfluenceLoader(
    url=CONFLUENCE_URL,
    username=CONFLUENCE_USERNAME,
    api_key=CONFLUENCE_API_KEY
)
docs = loader.load(
    space_key=CONFLUENCE_SPACE_KEY,
    # limit=1,
    # max_pages=5,
    keep_markdown_format=True
)

# Dividir documentos basados en cabeceras Markdown
def split_markdown_documents(docs):
    # Markdown
    headers_to_split_on = [
            ("#", "Title 1"),
            ("##", "Subtitle 1"),
            ("###", "Subtitle 2"),
    ]
    markdown_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on)

    # Split based on markdown and add original metadata
    md_docs = []
    for doc in docs:
        md_doc = markdown_splitter.split_text(doc.page_content)
        for i in range(len(md_doc)):
            md_doc[i].metadata = md_doc[i].metadata | doc.metadata
        md_docs.extend(md_doc)

    # RecursiveTextSplitter
    # Chunk size big enough
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=512,
        chunk_overlap=20,
        separators=[r"\n\n", r"\n", r"(?<=\. )", " ", ""]
    )

    splitted_docs = splitter.split_documents(md_docs)
    return splitted_docs

# Proceso completo
texts = split_markdown_documents(docs)

# Imprimir los primeros 10 chunks
def pretty_print(chunks, limit=10):
    for i, chunk in enumerate(chunks[:limit]):
        print(f"Chunk {i+1} Content:\n{chunk.page_content}\n---\nMetadata:\n{chunk.metadata}\n{'='*50}\n")

# pretty_print(texts) 


Received runtime arguments {'space_key': 'EN', 'limit': 1, 'max_pages': 5}. Passing runtime args to `load` is deprecated. Please pass arguments during initialization instead.


Chunk 1 Content:
Welcome to the Engineering Portal Search: large Search Favorite Pages Product Domains On-Call Responsibilities Core Values Coding Style Guides Key Links https://app.gusto.com/login https://login.okta.com/ https://www.datadoghq.com/ https://github.com/webconnex Recently updated page, comment, blogpost, spacedesc 10 true concise Other Spaces of Interest: Infrastructure Space Enterprise Foundation Team Space App Development Space Product Red Squadron Blizzard Force
---
Metadata:
{'title': 'Engineering', 'id': '24969371', 'source': 'https://webconnex.atlassian.net/wiki/spaces/EN/overview', 'when': '2023-10-09T04:46:45.355Z'}

Chunk 2 Content:
While this is a living document that will always be changing, as of right now this it is considered Complete Green true and can/should be referenced. Webconnex draws it's style guides from the Golang community. We rely heavily on the following documents. Effective Go Effective Go gives tips for writing clear, idiomatic Go code. Review

In [5]:
import boto3
from langchain_community.chat_models import BedrockChat

# Configuración de AWS
session = boto3.Session(profile_name='881184462287-/okta-admin-user')
bedrock_client = session.client("bedrock-runtime", region_name="us-east-1")

# Función para generar preguntas usando AWS Bedrock
def generate_question(chunk_text, metadata):
    llm = BedrockChat(model_id="anthropic.claude-3-sonnet-20240229-v1:0", region_name="us-east-1", client=bedrock_client)

    # Crear el prompt y configurar el mensaje de usuario
    prompt = f"Consider this text: '{chunk_text}' and this metadata '{metadata}'. What would be an appropriate question that a developer would formulate for which this text is an answer? return only the question no more text nedeed"
    user_messages = [{"role": "user", "content": prompt}]

    try:
        response = llm.invoke(user_messages)
        # Extracción del contenido de la respuesta, asegurándose de obtener el texto de la respuesta
        question = response.content
        return question
    except Exception as e:
        print(f"Error during API call: {e}")
        return None


#chunk_text = "The Python Software Foundation manages the open-source licensing for Python version 3.7 and later."
#question = generate_question(chunk_text)
#print("Generated Question:", question)


In [None]:
import pandas as pd

# Lista para almacenar los pares prompt-response junto con metadata
prompt_response_pairs = []

# Procesar cada documento en el conjunto de textos
for document in texts:
    chunk_text = document.page_content  # Asumiendo que el texto está en el atributo 'page_content'
    metadata = document.metadata if hasattr(document, 'metadata') else {}
    try:
        question = generate_question(chunk_text, metadata)
        # question = "holis"
        if question:  # Asegurarse de que la pregunta fue generada
            prompt_response_pairs.append({
                'prompt': question,
                'response': chunk_text,
                'metadata': metadata
            })
    except Exception as e:
        print(f"Error al generar pregunta para el chunk: {e}")

# Crear un DataFrame con los datos recogidos
df = pd.DataFrame(prompt_response_pairs)

# Mostrar los primeros registros para verificar
print(df.head())

# Opcional: guardar el DataFrame como un archivo CSV
df.to_csv("prompt_response_pairs.csv", index=False)


In [None]:
import pandas as pd

df = pd.read_csv("prompt_response_pairs.csv")
df['response'] = df['response'] + df['metadata']
df

Unnamed: 0,prompt,response,metadata
0,What information and links are available on th...,Welcome to the Engineering Portal Search: larg...,"{'title': 'Engineering', 'id': '24969371', 'so..."
1,What are the coding standards and best practic...,While this is a living document that will alwa...,"{'title': 'GoLang (Go)', 'id': '40894467', 'so..."
2,How do we communicate and collaborate effectiv...,Because the current team structure consists of...,"{'title': 'Communication', 'id': '40894484', '..."
3,What are the best practices and guidelines for...,This document is a place where we can note and...,"{'title': 'Best Practices and Guidelines', 'id..."
4,What are the coding style guidelines and best ...,For the most part we follow idiomatic Golang p...,"{'title': 'Coding Style Guides', 'id': '409600..."
...,...,...,...
565,How can I modify the API rate limits for an ac...,1 1 false decimal list false Set the API limit...,"{'title': 'Public API Runbook', 'id': '2393047..."
566,Why did the release of the Purchase Protection...,Postmortem summary Postmortem owner Incident d...,{'title': '2024/04/19 - Purchase Protection Ch...
567,What details should be included in a postmorte...,Postmortem summary Postmortem owner Incident d...,{'title': '2024/04/26 - Tickets displaying Pen...
568,What steps should be taken to address the issu...,refactor of the codebase to improve readabili...,{'title': '2024/04/26 - Tickets displaying Pen...


In [None]:
# Cargar el segundo DataFrame desde otro CSV
df_additional = pd.read_csv("github - Sheet1-2.csv")

# Fusionar ambos DataFrames
df_combined = pd.concat([df, df_additional], ignore_index=True)

# Opcional: Eliminar duplicados si es necesario
df_combined = df_combined.drop_duplicates()

In [None]:
df = df_combined.copy()
df.describe()

Unnamed: 0,prompt,response,metadata
count,582,582,570
unique,582,575,413
top,readme for shebang repo?,Shebang\n\nShebang contains various API's used...,"{'title': 'Python', 'id': '254509193', 'source..."
freq,1,6,33


In [None]:
system_message = "This model provides proffesional answers that have metadata and URL to refer to the source of the answer to specific questions of  Webconnex employees using the model knowledge, the model answer the question if doesn't know the question it will respond <I don't have the answer>."



# Install necessary libraries

In [None]:
!pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7

import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer

# Define Hyperparameters

In [None]:
model_name = "NousResearch/llama-2-7b-chat-hf" # use this if you have access to the official LLaMA 2 model "meta-llama/Llama-2-7b-chat-hf", though keep in mind you'll need to pass a Hugging Face key argument
dataset_name = "/content/train.jsonl"
new_model = "llama-2-7b-custom"
lora_r = 64
lora_alpha = 16
lora_dropout = 0.1
use_4bit = True
bnb_4bit_compute_dtype = "float16"
bnb_4bit_quant_type = "nf4"
use_nested_quant = False
output_dir = "./results"
num_train_epochs = 1
fp16 = False
bf16 = False
per_device_train_batch_size = 4
per_device_eval_batch_size = 4
gradient_accumulation_steps = 1
gradient_checkpointing = True
max_grad_norm = 0.3
learning_rate = 2e-4
weight_decay = 0.001
optim = "paged_adamw_32bit"
lr_scheduler_type = "constant"
max_steps = -1
warmup_ratio = 0.03
group_by_length = True
save_steps = 25
logging_steps = 5
max_seq_length = None
packing = False
device_map = {"": 0}

#Load Datasets and Train

In [None]:
# Load datasets
train_dataset = load_dataset('json', data_files='/content/train.jsonl', split="train")
valid_dataset = load_dataset('json', data_files='/content/test.jsonl', split="train")

# Preprocess datasets
#train_dataset_mapped = train_dataset.map(lambda examples: {'text': [f'[INST] <<SYS>>\n{system_message.strip()}\n<</SYS>>\n\n' + prompt + ' [/INST] ' + response for prompt, response in zip(examples['prompt'], examples['response'])]}, batched=True)
#valid_dataset_mapped = valid_dataset.map(lambda examples: {'text': [f'[INST] <<SYS>>\n{system_message.strip()}\n<</SYS>>\n\n' + prompt + ' [/INST] ' + response for prompt, response in zip(examples['prompt'], examples['response'])]}, batched=True)
# Preprocess datasets
def format_data(example):
    formatted_text = f'[INST] <<SYS>>\n{system_message.strip()}\n<</SYS>>\n\n{example["prompt"]} [/INST] {example["response"]}'
    return {'text': formatted_text}

# Aplicar formateo a los datasets
train_dataset_mapped = train_dataset.map(format_data, batched=False)
valid_dataset_mapped = valid_dataset.map(format_data, batched=False)
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map
)
model.config.use_cache = False
model.config.pretraining_tp = 1
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
)
# Set training parameters
training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=fp16,
    bf16=bf16,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type,
    report_to="all",
    evaluation_strategy="steps",
    eval_steps=5  # Evaluate every 20 steps
)
# Set supervised fine-tuning parameters
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset_mapped,
    eval_dataset=valid_dataset_mapped,  # Pass validation dataset here
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)
trainer.train()
trainer.model.save_pretrained(new_model)

# Cell 4: Test the model
logging.set_verbosity(logging.CRITICAL)
prompt = f"[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\nWrite a function that reverses a string. [/INST]" # replace the command here with something relevant to your task
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(prompt)
print(result[0]['generated_text'])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



Map:   0%|          | 0/511 [00:00<?, ? examples/s]

Map:   0%|          | 0/57 [00:00<?, ? examples/s]

You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss,Validation Loss
5,2.7302,2.886348
10,2.6994,2.713974
15,2.5001,2.561215
20,2.4658,2.408849
25,2.0121,2.268775
30,1.3453,2.171365
35,1.8745,2.117732
40,2.1201,2.08724
45,2.1716,2.070149
50,1.9087,2.040256




#Run Inference

In [None]:
from transformers import pipeline

prompt = f"[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\nexplain to me the steps for a new hire in the company. [/INST]" # replace the command here with something relevant to your task
num_new_tokens = 100  # change to the number of new tokens you want to generate

# Count the number of tokens in the prompt
num_prompt_tokens = len(tokenizer(prompt)['input_ids'])

# Calculate the maximum length for the generation
max_length = num_prompt_tokens + num_new_tokens

gen = pipeline('text-generation', model=model, tokenizer=tokenizer, max_length=max_length)
result = gen(prompt)
print(result[0]['generated_text'].replace(prompt, ''))

#Merge the model and store in Google Drive

In [None]:
# Merge and save the fine-tuned model
from google.colab import drive
import torch

# Libera toda la memoria caché no utilizada
torch.cuda.empty_cache()
drive.mount('/content/drive')

model_path = "/content/drive/MyDrive/llama-2-7b-custom"  # change to your preferred path

# Reload model in FP16 and merge it with LoRA weights
try:
    base_model = AutoModelForCausalLM.from_pretrained(
        model_name,
        low_cpu_mem_usage=True,
        use_cache=False,
        torch_dtype=torch.float16,
        device_map=device_map,
    )
    model = PeftModel.from_pretrained(base_model, new_model)
    model = model.merge_and_unload()
    print("Modelo cargado y combinado con éxito")
except RuntimeError as e:
    print(f"No se pudo cargar el modelo debido a un error de memoria: {e}")
    torch.cuda.empty_cache()

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Save the merged model
model.save_pretrained(model_path)
tokenizer.save_pretrained(model_path)

# Load a fine-tuned model from Drive and run inference

In [None]:
from google.colab import drive
from transformers import AutoModelForCausalLM, AutoTokenizer

drive.mount('/content/drive')

model_path = "/content/drive/MyDrive/llama-2-7b-custom"  # change to the path where your model is saved

model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

In [None]:
from transformers import pipeline

prompt = "What is 2 + 2?"  # change to your desired prompt
gen = pipeline('text-generation', model=model, tokenizer=tokenizer)
result = gen(prompt)
print(result[0]['generated_text'])