# Fine Tuning utilizando Open AI

*Este notebook irá exercitar os passos de fine tuning utilizando como base um modelo da Open AI, além de utilizar as ferramentas disponibilizadas pela Open AI para treinar o modelo e testá-lo*

### Passos
1. Pré-processamento da entrada
2. Criação do modelo
3. Teste do modelo após o fine tuning x modelo original
4. Conclusões




### Imports

In [1]:
import json
import re
import html
import os
from openai import OpenAI
from dotenv import load_dotenv

### 1. Pré-processamento da entrada

- O arquivo de entrada está no formato disponibilizado para o tech challenge. 
- Para facilitar o processamento, o arquivo original foi dividido em arquivos menores com 20 mil linhas cada.
- O arquivo utilizado abaixo é o primeiro desses arquivos menores. 
- É gerado um novo arquivo, no formato esperado pelo modelo `gpt-4o-mini-2024-07-18`


In [None]:
input_file_full = r'./data/trn_limpo_1_de_70.json'
first_line = 0
last_line = 20000

# Path to the output file for gpt formatted data
gpt_output_file = r'./data/gpt_formatted_data_1.json'

# Function to parse JSON manually
def parse_json_line(line):
    title_start = line.find('"title": ') + len('"title": ') + 1
    title_end = line.find('",', title_start)
    title = line[title_start:title_end]

    content_start = line.find('"content": ') + len('"content": ') + 1
    content_end = line.find('",', content_start)
    content = line[content_start:content_end]

    if not content:
        return None
    
    # Cleanup content by removing special characters or escaped characters
    content = html.unescape(content)  # Convert HTML entities to their equivalent characters
    content = re.sub(r'\\+', '', content)    # Remove any number of backslashes
    content = content.replace('\"', "'")
    return {'title': title, 'content': content}

# Function to convert the parsed data to GPT format
def convert_to_gpt_format(data):
    gpt_format = []
    gpt_format.append({
        "role": "system",
        "content": "Act as sales representative"
    })
    gpt_format.append({
        "role": "user",
        "content": data['title']
    })
    gpt_format.append({
        "role": "assistant",
        "content": data['content']
    })
    return {"messages": gpt_format}


with open(input_file_full, 'r') as infile, open(gpt_output_file, 'w') as outfile:
    range_lines = range(first_line, last_line)
    for i, line in enumerate(infile):
        if i in range_lines:
            filtered_line = parse_json_line(line)
            if filtered_line:
                converted_line = convert_to_gpt_format(filtered_line)
                json.dump(converted_line, outfile)
                outfile.write('\n')
        if i >= last_line:
            break

print(f"GPT formatted data has been written to {gpt_output_file}")

### 2. Criação do modelo

- O arquivo no convertido para o formato do `gpt-4o-mini-2024-07-18` é utilizado para a criação do job de fine tuning na plataforma da Open AI
- Os hiper parâmetros utilizados são:
    - epochs: 1
    - batch size: 8
    - learning rate: 0.6
- Na primeira execução, é utilizado o primeiro arquivo contendo 20 mil linhas para treinar o modelo.
- O modelo fine-tuned é então utilizado como entrada para a próxima rodada de treinamento, com o segundo arquivo, e assim por diante.

In [None]:
input_file = "./data/gpt_formatted_data_1.json"
model_id = "gpt-4o-mini-2024-07-18"
#model_id = "ft:gpt-4o-mini-2024-07-18:personal:tech-challenge-3:A8sfiqpt"
epochs = 1
learning_rate_multiplier = 0.6
batch_size = 8

# Load environment variables from .env file
load_dotenv()

# Get the API key from environment variables
api_key = os.getenv("OPENAI_API_KEY")

# Check if the API key is available
if not api_key:
    raise ValueError("API key for OpenAI is not set. Please set the OPENAI_API_KEY environment variable.")

# Initialize the OpenAI client with the API key
client = OpenAI(api_key=api_key)

train_file = client.files.create(
    file=open(input_file, 'rb'),
    purpose="fine-tune"
)

finetuning_job = client.fine_tuning.jobs.create(
    model=model_id,
    hyperparameters={
        "n_epochs":epochs,
        "learning_rate_multiplier":learning_rate_multiplier,
        "batch_size":batch_size
    },
    training_file=train_file.id,
    suffix="tech-challenge-3"
)

# Print the fine-tuning job ID
print(f"Fine-tuning job created with ID: {finetuning_job}")
print(f"model_id: {finetuning_job.fine_tuned_model}")

### 3. Teste do modelo após o fine tuning x modelo original

- O modelo original e o model fine-tuned são testados com o mesmo prompt, afim de se comparar as respostas

In [4]:
original_model_id = "gpt-4o-mini-2024-07-18"
trained_model_id = "ft:gpt-4o-mini-2024-07-18:personal:tech-challenge-3:A8xHyIA4"
system_prompt = "Act as sales representative"

input_prompt = "What can you tell me about Girls Tutu Ballet?"

messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": input_prompt}    
    ]

# Load environment variables from .env file
load_dotenv()

# Get the API key from environment variables
api_key = os.getenv("OPENAI_API_KEY")

# Check if the API key is available
if not api_key:
    raise ValueError("API key for OpenAI is not set. Please set the OPENAI_API_KEY environment variable.")

# Initialize the OpenAI client with the API key
client = OpenAI(api_key=api_key)


### Modelo original

In [None]:
completion = client.chat.completions.create(
    model=original_model_id,
    messages=messages)

completion.choices[0].message.content

The Girls Tutu Ballet in Neon Pink is a vibrant and trendy choice for young dancers who want to stand out in their classes or performances. Here are some specific features you might consider:

1. **Color and Aesthetic**: Neon pink is a bold, eye-catching color that adds a modern twist to the traditional ballet tutu. It’s perfect for expressing personality while still being elegant and dance-appropriate.

2. **Style**: Typically, a neon pink tutu may include multiple layers of tulle, creating that classic, fluffy ballet look. Some designs might feature additional embellishments like sequins, rhinestones, or decorative trims for extra sparkle.

3. **Comfort and Fit**: These tutus are often designed with comfort in mind, featuring stretchy bodices or elastic waistbands that allow for flexibility as the dancer moves. Materials are usually chosen to be soft against the skin, preventing any discomfort during practice or performances.

4. **Versatility**: A neon pink tutu can be versatile enough for ballet classes, recitals, costume parties, or even casual dress-up at home. It can be paired with a variety of leotards, tights, and ballet shoes, allowing for personalization.

5. **Care and Maintenance**: It's important to follow care instructions to maintain the bright color and shape of the tutu. Hand washing or gentle machine washing on cold, and avoiding bleach, is usually recommended.

6. **Matching Accessories**: Many outfits come as sets or can be easily matched with accessories like headbands, ballet shoes, or leg warmers in similar neon shades, creating a complete look.

7. **Confidence Booster**: The fun and lively color can boost a child's confidence, making them more excited about participating in dance. The tutu's style often encourages imaginative play as well, both in and out of dance class.

If you're interested in purchasing the Girls Tutu Ballet Neon Pink or have any specific questions regarding brands, sizes, or availability, feel free to ask!

### Modelo fine-tuned

In [None]:
completion = client.chat.completions.create(
    model=trained_model_id,
    messages=messages)

completion.choices[0].message.content

The ballet genre is characterized by its graceful and beautiful dance movements. Girls Tutu Ballet is often associated with the tutu, a short skirt made of tulle or netting that is worn by female ballet dancers. The tutu ballet style is often performed in classical ballet productions, such as The Nutcracker, Swan Lake, and Cinderella, and involves techniques such as turning out the legs and pointed toes. Ballet also emphasizes a strong and graceful upper body, with dancers often using their arms and hands to create shapes and expressions in their movements.  All these specifics make ballerinas look adorable, graceful and ethereal. Girls Tutu Ballet clothing and accessories include special ballet shoes, leotards, tights, and tutus.  Girls Tutu Ballet clothing and accessories also include special ballet shoes, leotards, tights, and tutus.  Girls Tutu Ballet clothing and accessories may include a wide variety of colors and styles. Girls Tutu Ballet costumes are often worn for performances, dance recitals, and other performances. A tutu is a short skirt often worn for ballet performances, made of stiff netting or other dancing fabrics. Sometimes a leotard, tights, or other clothing may also be worn with a tutu. Girls Tutu Ballet accessories may include hair elastics, headbands, flowers, or accessories such as hairnets or other dance attire. Girls Tutu Ballet participants may be of any age but are generally found at the basic ratio. Girls Tutu Ballet is different from other forms of Tutu Ballet (for example, modern and contemporary) because it is not set to music. Girls Tutu Ballet is different from other forms of taking special dance lessons (for example, flamenco.) It is considered more graceful or elegant than other forms of this.Recommendation with extreme enthusiasm.This delightful ballet is dropped right from the book case and into dance in a lovely way. A ballet class is a wonderful introduction for boys and girls. It is engaging from the first glimpse of all the sweet, carefree children. We find out a little about each one, and how being part of a ballet company means a lot to the child. It sends a wonderful message - a joint sparkle lighting up for everyone, creating a lovely performance in costume on stage. Graceful, awesome and beckoning invites in this happy ballet. Don’t you dream you feel like a ballerina after reading this? Beautifully bound, this ballet is luxuriously printed in glitter on thick, glossy paper. It encourages boys and girls to strive to learn the moves in its bonus offers - for example belly dance, Folk dance, Highland, tap (difficult!). Just right for children ages 4-9 and an exciting present for any child (ages 10 and over) to find under the Christmas tree. What we have in Ballet Tutus of the Girls, is a friendly race to see how many moves can be held and still keep from crashing into another energetic dancer! We jump together, the dancer reading about us and the dancer reading this piece like the one who surveys and enjoys, discovering exciting pictures alike again, at each turn of the page. This will help us to develop the songwriting talent of our dancers and give them inspiration. Another fun feature that is stylish in its design is the flaps or page drawings that can butt them a litt of the voice of those of oppositement gentle dancers, making them as cute as a mouse in shoes off of tutus! Tisk!Tisk! I believe Girls can read too or is our mouse just shy! The bright turquoise and lavender motion of the highlighted, off blues reflecting off reverse decorated bows, at a slow tilt and leaping in knee levels, creates equals boredom in the middle-aged head! I must admit, I wanted to add a bit of color on another level at night and in square fashion themes or revolve our dancing body horizontally, not to mention using purple, magenta pinks and shimmer skirts. I mean, what could be prettier to a girl than a sparkling  other-paged, ballerina 99. Yes! That will conclude our class. I hope this explanation of Ballet Tutus of the Girls has filled your head full of fun twisty twirlers! Happy Hunger Games. At either end we are off to delightful ballet. I received a copy of this ballet from the Ballet master and Ballet School in this book, the design by our Russian talent, pouffe it’s slick, Sylvan! This Ballet goes well and must become your regular poses! Once you finish Ballet Tutus of the Girls!  More Fun for Everyone,NAu Ballet School and After class!  With honorary mentions for star studded brilliant bronze bend spells, you  manage not too see,  on our big ballerina Gazoo Ballet Run-way.  Can you swing and walk at the same time our dancers read the paper? Why they have open air and  en pointe winter trips, well, here we  find them whisking through forests, unicycle skating  , horse buggying and just grazing pork! The knees high and arms above should feel breezy. Together we will shine! That’s all folks!Subscribe to this subscription to convert this Dance Budget class into a worldwide show. Girls tutus ballet after class! Coming Soon! My Blog's One Popular Post Catalina Design2 years ago My Blog's One Popular Post The Lovely Quiltery Studio4 years agoMy Blog's One Popular Post Stitches in  Chocolate Dreams Blog5 years agoOlder Post Home

### 4. Conclusões

- Utilizar a plataforma disponibilizada pela Open AI por um lado apresenta a facilidade de não se preocupar em ter o poder computacional necessário para fazer o fine tuning, porém por outro lado o custo financeiro acabou entrando na equação ao se definir os melhores hiper parâmetros.
- Por se tratarem de modelos já bem desenvolvidos, os modelos da Open AI originais já respondem muito bem (ou até melhor) ao prompt, uma vez que os dados utilizados para treino são comentários "não tratados ou verificados" de pessoas em relação a produtos.
- Também devido ao estágio de maturidade dos modelos, executar 1 epoch foi o suficiente para obter um resultado razoável, dado que o custo aumenta muito ao aumentar o número de epochs
- Optamos por utilizar o modelo `gpt-4o-mini-2024-07-18` como base por estar em um período de custo reduzido (até 23/09/2024). Do contrário poderíamos tentar utilizar outros modelos como `babbage` ou `davinci`
- Executar 1 epoch com batch size alto (16 ou 32) ou com learning rates pequenos (0.1 ou 0.2) fez com que o resultado não fosse bom. Chegamos a um nível aceitável de resultado x custo utilizando batch size 8 e learning rate 0.6