## Reverse Neutralization Example to generate a dataset for fine-tuning

In [1]:
# pip install --upgrade --quiet  openai python-dotenv tqdm

In [2]:
import json
import os
import time
from dotenv import load_dotenv
from openai import OpenAI
from tqdm.auto import tqdm

load_dotenv("../saved_keys.env")

assert os.environ["OPENAI_API_KEY"][:2] == "sk",\
       "Please sign up for access to the OpenAI API and provide access token in keys.env file"

In [3]:
# Initialize the OpenAI client
client = OpenAI(
    api_key=os.environ["OPENAI_API_KEY"]
)

In [4]:
# read the collection of emails
emails = []
with open("hawaiian_emails.jsonl", "r") as f:
    for line in f:
        emails.append(json.loads(line))


In [5]:
# Neutralize the emails

prompt = """
Neutralize the tone and style from the following email to make it professional and suitable for communication between executives who may not know each other very well.

{email}
"""

neutralized_emails = []

for email in tqdm(emails):
    prompt_with_email = prompt.format(email=email)
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt_with_email}]
    )

    neutralized_emails.append(response.choices[0].message.content)


  0%|          | 0/198 [00:00<?, ?it/s]

### Generate the dataset for fine-tuning

In [6]:
dataset = []

for email, neutralized_email in zip(emails, neutralized_emails):
    dataset.append({
        "messages": [
            {"role": "system", "content": "You are a helpful assistant converting the neutralized email into personalized email."},
            {"role": "user", "content": neutralized_email},
            {"role": "assistant", "content": email}
        ]
    })

# write out the dataset to a jsonl file
with open("dataset.jsonl", "w") as f:
    for item in dataset:
        f.write(json.dumps(item) + "\n")


In [7]:
# show comparison the neutralized email and the personalized email, limit example to 1

print(f"Neutralized email: {neutralized_emails[0]}")
print("-"*100)
print(f"Personalized email: {emails[0]}")


Neutralized email: Subject: Summary of Marketing Team Meeting - {date}

Dear Marketing team,

I trust this email finds you well. I wanted to provide a brief summary of our recent team meeting held on {date} at {time} in {location}.

In our discussions regarding the Q2 roadmap, we came to agreements on:
- Prioritizing key objectives for the upcoming quarter
- Identifying challenges and exploring potential solutions
- Updating the timeline and outlining deliverables

Moving forward, we plan to:
- Arrange a follow-up meeting for next week
- Request that all updated documentation be shared by Friday

Should you have any queries or if any crucial points were omitted, please do not hesitate to reach out.

Warm regards,

Evelyn
----------------------------------------------------------------------------------------------------
Personalized email: Subject: Aloha from the Marketing Team Meeting Summary - {date}

Body: Aloha Marketing team,

I hope this email finds you all in good spirits. I'm w

In [8]:
# Debug: Check current files before upload
print("Current files in OpenAI:")
try:
    files = client.files.list()
    for file in files.data[:5]:  # Show first 5 files
        print(f"  - {file.id}: {file.filename} ({file.purpose}) - {file.status}")
    print(f"Total files: {len(files.data)}")
except Exception as e:
    print(f"Error listing files: {e}")

# Also check if dataset.jsonl exists locally
import os
if os.path.exists("dataset.jsonl"):
    file_size = os.path.getsize("dataset.jsonl")
    print(f"Local dataset.jsonl exists - Size: {file_size} bytes")
else:
    print("ERROR: dataset.jsonl not found locally!")


Current files in OpenAI:
  - file-FcXPJovX8KHTQiEmNN3sen: 9797739828e95409.jsonl (fine-tune) - processed
  - file-XIoQr4yA8PiaZ7rLbzO3pgdx: parties.csv (assistants) - processed
  - file-MshL6yZB8ORDEct3iH19mKyu: global-parties.json (assistants) - processed
  - file-RRK6DEb4oAnAqFf4PLXAAUPm: compiled_results.csv (fine-tune-results) - processed
  - file-YDeBQK0v4BRWMzQiIsxvHSLi: compiled_results.csv (fine-tune-results) - processed
Total files: 31
Local dataset.jsonl exists - Size: 257313 bytes


### Observation:

The model is able to handle the task, but it fails on the correct placeholder tags or isn't hitting the correct tone you prefer. Let's fix those issues by fine-tuning a model.


In [9]:
# Upload the training file
training_file = client.files.create(
    file=open("dataset.jsonl", "rb"),
    purpose="fine-tune"
)

# Print the file info to confirm upload was successful
print(f"File uploaded successfully!")
print(f"File ID: {training_file.id}")
print(f"File status: {training_file.status}")
print(f"File bytes: {training_file.bytes}")
print(f"File purpose: {training_file.purpose}")

# Wait a moment for the file to be processed
import time
time.sleep(5)

# Verify the file exists by listing files
files = client.files.list()
uploaded_file_exists = any(file.id == training_file.id for file in files.data)
print(f"File exists in OpenAI: {uploaded_file_exists}")

File uploaded successfully!
File ID: file-Fz9PnC3tqdYjQrHG57tER8
File status: processed
File bytes: 257313
File purpose: fine-tune
File exists in OpenAI: True


In [10]:

# Create a fine-tuning job
job = client.fine_tuning.jobs.create(
    training_file=training_file.id,
    model="gpt-3.5-turbo"  # Base model to fine-tune
)


In [11]:
# Continuously check the status of the fine-tuning job
while True:
    job_status = client.fine_tuning.jobs.retrieve(job.id)
    print(f"Job status: {job_status.status}")

    if job_status.status in ['succeeded', 'failed']:
        break

    print("Waiting 120 seconds...")
    time.sleep(120)

if job_status.status == 'succeeded':
    print(f"Fine-tuning complete! You can now use model: {job_status.fine_tuned_model}")
else:
    print("Fine-tuning failed. Check the job status for more information.")

# Once the job is complete, you can use the fine-tuned model
# The fine-tuned model ID will be available in job_status.fine_tuned_model

Job status: validating_files
Waiting 120 seconds...
Job status: running
Waiting 120 seconds...
Job status: running
Waiting 120 seconds...
Job status: running
Waiting 120 seconds...
Job status: running
Waiting 120 seconds...
Job status: running
Waiting 120 seconds...
Job status: running
Waiting 120 seconds...
Job status: running
Waiting 120 seconds...
Job status: running
Waiting 120 seconds...
Job status: running
Waiting 120 seconds...
Job status: running
Waiting 120 seconds...
Job status: running
Waiting 120 seconds...
Job status: running
Waiting 120 seconds...
Job status: running
Waiting 120 seconds...
Job status: running
Waiting 120 seconds...
Job status: succeeded
Fine-tuning complete! You can now use model: ft:gpt-3.5-turbo-0125:digits::BqTY2RHl


In [12]:
# Now we can use the fine-tuned model to generate the email with neutralized_emails[0]
# Generally it is ad practise to test the model with a sample input from the training data,
# but we want to check the output of the fine-tuned model.

# Test the fine-tuned model with a sample input neutralized_emails[0]
completion = client.chat.completions.create(
    model=job_status.fine_tuned_model,  # Use the fine-tuned model
    messages=[
        {"role": "system", "content": "You are a helpful assistant converting the neutralized email into personalized email."},
        {"role": "user", "content": neutralized_emails[20]},]
)

# Print the generated response
print("Generated personalized email:")
print(completion.choices[0].message.content)

Generated personalized email:
Subject: Feedback on mobile app

Body: Aloha {name},

Mahalo for sharing the mobile app with me. I've reviewed it and would like to provide some constructive feedback.

Here are my main observations:
- Reviewed current progress and milestones
- Aligned on priorities for the next quarter
- Discussed challenges and potential solutions

I believe implementing these suggestions would further strengthen the mobile app. Please let me know if you'd like to talk about any of these points in more detail.

Warm aloha,  
{name}


In [13]:
# Now we can use the fine-tuned model to generate the email

test_email = """
Subject: Request for Project Timeline Update

Body: Hi Sam,

I am writing to request an update on the project timeline. Please provide the update by the end of the day, as it is important for our upcoming steps.

Thank you.

Best,
Alex
"""

# Test the fine-tuned model with a sample input
completion = client.chat.completions.create(
    model=job_status.fine_tuned_model,  # Use the fine-tuned model
    messages=[
        {"role": "system", "content": "You are a helpful assistant converting the neutralized email into personalized email."},
        {"role": "user", "content": test_email}
    ]
)

# Print the generated response
print("Generated personalized email:")
print(completion.choices[0].message.content)


Generated personalized email:
Subject: Request for project timeline update

Body: Aloha Sam,

I hope you’re doing well. I’m reaching out because I need an update on the project timeline.

Could you please share this with me by the end of the day? It’s needed for our next steps.

Mahalo in advance for your help.

Best,  
Alex


## Inference Example

In [14]:
prompt = """
Write a short email to Gretl inviting her to give a presentation on the marketing campaign around the 2026 FIFA World Cup.
"""

response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a helpful assistant writing letters suitable for communication between executives."},
            {"role": "user", "content": prompt}]
    )

neutral_email = response.choices[0].message.content

print(neutral_email)

Subject: Invitation to Present on 2026 FIFA World Cup Marketing Campaign

Dear Gretl,

I hope this email finds you well. I am writing to invite you to give a presentation on the marketing campaign surrounding the 2026 FIFA World Cup. Your expertise and insight would greatly benefit our team in understanding the strategies and efforts being implemented for this prestigious event.

We believe that your presentation will be instrumental in providing valuable information and inspiration as we work towards our own marketing objectives.

Please let me know your availability and preferred dates for the presentation. We are flexible and will accommodate your schedule accordingly.

Thank you in advance for considering this invitation. We are looking forward to hearing from you soon.

Best regards,

[Your Name]
[Your Title]
[Company Name]


In [15]:
completion = client.chat.completions.create(
    model=job_status.fine_tuned_model,  # Use the fine-tuned model
    messages=[
        {"role": "system", "content": "You are a helpful assistant converting the neutralized email into personalized email."},
        {"role": "user", "content": neutral_email}
    ]
)

# Print the generated response
print("Generated personalized email:")
print(completion.choices[0].message.content)

Generated personalized email:
Subject: Aloha and invitation to present on 2026 FIFA World Cup marketing campaign

Body: Aloha Gretl,

I hope this email finds you in good spirits. I’m reaching out because I’d love for you to share your mana‘o on the marketing campaign for the 2026 FIFA World Cup.

I believe your perspective would be incredibly valuable for our team, helping us gain a better understanding of the strategies and initiatives in place for this exciting event.

I’m sure your presentation would be both insightful and inspiring, and would help us further our own marketing efforts.

Could you please let me know your availability to present? I’m flexible and can work around your schedule.

Mahalo in advance for considering my request. I’m really looking forward to hearing from you.

Warmest aloha,  
[Your Name]  
[Your Title]  
[Company Name]
