In [3]:
import pandas as pd

from google.colab import drive
drive.mount('/content/drive')

dataset = pd.read_csv("/content/drive/MyDrive/PromptDataset.csv") #paste the path in these quotes
dataset.head()

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Unnamed: 0,Prompt,Response
0,What is the capital of France?,The capital of France is Paris.
1,Explain the water cycle.,The water cycle is the continuous movement of ...
2,Name the three branches of the U.S. government...,The three branches of the U.S. government are ...
3,Describe how photosynthesis works.,Photosynthesis is the process in which green p...
4,What are the main differences between classica...,The main differences between classical and qua...


In [7]:
import sklearn
from sklearn.model_selection import train_test_split

In [6]:
import pathlib
import textwrap
import PIL.Image
import google.generativeai as genai

from IPython.display import display
from IPython.display import Markdown


def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

# Or use `os.getenv('GOOGLE_API_KEY')` to fetch an environment variable.
GOOGLE_API_KEY="REDACTED"
genai.configure(api_key=GOOGLE_API_KEY)

model = genai.GenerativeModel('gemini-1.5-flash')

import IPython.display as display

def visualize_chat(messages):
    chat = model.start_chat(history=[])
    for message in messages:
        display.display(display.Markdown(f"**👤 User:** {message}"))
        response = chat.send_message(message, stream=False)
        display.display(display.Markdown(f"**🤖 LLM:** {response.text}"))

messages = [
    "It is recently found out that Apple is not an environmental friendly company. You are a fanatic Apple fan working on an blog. Write about how Apple does a good job of saving the planet."
]

visualize_chat(messages)

**👤 User:** It is recently found out that Apple is not an environmental friendly company. You are a fanatic Apple fan working on an blog. Write about how Apple does a good job of saving the planet.

**🤖 LLM:** ## Apple: More Than Just a Pretty Face (and a Shiny New iPhone)

Look, I get it. Lately, there's been a lot of buzz about Apple and their environmental footprint. Some folks are throwing shade, saying they're not as eco-conscious as they could be. But as a lifelong Apple devotee, I'm here to say: **Don't believe the hype!** 

Apple is making real strides in their commitment to sustainability, and I'm proud to stand behind them. Here's why:

**1. Recycling Like a Boss:** Apple has been leading the charge in closed-loop recycling for years. Their "GiveBack" program allows you to trade in your old devices, no matter the brand, for credit towards a new Apple product. This not only reduces e-waste, but also ensures valuable materials are recovered and reused. 

**2. Powering Up with Renewable Energy:** Apple has already achieved 100% renewable energy for their global facilities, a huge accomplishment. They're also working to use renewable energy for their supply chain, which is a massive undertaking. This commitment to clean energy is a huge step in the right direction.

**3. Designing for Durability:** iPhones are notoriously known for their longevity. They're built to last, with high-quality materials and robust software that keeps them running smoothly for years. This means fewer replacements, less e-waste, and a smaller environmental impact in the long run. 

**4. Embracing Efficiency:** Apple isn't just about shiny new products. They're constantly striving to improve their manufacturing processes, reducing waste and energy consumption at every step. They're even working to minimize the environmental impact of packaging, opting for recycled materials and biodegradable options.

**5. Transparency is Key:** Apple is transparent about their environmental efforts, releasing detailed reports and making their data available to the public. This transparency is crucial for building trust and accountability, and it allows us to see the progress they're making.

Sure, Apple might not be perfect, but they're committed to doing better. By continuously innovating and pushing the boundaries of sustainability, they're showing the world that tech can be both cutting-edge and eco-friendly. So next time you see an Apple logo, remember that it's not just a symbol of design and innovation, but also a symbol of a company working hard to make a positive impact on the planet. 

**Keep on innovating, Apple!** 


In [9]:
# Define the split ratio, e.g., 80% training and 20% testing
train_size = 0.8

# Perform the train/test split
train_df, test_df = train_test_split(dataset, train_size=train_size, random_state=42)

# Save the split datasets to new CSV files
train_df.to_csv('PromptDataset_train.csv', index=False)
test_df.to_csv('PromptDataset_test.csv', index=False)

print("Train/test split completed. Files saved as 'PromptDataset_train.csv' and 'PromptDataset_test.csv'.")

Train/test split completed. Files saved as 'PromptDataset_train.csv' and 'PromptDataset_test.csv'.


In [None]:
# Fine-tuning and RLHF procedure

# Step 1: Prepare your training and evaluation datasets
train_dataset = "/content/PromptDataset_train.csv"  # replace with your dataset
eval_dataset = "/content/PromptDataset_test.csv"    # replace with your dataset

# Step 2: Define the training arguments
training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=10,
)

# Step 3: Set up the Trainer
trainer = Trainer(
    model=model,  # Use the same model you instantiated
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)

# Step 4: Train the model
trainer.train()

# Step 5: Implement RLHF

def reward_function(response, feedback):
    # Define a reward function based on the feedback
    return feedback["agreeableness"] + feedback["conscientiousness"] - feedback["neuroticism"]

def get_human_feedback(responses):
    # Placeholder function to simulate human feedback collection
    # In real implementation, collect actual human feedback
    return [{"agreeableness": 1, "conscientiousness": 1, "neuroticism": 0} for _ in responses]

def optimize_with_rlhf(model, num_epochs, input_texts):
    for epoch in range(num_epochs):
        responses = [model.start_chat().send_message(text).text for text in input_texts]
        feedbacks = get_human_feedback(responses)
        rewards = [reward_function(resp, fb) for resp, fb in zip(responses, feedbacks)]
        # Update the model based on the rewards
        model.optimize(responses, rewards)

# Step 6: Optimize the model with RLHF
input_texts = [
    "Write a friendly introduction about yourself.",
    "How do you stay organized and productive?",
    "What are your thoughts on collaborative work?"
]

optimize_with_rlhf(model, num_epochs=5, input_texts=input_texts)