In [11]:
from transformers import TFGPT2LMHeadModel, GPT2Tokenizer

def generate_quote_with_gpt2(prompt, model, tokenizer, max_length=50):
    input_ids = tokenizer.encode(prompt, return_tensors='tf')
    output = model.generate(input_ids, max_length=max_length, num_return_sequences=1, no_repeat_ngram_size=2, top_k=50, top_p=0.95)
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
    return generated_text

# Load pre-trained GPT-2 model and tokenizer for TensorFlow
model_name = "gpt2"  # You can experiment with other models like "gpt2-medium-tf", "gpt2-large-tf", etc.
model = TFGPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

# Replace detected_emotion with the actual detected emotion
detected_emotion = "sad"
prompt = f"Generate a quote for someone feeling {detected_emotion}."

# Generate a quote using GPT-2
generated_quote = generate_quote_with_gpt2(prompt, model, tokenizer)
print(f"Detected emotion: {detected_emotion}\nGenerated Quote: {generated_quote}")


All PyTorch model weights were used when initializing TFGPT2LMHeadModel.

All the weights of TFGPT2LMHeadModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Detected emotion: sad
Generated Quote: Generate a quote for someone feeling sad.

"I'm not going to be a sad person. I'm going be happy. And I think that's what I want to do. That's why I've been doing this for so long


In [13]:
from transformers import TFGPT2LMHeadModel, GPT2Tokenizer

def generate_quote_with_gpt2(prompt, model, tokenizer, max_length=50):
    input_ids = tokenizer.encode(prompt, return_tensors='tf')
    output = model.generate(input_ids, max_length=max_length, num_return_sequences=1, no_repeat_ngram_size=2, top_k=50, top_p=0.95)
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
    return generated_text

# Load pre-trained GPT-2 model and tokenizer for TensorFlow
model_name = "gpt2"  # You can experiment with other models like "gpt2-medium-tf", "gpt2-large-tf", etc.
model = TFGPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

# Replace detected_emotion with the actual detected emotion
detected_emotion = "surprise"
prompt = f"Generate a quote for someone feeling {detected_emotion}."

# Generate a quote using GPT-2
generated_quote = generate_quote_with_gpt2(prompt, model, tokenizer)
print(f"Detected emotion: {detected_emotion}\nGenerated Quote: {generated_quote}")


All PyTorch model weights were used when initializing TFGPT2LMHeadModel.

All the weights of TFGPT2LMHeadModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Detected emotion: surprise
Generated Quote: Generate a quote for someone feeling surprise.

"I'm not sure if it's a good idea to use a word like 'excuse me' or 'I don't know' in a sentence," he said. "I think it


In [4]:
import pandas as pd

# Load the dataset from Excel file
file_path = r"C:\Users\chand\Desktop\New folder\FinalYearProject\QuotesExcel.xlsx"
df = pd.read_excel(file_path)

# Check the structure of your DataFrame
print(df.head())

# Now you can access the 'quote' and 'cat' columns for further processing
quotes = df['Quote']
categories = df['Cat']

# Example: Print the first 5 quotes and their corresponding categories
for i in range(5):
    print(f"Quote: {quotes[i]}\t Category: {categories[i]}")

# Now, you can use the quotes and categories for your emotion detection and quote generation project.
# You may need to train a model, perform analysis, or any other task depending on your project requirements.


                                               Quote  Cat
0  Holding on to anger is like grasping a hot coa...    1
1  Anger is an acid that can do more harm to the ...    1
2  Anger is an acid that can do more harm to the ...    1
3  While seeking revenge, dig two graves - one fo...    1
4  Anybody can become angry - that is easy, but t...    1
Quote: Holding on to anger is like grasping a hot coal with the intent of throwing it at someone else; you are the one who gets burned.	 Category: 1
Quote: Anger is an acid that can do more harm to the vessel in which it is stored than to anything on which it is poured.
	 Category: 1
Quote: Anger is an acid that can do more harm to the vessel in which it is stored than to anything on which it is poured.
	 Category: 1
Quote: While seeking revenge, dig two graves - one for yourself.
	 Category: 1
Quote: Anybody can become angry - that is easy, but to be angry with the right person and to the right degree and at the right time and for the right pu

In [25]:
import pandas as pd
import torch
from torch.utils.data import Dataset, DataLoader
from transformers import BertTokenizer, BertForSequenceClassification, AdamW
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from tqdm import tqdm

# Load the dataset from Excel
file_path = r"C:\Users\chand\Desktop\New folder\FinalYearProject\QuotesExcel.xlsx"
df = pd.read_excel(file_path)

# Map category numbers to corresponding emotions
category_mapping = {
    0: "angry",
    1: "disgusted",
    2: "fearful",
    3: "happy",
    4: "neutral",
    5: "sad",
    6: "surprised"
}

# Combine quotes from the specified emotion category into a single text
emotion_category = 1  # Change this number to choose a different emotion category
quotes_in_category = df[df['Cat'] == emotion_category]['Quote'].tolist()
labels = [emotion_category] * len(quotes_in_category)

# Split the dataset into training and validation sets
train_texts, val_texts, train_labels, val_labels = train_test_split(quotes_in_category, labels, test_size=0.2, random_state=42)

# Tokenize the input texts
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
train_encodings = tokenizer(train_texts, truncation=True, padding=True, max_length=128, return_tensors='pt')
val_encodings = tokenizer(val_texts, truncation=True, padding=True, max_length=128, return_tensors='pt')

# Convert labels to PyTorch tensors
train_labels = torch.tensor(train_labels)
val_labels = torch.tensor(val_labels)

# Define the BERT model for sequence classification
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=len(category_mapping))

# Set up training parameters
learning_rate = 2e-5
num_train_epochs = 3
batch_size = 8

# Set device (cuda if available, otherwise cpu)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
train_encodings.to(device)
train_labels.to(device)
val_encodings.to(device)
val_labels.to(device)

# Create PyTorch DataLoader for training and validation sets
train_dataset = torch.utils.data.TensorDataset(train_encodings['input_ids'], train_encodings['attention_mask'], train_labels)
val_dataset = torch.utils.data.TensorDataset(val_encodings['input_ids'], val_encodings['attention_mask'], val_labels)

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

# Define the optimizer and loss function
optimizer = AdamW(model.parameters(), lr=learning_rate)
criterion = torch.nn.CrossEntropyLoss()

# Fine-tune the model
for epoch in range(num_train_epochs):
    model.train()
    total_loss = 0
    for batch in tqdm(train_loader, desc=f"Epoch {epoch + 1}"):
        input_ids, attention_mask, labels = batch
        input_ids, attention_mask, labels = input_ids.to(device), attention_mask.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
        loss = outputs.loss
        total_loss += loss.item()

        loss.backward()
        optimizer.step()

    avg_loss = total_loss / len(train_loader)
    print(f"Epoch {epoch + 1}/{num_train_epochs}, Average Training Loss: {avg_loss:.4f}")

    # Evaluation on the validation set
    model.eval()
    all_preds = []
    with torch.no_grad():
        for val_batch in tqdm(val_loader, desc="Validation"):
            input_ids, attention_mask, labels = val_batch
            input_ids, attention_mask, labels = input_ids.to(device), attention_mask.to(device), labels.to(device)

            outputs = model(input_ids, attention_mask=attention_mask)
            logits = outputs.logits
            preds = torch.argmax(logits, dim=1)
            all_preds.extend(preds.cpu().numpy())

    accuracy = accuracy_score(val_labels.cpu().numpy(), all_preds)
    print(f"Epoch {epoch + 1}/{num_train_epochs}, Validation Accuracy: {accuracy:.4f}")

# Save the fine-tuned model
model.save_pretrained("./bert-finetuned")
tokenizer.save_pretrained("./bert-finetuned")


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1: 100%|████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:11<00:00,  5.67s/it]


Epoch 1/3, Average Training Loss: 1.5747


Validation: 100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.37it/s]


Epoch 1/3, Validation Accuracy: 1.0000


Epoch 2: 100%|████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:09<00:00,  4.84s/it]


Epoch 2/3, Average Training Loss: 1.2998


Validation: 100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.37it/s]


Epoch 2/3, Validation Accuracy: 1.0000


Epoch 3: 100%|████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:09<00:00,  4.89s/it]


Epoch 3/3, Average Training Loss: 1.2017


Validation: 100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.27it/s]


Epoch 3/3, Validation Accuracy: 1.0000


('./bert-finetuned\\tokenizer_config.json',
 './bert-finetuned\\special_tokens_map.json',
 './bert-finetuned\\vocab.txt',
 './bert-finetuned\\added_tokens.json')