**This code snippet demonstrates how to generate inspirational quotes using a pre-trained GPT-2 model for fine-tuned text generation. The process involves loading a pre-trained model and fine-tuning it for generating inspirational quotes based on seed text. Here's a breakdown of the key steps in this code**

1. **Use a Pre-trained Model with a Pipeline:**
   - The code first uses the Transformers library to create a text generation pipeline. It specifies the model to be used as "noelmathewisaac/inspirational-quotes-distilgpt2." This pre-trained model is designed for generating inspirational quotes.

2. **Load Model Directly:**
   - Alternatively, the code provides instructions on how to load the model directly using its tokenizer and model architecture. This approach offers more flexibility for advanced users who may want to customize model parameters.

3. **Check for GPU Availability:**
   - The code checks for the availability of a GPU. If a GPU is available, it sets the device to "cuda" to leverage GPU acceleration; otherwise, it uses the CPU.

4. **Load Tokenizer and Model:**
   - The tokenizer and model are loaded from the pre-trained "noelmathewisaac/inspirational-quotes-distilgpt2" model. The model is loaded on the specified device (GPU or CPU).

5. **Load Preprocessed Dataset:**
   - The code loads a preprocessed dataset containing text data from a CSV file. The path to the dataset should be provided, and the specific column with text data (in this case, "first_3_words") is selected.

6. **Generate Inspirational Quotes:**
   - The code defines a function, `generate_quotes_for_each_row`, which generates inspirational quotes for each row in the dataset. It loops through the rows, uses the seed text (the text from the selected column), and generates quotes using the fine-tuned GPT-2 model. The generated quotes are stored along with their source row index.

7. **Create a DataFrame for Generated Quotes:**
   - The generated quotes are organized into a new DataFrame. Each row in the DataFrame contains a generated quote and the source row it was based on.

8. **Save Generated Quotes:**
   - Finally, the code saves the generated quotes to a new CSV file called "generated_quotes.csv."


In [2]:
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="noelmathewisaac/inspirational-quotes-distilgpt2")

In [3]:
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("noelmathewisaac/inspirational-quotes-distilgpt2")
model = AutoModelForCausalLM.from_pretrained("noelmathewisaac/inspirational-quotes-distilgpt2")

In [None]:
import random
import torch
import pandas as pd
from transformers import AutoTokenizer, AutoModelForCausalLM

# Check if a GPU is available and use it
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load the tokenizer and model on the specified device
tokenizer = AutoTokenizer.from_pretrained("noelmathewisaac/inspirational-quotes-distilgpt2")
model = AutoModelForCausalLM.from_pretrained("noelmathewisaac/inspirational-quotes-distilgpt2").to(device)

# Load the preprocessed dataset
preprocessed_data = pd.read_csv('/kaggle/input/pre-pro-quio/preprocessed_quotes_no_quote_author_category.csv')  # Replace with your dataset path

# Specify the column containing text data
text_column = "first_3_words"  # Adjust this to the actual column name containing text data

# Create an empty list to store generated quotes
generated_quotes = []


max_length = 100


def generate_quotes_for_each_row(dataframe, max_rows):
    for index, row in dataframe.iterrows():
        if index >= max_rows:
            break
        entry = row[text_column]

        seed_text = entry
        input_ids = tokenizer.encode(seed_text, return_tensors="pt").to(device)
        attention_mask = torch.ones(input_ids.shape, device=device)
        output = model.generate(input_ids, max_length=max_length, no_repeat_ngram_size=20, top_k=50, pad_token_id=model.config.eos_token_id, attention_mask=attention_mask)
        quote = tokenizer.decode(output[0], skip_special_tokens=True)

        generated_quotes.append({"Generated_Quote": quote, "Source_Row": f"Row {index + 1}"})

# Generate up to 1 quote per row for the maximum number of rows
generate_quotes_for_each_row(preprocessed_data, max_rows=len(preprocessed_data))

# Create a DataFrame from the generated quotes
generated_quotes_data = pd.DataFrame(generated_quotes)

# Save the generated quotes to a new CSV file
generated_quotes_data.to_csv('generated_quotes.csv', index=False)


**vggh