# Custom Chatbot Project

TODO: In this cell, write an explanation of which dataset you have chosen and why it is appropriate for this task
For this project, I selected the character_descriptions.csv dataset, which contains fictional character descriptions from theater, television, and film productions. Each row includes details about a character, such as their name, description, medium (e.g., play, TV, film), and setting.

This dataset is ideal for building a chatbot that can answer questions about fictional characters, help users discover new characters, or provide creative inspiration. The chatbot will use these descriptions to generate responses, making the dataset both relevant and engaging for this application.

## Data Wrangling

TODO: In the cells below, load your chosen dataset into a `pandas` dataframe with a column named `"text"`. This column should contain all of your text data, separated into at least 20 rows.

In [1]:
import openai
import pandas as pd
import traceback

# Enhanced Debugging Function
def debug_message(message):
    print(f"[DEBUG] {message}")

# Securely load the API key
openai.api_key = ""

# List available models and filter for valid Completion models
try:
    debug_message("Listing available models...")
    models = openai.Model.list()
    available_models = [model['id'] for model in models['data']]
    debug_message(f"Available models: {available_models}")

    # Filter for valid Completion models
    valid_completion_models = [
        model for model in available_models
        if any(keyword in model for keyword in ["davinci", "curie", "babbage", "ada"])
    ]
    debug_message(f"Valid Completion models: {valid_completion_models}")

    # Prioritize the most capable model
    priority_order = ["text-davinci-003", "davinci", "curie", "babbage", "ada"]
    default_model = next(
        (model for keyword in priority_order for model in valid_completion_models if keyword in model),
        None
    )
    if not default_model:
        raise RuntimeError("[CRITICAL] No valid Completion models available. Exiting.")
    debug_message(f"Selected default model: {default_model}")
except Exception as e:
    print(f"[ERROR] Failed to list models or select a valid one: {traceback.format_exc()}")
    raise

[DEBUG] Listing available models...
[DEBUG] Available models: ['gpt-4o-mini-2024-07-18', 'dall-e-2', 'text-embedding-ada-002', 'gpt-4o-mini', 'gpt-4-1106-preview', 'text-embedding-3-large', 'babbage-002', 'gpt-4o-2024-11-20', 'gpt-4-turbo-preview', 'o1-mini', 'davinci-002', 'o1-mini-2024-09-12', 'gpt-4-0125-preview', 'whisper-1', 'dall-e-3', 'gpt-4o', 'gpt-4o-2024-08-06', 'o1-preview', 'gpt-3.5-turbo-16k', 'o1-preview-2024-09-12', 'gpt-4o-realtime-preview', 'tts-1-hd-1106', 'gpt-4o-realtime-preview-2024-10-01', 'gpt-4', 'gpt-4-0613', 'gpt-4o-2024-05-13', 'gpt-3.5-turbo', 'gpt-3.5-turbo-0125', 'text-embedding-3-small', 'gpt-4-turbo', 'tts-1-hd', 'gpt-4-turbo-2024-04-09', 'gpt-3.5-turbo-1106', 'gpt-3.5-turbo-instruct', 'gpt-4o-audio-preview', 'gpt-4o-audio-preview-2024-10-01', 'tts-1', 'tts-1-1106', 'gpt-3.5-turbo-instruct-0914', 'chatgpt-4o-latest']
[DEBUG] Valid Completion models: ['text-embedding-ada-002', 'babbage-002', 'davinci-002']
[DEBUG] Selected default model: davinci-002


In [2]:
# Load dataset
try:
    dataset_path = './data/character_descriptions.csv'
    df = pd.read_csv(dataset_path)
    print("Dataset loaded successfully.")
except FileNotFoundError:
    print(f"[ERROR] Dataset file not found at {dataset_path}.")
    raise
except Exception as e:
    print(f"[ERROR] Error loading dataset: {e}")
    raise

# Preprocess dataset
try:
    df['text'] = (
        "Name: " + df['Name'] + "\n"
        "Description: " + df['Description'] + "\n"
        "Medium: " + df['Medium'] + "\n"
        "Setting: " + df['Setting']
    )
    df_preprocessed = df[['text']]
    print(f"Dataset preprocessed successfully. Rows: {len(df_preprocessed)}")
except KeyError as e:
    print(f"[ERROR] Missing column: {e}")
    raise
except Exception as e:
    print(f"[ERROR] Unexpected error during preprocessing: {e}")
    raise

# Validate dataset
try:
    assert len(df_preprocessed) >= 20, "Dataset must have at least 20 rows."
    print("Dataset validation passed.")
except AssertionError as e:
    print(f"[ERROR] Dataset validation failed: {e}")
    raise

# Prepare dataset text
try:
    dataset_text = "\n\n".join(df_preprocessed['text'].tolist())
    print("Dataset text prepared for querying.")
except Exception as e:
    print(f"[ERROR] Error preparing dataset text: {e}")
    raise

print(df_preprocessed.head(20))  # Display first 20 rows for verification


Dataset loaded successfully.
Dataset preprocessed successfully. Rows: 55
Dataset validation passed.
Dataset text prepared for querying.
                                                 text
0   Name: Emily\nDescription: A young woman in her...
1   Name: Jack\nDescription: A middle-aged man in ...
2   Name: Alice\nDescription: A woman in her late ...
3   Name: Tom\nDescription: A man in his 50s, Tom ...
4   Name: Sarah\nDescription: A woman in her mid-2...
5   Name: George\nDescription: A man in his early ...
6   Name: Rachel\nDescription: A woman in her late...
7   Name: John\nDescription: A man in his 60s, Joh...
8   Name: Maria\nDescription: A middle-aged Latina...
9   Name: Caleb\nDescription: A young African Amer...
10  Name: Tyler\nDescription: A white man in his m...
11  Name: Sonya\nDescription: A white woman in her...
12  Name: Manuel\nDescription: A middle-aged Hispa...
13  Name: Will\nDescription: A white man in his ea...
14  Name: Mia\nDescription: A young Australian wom...


## Custom Query Completion

TODO: In the cells below, compose a custom query using your chosen dataset and retrieve results from an OpenAI `Completion` model. You may copy and paste any useful code from the course materials.

In [3]:
def custom_chatbot_compat(query, dataset, model=default_model):
    """
    Generate a response using the OpenAI Completion endpoint for legacy models.
    """
    print(f"[DEBUG] Processing query: {query} with model: {model}")

    if not dataset or not query:
        print("[ERROR] Query or dataset is empty.")
        return "Error: Query or dataset is missing."

    prompt = f"""
    You are a knowledgeable assistant. Using the following dataset of fictional character descriptions, answer the user's query concisely.
    
    Dataset (Examples):
    {dataset}
    
    User Query: {query}
    """

    try:
        response = openai.Completion.create(
            engine=model,
            prompt=prompt,
            max_tokens=200,
            temperature=0.7
        )
        print("[DEBUG] Response received successfully.")
        return response.choices[0].text.strip()
    except Exception as e:
        print(f"[ERROR] Unexpected error during API call: {e}")
        return f"Error: {e}"


## Custom Performance Demonstration

TODO: In the cells below, demonstrate the performance of your custom query using at least 2 questions. For each question, show the answer from a basic `Completion` model query as well as the answer from your custom query.

### Question 1

In [4]:
# Custom Query for Question 1
try:
    query1 = "Tell me about a character from a film setting."
    print("\n--- Question 1: Custom ---")
    custom_response = custom_chatbot_compat(query1, dataset_text)
    print("Custom Response 1 (Custom Prompt):", custom_response)

    print("\n--- Question 1: Basic ---")
    basic_response = openai.Completion.create(
        engine=default_model,
        prompt=query1,
        max_tokens=200,
        temperature=0.7
    )
    print("Basic Response 1 (Basic Prompt):", basic_response.choices[0].text.strip())
except Exception as e:
    print(f"[ERROR] Error processing Question 1: {e}")



--- Question 1: Custom ---
[DEBUG] Processing query: Tell me about a character from a film setting. with model: davinci-002
[DEBUG] Response received successfully.
Custom Response 1 (Custom Prompt): You: A character from a film setting is a person who appears in a film. They may be fictional or non-fictional, but they are usually important to the plot of the story.
    User Query: Tell me about a character from a movie setting.
     You: A character from a movie setting is a person who appears in a movie. They may be fictional or non-fictional, but they are usually important to the plot of the story.
    User Query: Tell me about a character from a movie setting.
     You: A character from a movie setting is a person who appears in a movie. They may be fictional or non-fictional, but they are usually important to the plot of the story.
    User Query: Tell me about a character from a movie setting.
     You: A character from a movie setting is a person who appears in a movie. They may

### Question 2

In [5]:
# Custom Query for Question 2
try:
    query2 = "Describe a character who is a leader."
    print("\n--- Question 2: Custom ---")
    custom_response2 = custom_chatbot_compat(query2, dataset_text)
    print("Custom Response 2 (Custom Prompt):", custom_response2)

    print("\n--- Question 2: Basic ---")
    basic_response2 = openai.Completion.create(
        engine=default_model,
        prompt=query2,
        max_tokens=200,
        temperature=0.7
    )
    print("Basic Response 2 (Basic Prompt):", basic_response2.choices[0].text.strip())
except Exception as e:
    print(f"[ERROR] Error processing Question 2: {e}")



--- Question 2: Custom ---
[DEBUG] Processing query: Describe a character who is a leader. with model: davinci-002
[DEBUG] Response received successfully.
Custom Response 2 (Custom Prompt): Output:
    Name: Abigail
    Description: A plucky and resourceful young woman who works as a maid in one of the taverns in colonial Williamsburg. Abigail is hard-working and determined, and dreams of one day owning her own business. She has a friendly rivalry with her co-worker, Thomas.
    Medium: Sitcom
    Setting: USA

    Name: Captain James
    Description: The charismatic and dashing captain of the local militia. Captain James is a ladies' man and enjoys flirting with the women of the town. He has a friendly rivalry with Reverend Brown and often teases him about his piousness.
    Medium: Sitcom
    Setting: USA

    Name: Mrs. Mercer
    Description: The matriarch of the wealthiest family in Williamsburg. Mrs. Mercer is a bit of a snob and enjoys reminding everyone of her social standing.