## Custom Chatbot

- **Dataset:** `data/character_descriptions.csv` This dataset contains character descriptions from various theater, television, and film productions. Each entry includes details such as character name, description, medium, and setting. All characters were originally generated by an OpenAI model.
- **Objective:** Build a chatbot that provides character recommendations based on personality traits or other query inputs, such as MBTI types, using the given dataset. This allows users to discover characters with specific personality traits or characteristics matching their preferences.
- **OpenAI Version Compatibility:** The code has been tested with `OpenAI` versions `0.28.0` and `1.52.1`, ensuring compatibility across these versions.

#### Initialize OpenAI Client

This cell sets up the OpenAI client by defining the API base URL and retrieving the API key from the environment. Make sure to set your `OPENAI_API_KEY` as an environment variable before running this cell.

- `api_base`: The base URL for the OpenAI API.
- `api_key`: API key retrieved from the environment (can also be directly set as a string).
- `client`: OpenAI client instance to use throughout the code.

In [1]:
import os
from openai_api_wrapper import OpenAIClient


api_base = "https://openai.vocareum.com/v1"
api_key = os.environ.get("OPENAI_API_KEY")

# create client
client = OpenAIClient(api_base, api_key)

OpenAI version: 1.52.1


#### Preprocess and Embed Data

This cell loads character descriptions, preprocesses the text data, and generates embeddings using the OpenAI client. If the embedding file already exists, you may skip this cell and proceed with the next step.

- `input_filename`: Filename for the input character descriptions CSV file.
- `output_filename`: Filename for the output embeddings CSV file, incorporating the OpenAI client version for tracking.
- `show_preview`: If `True`, displays a preview of embeddings.

After running this cell, check `data/{output_filename}.csv` for the processed embeddings.

In [2]:
import pandas as pd
from data_processor import preprocess_data
from embedding_generator import embed_dataframe


# 'character_descriptions.csv': contains character descriptions from theater, television, and film productions
input_filename = 'character_descriptions'
output_filename = f'{input_filename}_embeddings_v{client.version_index}'
show_preview = True


# load the data
input_path = f'data/{input_filename}.csv'
df = pd.read_csv(input_path)

# show all columns
pd.set_option("display.max_colwidth", 100)
display(df.head(5))

# preprocess the data
df = preprocess_data(df)
display(df.head(5))

# Embed the data
output_path = f'data/{output_filename}.csv'
embed_dataframe(df, output_path, client.get_embeddings, show_preview=show_preview)

Unnamed: 0,Name,Description,Medium,Setting
0,Emily,"A young woman in her early 20s, Emily is an aspiring actress and Alice's daughter. She has a bub...",Play,England
1,Jack,"A middle-aged man in his 40s, Jack is a successful businessman and Sarah's boss. He has a no-non...",Play,England
2,Alice,"A woman in her late 30s, Alice is a warm and nurturing mother of two, including Emily. She's kin...",Play,England
3,Tom,"A man in his 50s, Tom is a retired soldier and John's son. He has a no-nonsense approach to life...",Play,England
4,Sarah,"A woman in her mid-20s, Sarah is a free-spirited artist and Jack's employee. She's creative, unc...",Play,England


Unnamed: 0,text
0,"Name: Emily; Medium: Play; Setting: England; Description: A young woman in her early 20s, Emily ..."
1,"Name: Jack; Medium: Play; Setting: England; Description: A middle-aged man in his 40s, Jack is a..."
2,"Name: Alice; Medium: Play; Setting: England; Description: A woman in her late 30s, Alice is a wa..."
3,"Name: Tom; Medium: Play; Setting: England; Description: A man in his 50s, Tom is a retired soldi..."
4,"Name: Sarah; Medium: Play; Setting: England; Description: A woman in her mid-20s, Sarah is a fre..."


                                                                                                                                                    text  \
0  Name: Emily; Medium: Play; Setting: England; Description: A young woman in her early 20s, Emily is an aspiring actress and Alice's daughter. She h...   
1  Name: Jack; Medium: Play; Setting: England; Description: A middle-aged man in his 40s, Jack is a successful businessman and Sarah's boss. He has a...   
2  Name: Alice; Medium: Play; Setting: England; Description: A woman in her late 30s, Alice is a warm and nurturing mother of two, including Emily. S...   
3  Name: Tom; Medium: Play; Setting: England; Description: A man in his 50s, Tom is a retired soldier and John's son. He has a no-nonsense approach t...   
4  Name: Sarah; Medium: Play; Setting: England; Description: A woman in her mid-20s, Sarah is a free-spirited artist and Jack's employee. She's creat...   
5  Name: George; Medium: Play; Setting: England; Description: A 

#### Set Up and Initialize RAG Model

This cell configures and initializes the Retrieval-Augmented Generation (RAG) model, using the embeddings file created in the previous cell.

- `filename`: Embeddings filename created from the character descriptions.
- `max_input_tokens`: Maximum input tokens for context in the RAG model.
- `max_output_tokens`: Maximum output tokens for generated responses.
- `verbose`: If `True`, enables detailed output during RAG operations.
- `format`: Output format setting for the RAG model.

In [3]:
from rag_generator import RAGModel


# set configurations
filename = f"character_descriptions_embeddings_v{client.version_index}.csv"
max_input_tokens = 600
max_output_tokens = 400
verbose = False
format = 2

# create RAG model
file_names = [f"data/{filename}"]
llm_kwargs = dict(max_input_tokens=max_input_tokens, max_output_tokens=max_output_tokens)
rag_model = RAGModel(client, file_names)

Combined dataframe has 55 rows and 2 columns


##### Question 1

In [4]:
# Define the user query
user_query = "My MBTI type is INTJ. Could you recommend three characters that suit me best and explain why?"

In [5]:
# Get the answer to the query without using the RAG model
rag_flag = False
output = rag_model.get_answer(user_query, rag_flag=rag_flag, llm_kwargs=llm_kwargs, format=format, verbose=verbose)
print(f"********** NO RAG **********\n{output}")

********** NO RAG **********


My MBTI type is INTJ. Could you recommend three characters that suit me best and explain why?


Certainly! As an INTJ, you are often described as strategic, analytical, independent, and confident. You tend to be a planner, valuing logic and intellect over emotion, and you often have a clear vision of the future. Here are three fictional characters that might resonate with you:

1. **Sherlock Holmes (from Arthur Conan Doyle's "Sherlock Holmes" series)**:
   - **Reason**: Sherlock Holmes is the quintessential INTJ. He is highly logical, analytical, and observant, with an unmatched ability to piece together clues and solve complex mysteries. His strategic thinking and ability to stay calm under pressure align with the INTJ personality. Holmes often works independently and values competence and intelligence, which are key traits of an INTJ.

2. **Gandalf (from J.R.R. Tolkien's "The Lord of the Rings")**:
   - **Reason**: Gandalf is a strategic thinker and a m

In [6]:
# Get the answer to the query using the RAG model
rag_flag = True
output = rag_model.get_answer(user_query, rag_flag=rag_flag, llm_kwargs=llm_kwargs, format=format, verbose=verbose)
print(f"********** RAG **********\n{output}")

********** RAG **********


Instruction: 
Please answer the question using the context provided below.

Context: 
 – Name: Marcus; Medium: Reality Show; Setting: USA; Description: A charming and successful entrepreneur, Marcus is used to getting what he wants. He's a smooth talker with a magnetic personality, but can sometimes come across as a bit too self-centered. He's looking for someone who can challenge him and keep him on his toes.;
 – Name: George; Medium: Play; Setting: England; Description: A man in his early 30s, George is a charming and charismatic businessman who is in a relationship with Emily. He's ambitious, confident, and always looking for the next big opportunity. However, he's also prone to bending the rules to get what he wants.;
 – Name: Rachel; Medium: Play; Setting: England; Description: A woman in her late 20s, Rachel is a shy and introverted librarian who is in a relationship with Tom. She's intelligent, thoughtful, and has a deep love of books. However, she st

##### Question 2

In [7]:
# Define the user query
user_query = "My MBTI type is ESFP. Could you recommend three characters that suit me best and explain why?"

In [8]:
# Get the answer to the query without using the RAG model
rag_flag = False
output = rag_model.get_answer(user_query, rag_flag=rag_flag, llm_kwargs=llm_kwargs, format=format, verbose=verbose)
print(f"********** NO RAG **********\n{output}")

********** NO RAG **********


My MBTI type is ESFP. Could you recommend three characters that suit me best and explain why?


Certainly! As an ESFP, you're often described as outgoing, spontaneous, and energetic. You thrive on social interaction and have a keen sense of aesthetics and enjoyment of life. Here are three characters that might resonate with you:

1. **Maui from *Moana***: Maui is charismatic, adventurous, and has a flair for drama, much like an ESFP. He thrives on being the center of attention and loves to entertain others with his stories and songs. His adventurous spirit and desire to explore new things align well with the ESFP's love for excitement and spontaneity.

2. **Samantha Jones from *Sex and the City***: Samantha is the epitome of an outgoing and vivacious ESFP. She is bold, confident, and lives life to the fullest, embracing new experiences with enthusiasm. Her ability to live in the moment and her strong social skills are quintessential ESFP traits, making he

In [9]:
# Get the answer to the query using the RAG model
rag_flag = True
output = rag_model.get_answer(user_query, rag_flag=rag_flag, llm_kwargs=llm_kwargs, format=format, verbose=verbose)
print(f"********** RAG **********\n{output}")

********** RAG **********


Instruction: 
Please answer the question using the context provided below.

Context: 
 – Name: George; Medium: Play; Setting: England; Description: A man in his early 30s, George is a charming and charismatic businessman who is in a relationship with Emily. He's ambitious, confident, and always looking for the next big opportunity. However, he's also prone to bending the rules to get what he wants.;
 – Name: Emily; Medium: Play; Setting: England; Description: A young woman in her early 20s, Emily is an aspiring actress and Alice's daughter. She has a bubbly personality and a quick wit, but struggles with self-doubt and insecurity. She's also in a relationship with George.;
 – Name: Rachel; Medium: Play; Setting: England; Description: A woman in her late 20s, Rachel is a shy and introverted librarian who is in a relationship with Tom. She's intelligent, thoughtful, and has a deep love of books. However, she struggles with social anxiety and often feels like a