# Custom Chatbot Project

In this project I'm going to use the **2023 Fashion Trends** dataset, which contains information about recent fashion developments and styles. This dataset is appropriate for demonstrating a chatbot that can answer questions about fashion trends, designers, and style changes in 2023.

## Data Wrangling

In [None]:
# Some imports required for this notebook
import pandas as pd
import numpy as np
import requests
import os
from dotenv import load_dotenv

from utils import *

In [None]:
# Set openai api key by providing a valid key or text file containing the key.
load_dotenv(".config.env")
OPENAI_KEY = os.getenv('OPENAI_API_KEY')
OPENAI_URL = os.getenv('OPENAI_BASE_URL')
set_api_key(OPENAI_KEY, OPENAI_URL)

# Create a configuration object
config = create_config({
    'EMBEDDING_MODEL_NAME': 'text-embedding-ada-002',
    'COMPLETION_MODEL_NAME': 'gpt-4o-mini',
    'ENCODING': 'cl100k_base',
    'MAX_PROMPT_TOKENS': 2000,
    'MAX_RESPONSE_TOKENS': 150,
    'BATCH_SIZE': 100,
    'FROM_SCRATCH': True
})

In [None]:
if config.FROM_SCRATCH:
    df = pd.read_csv('./data/source/2023_fashion_trends.csv')
    df = clean_csv_data(df)
else:
    # Load cleaned data from csv file
    df = pd.read_csv('./data/results/df_preprocessed.csv', index_col = 0)

In [None]:
# EMBEDDINGS 

if config.FROM_SCRATCH:
    # Get embeddings for all text rows from openai and store in csv file
    df['embeddings'] = get_embeddings(df, config)
    df.to_csv('./data/results/df_embeddings.csv', index=False)
    df['embeddings'] = df['embeddings'].apply(np.array)
else:
    # Load preprocessed date with embeddings from csv file
    df = pd.read_csv('./data/results/df_embeddings.csv', index_col = 0)
    df['embeddings'] = df['embeddings'].apply(eval).apply(np.array)

In [None]:
df.head()

## Custom Query Completion

In [None]:
answer = answer_question("What are the top fashion trends in 2023?", df, config, custom=True)
print(answer)

In [None]:
# Example question and answer
answer = answer_question("Which designer influenced the 2023 fashion trends the most?", df, config, custom=True)
print(answer)

## Custom Performance Demonstration

TODO: In the cells below, demonstrate the performance of your custom query using at least 2 questions. For each question, show the answer from a basic `Completion` model query as well as the answer from your custom query.

### Question 1

In [None]:
# Question with context
print(answer_question('What colors are popular in 2023 fashion trends?', df, config, custom=True))

In [None]:
# Same question without context
print(answer_question('What colors are popular in 2023 fashion trends?', df, config, custom=False))

### Question 2

In [None]:
print(answer_question('Which materials are most used in 2023 fashion collections?', df, config, custom=True))

In [None]:
print(answer_question('Which materials are most used in 2023 fashion collections?', df, config, custom=False))

## Chat Bot

In [None]:
print('Hello, what do you want to know?\n')
while True:
    question = input('You: ')
    if len(question) > 0:
        print(f'\nBot: {answer_question(question, df, config, custom=True)}', end='\n\n')
    else:
        print('\nGood bye!')
        break