# Custom Chatbot Project

TODO: In this cell, write an explanation of which dataset you have chosen and why it is appropriate for this task

I wanted to work with a CSV dataset, and this one on "2023 fashion trends" seemed like a good fit. It has reports and descriptions of trends, which makes it useful for answering fashion-related questions with more specific details. This way, the chatbot can give answers based on real examples instead of just general trends.  
 


## Data Wrangling

TODO: In the cells below, load your chosen dataset into a `pandas` dataframe with a column named `"text"`. This column should contain all of your text data, separated into at least 20 rows.

In [1]:
# Import all libraries
import openai
import pandas as pd
import numpy as np
import requests
from dateutil.parser import parse
from openai.embeddings_utils import get_embedding, distances_from_embeddings
import tiktoken


openai.api_base = "https://openai.vocareum.com/v1"
openai.api_key = "YOUR API KEY"

In [2]:
# Load fashion dataset
dataset_path = "data/2023_fashion_trends.csv"  
df = pd.read_csv(dataset_path)
df = df.rename(columns={"Trends": "text"}) 

# Clean up text: Remove empty lines & headings
df = df[(df["text"].str.len() > 0) & (~df["text"].str.startswith("=="))]

# In some cases, dates are used as headings instead of being part of the text sample
# Adjust so that dated text samples start with dates
prefix = ""
for i, row in df.iterrows():
    if " – " not in row["text"]:  
        try:
            parse(row["text"])  
            prefix = row["text"]  
        except:
            row["text"] = prefix + " – " + row["text"]  

df = df[df["text"].str.contains(" – ")].reset_index(drop=True)
print("First 5 rows after cleaning:\n", df.head())


First 5 rows after cleaning:
                                                  URL  \
0  https://www.refinery29.com/en-us/fashion-trend...   
1  https://www.refinery29.com/en-us/fashion-trend...   
2  https://www.refinery29.com/en-us/fashion-trend...   
3  https://www.refinery29.com/en-us/fashion-trend...   
4  https://www.refinery29.com/en-us/fashion-trend...   

                                                text  \
0   – 2023 Fashion Trend: Red. Glossy red hues to...   
1   – 2023 Fashion Trend: Cargo Pants. Utilitaria...   
2  2023 Fashion Trend: Sheer Clothing. "Bare it a...   
3   – 2023 Fashion Trend: Denim Reimagined. From ...   
4  2023 Fashion Trend: Shine For The Daytime. The...   

                                              Source  
0  7 Fashion Trends That Will Take Over 2023 — Sh...  
1  7 Fashion Trends That Will Take Over 2023 — Sh...  
2  7 Fashion Trends That Will Take Over 2023 — Sh...  
3  7 Fashion Trends That Will Take Over 2023 — Sh...  
4  7 Fashion Trends T

In [3]:
# Check dataset length (at least teh 20 rows are in this dataset)
if len(df) < 20:
    raise ValueError("Dataset must have at least 20 rows of text data.")
print("Sample Text Data:\n", df["text"].head())

Sample Text Data:
 0     – 2023 Fashion Trend: Red. Glossy red hues to...
1     – 2023 Fashion Trend: Cargo Pants. Utilitaria...
2    2023 Fashion Trend: Sheer Clothing. "Bare it a...
3     – 2023 Fashion Trend: Denim Reimagined. From ...
4    2023 Fashion Trend: Shine For The Daytime. The...
Name: text, dtype: object


## Custom Query Completion

TODO: In the cells below, compose a custom query using your chosen dataset and retrieve results from an OpenAI `Completion` model. You may copy and paste any useful code from the course materials.

In [4]:
# Generate embeddings using OpenAI’s embedding model
EMBEDDING_MODEL_NAME = "text-embedding-ada-002"
batch_size = 100
embeddings = []

for i in range(0, len(df), batch_size):
    response = openai.Embedding.create(
        input=df.iloc[i:i+batch_size]["text"].tolist(),
        engine=EMBEDDING_MODEL_NAME
    )
    embeddings.extend([data["embedding"] for data in response["data"]])

df["embeddings"] = embeddings
df.to_csv("data/embedded_dataset.csv", index=False)


In [5]:
def get_rows_sorted_by_relevance(question, df):
    question_embedding = get_embedding(question, engine=EMBEDDING_MODEL_NAME)
    df_copy = df.copy()
    df_copy["distances"] = distances_from_embeddings(
        question_embedding, df_copy["embeddings"].values, distance_metric="cosine"
    )
    return df_copy.sort_values("distances", ascending=True)


In [6]:
query = "What are the top fashion trends in 2023?"
relevant_results = get_rows_sorted_by_relevance(query, df)
print(relevant_results.head())

                                                  URL  \
2   https://www.refinery29.com/en-us/fashion-trend...   
4   https://www.refinery29.com/en-us/fashion-trend...   
63  https://www.whowhatwear.com/spring-summer-2023...   
0   https://www.refinery29.com/en-us/fashion-trend...   
44  https://www.whowhatwear.com/spring-summer-2023...   

                                                 text  \
2   2023 Fashion Trend: Sheer Clothing. "Bare it a...   
4   2023 Fashion Trend: Shine For The Daytime. The...   
63   – "Every season, there is a trend that speaks...   
0    – 2023 Fashion Trend: Red. Glossy red hues to...   
44   – I get it. Some of the trends on this list m...   

                                               Source  \
2   7 Fashion Trends That Will Take Over 2023 — Sh...   
4   7 Fashion Trends That Will Take Over 2023 — Sh...   
63  Spring/Summer 2023 Fashion Trends: 21 Expert-A...   
0   7 Fashion Trends That Will Take Over 2023 — Sh...   
44  Spring/Summer 2023 Fashio

In [7]:
COMPLETION_MODEL_NAME = "gpt-3.5-turbo-instruct"

def create_prompt(question, df, max_token_count=1800):
    tokenizer = tiktoken.get_encoding("cl100k_base")
    prompt_template = """
Answer the question based on the context below, and if the question
can't be answered based on the context, say "I don't know."

Context: 

{}

---

Question: {}
Answer:
"""
    current_token_count = len(tokenizer.encode(prompt_template)) + len(tokenizer.encode(question))
    context = []
    for text in get_rows_sorted_by_relevance(question, df)["text"].values:
        text_token_count = len(tokenizer.encode(text))
        current_token_count += text_token_count
        if current_token_count <= max_token_count:
            context.append(text)
        else:
            break

    return prompt_template.format("\n\n###\n\n".join(context), question)


## Custom Performance Demonstration

TODO: In the cells below, demonstrate the performance of your custom query using at least 2 questions. For each question, show the answer from a basic `Completion` model query as well as the answer from your custom query.

### Question 1

In [8]:
question1 = "What are the top fashion trends in 2023?"

print("Basic Completion Model Answer:")
basic_answer1 = openai.Completion.create(
    model="gpt-3.5-turbo-instruct",
    prompt=question1,
    max_tokens=150
)["choices"][0]["text"].strip()
print(basic_answer1)

Basic Completion Model Answer:
1. Sustainability: As the fashion industry becomes more conscious of its environmental impact, sustainability will continue to be a dominant trend in 2023. This includes using eco-friendly materials, implementing ethical production practices, and promoting circular fashion.

2. Retro and Vintage Revival: Nostalgia for past decades will continue to influence fashion trends in 2023. Expect to see a resurgence of 70s bohemian style, 80s power dressing, and 90s grunge fashion.

3. Bold Colors and Prints: Bright, bold colors and statement prints will make a comeback in 2023. From floral patterns to geometric designs, expect to see eye-catching prints and patterns on everything from clothing to accessories.

4. Athle


In [9]:
print("\nCustom Completion Model Answer:")
custom_answer1 = openai.Completion.create(
    model="gpt-3.5-turbo-instruct",
    prompt=create_prompt(question1, df),
    max_tokens=150
)["choices"][0]["text"].strip()
print(custom_answer1)


Custom Completion Model Answer:
- Sheer Clothing
- Shine for the Daytime
- Red Hues
- Denim Reimagined
- Cobalt Blue
- Elevated Basics
- Maxi Skirts
- Cargo Pants
- Mesh
- Green
- Tailored Look
- Indie Sleaze.


### Question 2

In [10]:
question2 = "What are the key colors trending in fashion for 2023?"

print("Basic Completion Model Answer:")
basic_answer2 = openai.Completion.create(
    model="gpt-3.5-turbo-instruct",
    prompt=question2,
    max_tokens=150
)["choices"][0]["text"].strip()
print(basic_answer2)




Basic Completion Model Answer:
1. Pastel shades: Soft and delicate pastel colors like lavender, baby blue, pale yellow, and mint green are expected to be popular in 2023. These colors give a calming and soothing vibe and will be seen in both clothing and accessories.

2. Earthy tones: Rich and warm earthy tones like rust orange, mustard yellow, olive green, and burgundy will continue to dominate the fashion scene in 2023. These colors evoke a sense of nature and comfort, making them perfect for fall and winter fashion.

3. Metallics: Metallic shades such as silver, gold, copper, and bronze will add a touch of glamour to fashion in 2023. These colors will be seen in both clothing and accessories,


In [11]:
print("\nCustom Completion Model Answer:")
custom_answer2 = openai.Completion.create(
    model="gpt-3.5-turbo-instruct",
    prompt=create_prompt(question2, df),
    max_tokens=150
)["choices"][0]["text"].strip()
print(custom_answer2)



Custom Completion Model Answer:
Cobalt blue, red, lime green, saffron, and cherry red.
