# Custom Chatbot Project

I have choosen `2023_fashion_trends.csv` dataset, as this had 82 rows with trends, source and URL info. This seems more real world relevant and tricky as well. 
In this, we can ask queries for colors, styles etc. I believe if this kind of bigger dataset we get, that's gonna be practically very applicable

## Data Wrangling

TODO: In the cells below, load your chosen dataset into a `pandas` dataframe with a column named `"text"`. This column should contain all of your text data, separated into at least 20 rows.

In [2]:
import pandas as pd
data = pd.read_csv('2023_fashion_trends.csv')
data.sample(5)

Unnamed: 0,URL,Trends,Source
5,https://www.refinery29.com/en-us/fashion-trend...,2023 Fashion Trend: Maxi Skirts. In response t...,7 Fashion Trends That Will Take Over 2023 — Sh...
27,https://www.vogue.com/article/spring-2023-tren...,"Sculptural Statement Earrings. For me, this sp...",These Are the Spring 2023 Trends Vogue Editors...
8,https://www.instyle.com/spring-2023-fashion-tr...,All of the Pastels. Pastels are classic (albei...,"The Top 6 Trends to Wear for Spring 2023, Acco..."
77,https://www.whowhatwear.com/spring-summer-2023...,"If lime green isn't your vibe, rest assured th...",Spring/Summer 2023 Fashion Trends: 21 Expert-A...
79,https://www.whowhatwear.com/spring-summer-2023...,"""Combine this design shift with the fact that ...",Spring/Summer 2023 Fashion Trends: 21 Expert-A...


In [7]:
print(data.columns.tolist())
print(f"\nDataset shape: {data.shape}\n")

print(data.info())

# Check for missing values
print(data.isnull().sum())

['URL', 'Trends', 'Source']

Dataset shape: (82, 3)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 82 entries, 0 to 81
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   URL     82 non-null     object
 1   Trends  82 non-null     object
 2   Source  82 non-null     object
dtypes: object(3)
memory usage: 2.1+ KB
None
URL       0
Trends    0
Source    0
dtype: int64


In [10]:
data[['Source', 'Trends']].head(3)

Unnamed: 0,Source,Trends
0,7 Fashion Trends That Will Take Over 2023 — Shop Them Now,"2023 Fashion Trend: Red. Glossy red hues took over the Fall 2023 runways ranging from Sandy Liang and PatBo to Tory Burch and Wiederhoeft. Think: Juicy reds with vibrant orange undertones that would look just as good in head-to-toe looks (see: a pantsuit) as accent accessory pieces (shoes, handbags, jewelry)."
1,7 Fashion Trends That Will Take Over 2023 — Shop Them Now,"2023 Fashion Trend: Cargo Pants. Utilitarian wear is in for 2023, which sets the stage for the return of the cargo pant. But these aren't the shapeless, low-rise pants of the Y2K era. For spring, this trend is translated into tailored silhouettes, interesting pocket placements, elevated fabrics like silk and organza, and colors that go beyond khaki and olive."
2,7 Fashion Trends That Will Take Over 2023 — Shop Them Now,"2023 Fashion Trend: Sheer Clothing. ""Bare it all"" has been the motto since the end of the lockdown. In 2023, naked dressing makes its way from the red carpet – where celebrities like Cher and Rihanna have been sporting the trend forever – to street style. From a cellophane-like dress, worn over a boldly hued maxi skirt at Tory Burch's spring 2023 show, to a frothy frock revealing undergarments at Victoria Beckham, the previously risqué trend is coming not only for your weekend wardrobe but even workwear."


In [18]:
# Remove very short entries
data[data['Trends'].str.len() < 50]  

Unnamed: 0,URL,Trends,Source,text


In [19]:
data['text'] = "Source: " + data['Source'] + "\nTrends: " +\
               data['Trends'] + "\n URL: " + data['URL']


In [20]:
updated_data = data[['text']]
updated_data.head(3)

Unnamed: 0,text
0,"Source: 7 Fashion Trends That Will Take Over 2023 — Shop Them Now\nTrends: 2023 Fashion Trend: Red. Glossy red hues took over the Fall 2023 runways ranging from Sandy Liang and PatBo to Tory Burch and Wiederhoeft. Think: Juicy reds with vibrant orange undertones that would look just as good in head-to-toe looks (see: a pantsuit) as accent accessory pieces (shoes, handbags, jewelry).\n URL: https://www.refinery29.com/en-us/fashion-trends-2023"
1,"Source: 7 Fashion Trends That Will Take Over 2023 — Shop Them Now\nTrends: 2023 Fashion Trend: Cargo Pants. Utilitarian wear is in for 2023, which sets the stage for the return of the cargo pant. But these aren't the shapeless, low-rise pants of the Y2K era. For spring, this trend is translated into tailored silhouettes, interesting pocket placements, elevated fabrics like silk and organza, and colors that go beyond khaki and olive.\n URL: https://www.refinery29.com/en-us/fashion-trends-2023"
2,"Source: 7 Fashion Trends That Will Take Over 2023 — Shop Them Now\nTrends: 2023 Fashion Trend: Sheer Clothing. ""Bare it all"" has been the motto since the end of the lockdown. In 2023, naked dressing makes its way from the red carpet – where celebrities like Cher and Rihanna have been sporting the trend forever – to street style. From a cellophane-like dress, worn over a boldly hued maxi skirt at Tory Burch's spring 2023 show, to a frothy frock revealing undergarments at Victoria Beckham, the previously risqué trend is coming not only for your weekend wardrobe but even workwear.\n URL: https://www.refinery29.com/en-us/fashion-trends-2023"


In [21]:
print(f"Final dataset has {len(updated_data)} rows")


Final dataset has 82 rows


## Custom Query Completion

TODO: In the cells below, compose a custom query using your chosen dataset and retrieve results from an OpenAI `Completion` model. You may copy and paste any useful code from the course materials.

In [22]:
import openai
from openai import OpenAI

from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()


client = OpenAI(
    api_key=os.getenv("VOC_OPENAI_API_KEY"),
    base_url="https://openai.vocareum.com/v1"
)

In [23]:
import tiktoken 
def count_tokens(text, model="gpt-3.5-turbo"):
    """
    Count the number of tokens in a text string.
    """
    encoding = tiktoken.encoding_for_model(model)
    return len(encoding.encode(text))

In [45]:
def chunk_text_with_overlap(text, max_tokens=150, overlap_tokens=50):
    # Step 1: Tokenize text
    encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
    tokens = encoding.encode(text)  # Text → Tokens
    
    chunks = []
    step = max_tokens - overlap_tokens
    
    # Step 2: Chunk the tokens
    for i in range(0, len(tokens), step):
        chunk_tokens = tokens[i:i + max_tokens]
        
        # Step 3: Decode tokens back to text
        chunk_text = encoding.decode(chunk_tokens)
        chunks.append(chunk_text)
        
        if i + max_tokens >= len(tokens):
            break
    
    return chunks

# Apply chunking with overlap
chunked_texts = []
for text in updated_data['text']:
    chunks = chunk_text_with_overlap(text, max_tokens=100, overlap_tokens=50)
    chunked_texts.extend(chunks)

chunked_df = pd.DataFrame({'text': chunked_texts})
print(f"Total chunks: {len(chunked_df)}")

Total chunks: 191


In [46]:
chunked_df

Unnamed: 0,text
0,"Source: 7 Fashion Trends That Will Take Over 2023 — Shop Them Now\nTrends: 2023 Fashion Trend: Red. Glossy red hues took over the Fall 2023 runways ranging from Sandy Liang and PatBo to Tory Burch and Wiederhoeft. Think: Juicy reds with vibrant orange undertones that would look just as good in head-to-toe looks (see: a pantsuit) as accent accessory pieces (shoes, handbags"
1,"to Tory Burch and Wiederhoeft. Think: Juicy reds with vibrant orange undertones that would look just as good in head-to-toe looks (see: a pantsuit) as accent accessory pieces (shoes, handbags, jewelry).\n URL: https://www.refinery29.com/en-us/fashion-trends-2023"
2,"Source: 7 Fashion Trends That Will Take Over 2023 — Shop Them Now\nTrends: 2023 Fashion Trend: Cargo Pants. Utilitarian wear is in for 2023, which sets the stage for the return of the cargo pant. But these aren't the shapeless, low-rise pants of the Y2K era. For spring, this trend is translated into tailored silhouettes, interesting pocket placements, elevated fabrics like silk and organza, and colors that go beyond"
3,"pant. But these aren't the shapeless, low-rise pants of the Y2K era. For spring, this trend is translated into tailored silhouettes, interesting pocket placements, elevated fabrics like silk and organza, and colors that go beyond khaki and olive.\n URL: https://www.refinery29.com/en-us/fashion-trends-2023"
4,"Source: 7 Fashion Trends That Will Take Over 2023 — Shop Them Now\nTrends: 2023 Fashion Trend: Sheer Clothing. ""Bare it all"" has been the motto since the end of the lockdown. In 2023, naked dressing makes its way from the red carpet – where celebrities like Cher and Rihanna have been sporting the trend forever – to street style. From a cellophane-like dress, worn over a boldly hued maxi skirt at Tory"
...,...
186,"Source: Spring/Summer 2023 Fashion Trends: 21 Expert-Approved Looks You Need to See\nTrends: Thought party season ended at the stroke of midnight on December 31? Think again! The spring/summer runways bear a striking resemblance to all of my favourite coming-of-age movies from the '80s—Sixteen Candles, Pretty in Pink… basically anything starring Molly Ringwald. Between frothy fabrications, high-shine lamé, lashings of leopard print"
187,"striking resemblance to all of my favourite coming-of-age movies from the '80s—Sixteen Candles, Pretty in Pink… basically anything starring Molly Ringwald. Between frothy fabrications, high-shine lamé, lashings of leopard print, and the glorious return of the puffball hemline, this season invites you to play dress-up and not take yourself too seriously in the process.\n URL: https://www.whowhatwear.com/spring-summer-2023-fashion-trends"
188,", and the glorious return of the puffball hemline, this season invites you to play dress-up and not take yourself too seriously in the process.\n URL: https://www.whowhatwear.com/spring-summer-2023-fashion-trends/"
189,"Source: Spring/Summer 2023 Fashion Trends: 21 Expert-Approved Looks You Need to See\nTrends: ""This season, we saw the revival of the bubble skirt. Styled with printed snakeskin and powerful shoulders at Khaite, longer versions at Proenza Schouler reimagined with a low-waist silhouette and Simone Rocha's metallic mini bubble; these were all highlights,"" says Wiggins.\n URL: https://www.whowhatwear.com/spring-"


In [47]:
def get_embedding(text, model="text-embedding-ada-002"):
    """Get embedding for a text string"""
    text = text.replace("\n", " ")
    response = client.embeddings.create(input=[text], model=model)
    return response.data[0].embedding

# Generate embeddings for all chunks (this may take a minute)
chunked_df['embedding'] = chunked_df['text'].apply(lambda x: get_embedding(x))
print("Done!")

Done!


In [48]:
import numpy as np

def cosine_similarity(a, b):
    """Calculate cosine similarity between two vectors"""
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

def get_relevant_chunks_embedding(question, chunked_dataframe, top_k=5):
    """
    Find most relevant chunks using embedding similarity.
    """
    # Get question embedding
    question_embedding = get_embedding(question)
    
    # Calculate similarity scores
    chunked_dataframe['similarity'] = chunked_dataframe['embedding'].apply(
        lambda x: cosine_similarity(x, question_embedding)
    )
    
    # Get top chunks
    top_chunks = chunked_dataframe.nlargest(top_k, 'similarity')
    
    return top_chunks['text'].tolist()

In [49]:
def fashion_chatbot(question, chunked_dataframe, top_k=5):
    """
    Abstractive chatbot that generates answers using retrieved context.
    """
    # Step 1: Retrieve relevant chunks
    relevant_chunks = get_relevant_chunks_embedding(question, chunked_dataframe, top_k)
    context = "\n\n".join(relevant_chunks)
    
    # Step 2: Build prompt with context
    system_message = "You are a fashion trends expert for 2023. Answer based on the provided information."
    
    user_prompt = f"""Based on these 2023 fashion trends:

{context}

Question: {question}

Provide a clear, concise answer:"""
    
    # Step 3: Call OpenAI for abstractive answer
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": system_message},
            {"role": "user", "content": user_prompt}
        ],
        max_tokens=200,
        temperature=0.7
    )
    
    return response.choices[0].message.content


## Custom Performance Demonstration

## Question 1

### Basic Answer: Extractive method

In [50]:
# Test it
test_question = "What colors are trending?"
relevant_chunks = get_relevant_chunks_embedding(test_question, chunked_df)
print("Question: ", test_question)
print(f"Found {len(relevant_chunks)} relevant chunks")
print(f"\nFirst chunk:\n{relevant_chunks[0]}...")

Question:  What colors are trending?
Found 5 relevant chunks

First chunk:
Source: Spring/Summer 2023 Fashion Trends: 21 Expert-Approved Looks You Need to See
Trends: If lime green isn't your vibe, rest assured there was another bold hue that practically jumped off the runway and my screen, which was evident from the moment I began this lengthy research process. Striking red ensembles, where shades of saffron were styled top to toe, were present in the form of tights and tuxedos at Ferragamo, sharp...


### Custom Answer: Abstractive method

In [51]:
# Test abstractive chatbot
answer = fashion_chatbot(test_question, chunked_df)
print("Question: ", test_question)
print(f"Answer:\n{answer}")

Question:  What colors are trending?
Answer:
The trending colors for 2023 are lime green, saffron red, cobalt blue, pastels like lilac, pale yellow, and baby blue, as well as bold red shades with orange undertones.


## Question 2

In [52]:
test_question = "Any insights on social media influence on fashion trends?"

#### Basic Answer: Extractive method

In [None]:
relevant_chunks = get_relevant_chunks_embedding(test_question, chunked_df)
print("Question: ", test_question)
print(f"Found {len(relevant_chunks)} relevant chunks")
print(f"\nFirst chunk:\n{relevant_chunks[0]}...")

Question:  Any insights on social media influence on fashion trends?
Found 5 relevant chunks

First chunk:
Source: Spring/Summer 2023 Fashion Trends: 21 Expert-Approved Looks You Need to See
Trends: "It's no surprise that a post-lockdown world is leaning towards more relaxed silhouettes, especially when it comes to our denim choices. I spend a lot of my days on social media (for work, naturally), and the jeans styles that I'm seeing across TikTok, Instagram and Pinterest are so relaxed they might as well be joggers. As the world's...


### Custom Answer: Abstractive method

In [54]:
# Test abstractive chatbot
answer = fashion_chatbot(test_question, chunked_df)
print("Question: ", test_question)
print(f"Answer:\n{answer}")

Question:  Any insights on social media influence on fashion trends?
Answer:
Social media plays a significant role in shaping fashion trends for 2023, with relaxed denim styles popular on platforms like TikTok and Instagram. Sheer tops have gained traction through increased searches, reflecting a shift towards subtle nudity. Additionally, the influence of social media can be seen in the emergence of motocross-inspired athleisure trends and digitally manipulated blurry prints, showcasing how digital platforms impact fashion choices and preferences.
