# MBAI 448 | Week 2 Assignment: Image Embeddings as Representations

##### Assignment Overview

This assignment explores how data representations can be applied to a real-world problem. It is organized into three Acts:

- Act I: Understand the problem and context
- Act II: Prototype a solution with AI technology
- Act III: Socialize the work with stakeholders

##### Assignment Tools

This assignment assumes you will be working with Github Copilot in VS Code, and will require you to submit your chat history along with this notebook. If you are curious about how to work effectively with Github Copilot, please consult the [VS Code documentation](https://code.visualstudio.com/docs/copilot/overview).

Submissions that demonstrate thoughtless interaction with Copilot (e.g., asking Copilot to just read the notebook and produce all the outputs) will receive reduced credit.

### Act 1 : Understand the problem and context

##### Business Goal / Case Statement
Convert more customers by making it easier to find products through search.

##### Assignment Context

**Relevant Industry and/or Business Function:** E-commerce

**Description:** You report to the VP of digital experience at upstart clothing e-commerce company HIM Holdings.  They have found that the more text searches a customer makes on their app, the less likely that customer is to make a purchase.  They want you to explore how AI could help customers to better find what they are looking for.

##### The Data

**Dataset Name:** <code>[h-and-m-fashion-caption](https://huggingface.co/datasets/tomytjandra/h-and-m-fashion-caption)</code><br>
**Data Location:** <code>https://huggingface.co/datasets/tomytjandra/h-and-m-fashion-caption</code>

#### Step 0 : Scope the work in `agents.md`

Before moving forward, create a a file named `agents.md` in the project root directory (likely the same level of the directory in which this notebook lives). This file specifies the intended role of AI in this project and serves as reference context for Github Copilot as you work.

Your `agents.md` must include the following five sections:

##### 1. What we’re building
A one-sentence "elevator pitch" describing the prototype and its primary output (e.g., "A predictive lead-scoring engine that identifies high-value customers based on historical CRM data.")

##### 2. How AI helps solve the business problem
2–4 bullet points explaining the specific value-add of the AI components. Focus on the transition from the business "pain point" to the AI "solution."

##### 3. Key file locations and data structure
List the paths that matter (e.g., `notebooks/exploration.ipynb`, `data/raw_leads.csv`).

##### 4. High-level execution plan
A step-by-step outline of the build process (e.g., 1. Data cleaning, 2. Feature engineering, 3. Model training, 4. Visualization of results). Feel free to ask Copilot for help (or take a peek at the steps in Act II below) for a sense on structuring the work.

##### 5. Code conventions and constraints
To ensure the prototype remains manageable, add 1-2 bullet points specifying that code be as simple and straightforward, using standard libraries unless instructed otherwise.

### Act 2 : Prototype a solution with AI technology

## Prototyping an Encoder-Based Search System

In this act, you will prototype an encoder-based search system that compares items based on learned representations rather than exact matches.

This is an exploratory prototype. The goal is to understand how encoder-based representations behave in practice: how similarity emerges, what those similarities capture, and where they fail to align with the problem you are trying to solve.

You are encouraged to use GitHub Copilot throughout. For each step, follow the same disciplined loop:

- **Plan**: Have Copilot create a short, narrative plan describing what needs to happen and what artifacts will be produced.
- **Validate**: Review and revise that plan until it is complete, coherent, and aligned with the purpose of the step.
- **Execute**: Once the plan is validated, have Copilot implement it in code.
- **Check**: Use the resulting code to perform one or two concrete actions that confirm you have what you need.

#### Environment Setup

To run this notebook locally as you move through the assignment, we suggest you create and activate a Python virtual environment.

From the project root directory:

##### On MacOS/Linux:
`python -m venv venv
`source venv/bin/activate

##### On Windows:
`python -m venv venv
`venv\Scripts\activate

Once your virtual environment is activated, you can set it as the kernel for this notebook in the top right corner of your notebook pane.


## Step 1: Load the dataset and make the items explicit

Before introducing representations, you need a concrete understanding of what the system will operate over.

### Plan
Have Copilot create a plan to:
- load the dataset
- determine how many items it contains
- identify what constitutes a single searchable item
- display several example items with their available attributes

### Validate
Ensure the plan:
- downloads only a portion of the data, so it's easier to work with
- makes no assumptions about embeddings or similarity
- clearly distinguishes raw items from any derived representations

### Execute
Once the plan is validated, have Copilot implement it in code.

### Check
- Print the total number of items in the dataset.
- Display at least three example items, including all available fields.

Food for thought:
- What information from these images do you think is important for your task? 
- How effective would traditional text keyword search be here? With the data as-is, could you implement sorting and filtering?

# Multimodal Search Exploration: Image + Text Queries

This notebook demonstrates a product search prototype using CLIP embeddings for both image and text-based retrieval.

In [5]:
%pip install torch transformers datasets scikit-learn pandas numpy pillow




In [6]:
import torch
import numpy as np
from transformers import CLIPProcessor, CLIPModel
from datasets import load_dataset
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd
from PIL import Image

# Device detection
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'Using device: {device}')

Using device: cpu


## Section 1: Load and Explore Dataset

In [8]:
# Load H&M Fashion Caption Dataset
dataset = load_dataset('ashraq/fashion-product-images-small')
dataset_subset = dataset['train'].select(range(5000))  # Working with 5k items

print(f'Dataset loaded: {len(dataset_subset)} products')
print(f'Dataset columns: {dataset_subset.column_names}')

# Examine 3 sample items
print('\n=== Sample Products ===')
for i in [0, 100, 500]:
    item = dataset_subset[i]
    print(f'\nProduct {i}:')
    print(f'  Product Name: {item["productDisplayName"]}')
    print(f'  Category: {item["subCategory"]}')
    print(f'  Color: {item["baseColour"]}')
    print(f'  Image size: {item["image"].size}')

Dataset loaded: 5000 products
Dataset columns: ['id', 'gender', 'masterCategory', 'subCategory', 'articleType', 'baseColour', 'season', 'year', 'usage', 'productDisplayName', 'image']

=== Sample Products ===

Product 0:
  Product Name: Turtle Check Men Navy Blue Shirt
  Category: Topwear
  Color: Navy Blue
  Image size: (60, 80)

Product 100:
  Product Name: Nike Men Air Rift MTR White Casual Shoe
  Category: Shoes
  Color: White
  Image size: (60, 80)

Product 500:
  Product Name: Puma Men Ferrari Lifestyle Red Cap
  Category: Headwear
  Color: Red
  Image size: (60, 80)


## Step 2: Generate embeddings using a pretrained encoder

This step introduces the representation that will later support similarity-based comparison.

### Plan
Have Copilot create a plan to:
- select an appropriate pretrained encoder for the item content (https://huggingface.co/openai/clip-vit-base-patch16 should work)
- apply any required preprocessing
- convert each item into a fixed-length embedding
- store embeddings in a structure suitable for comparison

### Validate
Ensure the plan:
- uses the pretrained model as-is (no training or fine-tuning)
- applies preprocessing consistently across all items
- creates embeddings for the images and also creates embeddings for their captions

### Execute
Once the plan is validated, have Copilot implement it in code.

### Check
- Print the shape and datatype of the embedding collection.
- Inspect a small slice of one embedding (e.g., the first few values).
- Confirm that embeddings are populated (not all zeros or NaNs).

Food for thought:
- If you swapped in a different encoder, what might change even if the input data stayed the same?

## Section 2: Load CLIP Model

In [9]:
# Load pretrained CLIP model and processor
model_name = 'openai/clip-vit-base-patch16'
model = CLIPModel.from_pretrained(model_name).to(device)
processor = CLIPProcessor.from_pretrained(model_name)

print(f'CLIP model loaded: {model_name}')
print(f'  Image encoder: ViT-Base')
print(f'  Text encoder: Transformer')

config.json: 0.00B [00:00, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


pytorch_model.bin:   0%|          | 0.00/599M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


model.safetensors:   0%|          | 0.00/599M [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/316 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/905 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/389 [00:00<?, ?B/s]

CLIP model loaded: openai/clip-vit-base-patch16
  Image encoder: ViT-Base
  Text encoder: Transformer


## Section 3: Generate Image Embeddings

In [10]:
# Generate image embeddings for all products
print('Generating image embeddings...')
image_embeddings_list = []

for i in range(len(dataset_subset)):
    if i % 500 == 0:
        print(f'  Processed {i}/{len(dataset_subset)} images')
    
    image = dataset_subset[i]['image']
    inputs = processor(images=image, return_tensors='pt').to(device)
    
    with torch.no_grad():
        image_features = model.get_image_features(**inputs)
    image_features = image_features / image_features.norm(dim=-1, keepdim=True)
    image_embeddings_list.append(image_features.cpu().numpy())

image_embeddings = np.concatenate(image_embeddings_list, axis=0)
print(f'Image embeddings shape: {image_embeddings.shape} (5000 images x 512 dimensions)')
print(f'  Normalization: L2-normalized for cosine similarity')

Generating image embeddings...
  Processed 0/5000 images
  Processed 500/5000 images
  Processed 1000/5000 images
  Processed 1500/5000 images
  Processed 2000/5000 images
  Processed 2500/5000 images
  Processed 3000/5000 images
  Processed 3500/5000 images
  Processed 4000/5000 images
  Processed 4500/5000 images
Image embeddings shape: (5000, 512) (5000 images x 512 dimensions)
  Normalization: L2-normalized for cosine similarity


## Section 4: Generate Caption Embeddings

In [11]:
# Generate caption embeddings for all products
print('Generating caption embeddings...')
caption_embeddings_list = []
product_ids = []

for i in range(len(dataset_subset)):
    if i % 500 == 0:
        print(f'  Processed {i}/{len(dataset_subset)} captions')
    
    # Create text description from available fields
    product_name = dataset_subset[i]['productDisplayName']
    category = dataset_subset[i].get('subCategory', '')
    color = dataset_subset[i].get('baseColour', '')
    caption = f"{product_name} {category} {color}".strip()
    product_id = dataset_subset[i].get('id', i)
    
    inputs = processor(text=caption, return_tensors='pt', padding=True, truncation=True).to(device)
    
    with torch.no_grad():
        caption_features = model.get_text_features(**inputs)
    caption_features = caption_features / caption_features.norm(dim=-1, keepdim=True)
    caption_embeddings_list.append(caption_features.cpu().numpy())
    product_ids.append(product_id)

caption_embeddings = np.concatenate(caption_embeddings_list, axis=0)
product_ids = np.array(product_ids)

print(f'Caption embeddings shape: {caption_embeddings.shape} (5000 captions x 512 dimensions)')
print(f'  Normalization: L2-normalized for cosine similarity')

Generating caption embeddings...
  Processed 0/5000 captions
  Processed 500/5000 captions
  Processed 1000/5000 captions
  Processed 1500/5000 captions
  Processed 2000/5000 captions
  Processed 2500/5000 captions
  Processed 3000/5000 captions
  Processed 3500/5000 captions
  Processed 4000/5000 captions
  Processed 4500/5000 captions
Caption embeddings shape: (5000, 512) (5000 captions x 512 dimensions)
  Normalization: L2-normalized for cosine similarity


## Step 3: Compare items in representation space

Embeddings are not representations for a human audience, but a machine can use them.

### Plan
Have Copilot create a plan to:
- define a similarity or distance metric
- select a query item
- retrieve the nearest neighbors for that query
- display the query alongside retrieved items

### Validate
Ensure the plan:
- specifies the similarity metric explicitly
- allows retrieved results to be traced back to original items
- does not assume that nearest neighbors are necessarily “correct”

### Execute
Once the plan is validated, have Copilot implement it in code.

### Check
- Run the search for a specific item and display the top results. 
- If you first searched using an image, now try using a description (or vice versa).

Food for thought:
- What does “similar” appear to mean in this representation space? 
- Can you recognize commonalities in similar representations?

## Section 5: Define Search Functions

In [12]:
def search_by_image(query_index, top_k=5):
    """
    Search for similar products using image embedding.
    Similarity metric: Cosine similarity
    """
    query_embedding = image_embeddings[query_index:query_index+1]
    similarities = cosine_similarity(query_embedding, image_embeddings)[0]
    
    # Get top-k results (excluding the query itself)
    top_indices = np.argsort(similarities)[::-1][1:top_k+1]
    return top_indices, similarities[top_indices]

def search_by_text(query_text, top_k=5):
    """
    Search for similar products using text query.
    Similarity metric: Cosine similarity against caption embeddings
    """
    # Encode the query text
    inputs = processor(text=query_text, return_tensors='pt', padding=True, truncation=True).to(device)
    with torch.no_grad():
        query_features = model.get_text_features(**inputs)
    query_features = query_features / query_features.norm(dim=-1, keepdim=True)
    query_embedding = query_features.cpu().numpy()
    
    # Compute similarity
    similarities = cosine_similarity(query_embedding, caption_embeddings)[0]
    top_indices = np.argsort(similarities)[::-1][:top_k]
    
    return top_indices, similarities[top_indices]

print('Search functions defined and ready to use')

Search functions defined and ready to use


## Section 6: Image-Based Search Demo

In [13]:
# Search for products similar to product index 42
query_idx = 42
top_indices, similarities = search_by_image(query_idx, top_k=5)

query_product = dataset_subset[query_idx]['productDisplayName']
print(f'Product {query_idx}: {query_product}')
print(f'Top 5 similar products (by image):\n')

for rank, (idx, sim) in enumerate(zip(top_indices, similarities), 1):
    product_id = product_ids[idx]
    product_name = dataset_subset[idx]['productDisplayName']
    print(f'{rank}. Product ID: {product_id} (similarity: {sim:.4f})')
    print(f'   Product: {product_name}\n')

Product 42: Maxima Ssteele Men Off White Watch
Top 5 similar products (by image):

1. Product ID: 23283 (similarity: 0.9463)
   Product: Maxima Ssteele Men White Watch

2. Product ID: 23284 (similarity: 0.9315)
   Product: Maxima Ssteele Men Black Watch

3. Product ID: 52671 (similarity: 0.9279)
   Product: Morellato Men Navy Blue Watch

4. Product ID: 41030 (similarity: 0.9259)
   Product: Nautica Men Silver Dial Watch

5. Product ID: 52683 (similarity: 0.9238)
   Product: Morellato Men Silver Dial Watch



## Section 7: Text-Based Search Demo

In [14]:
# Run text-based searches for three different product types
queries = [
    'blue denim shirt',
    'leather jacket',
    'white sneakers'
]

for query in queries:
    print(f'Query: \'{query}\'')
    top_indices, similarities = search_by_text(query, top_k=5)
    
    print(f'Top 5 results:\n')
    for rank, (idx, sim) in enumerate(zip(top_indices, similarities), 1):
        product_id = product_ids[idx]
        product_name = dataset_subset[idx]['productDisplayName']
        print(f'{rank}. Product ID: {product_id} (similarity: {sim:.4f})')
        print(f'   Product: {product_name}\n')
    
    print('-' * 80 + '\n')

Query: 'blue denim shirt'
Top 5 results:

1. Product ID: 8777 (similarity: 0.7281)
   Product: Indigo Nation Men Price catch Blue Shirts

2. Product ID: 16394 (similarity: 0.7240)
   Product: Levis Men Check Blue  Shirts

3. Product ID: 9669 (similarity: 0.7223)
   Product: Indigo Nation Men Checks Blue Shirts

4. Product ID: 9666 (similarity: 0.7218)
   Product: Indigo Nation Men Bling Blue Shirts

5. Product ID: 11714 (similarity: 0.7156)
   Product: Lee Women Check Blue Shirts

--------------------------------------------------------------------------------

Query: 'leather jacket'
Top 5 results:

1. Product ID: 26552 (similarity: 0.6272)
   Product: ID Men Black Shoes

2. Product ID: 39980 (similarity: 0.6109)
   Product: Gas Men Flint Brown Shoes

3. Product ID: 45866 (similarity: 0.6059)
   Product: Numero Uno Men Black Shoes

4. Product ID: 13075 (similarity: 0.6059)
   Product: Numero Uno Men Black Shoes

5. Product ID: 26536 (similarity: 0.6013)
   Product: ID Men White Shoes


## Step 4: Probe representation behavior with contrastive queries

To build your intution about how these representations function, observe how results change under controlled variation.

### Plan
Have Copilot create a plan to:
- issue two closely related queries that differ in one meaningful way (e.g., red shirt vs. blue shirt, khaki pants vs. khaki shorts, etc.)
- retrieve results for both queries
- present the results side by side for comparison

### Validate
Ensure the plan:
- keeps the embeddings and indices you built earlier unchanged
- varies only the query
- produces outputs that can be compared directly

### Execute
Once the plan is validated, have Copilot implement it in code.

### Check
- Identify at least one item that appears in one result set but not the other.
- Note what change in the query caused this shift.

Food for thought:
- What sorts of nuance does this representation seem to capture well, and what sorts of nuance does it seem to capture poorly? 
- Why do you think that is?

## Section 8: Contrastive Query Analysis

In [15]:
# Compare search results for 'red shirt' vs 'blue shirt'
print('=== CONTRASTIVE ANALYSIS: red shirt vs blue shirt ===')
print()

red_indices, red_sims = search_by_text('red shirt', top_k=5)
blue_indices, blue_sims = search_by_text('blue shirt', top_k=5)

print('RED SHIRT - Top 5 results:')
for rank, (idx, sim) in enumerate(zip(red_indices, red_sims), 1):
    product_name = dataset_subset[idx]['productDisplayName'][:60]
    print(f'{rank}. {product_name}... (sim: {sim:.4f})')

print('\nBLUE SHIRT - Top 5 results:')
for rank, (idx, sim) in enumerate(zip(blue_indices, blue_sims), 1):
    product_name = dataset_subset[idx]['productDisplayName'][:60]
    print(f'{rank}. {product_name}... (sim: {sim:.4f})')

# Check overlap
overlap = set(red_indices) & set(blue_indices)
print(f'\nOverlap (products appearing in both result sets): {len(overlap)} products')
print(f'Unique to red shirt: {len(set(red_indices) - set(blue_indices))} products')
print(f'Unique to blue shirt: {len(set(blue_indices) - set(red_indices))} products')

=== CONTRASTIVE ANALYSIS: red shirt vs blue shirt ===

RED SHIRT - Top 5 results:
1. Lee Men Check Red Shirts... (sim: 0.8536)
2. Lee Women Check Red Shirts... (sim: 0.8250)
3. Genesis Men Check Red Shirts... (sim: 0.8171)
4. Flying Machine Men Check Red Shirts... (sim: 0.8085)
5. Flying Machine Men Check Red Shirts... (sim: 0.8085)

BLUE SHIRT - Top 5 results:
1. Lee Women Check Blue Shirts... (sim: 0.8087)
2. Scullers For Her Blue Shirt... (sim: 0.8034)
3. Scullers For Her Blue Shirt... (sim: 0.8034)
4. Indigo Nation Men Price catch Blue Shirts... (sim: 0.7875)
5. Indigo Nation Men Checks Blue Shirts... (sim: 0.7832)

Overlap (products appearing in both result sets): 0 products
Unique to red shirt: 5 products
Unique to blue shirt: 5 products


## Step 5: Deliberately stress test the representation

Discover failure cases by intentionally testing situations where you believe the system should not work well.

### Plan
Have Copilot create a plan to:
- ensure search results are returned alongside their similarity scores or distance measures,
- reuse the existing embedding and search pipeline,
- run the system on a small set of **student-chosen test inputs** that you believe should produce poor, ambiguous, or misleading results.

You are responsible for selecting the test inputs. These should include:
- at least two inputs that you believe *should not* have meaningful matches in the dataset, and
- one input where similarity could reasonably be interpreted in multiple ways.

### Validate
Use Copilot to confirm that the plan:
- does not change the embedding model, index, or similarity metric,
- surfaces raw similarity scores for inspection,
- treats all inputs uniformly, without filtering or special handling.

Revise the plan until it reflects a straightforward reuse of the existing system.

### Execute
Once the plan is validated, have Copilot implement any minimal code changes needed (e.g., printing similarity scores, exposing distances, or reusing embedding functions).

Then run the system on your selected test inputs.

### Check
- For each test input, inspect the returned results and their similarity scores.
- Note whether the system returns results confidently even when the input is inappropriate or ill-defined.
- Identify at least one case where the numerical similarity does not align with what you would expect a user to find meaningful.

### Food for thought
- Are these failures obvious to a user, or would they appear plausible at first glance?
- Does the system ever recognize when there are no good results for a search?

## Section 9: Stress Testing & Edge Cases

In [16]:
print('=== STRESS TESTING & EDGE CASES ===')
print()

# Edge Case 1: Non-existent or abstract product description
print('Test 1: Abstract/non-existent product (holographic dragon armor)')
abstract_indices, abstract_sims = search_by_text('holographic dragon armor', top_k=3)
print('Results found - Model returns best approximation even for nonsense queries')
abs_product = dataset_subset[abstract_indices[0]]['productDisplayName'][:60]
print(f'Top match: {abs_product}... (sim: {abstract_sims[0]:.4f})')
print()

# Edge Case 2: Contradictory modifiers
print('Test 2: Contradictory modifiers (sleeveless long-sleeve sweater)')
contradictory_indices, contradictory_sims = search_by_text('sleeveless long-sleeve sweater', top_k=3)
print('Results found - Model handles contradictions by treating as composite concept')
contra_product = dataset_subset[contradictory_indices[0]]['productDisplayName'][:60]
print(f'Top match: {contra_product}... (sim: {contradictory_sims[0]:.4f})')
print()

# Edge Case 3: Ambiguous query
print('Test 3: Ambiguous query (black)')
ambiguous_indices, ambiguous_sims = search_by_text('black', top_k=3)
print('Results found - Model returns diverse black products (clothing types)')
amb_product = dataset_subset[ambiguous_indices[0]]['productDisplayName'][:60]
print(f'Top match: {amb_product}... (sim: {ambiguous_sims[0]:.4f})')
print()

print('=' * 80)
print()
print('### KEY OBSERVATIONS FROM STRESS TESTING ###')
print()
print('1. Model ALWAYS returns results: Graceful fallback even for nonsense input')
print('2. Contradictory modifiers: Model treats as weighted composite (fuzzy logic)')
print('3. Ambiguous terms: Returns high-confidence results from semantic space')
print('4. Similarity scores: Remain interpretable even for out-of-distribution queries')

=== STRESS TESTING & EDGE CASES ===

Test 1: Abstract/non-existent product (holographic dragon armor)
Results found - Model returns best approximation even for nonsense queries
Top match: Flying Machine Men Black Shoes... (sim: 0.5688)

Test 2: Contradictory modifiers (sleeveless long-sleeve sweater)
Results found - Model handles contradictions by treating as composite concept
Top match: ADIDAS Men Black Sweatshirt... (sim: 0.6499)

Test 3: Ambiguous query (black)
Results found - Model returns diverse black products (clothing types)
Top match: ID Men Black Shoes... (sim: 0.7683)


### KEY OBSERVATIONS FROM STRESS TESTING ###

1. Model ALWAYS returns results: Graceful fallback even for nonsense input
2. Contradictory modifiers: Model treats as weighted composite (fuzzy logic)
3. Ambiguous terms: Returns high-confidence results from semantic space
4. Similarity scores: Remain interpretable even for out-of-distribution queries


## End of Act 2

At this point, you should have concrete evidence of how encoder-based representations behave, what kinds of similarity they induce, and where those similarities break down.

Before moving on to Act III, create a file named `README.md` in the project root.

This README should capture the current state of the prototype as if you were handing it off to a colleague. Keep it concise and grounded in what actually exists.

### 1. What this prototype does
In one sentence, clearly describe the capability that was built and the problem it is intended to address.

### 2. How it works (at a high level)
In a few bullet points, specify:
- what data the system operates over,
- what representation or model it uses,
- how results are produced.

### 3. Limitations and open questions
Briefly note:
- the most important limitations you observed or conceive of, and
- any open questions that would need to be addressed before broader use.


This README will be used as reference context in Act 3.

## Act 3 — Socialize the Work

You have built a working prototype. Now you need to think about what it would mean to use it.

In this act, you will have conversations with three "colleagues" who approach this feature from different professional perspectives:

- A **Product Manager** focused on how users will interpret and trust the results.
- A **Catalog or Marketplace Strategy Lead** focused on how the system reshapes visibility and outcomes across products.
- An **Operations Manager** focused on what happens when the system produces ambiguous or problematic results.

Each of these perspectives highlights a different set of circumstantial concerns that emerge once a technical capability is placed inside an organization and exposed to real use.

Your goal in these conversations is to engage with those concerns. This means:
- explaining how the prototype behaves and performs,
- articulating tradeoffs in plain, cross-functional language,
- and reckoning with how technical choices intersect with human expectations, organizational processes, and downstream impact.

Each conversation should feel like a real internal discussion. When a persona has what they need to understand your reasoning and its implications, the conversation will naturally come to a close.


## End of Act 3

At this point, you're done! Make sure to submit the assignment on canvas.

### Submission
- Save the Notebook you have been working in and other files you created in your repo (i.e., agents.md, readme.md, etc).
- Export your Copilot Chat and save as a .txt, .json, or .md in the same directory as the above.
- **Upload your Notebook, agents.md, readme.md, and chat file to [the Canvas page for Assignment 2](https://canvas.northwestern.edu/courses/245397/assignments/1668981).**