In [1]:
import numpy as np
import pandas as pd
from pathlib import Path
data_dir = Path("../data").absolute()

In [2]:
df = pd.read_parquet(data_dir / "product_images.parquet")
df.sample(10)

Unnamed: 0,asin,title,primary_image
3996,B0949GBY28,Cincinnati Bengals NFL Mens Gradient Wordmark ...,https://m.media-amazon.com/images/I/41pNkBE+zC...
78956,B0829NLSX7,"USB C Coiled Cable for Car, Baseus Retractable...",https://m.media-amazon.com/images/I/41uqUkIsvG...
52885,B091D171NH,Mexican Slaps Lollipop Candy Green Apple Flavo...,https://m.media-amazon.com/images/I/51cZMusUWh...
53735,B08S721WDG,MICPANG Knife Sharpener 3 Stage Knife Sharpeni...,https://m.media-amazon.com/images/I/41fIlOCdMa...
54315,B08LPMTSCN,HONOR Band 6 Smart Watch Fitness Tracker Watch...,https://m.media-amazon.com/images/I/41A98p+ro0...
34528,B089Y75PN2,Artistic Weavers Gaillard Modern Abstract Runn...,https://m.media-amazon.com/images/I/61+UP0RkLY...
59390,B09GV6HQBV,BXYJDJ Men's Running Shoes Walking Trainers Sn...,https://m.media-amazon.com/images/I/41wFtILOyF...
57204,B09SWKGR8H,50FT Expandable Garden Hose Water Hose with 10...,https://m.media-amazon.com/images/I/61lEcqpuWL...
59688,B09ZQXJSB8,BTFBM Women Casual Long Sleeve Ruched Wrap Dre...,https://m.media-amazon.com/images/I/410OOtbl+-...
47911,B09JNJX39P,"Under Armour Mens ArmourFleece Twist Hoodie , ...",https://m.media-amazon.com/images/I/41U8hownKn...


# Task description
## The Data
The dataframe contains the top 100k best-selling items on Amazon (as of November 2022) has 3 columns

1. `asin` - The Amazon identifier.
1. `title` - The product title, as listed on the Amazon store.
1. `primary_image` - The image to be listed in search results.

## Goal
The goal of the task is be able to search products both by textual similarity, and by image similarity.

For example, a customer walking down the street could take a picture of a red dress she likes and get similar items from Amazon.

Altenatively, that same customer might open the Amazon website and search for "red dress" and find items that correspond to that query.

## Implementation

### Embedding
We will use [CLIP](https://github.com/openai/CLIP) embedding for this task.
<img src="https://openaiassets.blob.core.windows.net/$web/clip/draft/20210104b/overview-b.svg" width="400">

CLIP allows us to link images with their description and map them to the same embedding space.

### Similarity search

Once the embedding is done, we need to run a nearest-neighbor search using the `cosine` similarity measure.

The products that are closest to the query vector should (hopefully) be similar to the customer's intentions.

The query vector could be a result of either `CLIP` image embedding or `CLIP` textual embedding.

We will use the [vecsim](https://github.com/argmaxml/vecsim) module to do the similarity search.

### Serving

We used [Flask](https://flask.palletsprojects.com/en/2.2.x/) to implement the web-server, the code is at `server.py`.

**Note**: The server code cotains several `TODO:` comments, you will need to implement. The server is currently functional and it outputs random results.

# Submission


1. Please clone this repo to a **private** repo on your github account.
1. Implement the missing parts.
1. Please fill in [this form](https://forms.gle/apMr8zPLbBf9pQY7A).
1. Once done, please schdule an interview with Uri to review the code

## Submission deadline:
December 21st, 2022

## Good luck !
