![title.png](img/title.png)

![line.png](img/line.png)

![objectives.png](img/objectives.png)

![line.png](img/line.png)

![history.png](img/history.png)

![line.png](img/line.png)

![keyword_search_1](img/keyword_search_1.png)

![line.png](img/line.png)

![keyword_search_2.png](img/keyword_search_2.png)

![line.png](img/line.png)

![vector_embeddings.png](img/vector_embeddings.png)

![line.png](img/line.png)

![semantic_search.png](img/semantic_search.png)

![line.png](img/line.png)

![workshop_beginning.png](img/workshop_beginning.png)

# Setup environment

Install required libraries:

In [None]:
! pip install \
    pandas==2.2.2 \
    pyarrow==17.0.0 \
    vantage-sdk==0.9.3 \
    datasets==3.0.1

Setup utility functions (just execute following cell):

In [None]:
from utils import (
    display_row,
    display_query_results,
    display_shopping_assistant_results,
    display_images
)

# Dataset

The dataset consists of ~37k fashion items scraped from: `https://asos.com`.


In [None]:
RAW_DATASET = "smartcat/asos-data-embedded"

In [None]:
from datasets import load_dataset

data = load_dataset(RAW_DATASET, split="train")
dataset = data.to_pandas()
dataset.head(3)

In [None]:
display_row(dataset, 1)

# Setup Vantage

Create a new Vantage account (if you don't have one): https://www.vantagediscovery.com/

Vantage console: https://console.vanta.ge/

Vantage platform docs: https://docs.vantagediscovery.com/docs/about

Copy Vantage API key from: https://console.vanta.ge/api

In [None]:
from getpass import getpass

ACCOUNT_ID: str = "YOUR_ACCOUNT_ID"
VANTAGE_API_KEY: str = getpass("Enter your Vantage API key: ")

Create Vantage client:

In [None]:
from vantage_sdk import VantageClient

vantage = VantageClient.using_vantage_api_key(
    vantage_api_key=VANTAGE_API_KEY,
    account_id=ACCOUNT_ID,
)

vantage.get_account()

The Vantage platform can use popular models to retrieve item embeddings (or we can calculate them ourselves and upload). In this case, we’ll use OpenAI to get embeddings, so we need to add the OpenAI API key. There are two ways to do this:
1. Use the Vantage Console to add a new Model API Key.
2. Run the cell below.

In [None]:
OPENAI_API_KEY: str = getpass("Enter your OpenAI API key: ")

In [None]:
model_key = vantage.create_external_key(
    llm_provider = "OpenAI",
    llm_secret = OPENAI_API_KEY,
)

In [None]:
model_key

# Create collection
A collection is a fundamental object of the Vantage Platform that enables you to organize, manage, and search your data sets within the platform. Your data records, called documents, are ingested into a collection.

There are 2 types of collections:
1. Vantage Managed Embeddings (VME) - Vantage Platform manage the translation of your data to AI embeddings.
2. User Provided Embeddings (UPE) - You upload embeddings taken from the LLM of your choice into your collection.

We'll create VME collection with OpenAI embeddings:

In [None]:
from vantage_sdk.model.collection import OpenAICollection

collection = OpenAICollection(
    collection_id="test-collection-4",
    embeddings_dimension=1536,
    llm="text-embedding-ada-002",
    external_key_id=model_key.external_key_id
)
created_collection = vantage.create_collection(collection=collection)

In [None]:
print(created_collection)

# Indexing

The dataset needs to be processed to match Vantage ingestion format.

The documents we upload into the collection must conform to specific field and format requirements.

<div>
<img src="img/field_format.png" width="700"/>
</div>

Required fields:
1. `id` - This represents your ID for this document. It will be handed back to you in search results.
2. `text` for VME collections - text that will be embedded using your provided model
3. `embeddings` for UPE collections - array of 32-bit floating point numbers. The array length should match the Dimension Size of the collection you're putting data into.

Optional fields:
1. `operation` - `update` (default), `delete`, `add`
2. `meta_` fields - fields prefixed with `meta_` are used for search query filtering
3. `meta_ordered_` fields - fields prefixed with `meta_ordered_` are used for sorting search query results
4. `variants` - used to specify multiple variants of the product

The data can be uploaded:
- As `parquet` or `jsonl` files via Console, SDK, or API
- Using document upload API

### Prepare input data

In [None]:
dataset.head(1)

We'll create `parquet` file and upload it to the platform:

Create filter columns:

In [None]:
vantage_data = dataset.rename(
    columns={
        "availability": "meta_availability",
        "brand": "meta_brand",
        "gender": "meta_gender",
        "product_type": "meta_product_type",
        "style": "meta_style",
        "color": "meta_color",
        "price": "meta_ordered_price",
        "price_range": "meta_price_range",
    }
)
vantage_data.head(1)

Create text column:

In [None]:
def get_text(row: pd.Series) -> str:
    return "Title: " + row["title"] + "\nDescription: " + row["description"]


vantage_data["text"] = vantage_data.apply(get_text, axis=1)
vantage_data.drop(columns=["description", "image_url", "title"], inplace=True)
vantage_data.head(3)

In [None]:
INPUT_FILE_PATH: str = "data/input.parquet"

vantage_data.to_parquet(INPUT_FILE_PATH, index=False)

### Index data

Upload data into the collection:

In [None]:
vantage.upload_documents_from_parquet_file(
    collection_id=created_collection.collection_id,
    parquet_file_path=INPUT_FILE_PATH
)

# Search

Vantage platform provides different types of search:

<div>
<img src="img/search_types.png" width="1000"/>
</div>

1. Semantic search (for VME collections)
2. Embedding search (for UPE collections)
3. More-like-this search
4. More-like-these search
5. Vantage Vibe
6. Shopping assistant

### Semantic search

It takes a text query (typically entered by the end user) and returns IDs and relevance scores of the semantically closest items.

#### **Simple semantic search:**

In [None]:
semantic_results = vantage.semantic_search(
    text="adidas sneakers",
    collection_id=created_collection.collection_id,
)

In [None]:
display_query_results(dataset, semantic_results.results)

#### **Semantic search with filters**

Search results can be filtered by `meta` columns and ordered by `meta_ordered` columns:

In [None]:
from vantage_sdk.model.search import Filter, Sort, Pagination

results_with_filters = vantage.semantic_search(
    text="adidas sneakers",
    collection_id=created_collection.collection_id,
    filter=Filter(boolean_filter="(gender:\"Men\" AND color:\"WHITE\")"),
    sort=Sort(field="price", order="asc", mode="semantic_threshold"),
    pagination=Pagination(page=0, count=10, threshold=10)
)

In [None]:
display_query_results(dataset, results_with_filters.results)

#### **More semantic search examples**

In [None]:
semantic_results_2 = vantage.semantic_search(
    text="outfit for gym cardio sessions",
    filter=Filter(boolean_filter="gender:\"Men\""),
    collection_id=created_collection.collection_id,
)

In [None]:
display_query_results(dataset, semantic_results_2.results)

In [None]:
semantic_results_3 = vantage.semantic_search(
    text="Elegant dinner party outfit",
    collection_id=created_collection.collection_id,
)

In [None]:
display_query_results(dataset, semantic_results_3.results)

In [None]:
semantic_results_4 = vantage.semantic_search(
    text="Light and comfortable clothing for a summer vacation",
    collection_id=created_collection.collection_id,
)

In [None]:
display_query_results(dataset, semantic_results_4.results)

#### More like this

This type of search takes a document ID and finds similar results in your collection.

In [None]:
more_like_this_results = vantage.more_like_this_search(
    document_id="205357122",
    collection_id="test-collection-3"
)

In [None]:
display_query_results(dataset, more_like_this_results.results)

#### More like these

<div>
<img src="img/more_like_these.png" width="800"/>
</div>

This type of search blends `text`, `items`, and `embeddings` to get personalized search, usually based on user data or external sources. 
It allows combining multi-modal inputs, such as blending image embeddings or text descriptions with current searches.

In [None]:
from vantage_sdk.model.search import MoreLikeTheseItem

these = [
    MoreLikeTheseItem(query_text="outfit for long walks", weight=0.4),
    MoreLikeTheseItem(query_document_id="205629611", weight=0.4),
    MoreLikeTheseItem(query_document_id="206263911", weight=0.2)
]
more_like_these_results = vantage.more_like_these_search(
    more_like_these=these,
    collection_id=created_collection.collection_id,
    filter=Filter(boolean_filter="gender:\"Men\""),
)

In [None]:
display_query_results(dataset, more_like_these_results.results)

#### **Vantage Vibe**

This feature allows you to search over collections with a specific "vibe" or thematic focus. This feature leverages both visual and semantic inputs to refine and personalize search results.

We need to create a Vibe configuration first:

In [None]:
vibe = vantage.create_vibe_configuration(
    name="Floral vibe",
    external_account_id=model_key.external_key_id,
    llm_model_name="gpt-4o-mini",
)
vibe

Images that will be used for Vibe search:

In [None]:
VIBE_IMAGES: List[str] = [
    "https://cdna.lystit.com/520/650/n/photos/asos/bc43b994/nobodys-child-Multi-floral-Aurora-Bloom-Bandeau-Midi-Dress.jpeg",
    "https://product-images.thecoolhour.com/images/asos_annorlunda_annorlunda_barb_wire_floral_asymmetric_fluted_sleeve_midaxi_dress_in_pink_xl.jpg",
    "https://images.asos-media.com/products/french-connection-long-sleeve-mini-mesh-dress-in-white-floral/204319668-1-whitefloral?$n_960w$&wid=952&fit=constrain",
    "https://is4.revolveassets.com/images/p4/n/z/ASTR-WD159_V1.jpg"
]

display_images(VIBE_IMAGES)

Vibe search:

In [None]:
from vantage_sdk.model.search import VantageVibeImageUrl

images = [VantageVibeImageUrl(url=image) for image in VIBE_IMAGES]
text = "romantic dinner"

vibe_results = vantage.vantage_vibe_search(
    vibe_id=vibe.id,
    collection_id=created_collection.collection_id,
    images=images,
    text=text,
)

In [None]:
display_query_results(dataset, vibe_results.results)

#### **Shopping assistant**

With shopping assistant, users can create a detailed specification that defines how search results should be grouped. These groups are created based on the criteria set during the assistant's creation. Once the assistant is set up, users can perform searches by sending a `text query` and a `shopping_assistant_id`.

We need to create a Shopping assistant configuration first:

In [None]:
shopping_assistant = vantage.create_shopping_assistant(
    name="Full outfit assistant",
    groups=["shirt", "pants", "shoes"],
    external_account_id=model_key.external_key_id,
    llm_model_name="gpt-4o-mini",
)

In [None]:
shopping_assistant

Query using shopping assistant:

In [None]:
response = vantage.shopping_assistant_search(
    collection_id=created_collection.collection_id,
    text="outfit for business meeting",
    shopping_assistant_id=shopping_assistant.shopping_assistant_id,
    max_groups=3,
    filter=Filter(boolean_filter="gender:\"Men\""),
)

In [None]:
response

In [None]:
display_shopping_assistant_results(dataset, response.groups)