# Understand Vector Sampler

`VectorSampler` is a utility to directly access and export your vectors stored in the `Index` into a `VectorCollection` being essentially a numpy array and a list of ids corresponding to rows.

In [1]:
%pip install superlinked==36.3.0

In [2]:
import pandas as pd
from superlinked import framework as sl

pd.set_option("display.max_colwidth", 100)

## Load data into Superlinked

In [3]:
class Paragraph(sl.Schema):
    id: sl.IdField
    body: sl.String
    category: sl.StringList


paragraph = Paragraph()

In [4]:
body_space = sl.TextSimilaritySpace(text=paragraph.body, model="sentence-transformers/all-mpnet-base-v2")
category_space = sl.CategoricalSimilaritySpace(
    category_input=paragraph.category,
    categories=["category-1", "category-2", "category-3"],
)

paragraph_index = sl.Index([body_space, category_space])

In [5]:
source: sl.InMemorySource = sl.InMemorySource(paragraph)
executor = sl.InMemoryExecutor(sources=[source], indices=[paragraph_index])
app = executor.run()

In [6]:
source.put(
    [
        {
            "id": "paragraph-1",
            "body": "Glorious animals live in the wilderness.",
            "category": "category-2",
        },
        {
            "id": "paragraph-2",
            "body": "Growing computation power enables advancements in AI.",
            "category": "category-3",
        },
        {
            "id": "paragraph-3",
            "body": "Processed foods are generally worse for your health than raw vegetables.",
            "category": "category-1",
        },
        {
            "id": "paragraph-4",
            "body": "The fauna of distant places can surprise travelers.",
            "category": "category-2",
        },
    ]
)

## Using a Vector Sampler 

A `VectorSampler` object can be created by supplying it with a running `executor` instance, an `app`. Subsequently, vectors from indices can be exported into a `VectorCollection` object per schema. The collections can contain all vectors or can be filtered by (a list of) id(s).

In [7]:
vector_sampler = sl.VectorSampler(app=app)

### Get a subset of vectors

A `VectorCollection` object is essentially a numpy array (vectors) with shape `(num_entities, vector_dims)` and a corresponding `id_list` where `id_list[i]` is the id of `vectors[i, :]`.

In [8]:
singular_vector_collection = vector_sampler.get_vectors_by_ids(
    id_="paragraph-1", index=paragraph_index, schema=paragraph
)
singular_vector_collection

VectorCollection of 1 vector.

In [9]:
singular_vector_collection.id_list  # the id we requested

['paragraph-1']

In [10]:
# 1 vector, 768 dimensions for text embedding, 4 for categorical embedding (3 categories and other)
(len(singular_vector_collection.vectors), len(singular_vector_collection.vectors[0]))

(1, 772)

### Get all vectors

In [11]:
vector_collection = vector_sampler.get_all_vectors(
    index=paragraph_index, schema=paragraph
)  # return all vectors of a schema in an index
id_list, vector_array = vector_collection.id_list, vector_collection.vectors

In [12]:
vector_collection

VectorCollection of 4 vectors.

In [13]:
id_list  # all 4 vector ids

['paragraph-1', 'paragraph-4', 'paragraph-3', 'paragraph-2']

In [14]:
# 4 vector, 768 dimensions for text embedding, 4 for categorical embedding (3 categories and other)
(len(vector_collection.vectors), len(vector_collection.vectors[0]))

(4, 772)