# Querying options

It is possible to simply run a query with 
- either a vectorized version of some input supplied when running the query,
- or an object from our storage that results in the most relevant items to the said object.

In [1]:
%pip install superlinked==22.3.0

In [2]:
import pandas as pd
from superlinked import framework as sl

pd.set_option("display.max_colwidth", 100)

In [3]:
class Paragraph(sl.Schema):
    id: sl.IdField
    body: sl.String
    category: sl.String


paragraph = Paragraph()

body_space = sl.TextSimilaritySpace(text=paragraph.body, model="sentence-transformers/all-mpnet-base-v2")
category_space = sl.CategoricalSimilaritySpace(
    category_input=paragraph.category, categories=["IT", "environment"], uncategorized_as_category=True
)
paragraph_index = sl.Index([body_space, category_space])

Now let's add some data to our space and fire up a running executor

In [4]:
source: sl.InMemorySource = sl.InMemorySource(paragraph)
executor = sl.InMemoryExecutor(sources=[source], indices=[paragraph_index])
app = executor.run()

In [5]:
source.put(
    [
        {"id": "paragraph-1", "body": "Glorious animals live in the wilderness.", "category": "environment"},
        {
            "id": "paragraph-2",
            "body": "Growing computation power enables advancements in AI.",
            "category": "IT",
        },
        {
            "id": "paragraph-3",
            "body": "The flora and fauna of a specific habitat highly depend on the weather.",
            "category": "environment",
        },
    ]
)

## Using the .similar clause

Makes us able to supply query input unrelated to the stored vectors.

In [6]:
# we are creating a Param to reuse the query.
# For more info check the `dynamic_parameters.ipynb` feature notebook in this same folder.
similar_query = sl.Query(paragraph_index).find(paragraph).similar(body_space, sl.Param("similar_input")).select_all()

In [7]:
similar_result_weather = app.query(similar_query, similar_input="rainfall")
sl.PandasConverter.to_pandas(similar_result_weather)

Unnamed: 0,body,category,id,similarity_score
0,The flora and fauna of a specific habitat highly depend on the weather.,environment,paragraph-3,0.337601
1,Glorious animals live in the wilderness.,environment,paragraph-1,0.094036
2,Growing computation power enables advancements in AI.,IT,paragraph-2,0.044686


In [8]:
similar_result_it = app.query(similar_query, similar_input="progress in AI")
sl.PandasConverter.to_pandas(similar_result_it)

Unnamed: 0,body,category,id,similarity_score
0,Growing computation power enables advancements in AI.,IT,paragraph-2,0.598644
1,Glorious animals live in the wilderness.,environment,paragraph-1,0.007107
2,The flora and fauna of a specific habitat highly depend on the weather.,environment,paragraph-3,-0.121885


## Using the .with_vector clause

Provides the opportunity to search with the vector of an object in our database. This is useful for example for recommending items for a user based on it's vector.

In [9]:
with_vector_query = sl.Query(paragraph_index).find(paragraph).with_vector(paragraph, "paragraph-3", 1.0).select_all()

In this case the weight in the clause didn't really matter as there was no other competing clauses. Stay tuned because this is not always the case!

In [10]:
with_vector_result = app.query(with_vector_query)
sl.PandasConverter.to_pandas(with_vector_result)

Unnamed: 0,body,category,id,similarity_score
0,The flora and fauna of a specific habitat highly depend on the weather.,environment,paragraph-3,1.0
1,Glorious animals live in the wilderness.,environment,paragraph-1,0.655296
2,Growing computation power enables advancements in AI.,IT,paragraph-2,-0.00953


The first result is the one we are searching with, the second is the more related one, and finally the less connected paragraph body comes.

Note however, that with_vector queries can be weighted on a per-space basis as well!

In [11]:
weight_dict: dict[sl.Space, float] = {body_space: 0.0, category_space: 1.0}
with_vector_query_space_weights = (
    sl.Query(paragraph_index).find(paragraph).with_vector(paragraph, "paragraph-3", weight_dict).select_all()
)
with_vector_result_space_weights = app.query(with_vector_query_space_weights)
sl.PandasConverter.to_pandas(with_vector_result_space_weights)

Unnamed: 0,body,category,id,similarity_score
0,Glorious animals live in the wilderness.,environment,paragraph-1,1.0
1,The flora and fauna of a specific habitat highly depend on the weather.,environment,paragraph-3,1.0
2,Growing computation power enables advancements in AI.,IT,paragraph-2,0.0


In the above case as we see the results are only based on the `category` information.

While below, only the body of the text influences the similarities.

In [12]:
weight_dict_alt: dict[sl.Space, float] = {body_space: 1.0, category_space: 0.0}
with_vector_query_space_weights_alt = (
    sl.Query(paragraph_index).find(paragraph).with_vector(paragraph, "paragraph-3", weight_dict_alt).select_all()
)
with_vector_result_space_weights_alt = app.query(with_vector_query_space_weights_alt)
sl.PandasConverter.to_pandas(with_vector_result_space_weights_alt)

Unnamed: 0,body,category,id,similarity_score
0,The flora and fauna of a specific habitat highly depend on the weather.,environment,paragraph-3,1.0
1,Glorious animals live in the wilderness.,environment,paragraph-1,0.310591
2,Growing computation power enables advancements in AI.,IT,paragraph-2,-0.019059


## Combine them

With the use of weights, creating any combination of inputs is possible. Imagine a situation where we search for a term, `similar_input` in those paragraphs that are relevant to a specific paragraph, denoted by `paragraph_id`. It is possible to weight the input using `input_weight` `Param`, in the relation to the context the search takes place inside using `context_weight` `Param`. Note that the `Param` names are totally arbitrary, the clauses matter.

In [13]:
# we are using dynamic parameters again
combined_query = (
    sl.Query(paragraph_index)
    .find(paragraph)
    .similar(body_space, sl.Param("similar_input"), weight=sl.Param("input_weight"))
    .with_vector(paragraph, sl.Param("paragraph_id"), weight=sl.Param("context_weight"))
    .select_all()
)

In [14]:
# equal weight
combined_result = app.query(
    combined_query,
    similar_input="progress in AI",
    paragraph_id="paragraph-3",
    input_weight=1,
    context_weight=1,
)
sl.PandasConverter.to_pandas(combined_result)

Unnamed: 0,body,category,id,similarity_score
0,The flora and fauna of a specific habitat highly depend on the weather.,environment,paragraph-3,0.831307
1,Glorious animals live in the wilderness.,environment,paragraph-1,0.619865
2,Growing computation power enables advancements in AI.,IT,paragraph-2,0.218673


In [15]:
# upweight context - notice the score differences
combined_result_context = app.query(
    combined_query,
    similar_input="progress in AI",
    paragraph_id="paragraph-3",
    input_weight=0.25,
    context_weight=1,
)
sl.PandasConverter.to_pandas(combined_result_context)

Unnamed: 0,body,category,id,similarity_score
0,The flora and fauna of a specific habitat highly depend on the weather.,environment,paragraph-3,0.984387
1,Glorious animals live in the wilderness.,environment,paragraph-1,0.656062
2,Growing computation power enables advancements in AI.,IT,paragraph-2,0.06525


In [16]:
# give more weight to query time input - the most relevant document changes
combined_result_input = app.query(
    combined_query,
    similar_input="progress in AI",
    paragraph_id="paragraph-3",
    input_weight=1,
    context_weight=0.1,
)
sl.PandasConverter.to_pandas(combined_result_input)

Unnamed: 0,body,category,id,similarity_score
0,Glorious animals live in the wilderness.,environment,paragraph-1,0.519222
1,The flora and fauna of a specific habitat highly depend on the weather.,environment,paragraph-3,0.488978
2,Growing computation power enables advancements in AI.,IT,paragraph-2,0.300537


In order to use per-space weights, the dict structure has to be in place and the actual values can be `Param`s.

In [17]:
# we are using dynamic parameters again
combined_query_dict_context_weights = (
    sl.Query(paragraph_index)
    .find(paragraph)
    .similar(body_space, sl.Param("similar_input"), weight=sl.Param("input_weight"))
    .with_vector(
        paragraph,
        sl.Param("paragraph_id"),
        weight={body_space: sl.Param("body_context_weight"), category_space: sl.Param("category_context_weight")},
    )
    .select_all()
)
# I can even use specific weights for context, too as seen before
combined_result_input = app.query(
    combined_query_dict_context_weights,
    similar_input="progress in AI",
    paragraph_id="paragraph-3",
    input_weight=1,
    body_context_weight=0.15,
    category_context_weight=0.05,
)
sl.PandasConverter.to_pandas(combined_result_input)

Unnamed: 0,body,category,id,similarity_score
0,Glorious animals live in the wilderness.,environment,paragraph-1,0.527039
1,The flora and fauna of a specific habitat highly depend on the weather.,environment,paragraph-3,0.514157
2,Growing computation power enables advancements in AI.,IT,paragraph-2,0.30001


## Filter results based on score or position

In [18]:
# let's use combined query above with some preset params
params = {
    "similar_input": "progress in AI",
    "paragraph_id": "paragraph-3",
    "input_weight": 1,
    "context_weight": 0.25,
}

In [19]:
# return top 2 items
combined_query_limit_result = app.query(combined_query.limit(2), **params)
sl.PandasConverter.to_pandas(combined_query_limit_result)

Unnamed: 0,body,category,id,similarity_score
0,The flora and fauna of a specific habitat highly depend on the weather.,environment,paragraph-3,0.564008
1,Glorious animals live in the wilderness.,environment,paragraph-1,0.542344


In [20]:
# return items with scores larger than 0.5
combined_query_radius_result = app.query(combined_query.radius(0.5), **params)
sl.PandasConverter.to_pandas(combined_query_radius_result)

Unnamed: 0,body,category,id,similarity_score
0,The flora and fauna of a specific habitat highly depend on the weather.,environment,paragraph-3,0.564008
1,Glorious animals live in the wilderness.,environment,paragraph-1,0.542344
