# Basic buildling blocks

<div class="alert alert-block alert-warning">
Replace <code>YOUR_GITHUB_TOKEN</code> in the install script. To get your token follow the instructions in the <a href="../README.md">README.md</a>
</div>

In [15]:
%pip install  'https://us-central1-data-359211.cloudfunctions.net/github-proxy/superlinked-2.1.0.post5+git.c534f821-py3-none-any.whl?token=YOUR_GITHUB_TOKEN'

Import the dependencies.

In [16]:
import pandas as pd

from superlinked.framework.common.schema.schema import schema
from superlinked.framework.common.schema.schema_object import String
from superlinked.framework.dsl.executor.in_memory.in_memory_executor import InMemoryExecutor
from superlinked.framework.dsl.index.index import Index
from superlinked.framework.dsl.query.param import Param
from superlinked.framework.dsl.query.query import Query
from superlinked.framework.dsl.source.in_memory_source import InMemorySource
from superlinked.framework.dsl.space.text_similarity_space import TextSimilaritySpace

Create a schema for your data.

In [17]:
@schema
class ParagraphSchema:
    body: String

Instantiate a new instance of your schema to start the pipeline definition.

In [18]:
paragraph = ParagraphSchema()

Create a space that will run a transformers model on the body of the paragraph.

In [19]:
relevance_space = TextSimilaritySpace(text=paragraph.body, model="all-MiniLM-L6-v2")

Group your space in an index to make it retrievable.

In [20]:
paragraph_index = Index(relevance_space)

Define a query that will search for similar paragraphs in the index. The parameters will be filled later on.

In [21]:
query = (
    Query(paragraph_index)
    .find(paragraph)
    .similar(relevance_space.text, Param("query_text"))
)

Create an in-memory source and executor to try out your configuration

In [22]:
source: InMemorySource = InMemorySource(paragraph)
executor = InMemoryExecutor(sources=[source], indices=[paragraph_index])
app = executor.run()

Insert some example data.

In [23]:
source.put([{"id": "happy_dog", "body": "That is a happy dog"}])
source.put([{"id": "happy_person", "body": "That is a very happy person"}])
source.put([{"id": "sunny_day", "body": "Today is a sunny day"}])

Query you data.

In [24]:
result = app.query(query, query_text="This is a happy person")

pd.DataFrame([entry.stored_object for entry in result.entries])

Unnamed: 0,body,id
0,That is a very happy person,happy_person
1,That is a happy dog,happy_dog
2,Today is a sunny day,sunny_day


Check how a different query can produce different results.

In [25]:
result = app.query(query, query_text="This is a happy dog")

pd.DataFrame([entry.stored_object for entry in result.entries])

Unnamed: 0,body,id
0,That is a happy dog,happy_dog
1,That is a very happy person,happy_person
2,Today is a sunny day,sunny_day
