# Optional schema fields

`SchemaField`s can be declared optional, allowing the user to ingest records where that particular field is missing. Non-optional `SchemaField`s will raise at ingestion of the data is missing.

In [1]:
%pip install superlinked==37.0.0

In [2]:
import pandas as pd
from superlinked import framework as sl

pd.set_option("display.max_colwidth", 100)

## Set up optional fields

In [3]:
class Paragraph(sl.Schema):
    id: sl.IdField
    body: sl.String
    like_count: sl.Integer | None  # configuring an optional SchemaField

This way one can ingest records where `like_count` is missing.

Now let's set up a basic config to see it working.

In [4]:
paragraph = Paragraph()

body_space = sl.TextSimilaritySpace(text=paragraph.body, model="sentence-transformers/all-mpnet-base-v2")
like_space = sl.NumberSpace(number=paragraph.like_count, min_value=0, max_value=100, mode=sl.Mode.MAXIMUM)

paragraph_index = sl.Index([body_space, like_space])

source: sl.InMemorySource = sl.InMemorySource(paragraph)
executor = sl.InMemoryExecutor(sources=[source], indices=[paragraph_index])
app = executor.run()

## Ingesting records with missing data

`like_count` containing None, or the record simply missing the key results in the system handling the field as missing data.

In [5]:
source.put(
    [
        {
            "id": "paragraph-1",
            "body": "Glorious animals live in the wilderness.",
            "like_count": 10,
        },
        {
            "id": "paragraph-1-missing-key",
            "body": "Glorious animals live in the wilderness.",
        },
        {
            "id": "paragraph-2",
            "body": "Growing computation power enables advancements in AI.",
            "like_count": 100,
        },
        {
            "id": "paragraph-2-missing-None",
            "body": "Growing computation power enables advancements in AI.",
            "like_count": None,
        },
    ]
)

But `body`, which is not configured as optional is going to raise for any of these inputs that constitute a missing value

In [6]:
try:
    source.put(
        [
            {
                "id": "paragraph-x",
                "body": None,
                "like_count": 10,
            }
        ]
    )
except Exception as e:
    print(e)

("The SchemaField Paragraph.body doesn't have a default value and was not provided in the ParsedSchema.",)


## Querying items with missing values

Missing values do not influence query results, they effectively produce zero scores in terms of that particular attribute. Let's showcase that with a query!

In [7]:
query = (
    sl.Query(paragraph_index, weights={body_space: sl.Param("body_weight"), like_space: sl.Param("like_weight")})
    .find(paragraph)
    .similar(body_space, sl.Param("query_text"))
    .select_all()
)

In [8]:
result = app.query(
    query, query_text="Growing computation power enables advancements in AI.", body_weight=1.0, like_weight=1.0
)

sl.PandasConverter.to_pandas(result)

Unnamed: 0,body,like_count,id,similarity_score,rank
0,Growing computation power enables advancements in AI.,100.0,paragraph-2,1.0,0
1,Growing computation power enables advancements in AI.,,paragraph-2-missing-None,0.5,1
2,Glorious animals live in the wilderness.,10.0,paragraph-1,0.096102,2
3,Glorious animals live in the wilderness.,,paragraph-1-missing-key,0.017885,3


We can easily observe that by searching with the exact same text, the maximum like count `paragraph-2` has perfect score, while the paragraph with the same text but missing like count (`paragraph-2-missing-None`) has exactly half score.