# Filtering out elements based on hard criteria

In a lot of cases hard filtering is needed, when we specifically do not want the result set to contain some items, no matter how deep we scroll into the results. This can be achieved via the `.filter` clause in the `Query`.

In [1]:
%pip install superlinked==6.1.0

In [2]:
import pandas as pd

from superlinked.framework.common.schema.id_schema_object import IdField
from superlinked.framework.common.schema.schema import schema
from superlinked.framework.common.schema.schema_object import String
from superlinked.framework.dsl.index.index import Index
from superlinked.framework.dsl.space.text_similarity_space import TextSimilaritySpace

from superlinked.framework.dsl.executor.in_memory.in_memory_executor import (
    InMemoryExecutor,
)
from superlinked.framework.dsl.source.in_memory_source import InMemorySource
from superlinked.framework.dsl.query.query import Query

pd.set_option("display.max_colwidth", 100)

In [3]:
@schema
class Paragraph:
    id: IdField
    body: String
    author: String


paragraph = Paragraph()

body_space = TextSimilaritySpace(
    text=paragraph.body, model="sentence-transformers/all-mpnet-base-v2"
)
author_space = TextSimilaritySpace(
    text=paragraph.author, model="sentence-transformers/all-mpnet-base-v2"
)

<div class="alert alert-block alert-info"><b>NOTE:</b> 
The index definition requires the fields that we plan to create filters for.

In [4]:
paragraph_index = Index(
    [body_space, author_space], fields=[paragraph.author, paragraph.body]
)

Now let's add some data and try it out!

In [5]:
source: InMemorySource = InMemorySource(paragraph)
executor = InMemoryExecutor(sources=[source], indices=[paragraph_index])
app = executor.run()

In [6]:
source.put(
    [
        {
            "id": "paragraph-1",
            "body": "The first thing Adam wrote.",
            "author": "Adam",
        },
        {
            "id": "paragraph-2",
            "body": "The first thing Bob wrote.",
            "author": "Bob",
        },
        {
            "id": "paragraph-3",
            "body": "The second thing Adam wrote.",
            "author": "Adam",
        },
    ]
)

## Using the .filter clause

Provides the opportunity to write filters on the result set. For example I can ask for articles written by Adam...

In [7]:
adam_query = Query(paragraph_index).find(paragraph).filter(paragraph.author == "Adam")
adam_result = app.query(adam_query)

adam_result.to_pandas()

Unnamed: 0,body,author,id
0,The first thing Adam wrote.,Adam,paragraph-1
1,The second thing Adam wrote.,Adam,paragraph-3


...or not Adam.

In [8]:
bob_query = Query(paragraph_index).find(paragraph).filter(paragraph.author != "Adam")
bob_result = app.query(bob_query)

bob_result.to_pandas()

Unnamed: 0,body,author,id
0,The first thing Bob wrote.,Bob,paragraph-2


and we can also stack multiple filters to form AND type of relationship.

In [9]:
stacked_query = (
    Query(paragraph_index)
    .find(paragraph)
    .filter(paragraph.author == "Adam")
    .filter(paragraph.body == "The first thing Adam wrote.")
)
stacked_result = app.query(stacked_query)

stacked_result.to_pandas()

Unnamed: 0,body,author,id
0,The first thing Adam wrote.,Adam,paragraph-1


## Summary

We are supporting

* the `==` and `!=` operators
* AND relationships by stacking filters