# Sineps Self Query

>[Sineps](https://sineps.io) provides fast and cost-effective NLP models. [Filter Extractor](https://sineps.io/filter-extractor) is a model that extracts the metadata filter from the user's query. It can be used as a self-query retriever. In langchain, the Filter Extractor is implemented as a `SinepsSelfQueryRetriever`.

In the walkthrough, we'll demo the `SinepsSelfQueryRetriever` with a simple example.

## Setup

You will need to have an API key to use Sineps. You can get one [here](https://platform.sineps.io/).

In [1]:
import os
import getpass

os.environ["SINEPS_API_KEY"] = getpass.getpass("Sineps API Key:")

We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key.

In [2]:
import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

## Creating a Chroma vector store

First we'll want to create a Chroma vector store and seed it with some data. We've created a small demo set of documents that contain summaries of movies.

**Note:** The self-query retriever requires you to have `lark` installed (`pip install lark`). We also need the `langchain-chroma` package.

In [None]:

%pip install --upgrade --quiet  lark

In [None]:
%pip install --upgrade --quiet  langchain-chroma

In [3]:
from langchain_chroma import Chroma
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

In [4]:
docs = [
    Document(
        page_content="A bunch of scientists bring back dinosaurs and mayhem breaks loose",
        metadata={"year": 1993, "rating": 7.7, "genre": "science fiction"},
    ),
    Document(
        page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ...",
        metadata={"year": 2010, "director": "Christopher Nolan", "rating": 8.2},
    ),
    Document(
        page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea",
        metadata={"year": 2006, "director": "Satoshi Kon", "rating": 8.6},
    ),
    Document(
        page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them",
        metadata={"year": 2019, "director": "Greta Gerwig", "rating": 8.3},
    ),
    Document(
        page_content="Toys come alive and have a blast doing so",
        metadata={"year": 1995, "genre": "animated"},
    ),
    Document(
        page_content="Three men walk into the Zone, three men walk out of the Zone",
        metadata={
            "year": 1979,
            "director": "Andrei Tarkovsky",
            "genre": "science fiction",
            "rating": 9.9,
        },
    ),
]
vectorstore = Chroma.from_documents(docs, embeddings)

## Creating the Sineps self-querying retriever

Now we can instantiate our retriever. To do this we'll need to provide some information upfront about the metadata fields that our documents support and a short description of the document contents.

Note that we use `SinepsAttributeInfo` to specify the metadata fields instead of `AttributeInfo`. 

In [5]:
from langchain_community.retrievers.sineps import (
    SinepsAttributeInfo,
    SinepsSelfQueryRetriever,
)

sineps_metadata_field_info = [
    SinepsAttributeInfo(
        name="genre",
        description="The genre of the movie.",
        type="string",
        values=[
            "science fiction",
            "comedy",
            "drama",
            "thriller",
            "romance",
            "action",
            "animated",
        ],
    ),
    SinepsAttributeInfo(
        name="year",
        description="The year the movie was released",
        type="number",
    ),
    SinepsAttributeInfo(
        name="director",
        description="The name of the movie director",
        type="string",
        values=[
            "Christopher Nolan",
            "Satoshi Kon",
            "Greta Gerwig",
            "Andrei Tarkovsky",
        ],
    ),
    SinepsAttributeInfo(
        name="rating", description="A 1-10 rating for the movie", type="number"
    ),
]
retriever = SinepsSelfQueryRetriever(
    vectorstore=vectorstore,
    sineps_metadata_field_info=sineps_metadata_field_info,
    verbose=True,
)

## Testing it out
And now we can try actually using our retriever!

In [6]:
# This example only specifies a relevant query
retriever.invoke("What are some movies about dinosaurs")

Generated Query: query='What are some movies about dinosaurs' filter=None limit=None


[Document(metadata={'genre': 'science fiction', 'rating': 7.7, 'year': 1993}, page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose'),
 Document(metadata={'genre': 'animated', 'year': 1995}, page_content='Toys come alive and have a blast doing so'),
 Document(metadata={'director': 'Satoshi Kon', 'rating': 8.6, 'year': 2006}, page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea'),
 Document(metadata={'director': 'Christopher Nolan', 'rating': 8.2, 'year': 2010}, page_content='Leo DiCaprio gets lost in a dream within a dream within a dream within a ...')]

In [7]:
# This example only specifies a filter
retriever.invoke("I want to watch a movie rated higher than 8.5")

[Document(metadata={'director': 'Satoshi Kon', 'rating': 8.6, 'year': 2006}, page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea'),
 Document(metadata={'director': 'Andrei Tarkovsky', 'genre': 'science fiction', 'rating': 9.9, 'year': 1979}, page_content='Three men walk into the Zone, three men walk out of the Zone')]

In [8]:
# This example specifies a query and a filter
retriever.invoke("Has Greta Gerwig directed any movies about women")

[Document(metadata={'director': 'Greta Gerwig', 'rating': 8.3, 'year': 2019}, page_content='A bunch of normal-sized women are supremely wholesome and some men pine after them')]

In [9]:
# This example specifies a composite filter
retriever.invoke("What's a highly rated (above 8.5) science fiction film?")

[Document(metadata={'director': 'Andrei Tarkovsky', 'genre': 'science fiction', 'rating': 9.9, 'year': 1979}, page_content='Three men walk into the Zone, three men walk out of the Zone')]

In [10]:
# This example specifies a query and composite filter
retriever.invoke(
    "What's a movie after 1990 but before 2005 that's all about toys, and preferably is animated"
)

[Document(metadata={'genre': 'animated', 'year': 1995}, page_content='Toys come alive and have a blast doing so')]

## Advanced Functionality

Now we'll look at date related functionality of the Sineps' Filter Extractor.
ChromaDB does not support date type directly, so we'll use OpenSearch for this example.

Setup OpenSearch yourself and we assume that we can access it via `http://localhost:9200`.
In this example, we'll use a date type metadata field.

In [None]:
docs = [
    Document(
        page_content="An in-depth analysis of how AIs are reshaping the job market and economy.",
        metadata={
            "published_date": "2020-01-15",
            "rating": 9.0,
        },
    ),
    Document(
        page_content="Exploring the economic consequences of AI advancements in 2024, focusing on job displacement.",
        metadata={
            "published_date": "2024-03-22",
            "rating": 8.7,
        },
    ),
    Document(
        page_content="The rise of AI: Examining the societal impact of large language models on employment and growth.",
        metadata={
            "published_date": "2024-05-10",
            "rating": 8.9,
        },
    ),
    Document(
        page_content="A retrospective on the influence of large language models on economic trends in 2024.",
        metadata={
            "published_date": "2024-11-05",
            "rating": 8.5,
        },
    ),
    Document(
        page_content="Comparing the job market of 2024 with previous years: The role of AI and automation.",
        metadata={
            "published_date": "2021-07-18",
            "rating": 8.8,
        },
    ),
    Document(
        page_content="The future of work: How large language models are redefining careers in 2024.",
        metadata={
            "published_date": "2024-09-30",
            "rating": 9.1,
        },
    ),
]
vectorstore = OpenSearchVectorSearch.from_documents(
    docs,
    OpenAIEmbeddings(),
    index_name="opensearch-self-query-demo",
    opensearch_url="http://localhost:9200",
)

sineps_metadata_field_info = [
    SinepsAttributeInfo(
        name="published_date",
        description="The date the article was published.",
        type="date",
    ),
    SinepsAttributeInfo(
        name="rating", description="A 1-10 rating for the article", 
        type="number"
    ),
]
retriever = SinepsSelfQueryRetriever(
    vectorstore=vectorstore,
    sineps_metadata_field_info=sineps_metadata_field_info,
    verbose=True,
)

You can specify a date range in the query.

In [None]:
retriever.invoke(
    "Find articles about the impact of Large language models on jobs and the economy, published in 2024"
)

You can also specify a date range based on today.

In [None]:
retriever.invoke(
    "Find articles about the impact of Large language models on jobs and the economy published in the last 1 year"
)