# 2a - Metadata Filtering

## Prerequisites

Complete <a href="../../../../nbclassic/notebooks/graphrag-toolkit/2-Querying.ipynb"><b>Exercise 2 - Querying</b></a> before beginning these additional exercises.

## Overview

Metadata filtering allows for constrained retrieval of information based on specific criteria. It can be applied at various stages of the process, including document extraction, graph building, and querying.

When querying a lexical graph, metadata filtering enables the retrieval of a constrained set of statements, sources, and topics based on specified metadata filters and associated values. This functionality is particularly useful for narrowing down search results to relevant information.

The GraphRAG Toolkit utilizes LlamaIndex types such as `MetadataFilters`, `MetadataFilter`, `FilterOperator`, and `FilterCondition` to specify filter criteria. These components allow you to create complex and nested filter expressions.

### Metadata filtering versus multi-tenancy

Metadata filtering and multi-tenancy work well together. Multi tenancy restricts access to one of many _wholly separate_ lexical graphs within the same underlying graph and vector stores. Metadata filtering constrains retrieval to one or more _subgraphs_ within a particular lexical graph:

![Metadata Filtering](../images/metadata-filtering.png)

### ðŸŽ¯ 2a.1 Query the data using graph-enhanced search with metadata filtering

The following query applies a metadata filter that restricts the data to source documents whose `pub_date` is before 1st September 2025. Because the news of the blockage in the Turquoise Canal was published after 1st September, the answer returned by the query does not take account of this unfortunate development, and is thus more optimistic about Example Corp's fortunes in the UK.

In [None]:
%reload_ext dotenv
%dotenv

import os

from graphrag_toolkit.lexical_graph import LexicalGraphQueryEngine
from graphrag_toolkit.lexical_graph.storage import GraphStoreFactory
from graphrag_toolkit.lexical_graph.storage import VectorStoreFactory
from graphrag_toolkit.lexical_graph.metadata import FilterConfig

from llama_index.core.vector_stores.types import FilterOperator, MetadataFilter

with (
    GraphStoreFactory.for_graph_store(os.environ['GRAPH_STORE']) as graph_store,
    VectorStoreFactory.for_vector_store(os.environ['VECTOR_STORE']) as vector_store
):

    query_engine = LexicalGraphQueryEngine.for_traversal_based_search(
        graph_store, 
        vector_store,
        streaming=True,
        tenant_id='ecorp',
        filter_config = FilterConfig(
            MetadataFilter(
                key='pub_date',
                value='2025-09-01',
                operator=FilterOperator.LT
            )
        ),
        no_cache=True
    )

    response = query_engine.query("What are the sales prospects for Example Corp in the UK?")

response.print_response_stream()

print(f"""

retrieve_ms: {int(response.metadata['retrieve_ms'])}
answer_ms  : {int(response.metadata['answer_ms'])}
total_ms   : {int(response.metadata['total_ms'])}
""")

In [None]:
# results to be passed to LLM

for n in response.source_nodes:
    print(n.text)