# How to use a graph vectorstore as a retriever

A graph vector store retriever is a retriever that uses a graph vector store to retrieve documents. It is similar to a vector store retriever, except that it uses both vector similarity and graph connections to retriever documents
It uses the search methods implemented by a graph vector store, like similarity search and MMR, to query the texts in the graph vector store.

In this guide we will cover:

1. How to instantiate a retriever from a graph vectorstore;
2. How to specify the search type for the retriever;
3. How to specify additional search parameters, such as threshold scores and top-k.

## Creating a retriever from a graph vectorstore

You can build a retriever from a vectorstore using its [.as_retriever](https://api.python.langchain.com/en/latest/vectorstores/langchain_core.graph_vectorstores.GraphVectorStore.html#langchain_core.vectorstores.VectorStore.as_retriever) method. Let's walk through an example.

First we instantiate a graph vectorstore. We will use a store backed by Cassandra [CassandraGraphVectorStore](https://api.python.langchain.com/en/latest/graph_vectorstores/langchain_community.graph_vectorstores.cassandra.CassandraGraphVectorStore.html) graph vectorstore:

In [1]:
from langchain_community.document_loaders import TextLoader
from langchain_community.graph_vectorstores import CassandraGraphVectorStore
from langchain_community.graph_vectorstores.extractors import LinkExtractorTransformer, KeybertLinkExtractor

from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("state_of_the_union.txt")

documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
pipeline = LinkExtractorTransformer([KeybertLinkExtractor()])
pipeline.transform_documents(texts)
embeddings = OpenAIEmbeddings()
graph_vectorstore = CassandraGraphVectorStore.from_documents(texts, embeddings)

We can then instantiate a retriever:

In [2]:
retriever = graph_vectorstore.as_retriever()

This creates a retriever (specifically a [GraphVectorStoreRetriever](https://api.python.langchain.com/en/latest/graph_vectorstores/langchain_core.graph_vectorstores.base.GraphVectorStoreRetriever.html)), which we can use in the usual way:

In [3]:
docs = retriever.invoke("what did the president say about ketanji brown jackson?")

## Maximum marginal relevance traversal retrieval
By default, the graph vector store retriever uses similarity search, then expands the retrieved set by following a fixed number of graph edges. If the underlying graph vector store supports maximum marginal relevance traversal, you can specify that as the search type.

MMR-traversal is a retrieval method combining MMR and graph traversal. The strategy first retrieves the top `fetch_k` results by similarity to the question. It then iteratively expands the set of fetched documents by following `adjacent_k` graph edges and selects the top `k` results based on maximum-marginal relevance using the given `lambda_mult`. 

In [4]:
retriever = graph_vectorstore.as_retriever(search_type="mmr")

In [5]:
docs = retriever.invoke("what did the president say about ketanji brown jackson?")

## Passing search parameters

We can pass parameters to the underlying vectorstore's search methods using `search_kwargs`.

### Specifying graph traversal depth

For example, we can set the graph traversal depth to only return documents reachable through a given number of graph edges. 

In [6]:
retriever = graph_vectorstore.as_retriever(search_kwargs={"depth": 3})

In [7]:
docs = retriever.invoke("what did the president say about ketanji brown jackson?")

### Specifying MMR parameters

When using search type `mmr`, several parameters of the MMR algorithm can be configured.

The `fetch_k` parameter determines how many documents are fetched using vector similarity and `adjacent_k` parameter determines how many documents are fetched using graph edges. The `lambda_mult` parameter controls how the MMR re-ranking weights similarity to the query string vs diversity among the retrieved documents as fetched documents are selected for the set of `k` final results.

In [None]:
retriever = graph_vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={"fetch_k": 20, "adjacent_k": 20, "lambda_mult": 0.25},
)

In [None]:
docs = retriever.invoke("what did the president say about ketanji brown jackson?")

### Specifying top k

We can also limit the number of documents `k` returned by the retriever.

Note that if `depth` is greater than zero, the retriever may return more documents than is specified by `k`, since both the original `k` documents retrieved using vector similarity and any documents connected via graph edges will be returned.

In [8]:
retriever = graph_vectorstore.as_retriever(search_kwargs={"k": 1})

In [9]:
docs = retriever.invoke("what did the president say about ketanji brown jackson?")
len(docs)

1

### Similarity score threshold retrieval

For example, we can set a similarity score threshold and only return documents with a score above that threshold.

In [None]:
retriever = graph_vectorstore.as_retriever(search_kwargs={"score_threshold": 0.5})

In [None]:
docs = retriever.invoke("what did the president say about ketanji brown jackson?")