# DocArrayRetriever: Usage

[DocArray](https://github.com/docarray/docarray) is a versatile, open-source tool for managing your multi-modal data. It lets you shape your data however you want, and offers the flexibility to store and search it using various document index backends. Plus, it gets even better - you can utilize your DocArray document index to create a DocArrayRetriever, and build awesome Langchain apps!

This notebook will use HnswDocumentIndex backend and guide you on how to create and use a DocArrayRetriever with it. If you want to explore more about different backends we support, check out this notebook:
https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/docarray_backends.html

In [1]:
movies = [
    {
        "title": "Inception",
        "description": "A thief who steals corporate secrets through the use of dream-sharing technology is given the task of planting an idea into the mind of a CEO.",
        "director": "Christopher Nolan",
        "rating": 8.8,
    },
    {
        "title": "The Dark Knight",
        "description": "When the menace known as the Joker wreaks havoc and chaos on the people of Gotham, Batman must accept one of the greatest psychological and physical tests of his ability to fight injustice.",
        "director": "Christopher Nolan",
        "rating": 9.0,
    },
    {
        "title": "Interstellar",
        "description": "Interstellar explores the boundaries of human exploration as a group of astronauts venture through a wormhole in space. In their quest to ensure the survival of humanity, they confront the vastness of space-time and grapple with love and sacrifice.",
        "director": "Christopher Nolan",
        "rating": 8.6,
    },
    {
        "title": "Pulp Fiction",
        "description": "The lives of two mob hitmen, a boxer, a gangster's wife, and a pair of diner bandits intertwine in four tales of violence and redemption.",
        "director": "Quentin Tarantino",
        "rating": 8.9,
    },
    {
        "title": "Reservoir Dogs",
        "description": "When a simple jewelry heist goes horribly wrong, the surviving criminals begin to suspect that one of them is a police informant.",
        "director": "Quentin Tarantino",
        "rating": 8.3,
    },
    {
        "title": "The Godfather",
        "description": "An aging patriarch of an organized crime dynasty transfers control of his empire to his reluctant son.",
        "director": "Francis Ford Coppola",
        "rating": 9.2,
    },
]


In [2]:
import getpass
import os 

os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')

OpenAI API Key: ········


In [3]:
from docarray import BaseDoc, DocList
from docarray.typing import NdArray
from langchain.embeddings.openai import OpenAIEmbeddings

# define schema for your movie documents
class MyDoc(BaseDoc):
    title: str
    description: str
    description_embedding: NdArray[1536]
    rating: float
    director: str
    

embeddings = OpenAIEmbeddings()


# get "description" embeddings, and create documents
docs = DocList[MyDoc](
    [
        MyDoc(
            description_embedding=embeddings.embed_query(movie["description"]), **movie
        )
        for movie in movies
    ]
)

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
from docarray.index import HnswDocumentIndex

# initialize the index
db = HnswDocumentIndex[MyDoc](work_dir='movie_search')

# add data
db.index(docs)

## Normal Retriever

In [5]:
from langchain.retrievers import DocArrayRetriever

# create a retriever
retriever = DocArrayRetriever(
    index=db, 
    embeddings=embeddings, 
    search_field='description_embedding', 
    content_field='description'
)

# find the relevant document
doc = retriever.get_relevant_documents('movie about dreams')
print(doc)

[Document(page_content='A thief who steals corporate secrets through the use of dream-sharing technology is given the task of planting an idea into the mind of a CEO.', metadata={'id': '50c0be61c9a36dba01c3d4d840c829f7', 'title': 'Inception', 'rating': 8.8, 'director': 'Christopher Nolan'})]


## Retriever with Filters

In [6]:
from langchain.retrievers import DocArrayRetriever

# create a retriever
retriever = DocArrayRetriever(
    index=db, 
    embeddings=embeddings, 
    search_field='description_embedding', 
    content_field='description',
    filters={'director': {'$eq': 'Christopher Nolan'}},
    top_k=2,
)

# find relevant documents
docs = retriever.get_relevant_documents('space travel')
print(docs)

[Document(page_content='Interstellar explores the boundaries of human exploration as a group of astronauts venture through a wormhole in space. In their quest to ensure the survival of humanity, they confront the vastness of space-time and grapple with love and sacrifice.', metadata={'id': '6ff65e170c32f15b2250f2fc06f63112', 'title': 'Interstellar', 'rating': 8.6, 'director': 'Christopher Nolan'}), Document(page_content='A thief who steals corporate secrets through the use of dream-sharing technology is given the task of planting an idea into the mind of a CEO.', metadata={'id': '50c0be61c9a36dba01c3d4d840c829f7', 'title': 'Inception', 'rating': 8.8, 'director': 'Christopher Nolan'})]


## Retriever with MMR search

In [7]:
from langchain.retrievers import DocArrayRetriever

# create a retriever
retriever = DocArrayRetriever(
    index=db, 
    embeddings=embeddings, 
    search_field='description_embedding', 
    content_field='description',
    filters={'rating': {'$gte': 8.7}},
    search_type='mmr',
    top_k=3,
)

# find relevant documents
docs = retriever.get_relevant_documents('action movies')
print(docs)

[Document(page_content="The lives of two mob hitmen, a boxer, a gangster's wife, and a pair of diner bandits intertwine in four tales of violence and redemption.", metadata={'id': '69e7bebffc153fa1d8878e6bdf4675ee', 'title': 'Pulp Fiction', 'rating': 8.9, 'director': 'Quentin Tarantino'}), Document(page_content='A thief who steals corporate secrets through the use of dream-sharing technology is given the task of planting an idea into the mind of a CEO.', metadata={'id': '50c0be61c9a36dba01c3d4d840c829f7', 'title': 'Inception', 'rating': 8.8, 'director': 'Christopher Nolan'}), Document(page_content='When the menace known as the Joker wreaks havoc and chaos on the people of Gotham, Batman must accept one of the greatest psychological and physical tests of his ability to fight injustice.', metadata={'id': 'ac2e0a0bdf98c27796cad47a4cb19c7d', 'title': 'The Dark Knight', 'rating': 9.0, 'director': 'Christopher Nolan'})]
