<a href="https://colab.research.google.com/github/rvernica/notebook/blob/main/mongodb/langchain-hybrid-search.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LangChain MongoDB Integration - Hybrid Search

This notebook is a companion to the [LangChain Hybrid Search](https://www.mongodb.com/docs/atlas/ai-integrations/langchain/hybrid-search/) page. Refer to the page for set-up instructions and detailed explanations.

<a target="_blank" href="https://colab.research.google.com/github/mongodb/docs-notebooks/blob/main/ai-integrations/langchain-hybrid-search.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [1]:
!pip install --quiet --upgrade \
  langchain \
  langchain-community \
  langchain-core \
  langchain-mongodb \
  langchain-voyageai \
  langchain-google-genai \
  pymongo \
  pypdf

In [2]:
import os
from google.colab import userdata

os.environ["GOOGLE_API_KEY"] = userdata.get("GOOGLE_API_KEY")
os.environ["VOYAGE_API_KEY"] = userdata.get("VOYAGE_API_KEY")
MONGODB_URI = userdata.get("MONGODB_URI")

In [3]:
!curl ifconfig.me

34.30.53.135

In [4]:
from langchain_mongodb import MongoDBAtlasVectorSearch
from langchain_voyageai import VoyageAIEmbeddings

# Create the vector store
vector_store = MongoDBAtlasVectorSearch.from_connection_string(
   connection_string = MONGODB_URI,
   embedding = VoyageAIEmbeddings(model = "voyage-3-large", output_dimension = 2048),
   namespace = "sample_mflix.embedded_movies",
   text_key = "plot",
   embedding_key = "plot_embedding_voyage_3_large",
   relevance_score_fn = "dotProduct"
)

In [5]:
# Use helper method to create the vector search index
vector_store.create_vector_search_index(
   dimensions = 2048, # The dimensions of the vector embeddings to be indexed
   wait_until_complete = 60 # Number of seconds to wait for the index to build (can take around a minute)
)

In [6]:
from langchain_mongodb.index import create_fulltext_search_index
from pymongo import MongoClient

# Connect to your cluster
client = MongoClient(MONGODB_URI)

# Use helper method to create the search index
create_fulltext_search_index(
   collection = client["sample_mflix"]["embedded_movies"],
   field = "plot",
   index_name = "search_index",
   wait_until_complete = 60
)

In [7]:
from langchain_mongodb.retrievers.hybrid_search import MongoDBAtlasHybridSearchRetriever

# Initialize the retriever
retriever = MongoDBAtlasHybridSearchRetriever(
    vectorstore = vector_store,
    search_index_name = "search_index",
    top_k = 5,
    fulltext_penalty = 50,
    vector_penalty = 50,
    post_filter=[
        {
            "$project": {
                "plot_embedding": 0,
                "plot_embedding_voyage_3_large": 0
            }
        }
    ])

# Define your query
query = "time travel"

# Print results
documents = retriever.invoke(query)
for doc in documents:
   print("Title: " + doc.metadata["title"])
   print("Plot: " + doc.page_content)
   print("Search score: {}".format(doc.metadata["fulltext_score"]))
   print("Vector Search score: {}".format(doc.metadata["vector_score"]))
   print("Total score: {}\n".format(doc.metadata["fulltext_score"] + doc.metadata["vector_score"]))

Title: The Time Traveler's Wife
Plot: A romantic drama about a Chicago librarian with a gene that causes him to involuntarily time travel, and the complications it creates for his marriage.
Search score: 0.0196078431372549
Vector Search score: 0
Total score: 0.0196078431372549

Title: Timecop
Plot: An officer for a security agency that regulates time travel, must fend for his life against a shady politician who has a tie to his past.
Search score: 0.019230769230769232
Vector Search score: 0
Total score: 0.019230769230769232

Title: My iz budushchego
Plot: My iz budushchego, or We Are from the Future, is a movie about time travel. Four 21st century treasure seekers are transported back into the middle of a WWII battle in Russia. The movie's ...
Search score: 0.018867924528301886
Vector Search score: 0
Total score: 0.018867924528301886

Title: A.P.E.X.
Plot: A time-travel experiment in which a robot probe is sent from the year 2073 to the year 1973 goes terribly wrong thrusting one of th

In [8]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import  RunnablePassthrough
from langchain_google_genai import ChatGoogleGenerativeAI

# Define a prompt template
template = """
   Use the following pieces of context to answer the question at the end.
   {context}
   Question: Can you recommend some movies about {query}?
"""
prompt = PromptTemplate.from_template(template)
model = ChatGoogleGenerativeAI(model="gemini-2.5-flash")

# Construct a chain to answer questions on your data
chain = (
   {"context": retriever, "query": RunnablePassthrough()}
   | prompt
   | model
   | StrOutputParser()
)

# Prompt the chain
query = "time travel"
answer = chain.invoke(query)
print(answer)

Here are some movies about time travel:

*   **The Time Traveler's Wife** (2009)
*   **Timecop** (1994)
*   **My iz budushchego** (We Are from the Future) (2008)
*   **A.P.E.X.** (1994)
*   **Rubinrot** (2013)
