# Retrieval: Vectorstore-Backed Retriever

In [1]:
# Run the line of code below to check the version of langchain in the current environment.
# Substitute "langchain" with any other package name to check their version.

In [2]:
pip show langchain

Name: langchain
Version: 1.2.3
Summary: Building applications with LLMs through composability
Home-page: 
Author: 
Author-email: 
License: MIT
Location: C:\Premasis\Development\Pycharm\LangChainSample\.venv\Lib\site-packages
Requires: langchain-core, langgraph, pydantic
Required-by: 
Note: you may need to restart the kernel to use updated packages.


In [3]:
%load_ext dotenv
%dotenv

In [4]:
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_chroma import Chroma

from langchain_core.documents import Document

In [5]:
embedding = OpenAIEmbeddings()

vectorstore = Chroma(persist_directory = "./local-database",
                     embedding_function = embedding)

In [6]:
len(vectorstore.get()['documents'])

63

In [7]:
retriever = vectorstore.as_retriever(search_type = 'mmr', 
                                     search_kwargs = {'k': 3, 
                                                      'lambda_mult': 0.7})

In [8]:
retriever

VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x000002618BC5DD90>, search_type='mmr', search_kwargs={'k': 3, 'lambda_mult': 0.7})

In [9]:
question = "What software do data scientists use?"

In [10]:
retrieved_docs = retriever.invoke(question)

In [11]:
retrieved_docs

[Document(id='bdd23c68-0d6a-416e-99ac-0eacdfe3d42a', metadata={'Course Title': 'Introduction to Data and Data Science', 'Lecture Title': 'Programming Languages & Software Employed in Data Science - All the Tools You Need'}, page_content='As you can see from the infographic, R, and Python are the two most popular tools across all columns. Their biggest advantage is that they can manipulate data and are integrated within multiple data and data science software platforms. They are not just suitable for mathematical and statistical computations. In other words, R, and Python are adaptable. They can solve a wide variety of business and data-related problems from beginning to the end'),
 Document(id='ffaddb0d-dffe-43d6-a3ec-34cd96d2a59c', metadata={'Lecture Title': 'Programming Languages & Software Employed in Data Science - All the Tools You Need', 'Course Title': 'Introduction to Data and Data Science'}, page_content='It’s actually a software framework which was designed to address the com

In [12]:
for i in retrieved_docs:
    print(f"Page Content: {i.page_content}\n----------\nLecture Title:{i.metadata['Lecture Title']}\n")

Page Content: As you can see from the infographic, R, and Python are the two most popular tools across all columns. Their biggest advantage is that they can manipulate data and are integrated within multiple data and data science software platforms. They are not just suitable for mathematical and statistical computations. In other words, R, and Python are adaptable. They can solve a wide variety of business and data-related problems from beginning to the end
----------
Lecture Title:Programming Languages & Software Employed in Data Science - All the Tools You Need

Page Content: It’s actually a software framework which was designed to address the complexity of big data and its computational intensity. Most notably, Hadoop distributes the computational tasks on multiple computers which is basically the way to handle big data nowadays. Power BI, SaS, Qlik, and especially Tableau are top-notch examples of software designed for business intelligence visualizations
----------
Lecture Title: