Instead of calculating these embeddings everytime , we store these vectors into a vector database. In many cases , they also provide the service to calculate the embeddings using the model you select.

In [1]:
%pip install -q langchain python-dotenv langchain-openai langchain-community tiktoken chromadb 

Note: you may need to restart the kernel to use updated packages.


In [2]:
%pip -q install python-dotenv
from dotenv import load_dotenv
load_dotenv()

Note: you may need to restart the kernel to use updated packages.


True

We first split the markdown doc again

In [3]:
# We load the texts
from langchain.text_splitter import MarkdownHeaderTextSplitter

history_raw_text = ""
    # This is a long document we can split up.
with open("data/history.md") as f:
    history_raw_text = f.read()
    
headers_to_split_on = [
    ("#", "Header 1"),
    ("##", "Header 2"),
    ("###", "Header 3"),
]
md_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
raw_documents = md_splitter.split_text(history_raw_text)

from pprint import pprint
pprint(raw_documents)

[Document(page_content='Devopsdays is a worldwide series of technical conferences covering topics of software development, IT infrastructure operations, and the intersection between them. Each event is run by volunteers from the local area.  \nMost devopsdays events feature a combination of curated talks (see open Calls for Proposals) and self organized open space content. Topics often include automation, testing, security, and organizational culture.', metadata={'Header 1': 'A history lesson on Devops', 'Header 2': 'Devopsdays'}),
 Document(page_content='The first devopsdays was held in Ghent, Belgium in 2009. Since then, devopsdays events have multiplied, and if there isn’t one in your city, check out the information about organizing one yourself!', metadata={'Header 1': 'A history lesson on Devops', 'Header 2': 'Devopsdays', 'Header 3': 'History'}),
 Document(page_content='The devopsdays global core team guides local organizers in hosting their own devopsdays events worldwide. Activ

And now we use Chromadb as vector database. 
Note: We first reset it as we are running this for demos

In [4]:


# Resetting chromadb just in case
# For that we use the direct API

#import chromadb
#from chromadb import Settings
#client = chromadb.Client(settings=Settings(allow_reset=True))
#client.reset()

import chromadb

collection_name="my_langchain"
chroma_client = chromadb.PersistentClient(path="./chromadb")
collections = chroma_client.list_collections()
for collection in collections:
    if collection.name == collection_name:
        print("deleting "+collection_name)
        chroma_client.delete_collection(collection_name)

deleting my_langchain


Given the embeddings function and given our documents we ask the vector database to take care of this for us.

In [5]:
# Set the embeddings function
from langchain_openai import OpenAIEmbeddings

embeddings_model = OpenAIEmbeddings()

# Vectory database will calculate them using the embeddings_model provided
# and store the embeddings for each doc in it's database
from langchain_community.vectorstores import Chroma

vectorstore = Chroma.from_documents(
    documents=raw_documents,
    embedding=embeddings_model,
    client=chroma_client,
    collection_name=collection_name
    # client_settings
)

Once stored , we can ask it to find the related documents through embeddings.

In [6]:
query = "Who wrote the Devops Handbook? return the results as json and use the field firstname and lastname"
#Return the result as json and use the field firstname and lastname"
docs = vectorstore.similarity_search_with_relevance_scores(query, k=4, score_threshold=0.7)
pprint(docs)

[(Document(page_content='The Devops Handbook was written by the following authors: Gene Kim, Jez Humble , John Willis , Patrick Debois and John Allspaw.  \nGene Kim is a multiple award-winning entrepreneur, the founder and former CTO of Tripwire and a researcher. He is passionate about IT operations, security and compliance, and how IT organizations successfully transform from "good to great."\x9d He lives in Portland, Oregon.  \nJez Humble is an award-winning author and researcher on software who has spent his career tinkering with code, infrastructure, and product development in organizations of varying sizes across three continents. He works at 18F, teaches at UC Berkeley, and is co-founder of DevOps Research and Assessment LLC.  \nPatrick Debois is an independent IT-consultant who is bridging the gap between projects and operations by using Agile techniques both in development, project management and system administration.  \nJohn Willis has worked in the IT management industry for