# ChromaDB

chroma is a AI-native open-source vector database for storing, querying, and managing embeddings.

In [2]:
## building a simple vectordb
from langchain_chroma import Chroma
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import OllamaEmbeddings
from langchain.text_splitter import CharacterTextSplitter

In [3]:
text  = TextLoader("speech.txt").load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
docs =  text_splitter.split_documents(text)

In [4]:
docs

[Document(metadata={'source': 'speech.txt'}, page_content='Today we are launching a campaign called HeForShe. I am reaching out to you because we need your help. We want to end gender inequality, and to do this, we need everyone involved. This is the first campaign of its kind at the UN. We want to try to mobilize as many men and boys as possible to be advocates for change. And, we don’t just want to talk about it. We want to try and make sure that it’s tangible. \n\nI was appointed as Goodwill Ambassador for UN Women six months ago. And, the more I spoke about feminism, the more I realized that fighting for women’s rights has too often become synonymous with man-hating. If there is one thing I know for certain, it is that this has to stop. \n\nFor the record, feminism by definition is the belief that men and women should have equal rights and opportunities. It is the theory of political, economic and social equality of the sexes.'),
 Document(metadata={'source': 'speech.txt'}, page_co

In [7]:
embeddings = OllamaEmbeddings(model="mxbai-embed-large:latest")
vectordb = Chroma.from_documents(
    documents=docs,
    embedding=embeddings,
    persist_directory="vectordb"
)

In [8]:
vectordb

<langchain_chroma.vectorstores.Chroma at 0x778de13b34c0>

In [9]:
## Query it 
## querying

query = "When her male friends were unable to express their feelings ?"
result = vectordb.similarity_search(query)
result

[Document(id='c37cffd5-9f41-40b5-bafa-f1010439330c', metadata={'source': 'speech.txt'}, page_content='I started questioning gender-based assumptions a long time ago. When I was 8, I was confused for being called bossy because I wanted to direct the plays that we would put on for our parents, but the boys were not. When at 14, I started to be sexualized by certain elements of the media. When at 15, my girlfriends started dropping out of sports teams because they didn’t want to appear muscly. When at 18, my male friends were unable to express their feelings. \n\nI decided that I was a feminist, and this seemed uncomplicated to me. But my recent research has shown me that feminism has become an unpopular word. Women are choosing not to identify as feminists. Apparently, I’m among the ranks of women whose expressions are seen as too strong, too aggressive, isolating, and anti-men. Unattractive, even.'),
 Document(id='b923394f-814b-4ed7-a868-6ec2ae8e5e2b', metadata={'source': 'speech.txt'},

In [10]:
result[0].page_content

'I started questioning gender-based assumptions a long time ago. When I was 8, I was confused for being called bossy because I wanted to direct the plays that we would put on for our parents, but the boys were not. When at 14, I started to be sexualized by certain elements of the media. When at 15, my girlfriends started dropping out of sports teams because they didn’t want to appear muscly. When at 18, my male friends were unable to express their feelings. \n\nI decided that I was a feminist, and this seemed uncomplicated to me. But my recent research has shown me that feminism has become an unpopular word. Women are choosing not to identify as feminists. Apparently, I’m among the ranks of women whose expressions are seen as too strong, too aggressive, isolating, and anti-men. Unattractive, even.'

In [11]:
## load from disk 
db2 = Chroma(persist_directory="vectordb", embedding_function=embeddings)
db2

<langchain_chroma.vectorstores.Chroma at 0x778de13b2dd0>

In [12]:
docs = db2.similarity_search(query)
docs

[Document(id='c37cffd5-9f41-40b5-bafa-f1010439330c', metadata={'source': 'speech.txt'}, page_content='I started questioning gender-based assumptions a long time ago. When I was 8, I was confused for being called bossy because I wanted to direct the plays that we would put on for our parents, but the boys were not. When at 14, I started to be sexualized by certain elements of the media. When at 15, my girlfriends started dropping out of sports teams because they didn’t want to appear muscly. When at 18, my male friends were unable to express their feelings. \n\nI decided that I was a feminist, and this seemed uncomplicated to me. But my recent research has shown me that feminism has become an unpopular word. Women are choosing not to identify as feminists. Apparently, I’m among the ranks of women whose expressions are seen as too strong, too aggressive, isolating, and anti-men. Unattractive, even.'),
 Document(id='b923394f-814b-4ed7-a868-6ec2ae8e5e2b', metadata={'source': 'speech.txt'},

In [13]:
## Retriver options 
retriever = vectordb.as_retriever()
docs = retriever.invoke(query)[0]
docs.page_content


'I started questioning gender-based assumptions a long time ago. When I was 8, I was confused for being called bossy because I wanted to direct the plays that we would put on for our parents, but the boys were not. When at 14, I started to be sexualized by certain elements of the media. When at 15, my girlfriends started dropping out of sports teams because they didn’t want to appear muscly. When at 18, my male friends were unable to express their feelings. \n\nI decided that I was a feminist, and this seemed uncomplicated to me. But my recent research has shown me that feminism has become an unpopular word. Women are choosing not to identify as feminists. Apparently, I’m among the ranks of women whose expressions are seen as too strong, too aggressive, isolating, and anti-men. Unattractive, even.'