# Example of how to use ChromaDB embeddings database

This tutorial shows how to create a vector embedding database using ChromaDB. 

In [1]:
import chromadb

In [2]:
# the first document, text to store in the database
strSantaClaus = '''You better watch out, you better not cry
You better not pout, I'm telling you why
Santa Claus is coming to town
He's making a list and checking it twice
He's gonna find out who's naughty and nice
Santa Claus is coming to town
He sees you when you're sleeping
He knows when you're awake
He knows if you've been bad or good
So be good for goodness' sake
Oh, you better watch out, you better not cry
You better not pout, I'm telling you why
Santa Claus is coming to town '''

# the second document, text to store in the database
strFrostyTheSnowman = '''Who's debonair with the tall silk hat?
Muffler of wool and a tummy that's fat?
King for a day, and he loves the road
With a broomstick cane, and a heart of gold
That's Frosty the Snowman
He's a jolly, happy soul
With a corncob pipe, and a button nose
And two eyes made out of coal
Frosty the Snowman is a fairytale they say
He was made of snow
But the children know
How he came to life one day
There must have been some magic
In that old silk hat they found
For when they placed it on his head
He began to dance around
Oh, Frosty the Snowman
Was alive as he could be
And the children say
He could laugh and play just the same as you and me
Frosty the Snowman
Knew the sun was hot that day
So, he said let's run
And we'll have some fun now before I melt away
So down to the village
With a broomstick in his hand
Running here and there all around the square
Saying, "Catch me if you can"
He led them down the streets of town
Right to the traffic cop
And he only paused a moment when
He heard him holler, "Stop!"
For Frosty the Snowman had to hurry on his way
But he waved goodbye
Saying don't you cry
I'll be back again some day, (they listening)
Thumpety, thump-thump, thumpety, thump-thump
Look at Frosty go (hey, look at him)
Thumpety, thump-thump, thumpety, thump-thump
Over the hills of snow '''

In [3]:
import chromadb.utils.embedding_functions as embedding_functions

# Use a local model, e.g., "all-MiniLM-L6-v2" (downloaded by sentence-transformers)
local_ef = embedding_functions.SentenceTransformerEmbeddingFunction(
    model_name="mstaron/SingBERTa"  # or path to your local model directory
)

No sentence-transformers model found with name mstaron/SingBERTa. Creating a new one with mean pooling.
Some weights of RobertaModel were not initialized from the model checkpoint at mstaron/SingBERTa and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [4]:
# chromadb in memory mode (emphemeral)
client = chromadb.EphemeralClient()

#client = chromadb.PersistentClient(path="./chroma_db")

# creating a new database, called collection
collection = client.create_collection("generative-ai", embedding_function=local_ef)

collection.add(
    documents=[strSantaClaus,strFrostyTheSnowman], # we handle tokenization, embedding, and indexing automatically. You can skip that and add your own embeddings as well
    metadatas=[{"source": "Bing Cosby"}, {"source": "Ella Fitzgerald"}], # filter on these!
    ids=["Santa", "Frosty"], # unique for each doc
)

In [5]:
# Query/search 2 most similar results. You can also .get by id
results = collection.query(
    query_texts=["Where is Santa Claus?"], # the text to search for
    n_results=1,
    # where={"metadata_field": "is_equal_to_this"}, # optional filter
    where_document={"$contains":"Santa"}  
)

In [6]:
import pprint

print("Query results:")

print(f"ID: {results['ids'][0]}")
print(f"Metadata: {results['metadatas'][0]}")
print(f"Score: {results['distances'][0]}")
pprint.pprint(results["documents"][0])

Query results:
ID: ['Santa']
Metadata: [{'source': 'Bing Cosby'}]
Score: [368.8697814941406]
['You better watch out, you better not cry\n'
 "You better not pout, I'm telling you why\n"
 'Santa Claus is coming to town\n'
 "He's making a list and checking it twice\n"
 "He's gonna find out who's naughty and nice\n"
 'Santa Claus is coming to town\n'
 "He sees you when you're sleeping\n"
 "He knows when you're awake\n"
 "He knows if you've been bad or good\n"
 "So be good for goodness' sake\n"
 'Oh, you better watch out, you better not cry\n'
 "You better not pout, I'm telling you why\n"
 'Santa Claus is coming to town ']


In [None]:
# set up persistent client

