# Chroma DB Server

Source:

Rany ElHousieny, Exploring Chroma DB: A Python Approach in Jupyter Notebooks

https://www.linkedin.com/pulse/exploring-chroma-db-python-approach-jupyter-notebooks-rany-dn3bc/

In [2]:
# !pip install chromadb

#### Run in a terminal window:

chroma run --path /Users/mjack6/chromadb

In [5]:
#!chroma run --path /Users/mjack6/chromadb # Or in notebook

In [3]:
import chromadb
client = chromadb.HttpClient()

In [5]:
collection = client.create_collection("new_embeddings")

**client:** This is an instance of a client object that acts as an interface between the Python script and the Chroma DB. It allows the script to send commands to the database server.

**.create_collection():** This is a method provided by the Chroma DB client. A collection in Chroma DB (and in many NoSQL databases) is analogous to a table in a relational database. It's a container for storing related data—in this case, embeddings.

**"my_embeddings":** This is the name given to the new collection that you're creating. The collection will be used to store embeddings, which are high-dimensional vectors typically used to represent complex data like text or images in a form that a machine learning model can understand and process.

In [6]:
documents = ["The quick brown fox", "Jumps over the lazy dog"]
metadatas = [{"text_length": 19}, {"text_length": 23}]
ids = ["doc1", "doc2"]

collection.add(documents=documents, metadatas=metadatas, ids=ids)

/Users/mjack6/.cache/chroma/onnx_models/all-MiniLM-L6-v2/onnx.tar.gz: 100%|██████████| 79.3M/79.3M [00:10<00:00, 7.93MiB/s]


In [7]:
query_results = collection.query(query_texts=["Find documents similar to this text"], n_results=2)

print(query_results)

{'ids': [['doc1', 'doc2']], 'distances': [[1.8206077221353594, 1.890621868768198]], 'embeddings': None, 'metadatas': [[{'text_length': 19}, {'text_length': 23}]], 'documents': [['The quick brown fox', 'Jumps over the lazy dog']], 'uris': None, 'data': None, 'included': ['distances', 'documents', 'metadatas']}


In [8]:
query_results = collection.query(
    query_texts=["Find documents similar to this text"],
    n_results=2,
    include=['embeddings']  # or include=['metadatas']
)

In [9]:
print(query_results)

{'ids': [['doc1', 'doc2']], 'distances': None, 'embeddings': [array([[ 2.76630162e-03,  3.32646891e-02, -6.87495165e-04,
         4.29980755e-02,  3.61491181e-02, -3.33429165e-02,
         5.55060990e-02, -1.04811341e-01,  1.37403337e-02,
        -1.24267042e-02,  6.47485629e-03, -3.19384336e-02,
        -6.04878180e-02,  1.06673902e-02, -3.22642773e-02,
        -2.86279786e-02, -5.72624709e-03, -5.08103594e-02,
        -2.72680540e-03, -4.73132767e-02, -1.44443437e-01,
         5.21714706e-03,  1.17100147e-03,  4.02128212e-02,
        -5.35303392e-02, -4.14886698e-02, -4.10799272e-02,
         1.36552751e-02,  6.58519790e-02, -1.75081670e-01,
        -1.50110573e-02,  5.92095330e-02,  2.02138741e-02,
         2.12893188e-02, -8.53960812e-02, -5.25634177e-02,
         4.53948304e-02, -3.43269855e-03,  2.79617179e-02,
         5.03437631e-02,  2.13018227e-02, -2.32071802e-02,
        -1.28373858e-02,  7.35622868e-02, -1.11349538e-01,
         2.08493005e-02, -5.61520420e-02,  8.51751957

#### Advanced Querying with Metadata Filters:

In [10]:
filtered_results = collection.query(
    query_texts=["Find documents similar to this text"],
    n_results=2,
    where={"text_length": {"$gt": 20}}
)

print(filtered_results) 

{'ids': [['doc2']], 'distances': [[1.890621868768198]], 'embeddings': None, 'metadatas': [[{'text_length': 23}]], 'documents': [['Jumps over the lazy dog']], 'uris': None, 'data': None, 'included': ['distances', 'documents', 'metadatas']}
