# Chroma DB Tests

You can configure Chroma to save and load from your local machine. Data will be persisted automatically and loaded on start (if it exists).

In [4]:
import chromadb

client = chromadb.PersistentClient(path="playDB")

client.heartbeat() # returns a nanosecond heartbeat. Useful for making sure the client remains connected.

1711129986311719300

The embedding function takes text as input, and performs tokenization and embedding. If no embedding function is supplied, Chroma will use sentence transformer as a default.

Existing collections can be retrieved by name with `.get_collection`, and deleted with `.delete_collection`. You can also use .`get_or_create_collection` to get a collection if it exists, or create it if it doesn't.

In [23]:
if "my_collection" == client.list_collections()[0].name:
    collection = client.get_collection(name="my_collection")
else:
    collection = client.create_collection(name="my_collection")
    collection = client.get_or_create_collection(name="my_collection")

In [25]:
collection.peek() # returns a list of the first 10 items in the collection
collection.count() # returns the number of items in the collection
collection.modify(name="new_name") # Rename the collection

In [26]:
collection = client.create_collection(
        name="collection_name",
        metadata={"hnsw:space": "cosine"} # l2 (squared L2) is the default it can be "ip" (inner prodcut) or cosine
    )

### Adding data 

If Chroma is passed a list of `documents`, it will automatically tokenize and embed them with the collection's embedding function . Chroma will also store the `documents` themselves. if too large exception would be raised.

you can supply a list of document-associated `embeddings` directly,

In [31]:
collection.add(
    documents=["tiger is a animal", "cat is close family of tiger", "dogs are wolves"],
    # embeddings=[[1.1, 2.3, 3.2], [4.5, 6.9, 4.4], [1.1, 2.3, 3.2]],
    metadatas=[{"wild":1}, {"domestic":2}, {"kinda cute":3}],
    ids=["id1", "id2", "id3"]
)

Add of existing embedding ID: id1
Add of existing embedding ID: id2
Add of existing embedding ID: id3
Insert of existing embedding ID: id1
Insert of existing embedding ID: id2
Insert of existing embedding ID: id3


Trying to `.add` the same ID twice will result in only the initial value being stored. An optional list of `metadata` dictionaries can be supplied for each document.

**You can also store documents elsewhere, and just supply a list of embeddings and metadata to Chroma**

### Querying a Collection 

In [34]:
collection.query(
    query_texts="tiger",
    # query_embeddings=[[11.1, 12.1, 13.1],[1.1, 2.3, 3.2], ...],
    n_results=1,
    # where={"wild": 1},
    # where_document={"$contains":"search_string"}
)

{'ids': [['id1']],
 'distances': [[0.14945050916109792]],
 'metadatas': [[{'wild': 1}]],
 'embeddings': None,
 'documents': [['tiger is a animal']],
 'uris': None,
 'data': None}

### Using Where filters

Chroma supports filtering queries by metadata and document contents. The where filter is used to filter by metadata, and the where_document filter is used to filter by document contents.

https://docs.trychroma.com/usage-guide#using-where-filters


### Updating data in a collection

If an `id` is not found in the `collection`, an error will be logged and the update will be ignored. Embeddings are recomputed.

In [40]:
collection.update(
    ids=["id1"],
    # embeddings=[[1.1, 2.3, 3.2], [4.5, 6.9, 4.4], [1.1, 2.3, 3.2], ...],
    # metadatas=[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}],
    documents=["doc1"],
)

Chroma also supports an `upsert` operation, which updates existing items, or adds them if they don't yet exist.

In [41]:
collection.upsert(
    ids=["id4"],
    # embeddings=[[1.1, 2.3, 3.2], [4.5, 6.9, 4.4], [1.1, 2.3, 3.2], ...],
    # metadatas=[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}, {"chapter": "29", "verse": "11"}, ...],
    documents=["fish in the bowl"],
)

### Deleting data from a collection

Chroma supports deleting items from a collection by `id` using `.delete`. The `embeddings`, `documents`, and `metadata` associated with each item will be deleted. 

In [42]:
collection.delete(
    ids=["id1"],
    # where={"chapter": "20"}
)