# ChromaDB Experiment Example

## Installations

In [1]:
# !pip install --quiet --force-reinstall prompttools

## Setup imports and API keys

We'll import the relevant `prompttools` modules to setup our experiment.

In [2]:
from prompttools.experiment import ChromaDBExperiment
import chromadb

## Run an experiment

One common use case is to compare two different embedding functions and how it may impact your document retrieval. We have can define what embedding functions we'd like to test here.

Note: If you previously haven't downloaded these embedding models. This may kick off downloads.

In [3]:
from chromadb.utils import embedding_functions


emb_fns = [
    embedding_functions.SentenceTransformerEmbeddingFunction(model_name="paraphrase-MiniLM-L3-v2"),
    embedding_functions.DefaultEmbeddingFunction(),
]  # default is "all-MiniLM-L6-v2"
emb_fns_names = ["paraphrase-MiniLM-L3-v2", "default"]

  if not hasattr(tensorboard, "__version__") or LooseVersion(
  ) < LooseVersion("1.15"):
  import sre_constants


Next, we create our test inputs. In this case, we would like to create a new ChromaDB collection.

During the experiment, for each embedding function, a new ChromaDB collection will be temporarily created. The documents will be added into it. Then, we will query from it and examine the results.

In [4]:
chroma_client = chromadb.Client()
# You can also create and use `chromadb.PersistentClient` or `chromadb.HttpClient`
TEST_COLLECTION_NAME = "TEMPORARY_COLLECTION"
try:
    chroma_client.delete_collection(TEST_COLLECTION_NAME)
except Exception:
    pass
collection_name = TEST_COLLECTION_NAME

use_existing_collection = False  # Specify that we want to create a collection during the experiment

# Documents that will be added into the database
add_to_collection_params = {
    "documents": ["This is a document", "This is another document", "This is the document."],
    "metadatas": [{"source": "my_source"}, {"source": "my_source"}, {"source": "my_source"}],
    "ids": ["id1", "id2", "id3"],
}

# Our test queries
query_collection_params = {"query_texts": ["This is a query document", "This is a another query document"]}


# Set up the experiment
experiment = ChromaDBExperiment(
    chroma_client,
    collection_name,
    use_existing_collection,
    query_collection_params,
    emb_fns,
    emb_fns_names,
    add_to_collection_params,
)

We can then run the experiment to get results.

In [5]:
experiment.run()

  self._read_ready.notifyAll()


We can visualize the result. In this case, the result of the second query "This is a another query document" is different.

paraphrase-MiniLM-L3-v2: [id2, id3, id1]

default (all-MiniLM-L6-v2) : [id2, id1, id3]

In [6]:
experiment.visualize()

Unnamed: 0,query_texts,embed_fn,top doc ids,distances,documents,latency
0,This is a query document,paraphrase-MiniLM-L3-v2,"[id1, id3, id2]","[14.106966018676758, 14.294026374816895, 18.137874603271484]","[This is a document, This is the document., This is another document]",0.008508
1,This is a another query document,paraphrase-MiniLM-L3-v2,"[id2, id3, id1]","[13.375584602355957, 16.815608978271484, 16.913410186767578]","[This is another document, This is the document., This is a document]",0.006097
2,This is a query document,default,"[id1, id3, id2]","[0.7111212611198425, 0.8084275126457214, 1.010977029800415]","[This is a document, This is the document., This is another document]",0.020234
3,This is a another query document,default,"[id2, id1, id3]","[0.7673601508140564, 0.8709302544593811, 0.9072309732437134]","[This is another document, This is a document, This is the document.]",0.020889


## Evaluate the model response

To evaluate the results, we'll define an evaluation function. Sometimes, you know order of the most relevant document should be given a query, and you can compute the correlation between expected ranking and actual ranking.

Note: there is a built-in version of this function that you can import (scroll further below to see an example).

In [7]:
import scipy.stats as stats

# For each query, you can define what the expected ranking is.
EXPECTED_RANKING = {
    "This is a query document": ["id1", "id3", "id2"],
    "This is a another query document": ["id2", "id3", "id1"],
}


def measure_correlation(row: "pandas.core.series.Series", ranking_column_name: str = "top doc ids") -> float:
    r"""
    A simple test that compares the expected ranking for a given query with the actual ranking produced
    by the embedding function being tested.
    """
    input_query = row["query_texts"]
    correlation, _ = stats.spearmanr(row[ranking_column_name], EXPECTED_RANKING[input_query])
    return correlation

Finally, we can evaluate and visualize the results.

In [8]:
experiment.evaluate("ranking_correlation", measure_correlation)

In [9]:
experiment.visualize()

Unnamed: 0,query_texts,embed_fn,top doc ids,distances,documents,latency,ranking_correlation
0,This is a query document,paraphrase-MiniLM-L3-v2,"[id1, id3, id2]","[14.106966018676758, 14.294026374816895, 18.137874603271484]","[This is a document, This is the document., This is another document]",0.008508,1.0
1,This is a another query document,paraphrase-MiniLM-L3-v2,"[id2, id3, id1]","[13.375584602355957, 16.815608978271484, 16.913410186767578]","[This is another document, This is the document., This is a document]",0.006097,1.0
2,This is a query document,default,"[id1, id3, id2]","[0.7111212611198425, 0.8084275126457214, 1.010977029800415]","[This is a document, This is the document., This is another document]",0.020234,1.0
3,This is a another query document,default,"[id2, id1, id3]","[0.7673601508140564, 0.8709302544593811, 0.9072309732437134]","[This is another document, This is a document, This is the document.]",0.020889,-1.0


You can also import the built-in version of the rank correlation function.

In [10]:
from prompttools.utils import ranking_correlation

EXPECTED_RANKING_LIST = [
    ["id1", "id3", "id2"],
    ["id2", "id3", "id1"],
    ["id1", "id3", "id2"],
    ["id2", "id3", "id1"],
]

experiment.run()
experiment.evaluate("ranking_correlation", ranking_correlation, expected_ranking=EXPECTED_RANKING_LIST)
experiment.visualize()

  self._read_ready.notifyAll()


Unnamed: 0,query_texts,embed_fn,top doc ids,distances,documents,latency,ranking_correlation
0,This is a query document,paraphrase-MiniLM-L3-v2,"[id1, id3, id2]","[14.106966018676758, 14.294026374816895, 18.137874603271484]","[This is a document, This is the document., This is another document]",0.005949,1.0
1,This is a another query document,paraphrase-MiniLM-L3-v2,"[id2, id3, id1]","[13.375584602355957, 16.815608978271484, 16.913410186767578]","[This is another document, This is the document., This is a document]",0.006486,1.0
2,This is a query document,default,"[id1, id3, id2]","[0.7111212611198425, 0.8084275126457214, 1.010977029800415]","[This is a document, This is the document., This is another document]",0.020484,1.0
3,This is a another query document,default,"[id2, id1, id3]","[0.7673601508140564, 0.8709302544593811, 0.9072309732437134]","[This is another document, This is a document, This is the document.]",0.021018,-1.0


You can also use auto evaluation. We will add an example of this in the near future.