## How does Chroma DB work?
https://www.analyticsvidhya.com/blog/2023/07/guide-to-chroma-db-a-vector-store-for-your-generative-ai-llms/

Here are the steps describing how Chroma DB works:

- Data Structure: Chroma DB organizes chromatic data in a structured format optimized for efficient storage and retrieval.
- Storage: It stores color-related information such as RGB values, color names, and associated metadata in the database.
- Indexing: Chroma DB creates indexes to facilitate fast lookup of colors based on various criteria like RGB values, color names, or other attributes.
- Querying: Users can query Chroma DB using specific criteria such as color codes, names, or properties to retrieve relevant color information.
- Analysis: Chroma DB enables analysis of color data for various applications such as image processing, design, and color matching.
- Optimization: The database is optimized for speed and efficiency, allowing for quick retrieval and processing of color-related information.
- Integration: It can be integrated into different software applications and platforms to provide color-related functionalities seamlessly.
- Continued Improvement: Chroma DB may undergo updates and improvements to enhance its capabilities and accommodate evolving requirements in color management and analysis.

# Steps
https://colab.research.google.com/drive/1QEzFyqnoFxq7LUGyP1vzR4iLt9PpCDXv?usp=sharing

1. Create the client
2. Create the collection
3. Load data
4. Query data by nearest embedding

In [None]:
import chromadb
from chromadb.config import Settings


client = chromadb.Client()
collection = client.get_or_create_collection("test")


8 documents, each document contains embeding.

In [None]:
collection.add(
    embeddings=[
        [1.1, 2.3, 3.2],
        [4.5, 6.9, 4.4],
        [1.1, 2.3, 3.2],
        [4.5, 6.9, 4.4],
        [1.1, 2.3, 3.2],
        [4.5, 6.9, 4.4],
        [1.1, 2.3, 3.2],
        [4.5, 6.9, 4.4],
    ],
    metadatas=[
        {"uri": "img1.png", "style": "style1"},
        {"uri": "img2.png", "style": "style2"},
        {"uri": "img3.png", "style": "style1"},
        {"uri": "img4.png", "style": "style1"},
        {"uri": "img5.png", "style": "style1"},
        {"uri": "img6.png", "style": "style1"},
        {"uri": "img7.png", "style": "style1"},
        {"uri": "img8.png", "style": "style1"},
    ],
    documents=["doc1", "doc2", "doc3", "doc4", "doc5", "doc6", "doc7", "doc8"],
    ids=["id1", "id2", "id3", "id4", "id5", "id6", "id7", "id8"],
)

In [None]:
query_result = collection.query(query_embeddings=[[1.1, 2.3, 3.2], [5.1, 4.3, 2.2]], n_results=2)

In [None]:
query_result

## Result Interpretation

[1.1, 2.3, 3.2] -> id3, id1

[5.1, 4.3, 2.2] -> id2 and id4 

because n_results 2, system only returns 2 nearest vectors

It will returns
- ids
- distances
- metadatas
- documents

## Another Example

Persistent Client

To create your a local persistent client use the PersistentClient class. This client will store all data locally in a directory on your machine at the path you specify.

Parameters:

- path - parameter must be a local path on the machine where Chroma is running. If the path does not exist, it will be created. The path can be relative or absolute. If the path is not specified, the default is ./chroma in the current working directory.
- settings - Chroma settings object.
- tenant - the tenant to use. Default is default_tenant.
- database - the database to use. Default is default_database.

Uses of Persistent Client
The persistent client is useful for:

- Local development: You can use the persistent client to develop locally and test out ChromaDB.
- Embedded applications: You can use the persistent client to embed ChromaDB in your application. For example, if you are building a web application, you can use the persistent client to store data locally on the server.

data is saved as sqlite

In [None]:
import chromadb
from chromadb.config import DEFAULT_TENANT, DEFAULT_DATABASE, Settings

In [None]:
client = chromadb.PersistentClient(
    path="test",
    settings=Settings(),
    tenant=DEFAULT_TENANT,
    database=DEFAULT_DATABASE,
)

In [None]:
collection = client.get_or_create_collection("test")

In [None]:
collection.add(
    embeddings=[
        [1.1, 2.3, 3.2],
        [4.5, 6.9, 4.4],
        [1.1, 2.3, 3.2],
        [4.5, 6.9, 4.4],
        [1.1, 2.3, 3.2],
        [4.5, 6.9, 4.4],
        [1.1, 2.3, 3.2],
        [4.5, 6.9, 4.4],
    ],
    metadatas=[
        {"uri": "img1.png", "style": "style1"},
        {"uri": "img2.png", "style": "style2"},
        {"uri": "img3.png", "style": "style1"},
        {"uri": "img4.png", "style": "style1"},
        {"uri": "img5.png", "style": "style1"},
        {"uri": "img6.png", "style": "style1"},
        {"uri": "img7.png", "style": "style1"},
        {"uri": "img8.png", "style": "style1"},
    ],
    documents=["doc1", "doc2", "doc3", "doc4", "doc5", "doc6", "doc7", "doc8"],
    ids=["id1", "id2", "id3", "id4", "id5", "id6", "id7", "id8"],
)

In [None]:
query_result = collection.query(query_embeddings=[[1.1, 2.3, 3.2], [5.1, 4.3, 2.2]], n_results=2)

In [None]:
query_result

## Reuse the local vector

In [2]:
import chromadb
from chromadb.config import DEFAULT_TENANT, DEFAULT_DATABASE, Settings

client = chromadb.PersistentClient(
    path="test",
    settings=Settings(),
    tenant=DEFAULT_TENANT,
    database=DEFAULT_DATABASE,
)

# get the collection test
collection = client.get_or_create_collection("test")


In [3]:
# find directly the vectors
query_result = collection.query(query_embeddings=[[1.1, 2.3, 3.2], [5.1, 4.3, 2.2]], n_results=2)

In [4]:
query_result

{'ids': [['id1', 'id3'], ['id2', 'id8']],
 'distances': [[5.1159076593562386e-15, 5.1159076593562386e-15],
  [11.960000915527363, 11.960000915527363]],
 'metadatas': [[{'style': 'style1', 'uri': 'img1.png'},
   {'style': 'style1', 'uri': 'img3.png'}],
  [{'style': 'style2', 'uri': 'img2.png'},
   {'style': 'style1', 'uri': 'img8.png'}]],
 'embeddings': None,
 'documents': [['doc1', 'doc3'], ['doc2', 'doc8']],
 'uris': None,
 'data': None}