Skip to content

HelgeSverre/chromadb

Repository files navigation

ChromaDB PHP API Client

Latest Version on Packagist Total Downloads

ChromaDB is an open-source vector database that allows you to store and query vector embeddings. This package provides a PHP client for the ChromaDB API.

Installation

You can install the package via composer:

composer require helgesverre/chromadb

You can publish the config file with:

php artisan vendor:publish --tag="chromadb-config"

This is the contents of the published config/chromadb.php file:

return [
    'token' => env('CHROMADB_TOKEN'),
    'host' => env('CHROMADB_HOST', 'localhost'),
    'port' => env('CHROMADB_PORT', '19530'),
];

Usage

$chromadb = new \HelgeSverre\Chromadb\Chromadb(
    token: 'test-token-chroma-local-dev',
    host: 'http://localhost',
    port: '8000'
);

// Create a new collection with optional metadata
$chromadb->collections()->create(
    name: 'my_collection',
);

// Count the number of collections
$chromadb->collections()->count();

// Retrieve a specific collection by name
$chromadb->collections()->get(
    collectionName: 'my_collection'
);

// Delete a collection by name
$chromadb->collections()->delete(
    collectionName: 'my_collection'
);

// Update a collection's name and/or metadata
$chromadb->collections()->update(
    collectionId: '3ea5a914-e2ab-47cb-b285-8e585c9af4f3',
    newName: 'new_collection_name',
);

// Add items to a collection with optional embeddings, metadata, and documents
$chromadb->items()->add(
    collectionId: '3ea5a914-e2ab-47cb-b285-8e585c9af4f3',
    ids: ['item1', 'item2'],
    embeddings: ['embedding1', 'embedding2'],
    documents: ['doc1', 'doc2']
);

// Update items in a collection with new embeddings, metadata, and documents
$chromadb->items()->update(
    collectionId: '3ea5a914-e2ab-47cb-b285-8e585c9af4f3',
    ids: ['item1', 'item2'],
    embeddings: ['new_embedding1', 'new_embedding2'],
    documents: ['new_doc1', 'new_doc2']
);

// Upsert items in a collection (insert if not exist, update if exist)
$chromadb->items()->upsert(
    collectionId: '3ea5a914-e2ab-47cb-b285-8e585c9af4f3',
    ids: ['item'],
    metadatas: [['title' => 'metadata']],
    documents: ['document']
);

// Retrieve specific items from a collection by their IDs
$chromadb->items()->get(
    collectionId: '3ea5a914-e2ab-47cb-b285-8e585c9af4f3',
    ids: ['item1', 'item2']
);

// Delete specific items from a collection by their IDs
$chromadb->items()->delete(
    collectionId: '3ea5a914-e2ab-47cb-b285-8e585c9af4f3',
    ids: ['item1', 'item2']
);

// Count the number of items in a collection
$chromadb->items()->count(
    collectionId: '3ea5a914-e2ab-47cb-b285-8e585c9af4f3'
);

// Query items in a collection based on embeddings, texts, and other filters
$chromadb->items()->query(
    collectionId: '3ea5a914-e2ab-47cb-b285-8e585c9af4f3',
    queryEmbeddings: [createTestVector(0.8)],
    include: ['documents', 'metadatas', 'distances'],
    nResults: 5
);

Example: Semantic Search with ChromaDB and OpenAI Embeddings

This example demonstrates how to perform a semantic search in ChromaDB using embeddings generated from OpenAI.

Full code available in SemanticSearchTest.php.

Prepare Your Data

First, create an array of data you wish to index. In this example, we'll use blog posts with titles, summaries, and tags.

$blogPosts = [
    [
        'title' => 'Exploring Laravel',
        'summary' => 'A deep dive into Laravel frameworks...',
        'tags' => ['PHP', 'Laravel', 'Web Development']
    ],
    [
        'title' => 'Introduction to React',
        'summary' => 'Understanding the basics of React and how it revolutionizes frontend development.',
        'tags' => ['JavaScript', 'React', 'Frontend']
    ],
];

Generate Embeddings

Use OpenAI's embeddings API to convert the summaries of your blog posts into vector embeddings.

$summaries = array_column($blogPosts, 'summary');
$embeddingsResponse = OpenAI::client('sk-your-openai-api-key')
    ->embeddings()
    ->create([
        'model' => 'text-embedding-ada-002',
        'input' => $summaries,
    ]);

foreach ($embeddingsResponse->embeddings as $embedding) {
    $blogPosts[$embedding->index]['vector'] = $embedding->embedding;
}

Create ChromaDB Collection

Create a collection in ChromaDB to store your blog post embeddings.

$createCollectionResponse = $chromadb->collections()->create(
    name: 'blog_posts',
);

$collectionId = $createCollectionResponse->json('id');

Insert into ChromaDB

Insert these embeddings, along with other blog post data, into your ChromaDB collection.

foreach ($blogPosts as $post) {
    $chromadb->items()->add(
        collectionId: $collectionId,
        ids: [$post['title']],
        embeddings: [$post['embedding']],
        metadatas: [$post]
    );
}

Creating a Search Vector with OpenAI

Generate a search vector for your query, akin to how you processed the blog posts.

$searchEmbedding = getOpenAIEmbedding('laravel framework');

Searching using the Embedding in ChromaDB

Use the ChromaDB client to perform a search with the generated embedding.

$searchResponse = $chromadb->items()->query(
    collectionId: $collectionId,
    queryEmbeddings: [$searchEmbedding],
    nResults: 3,
    include: ['metadatas']
);

// Output the search results
foreach ($searchResponse->json('results') as $result) {
    echo "Title: " . $result['metadatas']['title'] . "\n";
    echo "Summary: " . $result['metadatas']['summary'] . "\n";
    echo "Tags: " . implode(', ', $result['metadatas']['tags']) . "\n\n";
}

Running ChromaDB in Docker

To quickly get started with ChromaDB, you can run it in Docker

# Download the docker-compose.yml file
wget https://github.com/HelgeSverre/chromadb/blob/main/docker-compose.yml

# Start ChromaDB
docker compose up -d

The auth token is set to test-token-chroma-local-dev by default.

You can change this in the docker-compose.yml file by changing the CHROMA_SERVER_AUTH_CREDENTIALS environment variable

To stop ChromaDB, run docker compose down, to wipe all the data, run docker compose down -v.

NOTE

The docker-compose.yml file in this repo is provided only as an example and should not be used in production.

Go to the ChromaDB deployment documentation for more information on deploying Chroma in production.

Testing

cp .env.example .env

docker compose up -d
 
composer test
composer analyse src

License

The MIT License (MIT). Please see License File for more information.