# 🧾 Parameters in Vector Databases
This notebook outlines key parameters commonly used when working with vector databases, such as collections, documents, metadata, IDs, and embeddings.


> ⚙️ **Note:**  
Parameters in vector databases generally fall into three levels:
- **Client Configuration** (e.g., memory vs persistent storage)
- **Collection Parameters** (e.g., embedding functions, default metadata)
- **Query Parameters** (e.g., `n_results`, `where` filters, output fields)

This notebook focuses on basic `collection.add()` parameters. Advanced topics like persistence and filtering are covered in upcoming notebooks.


## 🔑 Common Parameters
- **Collection Name**: Logical grouping of related documents/vectors.
- **Documents**: Raw text or content (e.g., sentences, paragraphs, images).
- **IDs**: Unique identifiers for each document/vector.
- **Metadata**: Key-value pairs attached to vectors (e.g., tags, categories).
- **Embeddings**: Vector representations of documents used for similarity search.


In [None]:
# 📦 Install required packages
%pip install chromadb

In [None]:
# Example: Adding documents to ChromaDB
import chromadb
chroma_client = chromadb.Client()

# Create or get collection
collection = chroma_client.get_or_create_collection(name="my_collection")

# Add documents with key parameters
collection.add(
    documents=["Red apple", "Yellow banana"],
    metadatas=[{"category": "fruit"}, {"category": "fruit"}],
    ids=["doc1", "doc2"],
    embeddings=[[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]]
)

## ✅ Summary
Understanding parameters helps structure vector data efficiently, enabling accurate similarity search, filtering, and metadata-based querying.


### 🔄 Additional Parameters to Know

Besides the basic `collection.add()` parameters like `documents`, `ids`, `metadata`, and `embeddings`, here are more parameters you'll commonly encounter when working with ChromaDB:

- **embedding_function**: Used when creating a collection. This defines how your documents are converted to vectors.
- **n_results**: Used in `.query()` to control how many similar documents to retrieve.
- **include**: Specify what to include in query results (e.g., `documents`, `distances`, `metadatas`).
- **where / where_document**: Filters for structured metadata or raw document content (like SQL WHERE).
- **Persistence (Client type)**:
  - `chromadb.Client()` – stores everything in memory (lost on restart)
  - `chromadb.PersistentClient(path="...")` – stores vectors on disk and supports reuse

These parameters give you full control over how vectors are stored, searched, filtered, and retrieved.

➡️ You will see these in action in later notebooks (e.g., OpenAI integration, persistence, RAG).
