This page covers how to use ClickHouse Vector Search within LangChain.
ClickHouse is a open source real-time OLAP database with full SQL support and a wide range of functions to assist users in writing analytical queries. Some of these functions and data structures perform distance operations between vectors, enabling ClickHouse to be used as a vector database.
Due to the fully parallelized query pipeline, ClickHouse can process vector search operations very quickly, especially when performing exact matching through a linear scan over all rows, delivering processing speed comparable to dedicated vector databases.
High compression levels, tunable through custom compression codecs, enable very large datasets to be stored and queried. ClickHouse is not memory-bound, allowing multi-TB datasets containing embeddings to be queried.
The capabilities for computing the distance between two vectors are just another SQL function and can be effectively combined with more traditional SQL filtering and aggregation capabilities. This allows vectors to be stored and queried alongside metadata, and even rich text, enabling a broad array of use cases and applications.
Finally, experimental ClickHouse capabilities like Approximate Nearest Neighbour (ANN) indices support faster approximate matching of vectors and provide a promising development aimed to further enhance the vector matching capabilities of ClickHouse.
- Install clickhouse server by binary or docker image
- Install the Python SDK with
pip install clickhouse-connect
Customize ClickhouseSettings
object with parameters
```python
from langchain.vectorstores import ClickHouse, ClickhouseSettings
config = ClickhouseSettings(host="<clickhouse-server-host>", port=8123, ...)
index = Clickhouse(embedding_function, config)
index.add_documents(...)
```
supported functions:
add_texts
add_documents
from_texts
from_documents
similarity_search
asimilarity_search
similarity_search_by_vector
asimilarity_search_by_vector
similarity_search_with_relevance_scores
There exists a wrapper around open source Clickhouse database, allowing you to use it as a vectorstore, whether for semantic search or similar example retrieval.
To import this vectorstore:
from langchain.vectorstores import Clickhouse
For a more detailed walkthrough of the MyScale wrapper, see this notebook