Skip to content

Commit

Permalink
community[minor]: Add support for Upstash Vector (#20824)
Browse files Browse the repository at this point in the history
## Description

Adding `UpstashVectorStore` to utilize [Upstash
Vector](https://upstash.com/docs/vector/overall/getstarted)!

#17012 was opened to add Upstash Vector to langchain but was closed to
wait for filtering. Now filtering is added to Upstash vector and we open
a new PR. Additionally, [embedding
feature](https://upstash.com/docs/vector/features/embeddingmodels) was
added and we add this to our vectorstore aswell.

## Dependencies

[upstash-vector](https://pypi.org/project/upstash-vector/) should be
installed to use `UpstashVectorStore`. Didn't update dependencies
because of [this comment in the previous
PR](#17012 (review)).

## Tests

Tests are added and they pass. Tests are naturally network bound since
Upstash Vector is offered through an API.

There was [a discussion in the previous PR about mocking the
unittests](#17012 (review)).
We didn't make changes to this end yet. We can update the tests if you
can explain how the tests should be mocked.

---------

Co-authored-by: ytkimirti <yusuftaha9@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
  • Loading branch information
4 people committed Apr 29, 2024
1 parent 1a2ff56 commit cc6191c
Show file tree
Hide file tree
Showing 37 changed files with 11,751 additions and 4 deletions.
165 changes: 162 additions & 3 deletions docs/docs/integrations/providers/upstash.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,166 @@
# Upstash Redis
Upstash offers developers serverless databases and messaging
platforms to build powerful applications without having to worry
about the operational complexity of running databases at scale.

One significant advantage of Upstash is that their databases support HTTP and all of their SDKs use HTTP.
This means that you can run this in serverless platforms, edge or any platform that does not support TCP connections.

Currently, there are two Upstash integrations available for LangChain:
Upstash Vector as a vector embedding database and Upstash Redis as a cache and memory store.

# Upstash Vector

Upstash Vector is a serverless vector database that can be used to store and query vectors.

## Installation

Create a new serverless vector database at the [Upstash Console](https://console.upstash.com/vector).
Select your preferred distance metric and dimension count according to your model.


Install the Upstash Vector Python SDK with `pip install upstash-vector`.
The Upstash Vector integration in langchain is a wrapper for the Upstash Vector Python SDK. That's why the `upstash-vector` package is required.

## Integrations

Create a `UpstashVectorStore` object using credentials from the Upstash Console.
You also need to pass in an `Embeddings` object which can turn text into vector embeddings.

```python
from langchain_community.vectorstores.upstash import UpstashVectorStore
import os

os.environ["UPSTASH_VECTOR_REST_URL"] = "<UPSTASH_VECTOR_REST_URL>"
os.environ["UPSTASH_VECTOR_REST_TOKEN"] = "<UPSTASH_VECTOR_REST_TOKEN>"

store = UpstashVectorStore(
embedding=embeddings
)
```

An alternative way of `UpstashVectorStore` is to pass `embedding=True`. This is a unique
feature of the `UpstashVectorStore` thanks to the ability of the Upstash Vector indexes
to have an associated embedding model. In this configuration, documents we want to insert or
queries we want to search for are simply sent to Upstash Vector as text. In the background,
Upstash Vector embeds these text and executes the request with these embeddings. To use this
feature, [create an Upstash Vector index by selecting a model](https://upstash.com/docs/vector/features/embeddingmodels#using-a-model)
and simply pass `embedding=True`:

```python
from langchain_community.vectorstores.upstash import UpstashVectorStore
import os

os.environ["UPSTASH_VECTOR_REST_URL"] = "<UPSTASH_VECTOR_REST_URL>"
os.environ["UPSTASH_VECTOR_REST_TOKEN"] = "<UPSTASH_VECTOR_REST_TOKEN>"

store = UpstashVectorStore(
embedding=True
)
```

See [Upstash Vector documentation](https://upstash.com/docs/vector/features/embeddingmodels)
for more detail on embedding models.

### Inserting Vectors

```python
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings

loader = TextLoader("../../modules/state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

# Create a new embeddings object
embeddings = OpenAIEmbeddings()

# Create a new UpstashVectorStore object
store = UpstashVectorStore(
embedding=embeddings
)

# Insert the document embeddings into the store
store.add_documents(docs)
```

When inserting documents, first they are embedded using the `Embeddings` object.

Most embedding models can embed multiple documents at once, so the documents are batched and embedded in parallel.
The size of the batch can be controlled using the `embedding_chunk_size` parameter.

Upstash offers developers serverless databases and messaging platforms to build powerful applications without having to worry about the operational complexity of running databases at scale.
The embedded vectors are then stored in the Upstash Vector database. When they are sent, multiple vectors are batched together to reduce the number of HTTP requests.
The size of the batch can be controlled using the `batch_size` parameter. Upstash Vector has a limit of 1000 vectors per batch in the free tier.

```python
store.add_documents(
documents,
batch_size=100,
embedding_chunk_size=200
)
```

### Querying Vectors

Vectors can be queried using a text query or another vector.

The returned value is a list of Document objects.

```python
result = store.similarity_search(
"The United States of America",
k=5
)
```

Or using a vector:

```python
vector = embeddings.embed_query("Hello world")

result = store.similarity_search_by_vector(
vector,
k=5
)
```

When searching, you can also utilize the `filter` parameter which will allow you to filter by metadata:

```python
result = store.similarity_search(
"The United States of America",
k=5,
filter="type = 'country'"
)
```

See [Upstash Vector documentation](https://upstash.com/docs/vector/features/filtering)
for more details on metadata filtering.

### Deleting Vectors

Vectors can be deleted by their IDs.

```python
store.delete(["id1", "id2"])
```

### Getting information about the store

You can get information about your database like the distance metric dimension using the info function.

When an insert happens, the database an indexing takes place. While this is happening new vectors can not be queried. `pendingVectorCount` represents the number of vector that are currently being indexed.

```python
info = store.info()
print(info)

# Output:
# {'vectorCount': 44, 'pendingVectorCount': 0, 'indexSize': 2642412, 'dimension': 1536, 'similarityFunction': 'COSINE'}
```

# Upstash Redis

This page covers how to use [Upstash Redis](https://upstash.com/redis) with LangChain.

Expand All @@ -12,7 +172,6 @@ This page covers how to use [Upstash Redis](https://upstash.com/redis) with Lang
## Integrations
All of Upstash-LangChain integrations are based on `upstash-redis` Python SDK being utilized as wrappers for LangChain.
This SDK utilizes Upstash Redis DB by giving UPSTASH_REDIS_REST_URL and UPSTASH_REDIS_REST_TOKEN parameters from the console.
One significant advantage of this is that, this SDK uses a REST API. This means, you can run this in serverless platforms, edge or any platform that does not support TCP connections.


### Cache
Expand Down
Loading

0 comments on commit cc6191c

Please sign in to comment.