Skip to content

Commit

Permalink
Improve upsert throughput by 3x (#334)
Browse files Browse the repository at this point in the history
## Problem

Python SDK upsert throughput is low compared to other SDKs - for example
I can achive 880 vector upserts/sec with the Python SDK, compared to
3500 upserts/sec with the Java SDK.

Profiling the Python SDK performing these upserts shows a large
percentage of time in gRPC / protobuf serialisation / deserialisation.

## Solution

Upgrade protobuf from v3 to v4. This adds a number of performance
improvements in parsing / serialization as documented at
https://protobuf.dev/news/2022-05-06/#python-updates

This increases upsert() throughput by 3x (measured by upserting 1M 768
dimension indexes to a pod-based index in batches of 500):

* Before:  880 vectors/sec
* After:  2580 vectors/sec

As per the documentation, this results in an incompatible change with
the _generated_ Python code, so this depends on a related change to
pinecone-protos to change the version of protobuf used to generate the
Python code there.

## Type of Change

- [x] None of the above: Performance improvement.

## Test Plan

Use existing regression tests.
  • Loading branch information
daverigby committed May 1, 2024
1 parent e123da1 commit 82dbd7e
Show file tree
Hide file tree
Showing 7 changed files with 866 additions and 2,076 deletions.
14 changes: 6 additions & 8 deletions .github/workflows/testing-dependency.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -56,12 +56,11 @@ jobs:
# - 4.1.0
- 4.3.3
protobuf_version:
- 3.20.3
- 4.25.3
protoc-gen-openapiv2:
- 0.0.1
googleapis_common_protos_version:
- 1.53.0
- 1.62.0
grpc_gateway_protoc_gen_openapiv2_version:
- 0.1.0
steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/test-dependency-grpc
Expand Down Expand Up @@ -92,12 +91,11 @@ jobs:
- 3.1.3
- 4.3.3
protobuf_version:
- 3.20.3
- 4.25.3
protoc-gen-openapiv2:
- 0.0.1
googleapis_common_protos_version:
- 1.53.0
- 1.62.0
grpc_gateway_protoc_gen_openapiv2_version:
- 0.1.0
steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/test-dependency-grpc
Expand Down
1,817 changes: 189 additions & 1,628 deletions pinecone/core/grpc/protos/vector_service_pb2.py

Large diffs are not rendered by default.

0 comments on commit 82dbd7e

Please sign in to comment.