Dedicated Read Capacity and Metadata Schema Configuration for Serverless Indexes #528

jhamon · 2025-11-04T07:31:23Z

Add Support for Read Capacity and Metadata Schema Configuration for Serverless Indexes

Summary

This PR adds support for configuring read_capacity and schema (metadata schema) for serverless indexes in the Pinecone Python client. These features allow users to:

Configure dedicated read capacity nodes for better performance and cost predictability
Limit metadata indexing to specific fields for improved performance
Configure these settings both at index creation and after creation (for read_capacity)

Features Added

1. Read Capacity Configuration

Serverless indexes can now be configured with either OnDemand (default) or Dedicated read capacity modes. Dedicated mode allocates dedicated read nodes for your workload, providing more predictable performance and costs.

2. Metadata Schema Configuration

Users can now specify which metadata fields are filterable, limiting metadata indexing to only the fields needed for query filtering. This improves index building and query performance when dealing with large amounts of metadata.

Code Examples

Creating a Serverless Index with Dedicated Read Capacity

from pinecone import Pinecone, ServerlessSpec, CloudProvider, GcpRegion, Metric

pc = Pinecone(api_key='YOUR_API_KEY')

# Create an index with dedicated read capacity
pc.create_index(
    name='my-index',
    dimension=1536,
    metric=Metric.COSINE,
    spec=ServerlessSpec(
        cloud=CloudProvider.GCP,
        region=GcpRegion.US_CENTRAL1,
        read_capacity={
            "mode": "Dedicated",
            "dedicated": {
                "node_type": "t1",
                "scaling": "Manual",
                "manual": {
                    "shards": 2,
                    "replicas": 2
                }
            }
        }
    )
)

Creating a Serverless Index with Metadata Schema

from pinecone import Pinecone, ServerlessSpec, CloudProvider, AwsRegion, Metric

pc = Pinecone(api_key='YOUR_API_KEY')

# Create an index with metadata schema configuration
pc.create_index(
    name='my-index',
    dimension=1536,
    metric=Metric.COSINE,
    spec=ServerlessSpec(
        cloud=CloudProvider.AWS,
        region=AwsRegion.US_WEST_2,
        schema={
            "genre": {"filterable": True},
            "year": {"filterable": True},
            "description": {"filterable": True}
        }
    )
)

Creating an Index for Model with Read Capacity and Schema

from pinecone import Pinecone, CloudProvider, AwsRegion, EmbedModel

pc = Pinecone(api_key='YOUR_API_KEY')

# Create an index for a model with dedicated read capacity and schema
pc.create_index_for_model(
    name='my-index',
    cloud=CloudProvider.AWS,
    region=AwsRegion.US_EAST_1,
    embed={
        "model": EmbedModel.Multilingual_E5_Large,
        "field_map": {"text": "my-sample-text"}
    },
    read_capacity={
        "mode": "Dedicated",
        "dedicated": {
            "node_type": "t1",
            "scaling": "Manual",
            "manual": {"shards": 1, "replicas": 1}
        }
    },
    schema={
        "category": {"filterable": True},
        "tags": {"filterable": True}
    }
)

Configuring Read Capacity on an Existing Index

from pinecone import Pinecone

pc = Pinecone(api_key='YOUR_API_KEY')

# Switch to OnDemand read capacity
pc.configure_index(
    name='my-index',
    read_capacity={"mode": "OnDemand"}
)

# Switch to Dedicated read capacity with manual scaling
pc.configure_index(
    name='my-index',
    read_capacity={
        "mode": "Dedicated",
        "dedicated": {
            "node_type": "t1",
            "scaling": "Manual",
            "manual": {
                "shards": 3,
                "replicas": 2
            }
        }
    }
)

# Scale up by increasing shards and replicas
pc.configure_index(
    name='my-index',
    read_capacity={
        "mode": "Dedicated",
        "dedicated": {
            "node_type": "t1",
            "scaling": "Manual",
            "manual": {
                "shards": 4,
                "replicas": 3
            }
        }
    }
)

# Verify the configuration was applied
desc = pc.describe_index("my-index")
assert desc.spec.serverless.read_capacity.mode == "Dedicated"

Async Examples

All functionality is also available in the async client:

import asyncio
from pinecone import PineconeAsyncio, ServerlessSpec, CloudProvider, AwsRegion, Metric

async def main():
    async with PineconeAsyncio(api_key='YOUR_API_KEY') as pc:
        # Create index with dedicated read capacity
        await pc.create_index(
            name='my-index',
            dimension=1536,
            metric=Metric.COSINE,
            spec=ServerlessSpec(
                cloud=CloudProvider.AWS,
                region=AwsRegion.US_EAST_1,
                read_capacity={
                    "mode": "Dedicated",
                    "dedicated": {
                        "node_type": "t1",
                        "scaling": "Manual",
                        "manual": {"shards": 2, "replicas": 2}
                    }
                }
            )
        )
        
        # Configure read capacity later
        await pc.configure_index(
            name='my-index',
            read_capacity={
                "mode": "Dedicated",
                "dedicated": {
                    "node_type": "t1",
                    "scaling": "Manual",
                    "manual": {"shards": 3, "replicas": 2}
                }
            }
        )

asyncio.run(main())

Type Safety Improvements

This PR also improves type hints throughout the codebase by replacing Any types with specific TypedDict and OpenAPI model types for better IDE support and type checking. The following types are now exported from the top-level package:

ReadCapacityDict
ReadCapacityOnDemandDict
ReadCapacityDedicatedDict
ReadCapacityDedicatedConfigDict
ScalingConfigManualDict
MetadataSchemaFieldConfig

Changes

Core Functionality

Added read_capacity and schema parameters to ServerlessSpec class
Extended create_index to support read_capacity and schema via ServerlessSpec
Extended create_index_for_model to support read_capacity and schema
Extended configure_index to support read_capacity for serverless indexes
Added helper methods __parse_read_capacity and __parse_schema in request factory
Improved type hints throughout the codebase (replacing Any with specific types)

Documentation

Updated create_index docstrings in both sync and async interfaces
Updated create_index_for_model docstrings in both sync and async interfaces
Updated configure_index docstrings in both sync and async interfaces
Added comprehensive examples in docs/db_control/serverless-indexes.md
Added code examples showing how to configure read capacity

Testing

Added integration tests for create_index with read_capacity and schema
Added integration tests for create_index_for_model with read_capacity and schema
Added integration tests for configure_index with read_capacity
Tests cover both sync and async clients
Tests cover edge cases including transitions between read capacity modes

Breaking Changes

None. All changes are additive and backward compatible.

@yorickvP

⚠️ **Python 3.9 is no longer supported.** The SDK now requires Python 3.10 or later. Python 3.9 reached end-of-life on October 2, 2025. Users must upgrade to Python 3.10+ to continue using the SDK. ⚠️ **Namespace parameter default behavior changed.** The SDK no longer applies default values for the `namespace` parameter in GRPC methods. When `namespace=None`, the parameter is omitted from requests, allowing the API to handle namespace defaults appropriately. This change affects `upsert_from_dataframe` methods in GRPC clients. The API is moving toward `"__default__"` as the default namespace value, and this change ensures the SDK doesn't override API defaults. Note: The official SDK package was renamed last year from `pinecone-client` to `pinecone` beginning in version 5.1.0. Please remove `pinecone-client` from your project dependencies and add `pinecone` instead to get the latest updates if upgrading from earlier versions. You can now configure dedicated read nodes for your serverless indexes, giving you more control over query performance and capacity planning. By default, serverless indexes use OnDemand read capacity, which automatically scales based on demand. With dedicated read capacity, you can allocate specific read nodes with manual scaling control. **Create an index with dedicated read capacity:** ```python from pinecone import ( Pinecone, ServerlessSpec, CloudProvider, AwsRegion, Metric ) pc = Pinecone() pc.create_index( name='my-index', dimension=1536, metric=Metric.COSINE, spec=ServerlessSpec( cloud=CloudProvider.AWS, region=AwsRegion.US_EAST_1, read_capacity={ "mode": "Dedicated", "dedicated": { "node_type": "t1", "scaling": "Manual", "manual": { "shards": 2, "replicas": 2 } } } ) ) ``` **Configure read capacity on an existing index:** You can switch between OnDemand and Dedicated modes, or adjust the number of shards and replicas for dedicated read capacity: ```python from pinecone import Pinecone pc = Pinecone() pc.configure_index( name='my-index', read_capacity={"mode": "OnDemand"} ) pc.configure_index( name='my-index', read_capacity={ "mode": "Dedicated", "dedicated": { "node_type": "t1", "scaling": "Manual", "manual": { "shards": 3, "replicas": 2 } } } ) pc.configure_index( name='my-index', read_capacity={ "mode": "Dedicated", "dedicated": { "node_type": "t1", "scaling": "Manual", "manual": { "shards": 4, "replicas": 3 } } } ) ``` When you change read capacity configuration, the index will transition to the new configuration. You can use `describe_index` to check the status of the transition. See [PR #528](#528) for details. You can now fetch vectors using metadata filters instead of vector IDs. This is especially useful when you need to retrieve vectors based on their metadata properties. ```python from pinecone import Pinecone pc = Pinecone() index = pc.Index(host="your-index-host") response = index.fetch_by_metadata( filter={'genre': {'$in': ['comedy', 'drama']}, 'year': {'$eq': 2019}}, namespace='my_namespace', limit=50 ) print(f"Found {len(response.vectors)} vectors") for vec_id, vector in response.vectors.items(): print(f"ID: {vec_id}, Metadata: {vector.metadata}") ``` **Pagination support:** When fetching large numbers of vectors, you can use pagination tokens to retrieve results in batches: ```python response = index.fetch_by_metadata( filter={'status': 'active'}, limit=100 ) if response.pagination and response.pagination.next: next_response = index.fetch_by_metadata( filter={'status': 'active'}, pagination_token=response.pagination.next, limit=100 ) ``` The update method used to require a vector id to be passed, but now you have the option to pass a metadata filter instead. This is useful for bulk metadata updates across many vectors. There is also a dry_run option that allows you to preview the number of vectors that would be changed by the update before performing the operation. ```python from pinecone import Pinecone pc = Pinecone() index = pc.Index(host="your-index-host") response = index.update( set_metadata={'status': 'active'}, filter={'genre': {'$eq': 'drama'}}, dry_run=True ) print(f"Would update {response.matched_records} vectors") response = index.update( set_metadata={'status': 'active'}, filter={'genre': {'$eq': 'drama'}} ) ``` A new `FilterBuilder` utility class provides a type-safe, fluent interface for constructing metadata filters. While perhaps a bit verbose, it can help prevent common errors like misspelled operator names and provides better IDE support. When you chain `.build()` onto the `FilterBuilder` it will emit a python dictionary representing the filter. Methods that take metadata filters as arguments will continue to accept dictionaries as before. ```python from pinecone import Pinecone, FilterBuilder pc = Pinecone() index = pc.Index(host="your-index-host") filter1 = FilterBuilder().eq("genre", "drama").build() filter2 = (FilterBuilder().eq("genre", "drama") & FilterBuilder().gt("year", 2020)).build() filter3 = (FilterBuilder().eq("genre", "comedy") | FilterBuilder().eq("genre", "drama")).build() filter4 = ((FilterBuilder().eq("genre", "drama") & FilterBuilder().gte("year", 2020)) | (FilterBuilder().eq("genre", "comedy") & FilterBuilder().lt("year", 2000))).build() response = index.fetch_by_metadata(filter=filter2, limit=50) index.update( set_metadata={'status': 'archived'}, filter=filter3 ) ``` The FilterBuilder supports all Pinecone filter operators: `eq`, `ne`, `gt`, `gte`, `lt`, `lte`, `in_`, `nin`, and `exists`. Compound expressions are build with and `&` and or `|`. See [PR #529](#529) for `fetch_by_metadata`, [PR #544](#544) for `update()` with filter, and [PR #531](#531) for FilterBuilder. You can now create namespaces in serverless indexes directly from the SDK: ```python from pinecone import Pinecone pc = Pinecone() index = pc.Index(host="your-index-host") namespace = index.create_namespace(name="my-namespace") print(f"Created namespace: {namespace.name}, Vector count: {namespace.vector_count}") namespace = index.create_namespace( name="my-namespace", schema={ "fields": { "genre": {"filterable": True}, "year": {"filterable": True} } } ) ``` **Note:** This operation is not supported for pod-based indexes. See [PR #532](#532) for details. For sparse indexes with integrated embedding configured to use the `pinecone-sparse-english-v0` model, you can now specify which terms must be present in search results: ```python from pinecone import Pinecone, SearchQuery pc = Pinecone() index = pc.Index(host="your-index-host") response = index.search( namespace="my-namespace", query=SearchQuery( inputs={"text": "Apple corporation"}, top_k=10, match_terms={ "strategy": "all", "terms": ["apple", "corporation"] } ) ) ``` The `match_terms` parameter ensures that all specified terms must be present in the text of each search hit. Terms are normalized and tokenized before matching, and order does not matter. See [PR #530](#530) for details. **Update API keys, projects, and organizations:** ```python from pinecone import Admin admin = Admin() # Auth with PINECONE_CLIENT_ID and PINECONE_CLIENT_SECRET api_key = admin.api_key.update( api_key_id='my-api-key-id', name='updated-api-key-name', roles=['ProjectEditor', 'DataPlaneEditor'] ) project = admin.project.update( project_id='my-project-id', name='updated-project-name', max_pods=10, force_encryption_with_cmek=True ) organization = admin.organization.update( organization_id='my-org-id', name='updated-organization-name' ) ``` **Delete organizations:** ```python from pinecone import Admin admin = Admin() admin.organization.delete(organization_id='my-org-id') ``` See [PR #527](#527) and [PR #543](#543) for details. You can now configure which metadata fields are filterable when creating serverless indexes. This helps optimize performance by only indexing metadata fields that you plan to use for filtering: ```python from pinecone import ( Pinecone, ServerlessSpec, CloudProvider, AwsRegion, Metric ) pc = Pinecone() pc.create_index( name='my-index', dimension=1536, metric=Metric.COSINE, spec=ServerlessSpec( cloud=CloudProvider.AWS, region=AwsRegion.US_EAST_1, schema={ "genre": {"filterable": True}, "year": {"filterable": True}, "rating": {"filterable": True} } ) ) ``` When using schemas, only fields marked as `filterable: True` in the schema can be used in metadata filters. See [PR #528](#528) for details. The SDK now exposes header information from API responses. This information is available in response objects via the `_response_info` attribute and can be useful for debugging and monitoring. ```python from pinecone import Pinecone pc = Pinecone() index = pc.Index(host="your-index-host") response = index.query( vector=[0.1, 0.2, 0.3, ...], top_k=10, namespace='my_namespace' ) for k, v in response._response_info.get('raw_headers').items(): print(f"{k}: {v}") ``` See [PR #539](#539) for details. We've replaced Python's standard library `json` module with `orjson`, a fast JSON library written in Rust. This provides significant performance improvements for both serialization and deserialization of request payloads: - **Serialization (dumps)**: 10-23x faster depending on payload size - **Deserialization (loads)**: 4-7x faster depending on payload size These improvements are especially beneficial for: - High-throughput applications making many API calls - Applications handling large vector payloads - Real-time applications where latency matters No code changes are required - the API remains the same, and you'll automatically benefit from these performance improvements. See [PR #556](#556) for details. We've optimized gRPC response parsing by replacing `json_format.MessageToDict` with direct protobuf field access. This optimization provides approximately 2x faster response parsing for gRPC operations. Special thanks to [@yorickvP](https://github.com/yorickvP) for surfacing the `json_format.MessageToDict` refactor opportunity. While we didn't merge the specific PR, yorick's insight led us to implement a similar optimization that significantly improves gRPC performance. See [PR #553](#553) for details. - **Type hints and IDE support**: Comprehensive type hints throughout the SDK improve IDE autocomplete and type checking. The SDK now uses Python 3.10+ type syntax throughout. - **Documentation**: Updated docstrings with RST formatting and code examples for better developer experience. - **Dependency updates**: Updated protobuf to 5.29.5 to address security vulnerabilities. Updated `pinecone-plugin-assistant` to version 3.0.1. - **Build system**: Migrated from poetry to uv for faster dependency management. - [@yorickvP](https://github.com/yorickvP) - Thanks for surfacing the gRPC response parsing optimization opportunity!

jhamon added 2 commits November 4, 2025 02:25

Implement DRN

a72099a

Fix CI failures

844a59e

jhamon changed the title ~~Add Support for Dedicated Read Capacity and Metadata Schema Configuration for Serverless Indexes~~ Dedicated Read Capacity and Metadata Schema Configuration for Serverless Indexes Nov 4, 2025

jhamon marked this pull request as ready for review November 4, 2025 09:11

jhamon merged commit b63a907 into release-candidate/2025-10 Nov 4, 2025
34 checks passed

jhamon deleted the jhamon/dedicated-reads branch November 4, 2025 09:11

jhamon mentioned this pull request Nov 18, 2025

Merge release-candidate/2025-10 #562

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dedicated Read Capacity and Metadata Schema Configuration for Serverless Indexes #528

Dedicated Read Capacity and Metadata Schema Configuration for Serverless Indexes #528

Uh oh!

jhamon commented Nov 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Dedicated Read Capacity and Metadata Schema Configuration for Serverless Indexes #528

Dedicated Read Capacity and Metadata Schema Configuration for Serverless Indexes #528

Uh oh!

Conversation

jhamon commented Nov 4, 2025

Add Support for Read Capacity and Metadata Schema Configuration for Serverless Indexes

Summary

Features Added

1. Read Capacity Configuration

2. Metadata Schema Configuration

Code Examples

Creating a Serverless Index with Dedicated Read Capacity

Creating a Serverless Index with Metadata Schema

Creating an Index for Model with Read Capacity and Schema

Configuring Read Capacity on an Existing Index

Async Examples

Type Safety Improvements

Changes

Core Functionality

Documentation

Testing

Breaking Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants