[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/docs/read-units-demonstrated.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/docs/read-units-demonstrated.ipynb)

# Demonstrating the retrieval of new `Read Units` (RUs)

[Pinecone serverless](https://www.pinecone.io/blog/serverless/) (`pinecone-client` >= 3.0.0) has an entirely new infrastructure. Included in this major change is a [novel pricing structure](https://docs.pinecone.io/docs/understanding-cost), based on the [serverless model](https://www.pinecone.io/blog/serverless-architecture/). Now, reads and writes have separate cost structures.

This notebook will take users through building a Pinecone serverless index, populating that index, and retrieving the related Read Units (RUs) associated with different types of queries.

## Install the latest Pinecone client

Read about the [newest Pinecone client](https://docs.pinecone.io/docs/new-api).

Specifically, we'll be installing the [gRPC version](https://docs.pinecone.io/docs/upsert-data#grpc-python-client) of the newest Pinecone client to maximize performance of upserts and other data operations.

In [3]:
!pip install "pinecone-client[grpc]==3.0.0"



In [2]:
# Note version of Python this NB is tested with:

!python --version

Python 3.10.12


## Connect to Pinecone and create an index

In [9]:
import os
from getpass import getpass

api_key = os.getenv('PINECONE_API_KEY') or getpass("Pinecone API key:")  # Make sure this is the API key associated with your *Serverless* project (app.pinecone.io)

In [12]:
# Input the name of the index you want to create

index_name = input("Index name: ")

Index name:  rus-demo


In [13]:
from pinecone import ServerlessSpec
from pinecone.grpc import PineconeGRPC

pc = PineconeGRPC(api_key=api_key)

if not index_name in pc.list_indexes().names():
    print(f'Creating index \"{index_name}\"...\n')
    pc.create_index(
        name=index_name,
        dimension=1536,
        metric="cosine",
        spec=ServerlessSpec(cloud='aws', region='us-west-2')
    )
    if index_name in pc.list_indexes().names():
        print(f'Successfully created index \"{index_name}\"!')
else:
    print(f'An index named \"{index_name}\" already exists!')


Creating index "rus-demo"...

Successfully created index "rus-demo"!


In [14]:
# Instantiate index object with your index_name

index = pc.Index(index_name)

In [15]:
index.describe_index_stats()  # Great, we have created our index, and it currently have no vectors in it; we have no namespaces yet

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 0}},
 'total_vector_count': 0}

## [Skip this section if your index exists already]

## Batch [upsert](https://docs.pinecone.io/docs/upsert-data) vectors into different namespaces

We'll create and populate three [namespaces](https://docs.pinecone.io/docs/namespaces) with 50k, 100k, and 200k vectors, respectively. Namespaces are optional, but they are a best practice for limiting queries to relevant records, which both speeds up queries and reduces the RUs consumed.

In [None]:
# Takes ~2 mins to run on Google colab; longer if running locally

import uuid
import numpy as np
from tqdm import tqdm

# Define sizes of namespaces as a list of tuples. Each tuple contains a name and a size.
NAMESPACE_SIZES = [('50k', 50_000), ('100k', 100_000), ('200k', 200_000)]
BATCH_SIZE = 100  # Number of items in each batch
BATCHES = 100  # Number of batches
DIMS = 1536  # Assuming a dimension for the random values

# Loop through each namespace and its corresponding size
for namespace, size in NAMESPACE_SIZES:
    print(f'Populating namespace "{namespace}":')

    # Calculate the total number of iterations needed
    total_iterations = size // (BATCH_SIZE * BATCHES)

    # Process data in batches using tqdm for progress indication
    for _ in tqdm(range(total_iterations)):

        # Initialize an empty list to store all batches
        all_batches = []

        # Outer loop to create multiple batches
        for _ in range(BATCHES):

            # Initialize an empty list for a single batch
            single_batch = []

            # Inner loop to create each item in the batch
            for _ in range(BATCH_SIZE):

                # Generate a random vector value using numpy
                random_vector_value = np.random.rand(DIMS)

                # Create a tuple with a unique ID, the random vector value, and some toy metadata
                item = (str(uuid.uuid4()), random_vector_value, {'metadata': 'some toy metadata'})

                # Add the created item to the single batch
                single_batch.append(item)

            # Add the single batch to the list of all batches
            all_batches.append(single_batch)

        # Upsert (update/insert) each batch asynchronously and collect future objects
        async_results = []
        for batch in all_batches:
            async_result = index.upsert(vectors=batch, async_req=True, namespace=namespace)
            async_results.append(async_result)

        # Wait for all asynchronous operations to complete
        responses = [async_result.result() for async_result in async_results]

Populating namespace "50k":


100%|██████████████████████████████████████████████████████████████████████████████████| 5/5 [03:34<00:00, 42.93s/it]


Populating namespace "100k":


 60%|████████████████████████████████████████████████▌                                | 6/10 [04:16<02:50, 42.63s/it]

3. Validate everything looks as expected.

In [13]:
# Now we can see our namespaces and how many vectors they hold ("50k" has 50k, "100k" has 100k, and "200k" has 200k):
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'100k': {'vector_count': 100000},
                '200k': {'vector_count': 200000},
                '50k': {'vector_count': 50000}},
 'total_vector_count': 350000}

## Inspect Read Costs

We'll now execute a simple query on the first namespace (`'50k'`) and inspect its response.

You should see `'usage': {'read_units': 5}` at the way bottom. Those are our `'RUs'`!

In [49]:
# Create a dummy query to pass to our vector DB

dummy_vectorized_query = [0] * 1536  # Must have same dimensions as our indexed vectors (1536)

In [50]:
# Issue the query and get response

index.query(vector=dummy_vectorized_query, top_k=10, namespace='50k')

{'matches': [{'id': '7ba8b4aa-f883-4c4e-b0b9-b7fc43179750',
              'metadata': None,
              'score': 0.0,
              'sparse_values': {'indices': [], 'values': []},
              'values': []},
             {'id': 'e83c6d3e-4706-4ada-acbe-e825d11bcdf4',
              'metadata': None,
              'score': 0.0,
              'sparse_values': {'indices': [], 'values': []},
              'values': []},
             {'id': 'd334f5f6-e62d-47ac-ad7b-3124dcc4cb4d',
              'metadata': None,
              'score': 0.0,
              'sparse_values': {'indices': [], 'values': []},
              'values': []},
             {'id': '303aa521-b7b0-45d8-9cef-8ab3c8c23f9f',
              'metadata': None,
              'score': 0.0,
              'sparse_values': {'indices': [], 'values': []},
              'values': []},
             {'id': '9fe84421-3ee1-46f5-9af6-a158983791f5',
              'metadata': None,
              'score': 0.0,
              'sparse_values': {'ind

Since every query consumes read units, every query's response will have a `usage` field. This `usage` field contains the exact number of `RUs` your query incurred.

We can drill down to _only_ our query's corresponding cost in `RUs` by doing the following:

In [51]:
# Grab only RUs:
index.query(vector=dummy_vectorized_query, top_k=10, namespace='50k')['usage']['read_units']

5

Querying the `"50k"` namespace consumed `5` `RUs`, which is the minimum value a query can use.

Let's query the `"100k"` namespace to see how the result changes:

In [53]:
index.query(vector=dummy_vectorized_query, top_k=10, namespace='100k')['usage']['read_units']

6

When we queried the `"50k"` namespace, we consumed `5` `RUs`. When we now query a namespace that has `2x` the vectors (the `"100k"` namespace), we see that we only consumed `1` extra `RU`.


Let's see what happens when we `2x` the size again, querying the `"200k"` namesapce:

In [55]:
index.query(vector=dummy_vectorized_query, top_k=10, namespace='200k')['usage']['read_units']

8

When we query the `"200k"` namespace, our `RU` cost goes from `6` to `8`. Note that this is sub-linear scaling in action!

### Toggling `top_k`

Now let's stay querying the "200k" namespace, but increase our `top_k` value from `10` to `100` to see its effect:

In [56]:
index.query(vector=dummy_vectorized_query, top_k=100, namespace='200k')['usage']['read_units']

8

Increasing our `top_k` from `10` to `100` in the `"200k"` namespace has _not_ changed the number of `RUs` incurred.

This is because Pinecone's initial scan of the `"200k"` namespace was enough to produce the IDs of the `top_k` results for both `10` and `100`.


### Toggling `include_metadata`


But what if we set `include_metadata` to `True`? This _should_ trigger a "post-scan" `Fetch` stage with an additional cost of `1` `RU` per `10` items in our result set:

In [57]:
index.query(vector=dummy_vectorized_query, top_k=100, namespace='200k', include_metadata=True)['usage']['read_units']

18

Looks like that worked! By including metadata in our query's response (`include_metadata=True`), we went from a cost of `8` `RUs` to a cost of `18` `RUs`, because we added `1` `RU` per `10` items returned (and we returned `100` items by setting our `top_k` to `100`). Our original cost of `8` `RUs` plus our new overhead of `10` `RUs` (`1` additional `RU` for every `10` items out of our total of `100` items), equals `18` `RUs`.


### Putting it all together

Now let's increase the `top_k` even more to see how it affects the RU cost:

In [58]:
index.query(vector=dummy_vectorized_query, top_k=1000, namespace='200k', include_metadata=True)['usage']['read_units']

108

By increasing our `top_k` from `100` to `1000`, and continuing to include metadata in our response, we are now at a cost of `108` `RUs`.


---



[Play with the cost of your queries](https://docs.pinecone.io/docs/managing-cost) on your own and [let us know](https://community.pinecone.io/) what you find!