[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/docs/quick-tour/namespacing.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/docs/quick-tour/namespacing.ipynb)

# Namespacing with Pinecone

Namespacing is a feature in Pinecone that allows you to partition your data in an index. When you read from or write to a namespace in an index, you only access data in that particular namespace. Namespacing is useful when you want to reuse the same data processing pipeline but maintain strict separation between subsets of your data.

If your use-case is one where you feel a temptation to create multiple indexes programatically, consider whether the sort of multitenancy provided by namespaces would be a better solution to isolate different parts of your data.

For example, if you were building a movie recommender system, you could use namespacing to separate recommendations by genre. But if you need more flexibility in how you group and search records, putting genre information into metadata and using metadata filtering would probably be a better fit.

# Prerequisites

Install dependencies.

In [12]:
!pip install -qU pandas==2.2.3 pinecone==6.0.2

## Creating an Index

We begin by instantiating an instance of the Pinecone client. To do this we need a [free API key](https://app.pinecone.io).

In [13]:
import os
from pinecone import Pinecone

# Get API key at app.pinecone.io
api_key = os.environ.get("PINECONE_API_KEY") or "PINECONE_API_KEY"

# Instantiate the client
pc = Pinecone(api_key=api_key)

### Creating a Pinecone Index

When creating the index we need to define several configuration properties. 

- `name` can be anything we like. The name is used as an identifier for the index when performing other operations such as `describe_index`, `delete_index`, and so on. 
- `metric` specifies the similarity metric that will be used later when you make queries to the index.
- `dimension` should correspond to the dimension of the dense vectors produced by your embedding model. In this quick start, we are using made-up data so a small value is simplest.
- `spec` holds a specification which tells Pinecone how you would like to deploy our index. You can find a list of all [available providers and regions here](https://docs.pinecone.io/troubleshooting/available-cloud-regions).

There are more configurations available, but this minimal set will get us started.

In [14]:
index_name = "pinecone-namespacing"

In [15]:
# Delete the demo index if it already exists
if pc.has_index(name=index_name):
    pc.delete_index(index_name)

In [16]:
from pinecone import ServerlessSpec, Metric, CloudProvider, AwsRegion

# Create an index
pc.create_index(
    name=index_name,
    dimension=2,
    metric=Metric.EUCLIDEAN,
    spec=ServerlessSpec(cloud=CloudProvider.AWS, region=AwsRegion.US_EAST_1),
)

{
    "name": "pinecone-namespacing",
    "metric": "euclidean",
    "host": "pinecone-namespacing-dojoi3u.svc.aped-4627-b74a.pinecone.io",
    "spec": {
        "serverless": {
            "cloud": "aws",
            "region": "us-east-1"
        }
    },
    "status": {
        "ready": true,
        "state": "Ready"
    },
    "vector_type": "dense",
    "dimension": 2,
    "deletion_protection": "disabled",
    "tags": null
}

In [18]:
# You can look up the index configuration for an existing
# index using describe_index
index_config = pc.describe_index(name=index_name)
print(f"The index host is {index_config.host}")

The index host is pinecone-namespacing-dojoi3u.svc.aped-4627-b74a.pinecone.io


## Working with the Index

Data operations such as `upsert` and `query` are sent directly to the index host instead of `api.pinecone.io`, so we use a different client object object for these operations. By using the `.Index()` helper method to construct this client object, it will automatically inherit your API Key and any other configurations from the parent `Pinecone` instance.

In [19]:
# Instantiate an index client
index = pc.Index(host=index_config.host)

### Generate movie data

For this simple example scenario, we will make up some small vectors to represent different movies.

In [20]:
# Generate some data
import pandas as pd

df = pd.DataFrame()
df["id"] = ["Wall-E", "Up", "Ratatouille", "Toy Story"]
df["vector"] = [[1.0, 1.0], [2.0, 2.0], [3.0, 3.0], [4.0, 4.0]]
df

Unnamed: 0,id,vector
0,Wall-E,"[1.0, 1.0]"
1,Up,"[2.0, 2.0]"
2,Ratatouille,"[3.0, 3.0]"
3,Toy Story,"[4.0, 4.0]"


### Insert vectors without specifying a namespace

In [21]:
# Insert vectors without specifying a namespace
index.upsert(vectors=zip(df.id, df.vector))

{'upserted_count': 4}

In [22]:
import time


def is_fresh(index):
    stats = index.describe_index_stats()
    vector_count = stats.total_vector_count
    return vector_count > 0


while not is_fresh(index):
    # It takes a few moments for vectors we just upserted
    # to become available for querying
    time.sleep(5)

# View index stats
index.describe_index_stats()

{'dimension': 2,
 'index_fullness': 0.0,
 'metric': 'euclidean',
 'namespaces': {'': {'vector_count': 4}},
 'total_vector_count': 4,
 'vector_type': 'dense'}

### Insert vectors into a namespace

In [8]:
romantic_comedies = ["Wall-E", "Ratatouille"]
romcom_df = df[df.id.isin(romantic_comedies)]
romcom_df

Unnamed: 0,id,vector
0,Wall-E,"[1.0, 1.0]"
2,Ratatouille,"[3.0, 3.0]"


In [9]:
# Insert vectors into a namespace for romantic comedies
index.upsert(vectors=zip(romcom_df.id, romcom_df.vector), namespace="romantic-comedy")
index.describe_index_stats()

{'dimension': 2,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 4},
                'romantic-comedy': {'vector_count': 2}},
 'total_vector_count': 6}

### Query top-3 results, without a namespace

In [10]:
query_results = index.query(vector=df[df.id == "Wall-E"].vector[0], top_k=3)
query_results

{'matches': [{'id': 'Wall-E', 'score': 0.0, 'values': []},
             {'id': 'Up', 'score': 1.99999905, 'values': []},
             {'id': 'Ratatouille', 'score': 7.99999809, 'values': []}],
 'namespace': ''}

### Query top-3 results, with a namespace

We should expect to see only romantic comedies in the query results.

In [11]:
query_results = index.query(
    vector=df[df.id == "Wall-E"].vector[0], top_k=3, namespace="romantic-comedy"
)
query_results

{'matches': [{'id': 'Wall-E', 'score': 0.0, 'values': []},
             {'id': 'Ratatouille', 'score': 7.99999809, 'values': []}],
 'namespace': 'romantic-comedy'}

### Delete the index

Once we're done, delete the index to save resources.

In [12]:
# Delete the index
pc.delete_index(name=index_name)