# Namespacing with Pinecone

Namespacing is a neat feature in a Pinecone service that allows you to partition your data in an index. When you read from or write to a namespace in an index, you only access data in that particular namespace. In other words, data from two namespaces may have the same ids but different values. Namespacing is useful when you want to reuse the same data processing pipeline but query only a subset of your data. For example, when you are building a movie recommender system, you could use namespacing to separate recommendations by genre.

## Prerequisites

Install dependencies.

In [1]:
!pip install -qU pip pinecone-client pandas

Set up Pinecone.

In [2]:
import pinecone
import os

api_key = os.getenv("PINECONE_API_KEY") or "USE_YOUR_API_KEY"
pinecone.init(api_key=api_key)

Check Pinecone version compatibility.

In [3]:
import pinecone.info

version_info = pinecone.info.version()
server_version = ".".join(version_info.server.split(".")[:2])
client_version = ".".join(version_info.client.split(".")[:2])
notebook_version = "0.8"

assert (
    notebook_version == server_version
), "This notebook is outdated. Consider using the latest version of the notebook."
assert client_version == server_version, "Please upgrade pinecone-client."

## Namespacing

In [4]:
import pinecone.graph
import pinecone.service
import pinecone.connector
import pandas as pd

In [5]:
service_name = "pinecone-namespacing"

# Deploy a service
graph = pinecone.graph.IndexGraph(metric="euclidean")
pinecone.service.deploy(service_name=service_name, graph=graph)

# Create a connection
conn = pinecone.connector.connect(service_name)

### Generate movie data

In [6]:
# Generate some data

df = pd.DataFrame()
df["id"] = ["Wall-E", "Up", "Ratatouille", "Toy Story"]
df["vector"] = [[1, 1], [2, 2], [3, 3], [4, 4]]
df

Unnamed: 0,id,vector
0,Wall-E,"[1, 1]"
1,Up,"[2, 2]"
2,Ratatouille,"[3, 3]"
3,Toy Story,"[4, 4]"


### Insert vectors without specifying a namespace

In [7]:
# Insert vectors without specifying a namespace
conn.upsert(items=zip(df.id, df.vector)).collect()
conn.info()

InfoResult(index_size=4)

### Insert vectors into a namespace

In [8]:
romantic_comedies = ["Wall-E", "Ratatouille"]
romcom_df = df[df.id.isin(romantic_comedies)]
romcom_df

Unnamed: 0,id,vector
0,Wall-E,"[1, 1]"
2,Ratatouille,"[3, 3]"


In [9]:
# Insert vectors into a namespace
conn.upsert(
    items=zip(romcom_df.id, romcom_df.vector), namespace="romantic-comedy"
).collect()
conn.info(namespace="romantic-comedy")

InfoResult(index_size=2)

### Query top-3 results, without a namespace

In [10]:
query_results = conn.query(queries=df[df.id == "Wall-E"].vector, top_k=3).collect()
query_results

[QueryResult(ids=['Wall-E', 'Up', 'Ratatouille'], scores=[0.0, -2.0, -8.0], data=None)]

### Query top-3 results, with a namespace

We should expect to see only romantic comedies in the query results.

In [11]:
query_results = conn.query(
    queries=df[df.id == "Wall-E"].vector, top_k=3, namespace="romantic-comedy"
).collect()
print(query_results)

[QueryResult(ids=['Wall-E', 'Ratatouille'], scores=[0.0, -8.0], data=None)]


### Stop the service

In [12]:
# stop the service
pinecone.service.stop(service_name=service_name)

{'success': True}