[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/docs/quick-tour/interacting-with-the-index.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/docs/quick-tour/interacting-with-the-index.ipynb)

# Interacting with a Pinecone index

Pinecone creates an index for your input vectors,
and it lets you query their nearest neighbors.
A Pinecone index supports the following operations:

* `upsert`: insert data formatted as `(id, vector)` tuples into the index, or replace existing `(id, vector)` tuples with new vector values. Optionally, you can attach metadata for each vector so you can use them in the query by specifying conditions. The upserted vector will look like `(id, vector, metadata)`.
* `delete`: delete vectors by id.
* `query`: query the index and retrieve the top-k nearest neighbors based on dot-product, cosine-similarity, Euclidean distance, and more.
* `fetch`: fetch vectors stored in the index by id.
* `describe_index_stats`: get statistics about the index.

## Prerequisites

Install dependencies.

In [1]:
!pip install -qU \
  pinecone==6.0.1 \
  pandas

## Creating an Index

We begin by instantiating the Pinecone client. To do this we need a [free API key](https://app.pinecone.io).

In [2]:
import os
from pinecone import Pinecone

# Get API key at app.pinecone.io
api_key = os.environ.get('PINECONE_API_KEY') or 'PINECONE_API_KEY'

# Instantiate the client
pc = Pinecone(api_key=api_key)

  from .autonotebook import tqdm as notebook_tqdm


### Creating a Pinecone Index

When creating the index we need to define several configuration properties. 

- `name` can be anything we like. The name is used as an identifier for the index when performing other operations such as `describe_index`, `delete_index`, and so on. 
- `metric` specifies the similarity metric that will be used later when you make queries to the index.
- `dimension` should correspond to the dimension of the dense vectors produced by your embedding model. In this quick start, we are using made-up data so a small value is simplest.
- `spec` holds a specification which tells Pinecone how you would like to deploy our index. You can find a list of all [available providers and regions here](https://docs.pinecone.io/docs/projects).

There are more configurations available, but this minimal set will get us started.

In [3]:
index_name = "interacting-with-the-index"

In [4]:
# Delete the demo index if it already exists
if pc.has_index(name=index_name):
    pc.delete_index(index_name)

In [5]:
from pinecone import ServerlessSpec, Metric, CloudProvider, AwsRegion

pc.create_index(
    name=index_name, 
    dimension=2, 
    metric=Metric.EUCLIDEAN,
    spec=ServerlessSpec(
        cloud=CloudProvider.AWS, 
        region=AwsRegion.US_EAST_1
    )
)

{
    "name": "interacting-with-the-index",
    "metric": "euclidean",
    "host": "interacting-with-the-index-dojoi3u.svc.aped-4627-b74a.pinecone.io",
    "spec": {
        "serverless": {
            "cloud": "aws",
            "region": "us-east-1"
        }
    },
    "status": {
        "ready": true,
        "state": "Ready"
    },
    "vector_type": "dense",
    "dimension": 2,
    "deletion_protection": "disabled",
    "tags": null
}

The index configuration is returned by the create command, but we can look it up again at any time by calling the `describe_index` method.

In [6]:
index_config = pc.describe_index(name=index_name)

print(f"The index host is {index_config.host}")

The index host is interacting-with-the-index-dojoi3u.svc.aped-4627-b74a.pinecone.io


# Using the index

Data operations such as `upsert` and `query` are sent directly to the index host instead of `api.pinecone.io`, so we use a different client object object for these operations. By using the `pc.Index()` helper method to construct this index client object, it will automatically inherit your API Key and any other configurations from the parent `Pinecone` instance.

In [9]:
# Instantiate an index client
index = pc.Index(host=index_config.host)

### Insert vectors

In a real use case, the vectors we insert would represent embeddings of our data. But for this simple demo, we will make up some small values just to illustrate the shape of the interface.

In [13]:
# Create some fake samsple data
import pandas as pd

df = pd.DataFrame()
df["id"] = ["A", "B", "C", "D", "E"]
df["vector"] = [[1., 1.], [2., 2.], [3., 3.], [4., 4.], [5., 5.]]
df

Unnamed: 0,id,vector
0,A,"[1.0, 1.0]"
1,B,"[2.0, 2.0]"
2,C,"[3.0, 3.0]"
3,D,"[4.0, 4.0]"
4,E,"[5.0, 5.0]"


We perform upsert operations in our index. The upsert operation will insert a new vector in the index or update the vector if the id was already present.

In [14]:
# Upsert the vectors
index.upsert(vectors=zip(df.id, df.vector))

{'upserted_count': 5}

### Fetch vectors

In [15]:
# Fetch vectors by ID
fetch_results = index.fetch(ids=["A", "B"])
fetch_results

FetchResponse(namespace='', vectors={'A': Vector(id='A', values=[1.0, 1.0], metadata=None, sparse_values=None), 'B': Vector(id='B', values=[2.0, 2.0], metadata=None, sparse_values=None)}, usage={'read_units': 1})

### Query top-k vectors

In [16]:
# Query top-k nearest neighbors
query_results = index.query(vector=[1.1, 1.1], top_k=2)
query_results

{'matches': [], 'namespace': '', 'usage': {'read_units': 1}}

### Update vectors by ID

In [17]:
# Fetch current vectors by ID
fetch_result = index.fetch(ids=["A"])
fetch_result

FetchResponse(namespace='', vectors={'A': Vector(id='A', values=[1.0, 1.0], metadata=None, sparse_values=None)}, usage={'read_units': 1})

In [18]:
# Update vectors by ID
index.upsert(vectors=[("A",[0.1, 0.1])])

{'upserted_count': 1}

In [25]:
# Fetch vector by the same ID again
fetch_result = index.fetch(ids=["A"])
fetch_result

FetchResponse(namespace='', vectors={'A': Vector(id='A', values=[0.1, 0.1], metadata=None, sparse_values=None)}, usage={'read_units': 1})

### Delete vectors by ID

In [26]:
# Delete vectors by ID
index.delete(ids=["A"])

{}

In [28]:
# Deleted vectors are empty
fetch_results = index.fetch(ids=["A", "B"])
fetch_results

FetchResponse(namespace='', vectors={'A': Vector(id='A', values=[0.1, 0.1], metadata=None, sparse_values=None), 'B': Vector(id='B', values=[2.0, 2.0], metadata=None, sparse_values=None)}, usage={'read_units': 1})

### Get index statistics

In [29]:
# Index statistics
index.describe_index_stats()

{'dimension': 2,
 'index_fullness': 0.0,
 'metric': 'euclidean',
 'namespaces': {'': {'vector_count': 4}},
 'total_vector_count': 4,
 'vector_type': 'dense'}

### Delete the index

In [30]:
# Delete the index
pc.delete_index(name=index_name)