[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/docs/quick-tour/hello-pinecone.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/docs/quick-tour/hello-pinecone.ipynb)

# Hello, Pinecone!

This notebook will walk through the steps to get a simple Pinecone index up and running.


## Prerequisites

First we need to install a few dependencies

In [1]:
!pip install -qU pandas==2.2.3 pinecone==6.0.2

## Getting started

We begin by instantiating an instance of the Pinecone client. To do this we need a [free API key](https://app.pinecone.io).

In [2]:
import os
from pinecone import Pinecone

# Get your API key at app.pinecone.io
api_key = os.environ.get("PINECONE_API_KEY") or "PINECONE_API_KEY"

# Instantiate the Pinecone client
pc = Pinecone(api_key=api_key)

  from .autonotebook import tqdm as notebook_tqdm


# Pinecone quickstart

With Pinecone you can create a vector index where you can easily store and search through your vector embeddings.

In [3]:
# Giving our index a name
index_name = "hello-pinecone"

In [4]:
# Delete the index if an index of the same name already exists
if pc.has_index(name=index_name):
    pc.delete_index(name=index_name)

### Creating a Pinecone Index

When creating the index we need to define several configuration properties. 

- `name` can be anything we like. The name is used as an identifier for the index when performing other operations such as `describe_index`, `delete_index`, and so on. 
- `metric` specifies the similarity metric that will be used later when you make queries to the index.
- `dimension` should correspond to the dimension of the dense vectors produced by your embedding model. In this quick start, we are using made-up data so a small value is simplest.
- `spec` holds a specification which tells Pinecone how you would like to deploy our index. You can find a list of all [available providers and regions here](https://docs.pinecone.io/troubleshooting/available-cloud-regions).

There are more configurations available, but this minimal set will get us started.

In [5]:
from pinecone import ServerlessSpec, CloudProvider, AwsRegion, Metric

pc.create_index(
    name=index_name,
    metric=Metric.COSINE,
    dimension=3,
    spec=ServerlessSpec(cloud=CloudProvider.AWS, region=AwsRegion.US_EAST_1),
)

{
    "name": "hello-pinecone",
    "metric": "cosine",
    "host": "hello-pinecone-dojoi3u.svc.aped-4627-b74a.pinecone.io",
    "spec": {
        "serverless": {
            "cloud": "aws",
            "region": "us-east-1"
        }
    },
    "status": {
        "ready": true,
        "state": "Ready"
    },
    "vector_type": "dense",
    "dimension": 3,
    "deletion_protection": "disabled",
    "tags": null
}

We can look up the configuration for the index anytime we like by using `describe_index`

In [6]:
description = pc.describe_index(name=index_name)
description

{
    "name": "hello-pinecone",
    "metric": "cosine",
    "host": "hello-pinecone-dojoi3u.svc.aped-4627-b74a.pinecone.io",
    "spec": {
        "serverless": {
            "cloud": "aws",
            "region": "us-east-1"
        }
    },
    "status": {
        "ready": true,
        "state": "Ready"
    },
    "vector_type": "dense",
    "dimension": 3,
    "deletion_protection": "disabled",
    "tags": null
}

## Upserting data into the index

We can see the index ready. Now we will create some simple vectors that will serve as our examples.

In [7]:
# Instantiate an Index client
index = pc.Index(host=description.host)

In [8]:
import random
import pandas as pd


def create_simulated_data_in_df(num_vectors):
    df = pd.DataFrame(
        data={
            "id": [f"id-{i}" for i in range(num_vectors)],
            "vector": [
                [random.random() for i in range(description.dimension)]
                for _ in range(num_vectors)
            ],
        }
    )
    return df


df = create_simulated_data_in_df(10)

df.head()

Unnamed: 0,id,vector
0,id-0,"[0.19415077620011345, 0.12315138914527213, 0.9..."
1,id-1,"[0.8274728097660323, 0.8350750339818135, 0.961..."
2,id-2,"[0.9630530808168708, 0.46559222532176947, 0.04..."
3,id-3,"[0.5879443851815274, 0.5590457108385455, 0.924..."
4,id-4,"[0.6104298712136548, 0.2665978264705289, 0.858..."


We perform `upsert` operations in our index. This call will insert a new vector in the index or update the vector if the id was already present.

In [9]:
index.upsert(vectors=zip(df.id, df.vector))  # insert vectors

{'upserted_count': 10}

In [10]:
import time


def is_fresh(index):
    stats = index.describe_index_stats()
    vector_count = stats.total_vector_count
    print(f"Vector count: ", vector_count)
    return vector_count > 0


while not is_fresh(index):
    # It takes a few moments for vectors we just upserted
    # to become available for querying
    time.sleep(5)

Vector count:  0
Vector count:  0
Vector count:  0
Vector count:  0
Vector count:  0
Vector count:  0
Vector count:  0
Vector count:  0
Vector count:  0
Vector count:  10


In [11]:
# View index stats
index.describe_index_stats()

{'dimension': 3,
 'index_fullness': 0.0,
 'metric': 'cosine',
 'namespaces': {'': {'vector_count': 10}},
 'total_vector_count': 10,
 'vector_type': 'dense'}

## Running a query

Next we can run a query.

In a more realistic scenario, the `vector` values passing into `query` would be an embedding vector of something meaningful. But for this simple walkthrough we will use made up values. The query will succeed as long as the dimension matches the dimension of our index.

`top_k` specifies the number of results we would like returned. The method will return up to `top_k` results, but may be less if there are fewer than `top_k` vectors in your index or if all indexes have been filtered out using metadata filters.

In [12]:
# In a more realistic scenario, this would be an embedding vector
# that encodes something meaningful. For this simple demo, we will
# make up a vector that matches the dimension of our index.
query_embedding = [2.0, 2.0, 2.0]

index.query(vector=query_embedding, top_k=5, include_values=True)

{'matches': [{'id': 'id-1',
              'score': 0.997445405,
              'values': [0.827472806, 0.835075, 0.961279154]},
             {'id': 'id-3',
              'score': 0.972650886,
              'values': [0.587944388, 0.559045732, 0.924576044]},
             {'id': 'id-9',
              'score': 0.958372176,
              'values': [0.323696256, 0.569944143, 0.704567373]},
             {'id': 'id-4',
              'score': 0.921065927,
              'values': [0.610429883, 0.266597837, 0.858841419]},
             {'id': 'id-7',
              'score': 0.906878114,
              'values': [0.2434071, 0.738737464, 0.967846]}],
 'namespace': '',
 'usage': {'read_units': 1}}

## Delete the Index
Delete the index once you are sure that you do not want to use it anymore. **Deletion is permanent**. Once the index is deleted, you cannot use it again.

In [13]:
pc.delete_index(name=index_name)