[![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/search/semantic-search/hello-pinecone-aws.ipynb)

# Hello, Pinecone!

This notebook will walk through the steps to get a simple Pinecone index up and running on AWS.


## Prerequisites

Install dependencies.

In [1]:
!pip install -qU \
  pinecone==6.0.1 \
  pandas==2.2.2

## Initializing the Index

Now we need a place to store these embeddings and enable a efficient vector search through them all. To do that we use Pinecone, we can get a [free API key](https://app.pinecone.io/) and enter it below where we will initialize our connection to Pinecone and create a new index.

In [None]:
import os
from pinecone import Pinecone

# initialize connection to pinecone (get API key at app.pinecone.io)
api_key = os.environ.get('PINECONE_API_KEY') or 'PINECONE_API_KEY'

# configure client
pc = Pinecone(api_key=api_key)

## Pinecone quickstart

With Pinecone you can create a vector index where you can store and search through your vectors.

In [6]:
pc.list_indexes().names()

['hybrid-test', 'index', 'rerankers']

In [4]:
# Giving our index a name
index_name = "hello-pinecone"

In [5]:
# Delete the index, if an index of the same name already exists
if pc.has_index(name=index_name):
    pc.delete_index(name=index_name)

### Creating a Pinecone Index

When creating the index we need to define several configuration properties. 

- `name` can be anything we like. The name is used as an identifier for the index when performing other operations such as `describe_index`, `delete_index`, and so on. 
- `metric` specifies the similarity metric that will be used later when you make queries to the index.
- `dimension` should correspond to the dimension of the dense vectors produced by your embedding model. In this quick start, we are using made-up data so a small value is simplest.
- `spec` holds a specification which tells Pinecone how you would like to deploy our index. You can find a list of all [available providers and regions here](https://docs.pinecone.io/docs/projects).

There are more configurations available, but this minimal set will get us started.

In [7]:
from pinecone import ServerlessSpec, CloudProvider, AwsRegion, Metric

pc.create_index(
    name=index_name,
    metric=Metric.COSINE,
    dimension=3,
    spec=ServerlessSpec(
        cloud=CloudProvider.AWS, 
        region=AwsRegion.US_EAST_1
    )
)

{
    "name": "hello-pinecone",
    "metric": "cosine",
    "host": "hello-pinecone-96ix5ds.svc.aped-4627-b74a.pinecone.io",
    "spec": {
        "serverless": {
            "cloud": "aws",
            "region": "us-east-1"
        }
    },
    "status": {
        "ready": true,
        "state": "Ready"
    },
    "vector_type": "dense",
    "dimension": 3,
    "deletion_protection": "disabled",
    "tags": null
}

In [9]:
index = pc.Index(name=index_name)

We have the index ready. Now we will create some simple vectors that will serve as our examples.

In [11]:
import pandas as pd

df = pd.DataFrame(
    data={
        "id": ["A", "B"],
        "vector": [[1., 1., 1.], [1., 2., 3.]]
    })
df

Unnamed: 0,id,vector
0,A,"[1.0, 1.0, 1.0]"
1,B,"[1.0, 2.0, 3.0]"


We perform upsert operations in our index. This call will insert a new vector in the index or update the vector if the id was already present.

In [12]:
# insert vectors
index.upsert(
    vectors=zip(df.id, df.vector)
    )

{'upserted_count': 2}

In [13]:
index.describe_index_stats()

{'dimension': 3,
 'index_fullness': 0.0,
 'metric': 'cosine',
 'namespaces': {},
 'total_vector_count': 0,
 'vector_type': 'dense'}

In [14]:
# returns top_k matches
index.query(
    vector=[2., 2., 2.],
    top_k=1,
    include_values=True
)

{'matches': [], 'namespace': '', 'usage': {'read_units': 1}}

## Delete the Index
Delete the index once you are sure that you do not want to use it anymore. Once the index is deleted, you cannot use it again.

In [16]:
pc.delete_index(name=index_name)

---