# Developer quickstart

## Initialize a client
Initialize a client connection to Pinecone and begin managing your data.

### Install the SDK
Use an official Pinecone SDK for convenient access to the Pinecone API:

In [1]:
# !pip install pinecone

### Uninstall pinecone-client (DEPRECATED)
- https://pypi.org/project/pinecone-client/
- https://pypi.org/project/pinecone/

In [2]:
# !pip show pinecone
# !pip show pinecone-client

# !pip uninstall pinecone-client pinecone
# !pip install pinecone

### Get your API key
You need an API key to make API calls to your Pinecone project.

In [3]:
import os
from dotenv import load_dotenv

In [4]:
# Load environment variables from .env file
load_dotenv()

pinecone_api_key = os.getenv('PINECONE_API_KEY')
pinecone_environment = os.getenv('PINECONE_ENVIRONMENT')

# print(pinecone_api_key)
print(pinecone_environment)  # region

us-east-1


### Initialize a client
Using your API key, initialize your client connection to Pinecone:

In [5]:
from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key=pinecone_api_key)

In [6]:
print(pc.list_indexes().names())

[]


## Upsert data
Now that you are connected to Pinecone, the next critical step is to set up an index to store your data.

### Create a serverless index
An index defines the dimension of vectors to be stored and the similarity metric to be used when querying them.

Create a serverless index with a dimension and similarity metric based on the embedding model you’ll use to create the vector embeddings:

In [7]:
index_name = "quickstart"

pc.create_index(
    name=index_name,
    dimension=1024, # Replace with your model dimensions
    metric="cosine", # Replace with your model metric
    spec=ServerlessSpec(
        cloud="aws",
        # region="us-east-1"
        region=pinecone_environment # "us-east-1"
    ) 
)

{
    "name": "quickstart",
    "metric": "cosine",
    "host": "quickstart-4dhlx64.svc.aped-4627-b74a.pinecone.io",
    "spec": {
        "serverless": {
            "cloud": "aws",
            "region": "us-east-1"
        }
    },
    "status": {
        "ready": true,
        "state": "Ready"
    },
    "vector_type": "dense",
    "dimension": 1024,
    "deletion_protection": "disabled",
    "tags": null
}

### Create vector embeddings
A vector embedding is a series of numerical values that represent the meaning and relationships of words, sentences, and other data.

Use Pinecone Inference to generate embeddings from sentences related to the word "apple":

In [8]:
data = [
    {"id": "vec1", "text": "Apple is a popular fruit known for its sweetness and crisp texture."},
    {"id": "vec2", "text": "The tech company Apple is known for its innovative products like the iPhone."},
    {"id": "vec3", "text": "Many people enjoy eating apples as a healthy snack."},
    {"id": "vec4", "text": "Apple Inc. has revolutionized the tech industry with its sleek designs and user-friendly interfaces."},
    {"id": "vec5", "text": "An apple a day keeps the doctor away, as the saying goes."},
    {"id": "vec6", "text": "Apple Computer Company was founded on April 1, 1976, by Steve Jobs, Steve Wozniak, and Ronald Wayne as a partnership."}
]

embeddings = pc.inference.embed(
    model="multilingual-e5-large",
    inputs=[d['text'] for d in data],
    parameters={"input_type": "passage", "truncate": "END"}
)
print(embeddings[0])

{'vector_type': dense, 'values': [0.04931640625, -0.01328277587890625, ..., -0.0196380615234375, -0.010955810546875]}


### Upsert data
Upsert the six generated vector embeddings into a new ns1 namespace in your index:

In [9]:
# Wait for the index to be ready
while not pc.describe_index(index_name).status['ready']:
    time.sleep(1)

index = pc.Index(index_name)

vectors = []
for d, e in zip(data, embeddings):
    vectors.append({
        "id": d['id'],
        "values": e['values'],
        "metadata": {'text': d['text']}
    })

index.upsert(
    vectors=vectors,
    namespace="ns1"
)

{'upserted_count': 6}

### Check the index
Use the describe_index_stats operation to check if the current vector count matches the number of vectors you upserted (6):

In [10]:
print(index.describe_index_stats())

{'dimension': 1024,
 'index_fullness': 0.0,
 'metric': 'cosine',
 'namespaces': {'ns1': {'vector_count': 6}},
 'total_vector_count': 6,
 'vector_type': 'dense'}


## Query
Search through the data to find items that are semantically similar to a query vector.

### Create a query vector
Use Pinecone Inference to convert a question about the tech company "Apple" into a query vector:

In [11]:
query = "Tell me about the tech company known as Apple."

embedding = pc.inference.embed(
    model="multilingual-e5-large",
    inputs=[query],
    parameters={
        "input_type": "query"
    }
)

### Run a similarity search
Query the ns1 namespace for the three vectors that are most similar to the query vector, i.e., the vectors that represent the most relevant answers to your question:

In [13]:
results = index.query(
    namespace="ns1",
    vector=embedding[0].values,
    top_k=3,
    include_values=False,
    include_metadata=True
)

print(results)

{'matches': [{'id': 'vec2',
              'metadata': {'text': 'The tech company Apple is known for its '
                                   'innovative products like the iPhone.'},
              'score': 0.872664154,
              'values': []},
             {'id': 'vec4',
              'metadata': {'text': 'Apple Inc. has revolutionized the tech '
                                   'industry with its sleek designs and '
                                   'user-friendly interfaces.'},
              'score': 0.851996362,
              'values': []},
             {'id': 'vec6',
              'metadata': {'text': 'Apple Computer Company was founded on '
                                   'April 1, 1976, by Steve Jobs, Steve '
                                   'Wozniak, and Ronald Wayne as a '
                                   'partnership.'},
              'score': 0.850099862,
              'values': []}],
 'namespace': 'ns1',
 'usage': {'read_units': 6}}


In [14]:
print(pc.list_indexes().names())

['quickstart']
