# Lab #1
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/basic-operations-workshop/blob/main/lab1.ipynb)
1. Install pinecone client
2. Initialize Pinecone client and create your first index
3. Insert vectors and get statistics about your index
4. Query for top_k=10 with meta-data filter
5. TEARDOWN: Delete the index

# 1. Install Pinecone client 
Use the following shell command to install Pinecone:

In [7]:
!pip install -U "pinecone-client[grpc]" "python-dotenv"

try:
    import pinecone
    import dotenv
    import numpy
    print("SUCCESS: lab dependencies are installed.")
except ImportError as ie:
    print(f"ERROR: key deendencies are not installed: {ie}")


SUCCESS: lab dependencies are installed.


# 2. Initialize Pinecone client and create your first index

* To use Pinecone, you must have an API key. To find your API key, open the [Pinecone console](https://app.pinecone.io/organizations/-NF9xx-MFLRfp0AAuCon/projects/us-east4-gcp:55a4eee/indexes) and click API Keys. This view also displays the environment for your project. Note both your API key and your environment.
* Create a .env file and make sure the following properties are specified

```
PINECONE_API_KEY=[YOUR_PINECONE_API_KEY]
PINECONE_ENVIRONMENT=[YOUR_PINECONE_ENVIRONMENT]
PINECONE_INDEX_NAME=[YOUR_INDEX_NAME]
DIMENSIONS="768"
METRIC="euclidean"
```

* It will take roughly 1 minute to create your index. Once completed a list of all project indexes will be printed.

In [8]:
import os

from dotenv import load_dotenv
load_dotenv('.env')

PINECONE_INDEX_NAME = os.environ['PINECONE_INDEX_NAME']
PINECONE_API_KEY = os.environ['PINECONE_API_KEY']
PINECONE_ENVIRONMENT = os.environ['PINECONE_ENVIRONMENT']
DIMENSIONS = int(os.environ['DIMENSIONS'])
METRIC = os.environ['METRIC']

# print all of values to verify
print(f"PINECONE_INDEX_NAME: {PINECONE_INDEX_NAME}")
print(f"PINECONE_ENVIRONMENT: {PINECONE_ENVIRONMENT}")
print(f"PINECONE_API_KEY: {PINECONE_API_KEY}")
print(f"DIMENSIONS: {DIMENSIONS}")
print(f"METRIC: {METRIC}")

PINECONE_INDEX_NAME: james-williams
PINECONE_ENVIRONMENT: us-east4-gcp
PINECONE_API_KEY: 67e40045-9cca-486e-b1ce-1ad1a784d793
DIMENSIONS: 512
METRIC: euclidean


In [35]:
import pinecone

pinecone.init(api_key=PINECONE_API_KEY, environment=PINECONE_ENVIRONMENT)

if (PINECONE_INDEX_NAME in pinecone.list_indexes()) != True:  
    pinecone.create_index(PINECONE_INDEX_NAME, dimension=DIMENSIONS, metric=METRIC, pods=1, replicas=1, pod_type="s1.x1")
else:
    print(f"Index {PINECONE_INDEX_NAME} already exists")

print(f"Index Description: {pinecone.describe_index(name=PINECONE_INDEX_NAME)}")

Index Description: IndexDescription(name='james-williams', metric='euclidean', replicas=1, dimension=512.0, shards=1, pods=1, pod_type='s1.x1', status={'ready': True, 'state': 'Ready'}, metadata_config=None, source_collection='')


# 3. Insert vectors and get statistics about your index

* The upsert operation inserts a new vector in the index or updates the vector if a vector with the same ID is already present.
* The following commands upserts a large batch of vectors with meta-data into your index.

In [38]:
import numpy as np
import random
import time

def generate_vectors(dimensions):
    vectors = []
    id_seed = 1
    value_seed = 0.1

    for _ in range(500):
        meta_data = {"category": random.choice(["one", "two", "three"]),
                     "timestamp": time.time()}
        embeddings = np.full(shape=dimensions, fill_value=value_seed).tolist()
        vectors.append({'id': str(id_seed),
                        'values': embeddings,
                        'metadata': meta_data})
        id_seed = id_seed + 1
        value_seed = value_seed + 0.1
    return vectors

index = pinecone.Index(PINECONE_INDEX_NAME)
index.upsert(generate_vectors(DIMENSIONS))
index.describe_index_stats()
print(f"Index Stats: {index.describe_index_stats()}")

Index Stats: {'dimension': 512,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 500}},
 'total_vector_count': 500}


# 4. Query for top_k=10 with meta-data filter

The following example queries the index for the vectors that are most similar to the embedding and match the category filter.

In [48]:
embedding = np.full(DIMENSIONS,0.5).tolist()

query_results = index.query(
  vector = embedding,
  top_k=10,
  include_values=False,
  include_metadata=True,
  filter={
        "category": {"$eq": "one"}
  },).matches
print(f"Query results: {query_results}")

Query results: [{'id': '5',
 'metadata': {'category': 'one', 'timestamp': 1691440549.641388},
 'score': 0.0,
 'values': []}, {'id': '4',
 'metadata': {'category': 'one', 'timestamp': 1691440549.641373},
 'score': 5.1199646,
 'values': []}, {'id': '6',
 'metadata': {'category': 'one', 'timestamp': 1691440549.641402},
 'score': 5.11999512,
 'values': []}, {'id': '7',
 'metadata': {'category': 'one', 'timestamp': 1691440549.6414149},
 'score': 20.480011,
 'values': []}, {'id': '8',
 'metadata': {'category': 'one', 'timestamp': 1691440549.641431},
 'score': 46.0799561,
 'values': []}, {'id': '9',
 'metadata': {'category': 'one', 'timestamp': 1691440549.641448},
 'score': 81.920166,
 'values': []}, {'id': '11',
 'metadata': {'category': 'one', 'timestamp': 1691440549.641475},
 'score': 184.319946,
 'values': []}, {'id': '12',
 'metadata': {'category': 'one', 'timestamp': 1691440549.6414988},
 'score': 250.880066,
 'values': []}, {'id': '13',
 'metadata': {'category': 'one', 'timestamp': 169

# 5. TEARDOWN: Delete the index

Free up project pod resources by deleting this index. It is no longer needed.

In [34]:
pinecone.delete_index(PINECONE_INDEX_NAME)