# Lab #2
1. Install dependencies
2. Create a wrongly sized pinecone index - s1.x2 should be s1.x1
3. Insert data and get statistics about your index
4. Create a backup(aka collection) and delete the misconfigured index
5. Restore the index - s1.x1 with high cardinality meta-data filter exclusion
6. Query for top_k=10 with meta-data filter
7. TEARDOWN: Delete the index and backup(aka collection)

# 1. Install Pinecone client 
Use the following shell command to install Pinecone:

In [None]:
!pip install -U "pinecone-client[grpc]" "python-dotenv"

# 2. Create a wrongly sized pinecone index - s1.x2 should be s1.x1

* To use Pinecone, you must have an API key. To find your API key, open the [Pinecone console](https://app.pinecone.io/organizations/-NF9xx-MFLRfp0AAuCon/projects/us-east4-gcp:55a4eee/indexes) and click API Keys. This view also displays the environment for your project. Note both your API key and your environment.
* Create a .env file and make sure the following properties are specified

In [None]:
import os
from dotenv import load_dotenv

load_dotenv('.env')

PINECONE_API_KEY = os.environ['PINECONE_API_KEY']
PINECONE_ENVIRONMENT = os.environ['PINECONE_ENVIRONMENT']
PINECONE_INDEX_NAME = os.environ['PINECONE_INDEX_NAME']
PINECONE_COLLECTION_NAME = PINECONE_INDEX_NAME
DIMENSIONS = int(os.environ['DIMENSIONS'])
METRIC = os.environ['METRIC']

# print all of values to verify
print(f"PINECONE_API_KEY: {PINECONE_API_KEY}")
print(f"PINECONE_ENVIRONMENT: {PINECONE_ENVIRONMENT}")
print(f"PINECONE_INDEX_NAME: {PINECONE_INDEX_NAME}")
print(f"PINECONE_COLLECTION_NAME: {PINECONE_COLLECTION_NAME}")
print(f"DIMENSIONS: {DIMENSIONS}")
print(f"METRIC: {METRIC}")


In [None]:
# initialize connection to pinecone
import pinecone

pinecone.init(api_key=PINECONE_API_KEY, environment=PINECONE_ENVIRONMENT)
pinecone.create_index(PINECONE_INDEX_NAME, dimension=DIMENSIONS, metric=METRIC, pods=1, replicas=1, pod_type="s1.x2")
pinecone.list_indexes()

# 3. Insert data and get statistics about your index

* The upsert operation inserts a new vector in the index or updates the vector if a vector with the same ID is already present.
* The following commands upserts a large batch of vectors with meta-data into your index.

In [None]:
import numpy as np
import lab_utils as lu

index = pinecone.Index(PINECONE_INDEX_NAME)
index.upsert(lu.generate_vectors(DIMENSIONS))
pinecone.describe_index(PINECONE_INDEX_NAME)

Notice above where it says metadata_config=None. We are going to change that when we create the new index.

# 4. Create a backup(aka collection) and delete the misconfigured index

In [None]:
import time
pinecone.create_collection(name=PINECONE_COLLECTION_NAME, source=PINECONE_INDEX_NAME)

while pinecone.describe_collection(name=PINECONE_COLLECTION_NAME).status != "Ready":
    print("collection initializing, please hold...")
    time.sleep(10)
print(pinecone.describe_collection(name=PINECONE_COLLECTION_NAME))

pinecone.delete_index(PINECONE_INDEX_NAME)

### WARNING: You must wait for the collection to be 'READY' before moving on

# 5. Restore the index - s1.x1 with high cardinality meta-data filter exclusion

Create a new index with metadata_config and right sizing (scale down) using the PINECONE_COLLECTION_NAME as the source

In [None]:
# check if index already exists (it shouldn't because we just deleted it)
if PINECONE_INDEX_NAME not in pinecone.list_indexes():
    # if does not exist, create index
    pinecone.create_index(
        PINECONE_INDEX_NAME,
        dimension=DIMENSIONS,
        metric=METRIC,
        replicas=1,
        pods=1,
        pod_type='s1.x1',
        source_collection=PINECONE_COLLECTION_NAME,
        metadata_config={"indexed": ["category"]} # all other fields will be stored-only. You can put a dummy value here as a place holder if you have no fields that need to be indexed
    )

In [None]:
# Describe index
pinecone.describe_index(PINECONE_INDEX_NAME)

Notice now it says metadata_config={'indexed': ['category']}

This will result in the metadata field 'category' being indexed. This means that you can use it in queries. All other fields will be stored-only. This means that you can retrieve them, but you cannot use them in queries.

We have also resized the index to s1.x1 again to bring the pod count down to appropriate size in this case. 

# 6. Query for top_k=10 with meta-data filter

BONUS: You can add a filter for "score", but since only "category" is indexed, adding a "score" filter should make the query return 0 results.

In [None]:
embedding = np.full(DIMENSIONS,0.5).tolist()

index.query(
  vector = embedding,
  top_k=10,
  include_values=False,
  include_metadata=True,
  filter={
        "category": {"$eq": "one"}
  },)

# 7. TEARDOWN: Delete the index and backup(aka collection)
# WARNING: This next step will delete the PINECONE_INDEX_NAME index and all data in it. DO NOT RUN THIS UNTIL YOU ARE READY OR MANUALLY REMOVE THE INDEX INSTEAD!!! 

In [None]:
if PINECONE_INDEX_NAME in pinecone.list_indexes():
    pinecone.delete_index(PINECONE_INDEX_NAME)
if PINECONE_COLLECTION_NAME in pinecone.list_collections():
    pinecone.delete_collection(PINECONE_COLLECTION_NAME)
    
pinecone.list_indexes()
pinecone.list_collections()