# Lab #3 
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/basic-operations-workshop/blob/main/lab3.ipynb)
1. Install dependencies
2. Create a pinecone index 
3. Load public image dataset(fashion-mnist) and create vector embeddings from the dataset
4. Create a local parquet backup of your image embeddings
5. Insert the fashion-mnist embeddings into Pinecone
6. Run a nearest neighbor search on a sample image that is not in the training dataset
7. Run a nearest neighbor search on 100 random test images that are not in the training dataset
8. Run a load test script to simulate 10 concurrent users querying the index
9. TEARDOWN: Delete the index 

# 1. Install Pinecone client 
Use the following shell command to install Pinecone:

In [1]:
!pip install -U "pinecone-client[grpc]" "python-dotenv" "torch" "torchvision" "pillow" "ftfy" "regex" "git+https://github.com/openai/clip.git" "datasets" "locust"

try:
    import pinecone
    import dotenv
    import numpy
    import torch
    import clip
    import datasets
    print("SUCCESS: lab dependencies are installed.")
except ImportError as ie:
    print(f"ERROR: key deendencies are not installed: {ie}")

Collecting git+https://github.com/openai/clip.git
  Cloning https://github.com/openai/clip.git to /private/var/folders/n7/j7krsnmx3wl7_bjrwhx2z7ym0000gn/T/pip-req-build-ob81lah7
  Running command git clone --filter=blob:none --quiet https://github.com/openai/clip.git /private/var/folders/n7/j7krsnmx3wl7_bjrwhx2z7ym0000gn/T/pip-req-build-ob81lah7
  Resolved https://github.com/openai/clip.git to commit a1d071733d7111c9c014f024669f959182114e33
  Preparing metadata (setup.py) ... [?25ldone


  from tqdm.autonotebook import tqdm


SUCCESS: lab dependencies are installed.


# 2. Create a pinecone index 
We will create an index that will be used to load/query a hugging face dataset.

In [4]:
from dotenv import load_dotenv
import os
import pinecone

load_dotenv('.env')

PINECONE_INDEX_NAME = os.environ['PINECONE_INDEX_NAME']
PINECONE_API_KEY = os.environ['PINECONE_API_KEY']
PINECONE_ENVIRONMENT = os.environ['PINECONE_ENVIRONMENT']
METRIC = os.environ['METRIC']
DIMENSIONS = int(os.environ['DIMENSIONS'])
INDEX_NAMESPACE = os.environ['INDEX_NAMESPACE']

pinecone.init(api_key=PINECONE_API_KEY, environment=PINECONE_ENVIRONMENT)

if (PINECONE_INDEX_NAME in pinecone.list_indexes()) != True:  
    pinecone.create_index(PINECONE_INDEX_NAME, dimension=DIMENSIONS, metric=METRIC, pods=1, replicas=1, pod_type="s1.x1")
else:
    print(f"Index {PINECONE_INDEX_NAME} already exists")

print(f"Index Description: {pinecone.describe_index(name=PINECONE_INDEX_NAME)}")

Index Description: IndexDescription(name='james-williams', metric='euclidean', replicas=1, dimension=512.0, shards=1, pods=1, pod_type='s1.x1', status={'ready': True, 'state': 'Ready'}, metadata_config=None, source_collection='')


# 3. Load public image dataset(fashion-mnist) and create vector embeddings from the dataset

Use the following shell command to download the [fashion-mnist](https://huggingface.co/datasets/fashion_mnist) training dataset from Hugging Face so that we can create vector embeddings that uses a label(image class) as meta-data from this dataset. The meta-data labels mappings are:

| Label  | Description |
| ------ | ----------- |
| 0      | T-shirt/top |
| 1      | Trouser     |
| 2      | Pullover    |
| 3      | Dress       |
| 4      | Coat        |
| 5      | Sandal      |
| 6      | Shirt       |
| 7      | Sneaker     |
| 8      | Bag         |
| 9      | Ankle boot  |

The Fashion-MNIST dataset is a dataset of Zalando's article images, consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.

The accuracy you can achieve depends on the model and the preprocessing steps you use. Here's a rough guideline for what you might expect with some classic machine learning algorithms:

1. **Random Forest:** Around 85-89% accuracy.
2. **Support Vector Machines (SVM):** Around 85-90% accuracy, depending on kernel and hyperparameters.
3. **k-Nearest Neighbors (k-NN):** Around 85-88% accuracy.
4. **Logistic Regression:** Around 82-85% accuracy.
5. **Gradient Boosting Machines (e.g., XGBoost):** Around 87-90% accuracy.

Keep in mind these numbers are approximate and can vary based on the exact preprocessing, feature extraction, and hyperparameter tuning you do. In general, deep learning models, especially Convolutional Neural Networks (CNNs), tend to perform better on image classification tasks like Fashion-MNIST, potentially reaching over 90-95% accuracy.

But for classic machine learning models, anything in the 85-90% range can be considered a reasonable result for the Fashion-MNIST dataset. It reflects a model that has learned something meaningful from the data but isn't necessarily state-of-the-art for this particular task.

In [5]:
from datasets import load_dataset
from tqdm.auto import tqdm  # progress bar
from PIL import Image
import torch
import clip
import time
import numpy as np

#  Load the fashion-mnist dataset - only retrieve 6K random images (10% of total dataset)
dataset = load_dataset("fashion_mnist")['train'].shuffle(seed=42).select(range(0,6000))
#dataset = load_dataset("fashion_mnist")['train']

label_descriptions = {0: "T-shirt/top", 
           1: "Trouser",
           2: "Pullover",
           3: "Dress",
           4: "Coat",
           5: "Sandal",
           6: "Shirt",
           7: "Sneaker",
           8: "Bag",
           9: "Ankle boot"}

# Check to see if GPU is aviailable
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load('ViT-B/32', device=device)
   
# Generate vector embeddings for each image in the dataset
id = 0
vectors = []
for img in tqdm(dataset, total=dataset.num_rows, desc='Images', position=0):
    with torch.no_grad():
        id += 1
        image_pp = preprocess(img['image']).unsqueeze(0).to(device)
        image_features = model.encode_image(image_pp)
        embedding = image_features.cpu().numpy().squeeze().tolist()
        meta_data = {"description": label_descriptions[img["label"]], "timestamp": time.time()}
        vectors.append({'id': str(id),
                        'values': embedding,
                        'metadata': meta_data})

Images: 100%|██████████| 6000/6000 [03:34<00:00, 27.91it/s]


# 4. Create a local parquet backup of your image embeddings

This is good practice because generating embeddings can be expensive and time consuming when calling hosted models like OpenAI. As you can see, even locally generated embeddings are time consuming.

In [17]:
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
from IPython.display import display

df = pd.DataFrame(vectors)

#dipslay the first 5 rows of the dataframe
pd.set_option('display.max_colwidth', None)
display(df.head())

df.to_parquet('fashion-mnist-clip.parquet')

Unnamed: 0,id,values,metadata
0,1,"[0.008759277872741222, -0.20710694789886475, -0.3265663683414459, 0.05358453840017319, 0.34647610783576965, -0.3348323106765747, 0.16168484091758728, 0.9060096740722656, 0.3548886775970459, -0.04384024068713188, 0.08166541904211044, 0.2393338978290558, 0.6599251627922058, -0.32549598813056946, -0.2640305757522583, -0.072362519800663, 0.4160095453262329, 0.2451312243938446, 0.04245516657829285, 0.3809719681739807, -0.5581574440002441, 0.4062442183494568, 0.35916459560394287, -0.026508750393986702, -0.2544264793395996, -0.14446818828582764, -0.26534393429756165, -0.028064025565981865, 0.13279257714748383, -0.31222209334373474, 0.026539580896496773, -0.07840023189783096, 0.17045815289020538, -0.07644832134246826, -0.4124664068222046, -0.6990364193916321, -0.07593540847301483, 0.054352905601263046, -0.5394606590270996, -0.3521934747695923, -0.6636286973953247, -0.19592992961406708, -0.27161911129951477, -0.2148979753255844, -0.2216959297657013, 0.4042796194553375, -0.25046131014823914, 0.07209423184394836, 0.3029167950153351, -0.2142917960882187, 0.29765626788139343, 0.2673562467098236, 0.36764630675315857, -0.12876002490520477, -0.06875656545162201, 0.04690822958946228, 0.34548094868659973, -0.22725483775138855, -0.23107187449932098, -0.08655727654695511, 0.35345324873924255, -0.06281877309083939, -0.05350608006119728, -0.5631681084632874, 0.03890611603856087, 0.20459145307540894, 0.03106505237519741, -0.4141904413700104, -0.11418329179286957, -0.17679740488529205, -0.15880605578422546, 0.011449756100773811, 0.6305040717124939, -0.19833338260650635, -0.2886725068092346, 0.2577667534351349, 0.1635541468858719, 0.01885959319770336, 0.008384674787521362, -0.6606494188308716, -0.1398177146911621, -0.14677472412586212, 0.3102433383464813, 0.10178500413894653, 0.311718612909317, 0.19127772748470306, -0.19770747423171997, 0.061263423413038254, -0.29144954681396484, -0.2858257293701172, 0.2098928987979889, -0.3228234350681305, -8.724929809570312, 0.23969081044197083, 0.46594494581222534, 0.094818115234375, -0.09828225523233414, -0.08568913489580154, -0.11303391307592392, -0.43053391575813293, ...]","{'description': 'Dress', 'timestamp': 1691504808.082121}"
1,2,"[0.18876351416110992, 0.06803330779075623, -0.3696630001068115, -0.14380639791488647, -0.02326127700507641, 0.13572858273983002, 0.10369477421045303, 0.8926424384117126, 0.33766523003578186, 0.32612326741218567, 0.02722463198006153, 0.23189237713813782, 0.3104938566684723, -0.2723168432712555, -0.4215393662452698, 0.031867191195487976, 0.6063439846038818, 0.3050440847873688, 0.05424952134490013, -0.0072557092644274235, -0.5087708830833435, 0.4959316849708557, 0.0006730547174811363, -0.08289677649736404, -0.08398040384054184, 0.186918705701828, 0.04200270399451256, -0.1423955261707306, 0.22204463183879852, -0.040548380464315414, 0.26215046644210815, -0.40756309032440186, -0.09166142344474792, 0.16716548800468445, -0.3270414471626282, -0.7162606120109558, 0.11883895844221115, -0.038985736668109894, -0.2603166699409485, -0.4541040360927582, -0.6146935224533081, -0.44350990653038025, -0.28967905044555664, -0.32448065280914307, -0.21425308287143707, 1.066301703453064, -0.04915701597929001, 0.32419881224632263, 0.16908292472362518, -0.31009650230407715, 0.13615752756595612, 0.2629549503326416, 0.2929893732070923, 0.09002819657325745, 0.11788462847471237, 0.08777409046888351, 0.446664035320282, -0.13061974942684174, -0.3295435607433319, -0.06152321770787239, 0.6601959466934204, -0.0014343899674713612, -0.06366293877363205, -0.5479024648666382, -0.1107802614569664, 0.0633939728140831, -0.1153319701552391, -0.10133379697799683, -0.1450570672750473, -0.1259893774986267, -0.13065451383590698, -0.06449749320745468, 0.5432891845703125, -0.23775529861450195, -0.3399951756000519, 0.028745664283633232, 0.30851590633392334, 0.048755671828985214, -0.005466883536428213, -0.39040830731391907, -0.05489792302250862, -0.08486843854188919, 0.17055505514144897, -0.08973710983991623, 0.18062835931777954, 0.21384674310684204, 0.4122605621814728, -0.10543888062238693, -0.16396668553352356, -0.34760582447052, 0.02305014617741108, -0.11443872004747391, -9.118592262268066, 0.5628551840782166, 0.3428255617618561, 0.10250519216060638, -0.1313256025314331, 0.02671143412590027, -0.04703758284449577, -0.5244741439819336, ...]","{'description': 'Trouser', 'timestamp': 1691504808.1213}"
2,3,"[-0.0689052864909172, -0.24113823473453522, -0.20118270814418793, -0.4833519458770752, 0.09727362543344498, -0.2675895094871521, -0.020339583978056908, 1.1140083074569702, -0.05232321470975876, -0.03634607419371605, 0.5994221568107605, 0.08585227280855179, 0.31650084257125854, -0.4538556635379791, -0.2566702365875244, -0.08810650557279587, 0.46170738339424133, 0.3897814452648163, 0.09176426380872726, 0.12823130190372467, -0.25821778178215027, 0.26991575956344604, -0.0813613012433052, -0.134430930018425, 0.11423317342996597, 0.10929729044437408, -0.09957825392484665, -0.1810482293367386, 0.08895909786224365, -0.23236432671546936, 0.22146423161029816, 0.019082998856902122, 0.028852149844169617, 0.10649435967206955, -0.47525933384895325, -0.4023389518260956, 0.0858968049287796, 0.16029813885688782, -0.39258435368537903, -0.4986351430416107, -0.2471800297498703, -0.25033605098724365, -0.15675035119056702, -0.25271299481391907, -0.09313449263572693, 0.0805295929312706, 0.1340564787387848, 0.05452653020620346, 0.08584167808294296, -0.42951011657714844, 0.07351091504096985, 0.291505366563797, 0.19050799310207367, -0.08113886415958405, 0.29932859539985657, 0.4376024007797241, 0.40727776288986206, -0.16221225261688232, -0.03228463605046272, -0.10865467041730881, 0.011538180522620678, -0.30818888545036316, -0.16451837122440338, -0.40158218145370483, 0.04758229851722717, 0.02026527002453804, -0.05106479674577713, -0.053193364292383194, 0.21861541271209717, -0.2972756028175354, -0.3236633539199829, -0.12276126444339752, 0.7355660796165466, -0.055913638323545456, -0.406389445066452, -0.037936337292194366, -0.02244177833199501, -0.14847588539123535, -0.03227519243955612, -0.5808702707290649, 0.04662225395441055, -0.09997803717851639, 0.28525882959365845, -0.37203115224838257, 0.48536500334739685, 0.4557044506072998, 0.5164453983306885, 0.09406036138534546, -0.2020888328552246, -0.4507751762866974, -0.18802617490291595, -0.18002259731292725, -9.416439056396484, 0.959967315196991, 0.317121297121048, -0.1809791922569275, -0.2601853609085083, 0.026448408141732216, -0.2786314785480499, -0.668617308139801, ...]","{'description': 'Sandal', 'timestamp': 1691504808.159588}"
3,4,"[0.032351866364479065, -0.038666073232889175, -0.11967869102954865, -0.06897217035293579, -0.002791015896946192, 0.20970094203948975, 0.061749640852212906, 0.6183357834815979, 0.4085032045841217, 0.02068261057138443, 0.24875719845294952, 0.41432294249534607, 0.3338814377784729, -0.006913925986737013, -0.430025190114975, 0.34311509132385254, 1.1195499897003174, 0.43254655599594116, 0.2026023119688034, -0.22285257279872894, -0.36505255103111267, 0.004285162780433893, 0.17786572873592377, -0.7703995704650879, -0.20161977410316467, 0.16758060455322266, -0.06746743619441986, -0.5621646046638489, 0.2821822166442871, -0.21033014357089996, 0.4643497169017792, -0.2004319131374359, 0.04296662658452988, -0.22127608954906464, -0.46918365359306335, -0.6742099523544312, -0.3035784959793091, 0.12907010316848755, -0.8149935603141785, 0.8003113865852356, -0.5574088096618652, -0.31190380454063416, -0.2129656970500946, -0.15413236618041992, 0.010149301961064339, -0.9657955765724182, -0.038530103862285614, 0.055279456079006195, 0.3607432544231415, -0.3506191670894623, 0.3458269536495209, 0.11574611812829971, 0.48403143882751465, 0.20651818811893463, -0.11928628385066986, 0.24215367436408997, 0.2330978512763977, 0.06286066770553589, -0.6623694896697998, 0.2713583707809448, 0.4085535407066345, -0.34743592143058777, -0.07235443592071533, -0.40269407629966736, -0.0454251803457737, 0.0944112166762352, -0.10007898509502411, -0.5058452486991882, 0.015480926260352135, -0.28707095980644226, -0.49144425988197327, 0.1372714340686798, 0.0024407340679317713, -0.1084127128124237, -0.577633798122406, -0.008835817687213421, 0.19589757919311523, -0.05672351270914078, -0.1362789422273636, 0.022641867399215698, -0.23428402841091156, -0.3679942786693573, -0.04574134573340416, 0.13944695889949799, 0.584304928779602, 0.14184124767780304, 0.717440664768219, -0.27413734793663025, -0.4442986249923706, -0.23384477198123932, 0.15787501633167267, 0.006960373837500811, -8.006991386413574, 0.3179628551006317, 0.21536323428153992, 0.10105756670236588, -0.30606406927108765, -0.4297695457935333, -0.17305879294872284, -0.1855991631746292, ...]","{'description': 'Pullover', 'timestamp': 1691504808.1948211}"
4,5,"[-0.08263376355171204, -0.38928911089897156, -0.08394722640514374, -0.5106083154678345, -0.08207684010267258, -0.11943580955266953, 0.08709175139665604, 0.8987720012664795, 0.010261411778628826, 0.029808735474944115, 0.3707394301891327, -0.040922388434410095, 0.3061405122280121, -0.31735801696777344, -0.28630179166793823, -0.26594552397727966, 0.28357577323913574, 0.23190176486968994, 0.24657593667507172, 0.11728725582361221, -0.6328873634338379, 0.3196503520011902, -0.1207200139760971, 0.13479866087436676, -0.12376412749290466, 0.02629721350967884, 0.21213266253471375, 0.019875146448612213, 0.1805470734834671, -0.3996647596359253, 0.20594412088394165, -0.4920916259288788, 0.09771454334259033, -0.02145044319331646, -0.42277029156684875, -0.6677929162979126, 0.011684575118124485, 0.0337013304233551, -0.3224979341030121, 0.06780113279819489, -0.19551607966423035, -0.31933021545410156, -0.09055880457162857, -0.24618174135684967, 0.25629717111587524, -0.009876216761767864, 0.28964537382125854, 0.039219290018081665, 0.18881544470787048, -0.5140396356582642, 0.22076252102851868, 0.15376636385917664, 0.15864083170890808, -0.16149616241455078, 0.04787571355700493, 0.39662933349609375, 0.7252520322799683, -0.1712072193622589, -0.15385127067565918, 0.09808285534381866, 0.21358722448349, -0.08756324648857117, 0.05073364078998566, -0.36503785848617554, -0.19787932932376862, -0.036247774958610535, -0.21517899632453918, -0.6125231981277466, 0.14731432497501373, -0.35193097591400146, -0.27350446581840515, 0.14344742894172668, 0.329067200422287, -0.09914449602365494, -0.32244032621383667, 0.1885375827550888, 0.1635562926530838, -0.36681780219078064, 0.016357146203517914, -0.6180407404899597, -0.14663337171077728, -0.062249407172203064, 0.17959529161453247, -0.38284048438072205, 0.3635861575603485, 0.24055302143096924, -0.06931470334529877, 0.050895169377326965, -0.19824868440628052, -0.4473084509372711, -0.34041956067085266, -0.3118037283420563, -8.997475624084473, 0.5643855929374695, 0.3810853958129883, -0.0004098073986824602, -0.4682497978210449, -0.09806080162525177, -0.09130991995334625, -0.39755672216415405, ...]","{'description': 'Ankle boot', 'timestamp': 1691504808.2317219}"


# 5. Insert the fashion-mnist embeddings into Pinecone

The best way to do bulk updates is by batching the dataset. We will also use a namespace for the data. 

In [None]:
from tqdm.auto import tqdm  # progress bar
import pinecone
import itertools

# Read Parquet file into a DataFrame
df = pd.read_parquet('fashion-mnist-clip.parquet')
df['values'] = df['values'].apply(lambda x: x.tolist())

# Convert DataFrame to a list of dictionaries
data_list = df.to_dict(orient='records')

def chunks(iterable, batch_size=100):
    """A helper function to break an iterable into chunks of size batch_size."""
    it = iter(iterable)
    chunk = tuple(itertools.islice(it, batch_size))
    while chunk:
        yield chunk
        chunk = tuple(itertools.islice(it, batch_size))

index = pinecone.Index(PINECONE_INDEX_NAME)

# Obtain the upsert embeddings in batches of 100
batch_size = 100
id = 0
for vector_batch in tqdm(chunks(data_list, batch_size=batch_size), total=(len(vectors) / batch_size)):
   index.upsert(vector_batch, namespace=INDEX_NAMESPACE)

# 6. Run a nearest neighbor search on a sample image that is not in the training dataset

Download a sneaker image file from github that we will use to run a query to see if pinecone search returns the correct description "Sneaker". A query returns the correct result if the most common top_k description matches the test image.

You can change the top_k from 10 to 1 to 20 to see if the ANN results vary.

In [None]:
import pinecone
from PIL import Image
import torch
import clip
import requests

# Check to see if GPU is aviailable
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load('ViT-B/32', device=device) 

def image_to_embedding():
    
    url = "https://github.com/pinecone-io/basic-operations-workshop/blob/main/sneaker.jpeg?raw=true"
    response = requests.get(url)
    with open("sneaker.jpeg", "wb") as file:
      file.write(response.content)
    image_pp = preprocess(Image.open("./sneaker.jpeg")).unsqueeze(0).to(device)
    with torch.no_grad():
      embedding = model.encode_image(image_pp).squeeze().tolist()
    
    return embedding

index = pinecone.Index(PINECONE_INDEX_NAME)
top_k = 10

query_result = index.query(
  vector = image_to_embedding(),
  namespace=INDEX_NAMESPACE,
  top_k=top_k,
  include_values=False,
  include_metadata=True
)

top_k_success = False
match_cnt = 0
miss_categories = set()

my_list = query_result.matches
descriptions = [entry['metadata']['description'] for entry in my_list]
most_common_item = max(set(descriptions), key=descriptions.count)

if most_common_item == "Sneaker":
      top_k_success = True

for match in query_result.matches:
  if match.metadata['description'] == "Sneaker":
    match_cnt += 1
    top_k_contains = True
  else:
    miss_categories.add(match.metadata['description'])

print(f"Most common item matching result: {top_k_success}")
print(f"top_k: {top_k} match percentage is: {match_cnt/top_k * 100}%")
print(f"Match miss categories: {miss_categories} exepected 'Sneaker'")

# 7. Run a nearest neighbor search on 100 random test images that are not in the training dataset

Select 100 random test images. Keep in mind the model was NOT trained against these images. Obtain the percentage of pinecone queries that return the correct result in top_k. Feel free to play around with the "top_k" setting to see if you can increase the hit percentage. 

A query returns the correct result if the most common top_k description matches the test image.

In [None]:
import clip
import torch
import pinecone
from datasets import load_dataset
from tqdm.auto import tqdm 
from collections import Counter

test_dataset = load_dataset("fashion_mnist")['test'].shuffle().select(range(0, 100))
#test_dataset = load_dataset("fashion_mnist")['test']

# Check to see if GPU is aviailable
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load('ViT-B/32', device=device) 

label_descriptions = {0: "T-shirt/top", 
           1: "Trouser",
           2: "Pullover",
           3: "Dress",
           4: "Coat",
           5: "Sandal",
           6: "Shirt",
           7: "Sneaker",
           8: "Bag",
           9: "Ankle boot"}

# Generate vector embeddings for each image in the dataset
test_vectors = []
for img in tqdm(test_dataset, total=test_dataset.num_rows):
  image_pp = preprocess(img['image']).unsqueeze(0).to(device)
  embedding = model.encode_image(image_pp).squeeze().tolist()
    
  test_vectors.append({'embedding': embedding,
                        'description': label_descriptions[img["label"]]})
    
index = pinecone.Index(PINECONE_INDEX_NAME)
top_k = 10
top_k_success_cnt = 0

for v in test_vectors:

  top_k_success = False

  query_result = index.query(
    vector = v['embedding'],
    namespace=INDEX_NAMESPACE,
    top_k=top_k,
    include_values=False,
    include_metadata=True
  )

  my_list = query_result.matches
  descriptions = [entry['metadata']['description'] for entry in my_list]
  most_common_item = max(set(descriptions), key=descriptions.count)

  if most_common_item == v['description']:
      top_k_success = True
  
  if top_k_success:
    top_k_success_cnt += 1

print(f"top_k success rate: {top_k_success_cnt / (len(test_vectors)) * 100}%")

# 8. Run a load test script to simulate 10 concurrent users querying the index

Locust.io is an open-source load testing tool written in Python. It allows you to define user behaviour with Python code and simulate millions of simultaneous users to bombard a system with traffic to test its resilience under heavy load. The (locustfile.py)[./locustfile.py] script re-uses the logic in step #6 to query pinecone. It has a custom event hook that denotes a failure if the top_k result set does not match the search image description. This script will likely fail with a low error rate but you can increase top_k to get a 100% pass rate. The locust summary includes P50 to P100 response time percentiles and QPS(req/s).

In [None]:
%%bash
locust -f locustfile.py --headless -u 10 -r 1 --run-time 60s --host https://pinecone.io --only-summary

# 9. TEARDOWN: Delete the index 
# WARNING: This next step will delete the PINECONE_INDEX_NAME index and all data in it. DO NOT RUN THIS UNTIL YOU ARE READY OR MANUALLY REMOVE THE INDEX INSTEAD!!! 

In [3]:
if PINECONE_INDEX_NAME in pinecone.list_indexes():
    pinecone.delete_index(PINECONE_INDEX_NAME)

print(f"{PINECONE_INDEX_NAME} index should not exist in index list: {pinecone.list_indexes()}")

james-williams index should not exist in index list: []
