# Test AI tooling

## FAISS Similarity Search

From [How to Use FAISS to Build Your First Similarity Search](https://medium.com/loopio-tech/how-to-use-faiss-to-build-your-first-similarity-search-bf0f708aa772)


### Step 1: Create a dataframe with the existing text and categories

In [2]:
import pandas as pd
data = [['Where are your headquarters located?', 'location'],
['Throw my cellphone in the water', 'random'],
['Network Access Control?', 'networking'],
['Address', 'location']]
df = pd.DataFrame(data, columns = ['text', 'category'])

display(df)

Unnamed: 0,text,category
0,Where are your headquarters located?,location
1,Throw my cellphone in the water,random
2,Network Access Control?,networking
3,Address,location


### Step 2: Create vectors from the text

In [3]:
from sentence_transformers import SentenceTransformer
text = df['text']
encoder = SentenceTransformer("paraphrase-mpnet-base-v2")
vectors = encoder.encode(text)

  from .autonotebook import tqdm as notebook_tqdm
modules.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 229/229 [00:00<00:00, 203kB/s]
config_sentence_transformers.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 122/122 [00:00<00:00, 496kB/s]
README.md: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.70k/3.70k [00:00<00:00, 15.8MB/s]
sentence_bert_config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████

### Step 3: Build a FAISS index from the vectors

In [4]:
import faiss

vector_dimension = vectors.shape[1]
index = faiss.IndexFlatL2(vector_dimension)
faiss.normalize_L2(vectors)
index.add(vectors)

### Step 4: Create a search vector

In [5]:
import numpy as np

search_text = 'where is your office?'
search_vector = encoder.encode(search_text)
_vector = np.array([search_vector])
faiss.normalize_L2(_vector)

### Step 5: Search

In [6]:
k = index.ntotal
distances, ann = index.search(_vector, k=k)
results = pd.DataFrame({'distances': distances[0], 'ann': ann[0]})
display(results)

Unnamed: 0,distances,ann
0,0.584873,0
1,1.17595,3
2,1.644266,2
3,1.919768,1


### Step 6: Get category for the search text

In [10]:
# Join by df1.ann == data.index
merge = pd.merge(results, df, left_on='ann', right_index=True)
display(merge)
labels  = df['category']
category = labels[ann[0][0]]
print([f"{c}: {l}" for l, c in zip(labels, category)])

Unnamed: 0,distances,ann,text,category
0,0.584873,0,Where are your headquarters located?,location
1,1.17595,3,Address,location
2,1.644266,2,Network Access Control?,networking
3,1.919768,1,Throw my cellphone in the water,random


['l: location', 'o: random', 'c: networking', 'a: location']
