# Testing out vector search on quantitative queries

This is the notebook I created to test out the effectiveness of vector search with queries containing quantitative information.

### Setting up your environment

In [121]:
import os

os.environ["AZURE_OPENAI_API_KEY"] = "" # your API key here
os.environ["AZURE_OPENAI_ENDPOINT"] = "" # your endpoint here
os.environ["OPENAI_API_VERSION"] = "2023-05-15" # choose the API version you want to use

In [89]:
from langchain_openai import AzureOpenAIEmbeddings


# create an embeddings object
embeddings = AzureOpenAIEmbeddings(
    azure_deployment="", # the name of your OpenAI embeddings deployment here
)

### Build an index

In [101]:
sentences = []

for i in range(1000):
    sentences.append(f"I went to the store and bought {i} apples.")
    
for s in sentences[:10]:
    print(s)

I went to the store and bought 0 apples.
I went to the store and bought 1 apples.
I went to the store and bought 2 apples.
I went to the store and bought 3 apples.
I went to the store and bought 4 apples.
I went to the store and bought 5 apples.
I went to the store and bought 6 apples.
I went to the store and bought 7 apples.
I went to the store and bought 8 apples.
I went to the store and bought 9 apples.


In [102]:
import pandas as pd

# create a vector database in Pandas
# one column contains our 1,000 sentences (these will be returned as search results)
df = pd.DataFrame(sentences, columns=["sentence"])

# another column will contain sentence embeddings, which we will vectorize here with OpenAI's embeddings model (these will actually be 'searched')
df["embedding"] = embeddings.embed_documents(sentences)

### Create a basic vector search function

In [103]:
import langchain_community.utils.math as math

def search(query, index, top=5):

    # get embedding for the query being searched
    query_embedding = embeddings.embed_query(query)

    # create a new df column of each indexed sentence's similarity to the query
    df['similarity'] = df['embedding'].apply(lambda x: math.cosine_similarity([query_embedding], [x])[0])

    # sort the df by the "similarity" column
    sorted_df = df.sort_values(by='similarity', ascending=False)
    sorted_df.reset_index(drop=True, inplace=True)

    # return the top X rows
    top_rows = sorted_df.head(top)

    # print the sorted df with the top X rows
    print("The top {0} closest embeddings for '{1}' are:".format(top, query))
    print(top_rows['sentence'], end="\n\n")

    return None

### Experiment with vector search

In [120]:
q = "I went to the store and bought π apples."

search(q, df, top=5)

The top 5 closest embeddings for 'I went to the store and bought π apples.' are:
0    I went to the store and bought 925 apples.
1    I went to the store and bought 201 apples.
2    I went to the store and bought 901 apples.
3    I went to the store and bought 805 apples.
4      I went to the store and bought 2 apples.
Name: sentence, dtype: object



In [122]:
# if you want to use the 'apples' sentence to test a bunch of numbers in a batch, you can create a test set
test_set = ["5", "10", "16", "100", "7 + 5", "5.76", "seventy-seven", "forty-three and a half", "eleventy-one", "83.4", "2.3", "1.618", "-14", "a dozen", "five dozen", "2.5 dozen", "four score", "pi", "euler's number of", "avogadro's number of"]

In [106]:
# run the search function on the test set
for item in test_set:
    search("I went to the store and bought {0} apples.".format(item), df)

The top 5 closest embeddings for 'I went to the store and bought 5 apples.' are:
0      I went to the store and bought 5 apples.
1      I went to the store and bought 4 apples.
2    I went to the store and bought 500 apples.
3      I went to the store and bought 6 apples.
4     I went to the store and bought 25 apples.
Name: sentence, dtype: object

The top 5 closest embeddings for 'I went to the store and bought 10 apples.' are:
0    I went to the store and bought 10 apples.
1     I went to the store and bought 9 apples.
2    I went to the store and bought 11 apples.
3    I went to the store and bought 20 apples.
4     I went to the store and bought 8 apples.
Name: sentence, dtype: object

The top 5 closest embeddings for 'I went to the store and bought 16 apples.' are:
0    I went to the store and bought 16 apples.
1    I went to the store and bought 17 apples.
2    I went to the store and bought 14 apples.
3    I went to the store and bought 32 apples.
4    I went to the store and b