In [7]:
from openai.embeddings_utils import get_embedding, cosine_similarity
import pandas as pd

'''references
https://github.com/openai/openai-cookbook/blob/main/examples/Semantic_text_search_using_embeddings.ipynb
https://github.com/openai/openai-cookbook/blob/main/examples/Obtain_dataset.ipynb
'''

'references\nhttps://github.com/openai/openai-cookbook/blob/main/examples/Semantic_text_search_using_embeddings.ipynb\nhttps://github.com/openai/openai-cookbook/blob/main/examples/Obtain_dataset.ipynb\n'

In [1]:
# start with a list of documents, in this case sentences

docs = [
    "The importance of pollinators like bees in our ecosystem cannot be overstated, as they play a crucial role in the growth of fruits and vegetables.",
    "Virtual reality has significantly evolved in the past few years, offering immersive experiences that range from video gaming to medical training simulations.",
    "Electric cars are gaining popularity rapidly, providing an eco-friendly alternative to traditional gasoline vehicles, but their charging infrastructure still needs significant improvement.",
    "Solar energy, harnessed through photovoltaic panels, is an abundant and renewable resource that has the potential to drastically reduce greenhouse gas emissions.",
    "The intricate art of origami is not merely a hobby; it's a blend of geometry and creativity that has applications in engineering and design.",
    "The concept of time dilation in special relativity is mind-boggling, allowing for different rates of time passage depending on one's relative velocity.",
    "The Mona Lisa, housed in the Louvre Museum in Paris, continues to captivate visitors with its enigmatic smile and the mystery surrounding its true origins.",
    "Despite being the most abundant element in the universe, hydrogen is challenging to use as fuel due to its low energy density and storage issues.",
    "Artificial intelligence is transforming healthcare by assisting in diagnostics, treatment planning, and even in conducting intricate surgeries.",
    "The Great Barrier Reef, located off the coast of Australia, is the world's largest coral reef system but faces existential threats from climate change.",
    "Microplastics are an insidious form of pollution, entering our waterways and food chains, thereby posing a potential long-term risk to both wildlife and humans.",
    "Classical music, once considered an exclusive art form, is gaining renewed interest through its integration into movies, commercials, and various digital platforms.",
    "Psychological studies on the effects of social media reveal both its power to connect people globally and its potential to contribute to loneliness.",
    "Black holes, the remnants of massive stars, challenge our understanding of physics by presenting conditions where current theories break down.",
    "Urban gardening is more than a trend; it's a sustainable practice that enhances community well-being and contributes to environmental conservation.",
    "The global water crisis is intensifying, with millions of people lacking access to clean water, thereby exacerbating poverty and disease.",
    "Organic farming not only provides healthier produce but also employs sustainable practices that benefit the environment in the long run.",
    "Remote work has become increasingly feasible, thanks to advances in technology, but it also brings challenges in maintaining work-life balance.",
    "Genetic engineering holds great promise for curing diseases, but ethical concerns about its misuse cannot be ignored.",
    "Yoga, an ancient practice that originated in India, has gained global popularity for its physical and mental health benefits.",
    "The rise of fast fashion has made clothing affordable but at the expense of environmental sustainability and ethical labor practices.",
    "Smart homes are no longer a thing of the future; with devices like smart thermostats and voice-activated assistants, they're becoming increasingly common.",
    "Antarctica, the last unexplored frontier, holds vital clues about Earth's climate history, locked away in its ice cores.",
    "The Hubble Space Telescope has revolutionized our understanding of the universe, capturing images of distant galaxies and cosmic phenomena.",
    "Renewable energy technologies, such as wind and solar power, are essential for mitigating the impacts of climate change on a global scale.",
    "The study of ancient civilizations like Mesopotamia and Egypt offers insights into the origins of agriculture, governance, and even writing systems.",
    "The migration of monarch butterflies is one of nature's most fascinating phenomena, involving a journey of thousands of miles over multiple generations.",
    "Machine learning algorithms are transforming the finance sector, enabling automated trading and risk assessment with unprecedented accuracy.",
    "Advances in 3D printing technology have opened new possibilities in manufacturing, from customized products to medical prosthetics.",
    "The ethics of autonomous vehicles are complex, involving not only technological issues but also questions about legal responsibility in the event of accidents.",
    "The traditional art of storytelling is experiencing a resurgence through modern media like podcasts, offering a rich variety of narratives to audiences worldwide.",
    "The study of linguistics goes beyond grammar and vocabulary; it explores how language shapes thought, culture, and even our perception of reality.",
    "The ocean floor is one of the least-explored areas on Earth, yet it holds a wealth of resources and undiscovered biological diversity.",
    "The emergence of cryptocurrency has disrupted financial systems, offering a decentralized alternative to traditional banking but also raising concerns about regulation and security.",
    "Nanotechnology, the manipulation of matter at the atomic scale, holds the promise of revolutionary advances in medicine, materials science, and energy production.",
    "Agatha Christie's detective novels, particularly those featuring Hercule Poirot, have left an indelible mark on the mystery genre, inspiring countless authors and adaptations.",
    "The search for extraterrestrial life fascinates scientists and laypeople alike, as the discovery of even microbial life would redefine our understanding of biology.",
    "Sustainable tourism aims to minimize the negative impacts of travel, encouraging cultural exchange while preserving the environment and benefiting local communities.",
    "The concept of mindfulness, deeply rooted in Buddhist philosophy, has gained scientific support for its benefits in reducing stress and improving mental well-being.",
    "The phenomenon of 'earworms,' songs that get stuck in one's head, is not merely annoying but also an interesting subject of psychological study.",
    "Birdwatching is not just a casual hobby; it's an activity that promotes mindfulness, a deeper appreciation of nature, and valuable citizen science data collection.",
    "The internet of things (IoT) promises a future where everything from your fridge to your car will be interconnected, offering unprecedented convenience and efficiency.",
    "The field of archaeoastronomy explores how ancient cultures interpreted celestial bodies, influencing everything from architecture to religious practices."
]


In [6]:
# embedding model parameters
embedding_model = "text-embedding-ada-002"
embedding_encoding = "cl100k_base"  # this the encoding for text-embedding-ada-002
max_tokens = 8000  # the maximum for text-embedding-ada-002 is 8191

In [10]:
# use pandas for easy visualization

# Create a DataFrame
df = pd.DataFrame(docs, columns=["doc"])

# Show the DataFrame
df.head()

Unnamed: 0,doc
0,The importance of pollinators like bees in our...
1,Virtual reality has significantly evolved in t...
2,"Electric cars are gaining popularity rapidly, ..."
3,"Solar energy, harnessed through photovoltaic p..."
4,The intricate art of origami is not merely a h...


In [11]:
# now we use the api to get the embeddings and store them in the same df
# Ensure you have your API key set in your environment per the README: https://github.com/openai/openai-python#usage

df["embedding"] = df.doc.apply(lambda x: get_embedding(x, engine=embedding_model))
df.head()

Unnamed: 0,doc,embedding
0,The importance of pollinators like bees in our...,"[0.0027186137158423662, -0.020436476916074753,..."
1,Virtual reality has significantly evolved in t...,"[-0.011856520548462868, -0.0007038610056042671..."
2,"Electric cars are gaining popularity rapidly, ...","[0.0007442101486958563, -0.004543936811387539,..."
3,"Solar energy, harnessed through photovoltaic p...","[0.008117465302348137, -0.009854224510490894, ..."
4,The intricate art of origami is not merely a h...,"[-0.012270665727555752, 0.019313745200634003, ..."


In [12]:
# save it just in case :D

df.to_csv("docs_with_embeddings.csv")

In [14]:
# now imput your query
question = input("Your question:")

Your question:do you have any info about butterflies? 


In [16]:
# the process now follows:
# 1. calculate the embedding for the question, with the same API and parameters, of course

q_embedding = get_embedding(question, engine=embedding_model)

# have a look, again! it is a vector...
q_embedding

[-0.011704735457897186,
 0.0051749651320278645,
 0.0026509889867156744,
 -0.028789609670639038,
 0.010636523365974426,
 0.03475596383213997,
 -0.017299819737672806,
 -0.02480335719883442,
 -0.017117442563176155,
 -0.01511128805577755,
 0.009880959056317806,
 0.01616647280752659,
 -0.013404754921793938,
 -0.003930889070034027,
 0.011326952837407589,
 0.03157738223671913,
 0.02535048872232437,
 0.01413426548242569,
 0.020595643669366837,
 -0.023787252604961395,
 0.0044584814459085464,
 -0.0026558740064501762,
 0.004220739006996155,
 -0.024842437356710434,
 0.006373446434736252,
 0.009978661313652992,
 0.013456863351166248,
 -0.01671360619366169,
 -0.014994045719504356,
 -0.011079440824687481,
 0.05174313485622406,
 -0.003784335684031248,
 -0.029831767082214355,
 -0.018185654655098915,
 -0.012766432948410511,
 -0.014525074511766434,
 0.011522357352077961,
 -0.007783616427332163,
 0.010916602797806263,
 0.018159599974751472,
 0.02178109809756279,
 0.00238882121630013,
 0.000749050930608063

In [27]:
# 2. now we look for the closest vector from our db, using cosine similarity
# we use the c.s. implementation from openai
# Here we compare the cosine similarity of the embeddings of the query and the documents, and show top_n best matches.

def find_closest_doc(q_embedding, df, n=3):
    
    # create another column in the df and store the similarity
    df["similarity"] = df.embedding.apply(lambda x: cosine_similarity(x, q_embedding))
    
    # now we sort the df based on the new created column 
    results = df.sort_values("similarity", ascending = False).head(n)["doc"].tolist()
 
    return results

for result in find_closest_doc(q_embedding, df, 3):
    print (result,"\n")

The migration of monarch butterflies is one of nature's most fascinating phenomena, involving a journey of thousands of miles over multiple generations. 

The importance of pollinators like bees in our ecosystem cannot be overstated, as they play a crucial role in the growth of fruits and vegetables. 

Birdwatching is not just a casual hobby; it's an activity that promotes mindfulness, a deeper appreciation of nature, and valuable citizen science data collection. 



In [32]:
# now we put all in one cell to play around 

while True:
    question = input("Your question:")
    q_embedding = get_embedding(question, engine=embedding_model)
    print ("\n")
    for result in find_closest_doc(q_embedding, df, 3):
        print (result,"\n")

Your question:dog food


Microplastics are an insidious form of pollution, entering our waterways and food chains, thereby posing a potential long-term risk to both wildlife and humans. 

The internet of things (IoT) promises a future where everything from your fridge to your car will be interconnected, offering unprecedented convenience and efficiency. 

Organic farming not only provides healthier produce but also employs sustainable practices that benefit the environment in the long run. 

Your question:I need some help 


Remote work has become increasingly feasible, thanks to advances in technology, but it also brings challenges in maintaining work-life balance. 

Black holes, the remnants of massive stars, challenge our understanding of physics by presenting conditions where current theories break down. 

Psychological studies on the effects of social media reveal both its power to connect people globally and its potential to contribute to loneliness. 

Your question:where do we g

KeyboardInterrupt: Interrupted by user