# <u>What have I done?</u>
* OpenAI's `text-embedding-ada-002` has an embedding size of 1536
* Can I batch embed, or do I need to pass in the text one-by-one?
* Carefully consider the payloads and the IDs. IDs will help you update data points later if needed.

In [1]:
import pandas as pd
import numpy as np
from qdrant_client import QdrantClient
from qdrant_client.http.models import VectorParams, Distance
from openai import OpenAI
import os
import random

In [None]:
df = pd.read_pickle("bbe_12-03-2024-2134.pkl")

In [10]:
df.loc[df["ID"] == random.randint(0, len(df)-1)]

Unnamed: 0,ID,bookId,title,series,author,rating,description,language,isbn,genres,...,setting,coverImg,bbeScore,bbeVotes,price,descriptor,descriptor_lengths,description_lengths,genre_lengths,genres_list
11163,10283,9766063-ico,Ico: Castle in the Mist,,Miyuki Miyabe,3.64,When a boy named Ico grows long curved horns o...,English,9781421540634,"Fantasy, Fiction, Japanese Literature, Young A...",...,[],https://i.gr-assets.com/images/S/compressed.ph...,200,2,8.86,Description: When a boy named Ico grows long c...,695,552,120,"[Fantasy, Fiction, Japanese Literature, Young ..."


In [6]:
qdrant_api_key = os.environ["QDRANT_API_KEY"]
qdrant_url = os.environ["QDRANT_URL"]
# qdrant_url = "https://fd45ddab-8b96-4a40-9339-0671dbe64dcc.europe-west3-0.gcp.cloud.qdrant.io:6333"
openai_api_key = os.environ["OPENAI_API_KEY"]

In [7]:
client = OpenAI(api_key=openai_api_key)

In [8]:
qdrant_client = QdrantClient(
    url=qdrant_url,
    api_key=qdrant_api_key,
)

print(qdrant_client.get_collections())

collections=[CollectionDescription(name='best-book-ever'), CollectionDescription(name='hip-hop-speech'), CollectionDescription(name='qa_collection')]


In [18]:
text = df.iloc[1].descriptor
print(text)

Description: There is a door at the end of a silent corridor. And it’s haunting Harry Pottter’s dreams. Why else would he be waking in the middle of the night, screaming in terror?Harry has a lot on his mind for this, his fifth year at Hogwarts: a Defense Against the Dark Arts teacher with a personality like poisoned honey; a big surprise on the Gryffindor Quidditch team; and the looming terror of the Ordinary Wizarding Level exams. But all these things pale next to the growing threat of He-Who-Must-Not-Be-Named - a threat that neither the magical government nor the authorities at Hogwarts can stop.As the grasp of darkness tightens, Harry must discover the true depth and strength of his friends, the importance of boundless loyalty, and the shocking price of unbearable sacrifice.His fate depends on them all.

Genres: 'Fantasy', 'Young Adult', 'Fiction', 'Magic', 'Childrens', 'Adventure', 'Audiobook', 'Middle Grade', 'Classics', 'Science Fiction Fantasy'


In [19]:
response = client.embeddings.create(input=text, model="text-embedding-ada-002")

In [20]:
len(response.data[0].embedding)

1536

In [14]:
df.columns

Index(['ID', 'bookId', 'title', 'series', 'author', 'rating', 'description',
       'language', 'isbn', 'genres', 'characters', 'bookFormat', 'edition',
       'pages', 'publisher', 'publishDate', 'firstPublishDate', 'awards',
       'numRatings', 'ratingsByStars', 'likedPercent', 'setting', 'coverImg',
       'bbeScore', 'bbeVotes', 'price', 'descriptor', 'descriptor_lengths',
       'description_lengths', 'genre_lengths', 'genres_list'],
      dtype='object')

In [23]:
df.iloc[0].genres_list

['Young Adult',
 'Fiction',
 'Dystopia',
 'Fantasy',
 'Science Fiction',
 'Romance',
 'Adventure',
 'Teen',
 'Post Apocalyptic',
 'Action']

In [29]:
payloads = df.loc[:, ["title", "series", "author", "rating", "descriptor", "language", "isbn", "pages", "firstPublishDate", "numRatings", "genres_list"]].to_dict(orient="records")

In [34]:
ids = df["ID"].to_list()

In [3]:
def create_embedding(text):
    response = client.embeddings.create(input=text, model="text-embedding-ada-002")
    return response.data[0].embedding

# df["embedding"] = df["descriptor"].apply(create_embedding)

In [42]:
df.to_pickle("bbe_12-03-2024-embedded")

In [43]:
embeddings = df["embedding"].to_list()

In [10]:
collection_name = "best-book-ever"
if not qdrant_client.collection_exists(collection_name):
    qdrant_client.create_collection(
        collection_name=collection_name,
        vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
    )

In [44]:
qdrant_client.upload_collection(
    collection_name=collection_name,
    vectors=embeddings,
    payload=payloads,
    ids=ids,
)

In [12]:
query = "I'm looking for a graphic novel about the monkey king."

query_vector = create_embedding(query)



In [13]:
search_results = qdrant_client.search(
    collection_name=collection_name,
    query_vector=query_vector,
    limit=5,
)

[ScoredPoint(id=5816, version=90, score=0.8579728, payload={'title': 'Monkey: The Journey to the West', 'series': None, 'author': "Wu Cheng'en, Arthur Waley (Translator/Adapter), Hu Shih (Introduction)", 'rating': 4.01, 'descriptor': 'Description: Probably the most popular book in the history of the Far East, this classic sixteenth century novel is a combination of picaresque novel and folk epic that mixes satire, allegory, and history into a rollicking adventure. It is the story of the roguish Monkey and his encounters with major and minor spirits, gods, demigods, demons, ogres, monsters, and fairies. This translation, by the distinguished scholar Arthur Waley, is the first accurate English version; it makes available to the Western reader a faithful reproduction of the spirit and meaning of the original.\n\nGenres: Classics, Fiction, Fantasy, China, Mythology, Literature, Chinese Literature, Asia, Adventure, Novels', 'language': 'English', 'isbn': '9780802130860', 'pages': '306', 'fi

In [24]:
for hit in search_results:
    print(hit.payload['title'])
    print(hit.payload['genres_list'])
    print("")

Monkey: The Journey to the West
['Classics', 'Fiction', 'Fantasy', 'China', 'Mythology', 'Literature', 'Chinese Literature', 'Asia', 'Adventure', 'Novels']

American Born Chinese
['Graphic Novels', 'Young Adult', 'Comics', 'Fiction', 'Fantasy', 'Graphic Novels Comics', 'Teen', 'Coming Of Age', 'School', 'Cultural']

Monkey King
['Fiction', 'China', 'Contemporary']

Red Earth and Pouring Rain
['Fiction', 'India', 'Fantasy', 'Magical Realism', 'Indian Literature', 'Literature', 'Asia', 'Literary Fiction', 'Novels', 'Historical Fiction']

Prince Valiant Vol. 1: The Prophecy
['Comics', 'Fantasy', 'Graphic Novels', 'Fiction']

