Hi in this Notebook we will be creating a Semantic Search app based on meaning and context rather that keywords.

In [3]:
# !pip install cohere
#!pip install python-dotenv
# !pip install weaviate-client
# !pip install pandas

We will first start by installing the libraries we need for the search engine. We will need at the start ***cohere LLM*** as Base model and ***Weaviate*** as the search engine and vector Database . 

In [2]:
import cohere
from dotenv import load_dotenv
import os
import pandas as pd
import weaviate
import weaviate.classes as wvc
import weaviate.classes.config as wc
import time

from weaviate.classes.config import Integrations


In [3]:
#Load the API keys and the Cluster URL
#You can use  online cluster or local cluster
load_dotenv()
cohere_api = os.getenv("COHERE_API_KEY")
weaviate_api = os.getenv("WEAVIATE_API_KEY")
clusterUrl = os.getenv("WEAVIATE_CLUSTER_URL")
rpm_embeddings=100

In [4]:
#Create a Cohere CLient
co = cohere.Client(cohere_api) 

In [None]:
#Create the Weaviate Client and the Cluster for DataBase

# authConfig = weaviate.auth.AuthApiKey(weaviate_api)
# client = weaviate.connect_to_wcs(
#     cluster_url=clusterUrl,
#     auth_credentials=authConfig,
#     headers={'X-Cohere-Api-Key': cohere_api},
#     skip_init_checks=True
# )

In [5]:
#Create Local Weaviate Client and the Cluster
client = weaviate.connect_to_local(
    skip_init_checks=True
)
integrations=[
    Integrations.cohere(
        api_key=cohere_api,
        requests_per_minute_embeddings=rpm_embeddings,
    )
]
client.integrations.configure(integrations)

Now we can verify if our client is connected. 

In [6]:
print(client.is_connected())

True


<h2>Vector Database Population</h2><br>
Before starting the population process let's start by creating  a class for Book and to keep our system stable and coherant .<br> we start by deleting our Class and then choose the properties and keep only _Title_ , _Categories_ and _Description_ as semantic search criterias by specifying that they are the only parameters to be embedded .<br>



In [10]:
# client.collections.delete(name='Book')

In [7]:
questions = client.collections.create(
    name="Book",
    vectorizer_config=wc.Configure.Vectorizer.text2vec_cohere(),
    generative_config=wc.Configure.Generative.cohere(),
    properties=[
        wc.Property(name="title", data_type=wc.DataType.TEXT),
        wc.Property(name="isbn10", data_type=wc.DataType.TEXT, skip_vectorization=True),
        wc.Property(name="isbn13", data_type=wc.DataType.NUMBER, skip_vectorization=True),
        wc.Property(name="categories", data_type=wc.DataType.TEXT),
        wc.Property(name="thumbnail", data_type=wc.DataType.TEXT, skip_vectorization=True),
        wc.Property(name="description", data_type=wc.DataType.TEXT),
        wc.Property(name="num_pages", data_type=wc.DataType.NUMBER, skip_vectorization=True),
        wc.Property(name="average_rating", data_type=wc.DataType.NUMBER, skip_vectorization=True),
        wc.Property(name="published_year", data_type=wc.DataType.NUMBER, skip_vectorization=True),
        wc.Property(name="authors", data_type=wc.DataType.TEXT, skip_vectorization=True),
    ],)
    

UnexpectedStatusCodeError: Collection may not have been created properly.! Unexpected status code: 422, with response body: {'error': [{'message': 'class name Book already exists'}]}.

In [8]:
def process(chunk,collection):
    for i,book in chunk.iterrows():
        properties = {
            "title": book['title'],
            "isbn10": book['isbn10'],
            "isbn13": book['isbn13'],
            "categories": book['categories'],
            "thumbnail": book['thumbnail'],
            "description": book['description'],
            "num_pages": book['num_pages'],
            "average_rating": book['average_rating'],
            "published_year": book['published_year'],
            "authors": book['authors'],
        }
        try:
            uuid = collection.data.insert(properties)
            print(f"Inserting : {book['title']}: {uuid}",end='\n')
        except Exception as e:
            print(f"Exception : {e}")            

In [9]:
book_collections = client.collections.get('Book')

chunksize = 100
chunks = pd.read_csv("./books.csv",chunksize=chunksize)
current_book = None

for chunk in chunks:
    process(chunk,book_collections)

Inserting : Gilead: aca770b5-d89e-4955-85fe-7868fbf5703e
Inserting : Spider's Web: 458d80cb-2f3f-4c61-9767-11b29805f856
Inserting : The One Tree: e9bda804-cb0b-4af3-a333-2c55bd823152
Inserting : Rage of angels: e830893e-7d39-40b6-b913-396c108ea459
Inserting : The Four Loves: 99c6841d-ba21-4f3c-8f64-6745034e96a5
Inserting : The Problem of Pain: e4564fe5-25df-4934-9c39-8537db731d6c
Inserting : An Autobiography: 255d02f2-8cfb-49a3-9159-efa0f03ac118
Inserting : Empires of the Monsoon: 25fdd7e0-a1c3-42d4-acb4-19a6cc06b032
Inserting : The Gap Into Madness: 28ad69b5-ef3d-4fcd-84ee-801438f117b6
Inserting : Master of the Game: 6b7814d7-a4a7-4ec4-93f1-6536944b9504
Inserting : If Tomorrow Comes: 68f18f87-e398-41aa-88a7-a014e05d790d
Inserting : Assassin's Apprentice: 6ee1ee8f-506a-469e-8cc1-8ae6260b1289
Inserting : Warhost of Vastmark: b1364047-b3fa-43f4-b774-4d031fa6142f
Inserting : The Once and Future King: 1a5cc985-7314-4cff-9c5c-6d777c3ff715
Inserting : Murder in LaMut: 7c882f7d-7ea0-40ea-95ae

KeyboardInterrupt: 

In [10]:
books_collection = client.collections.get('Book')

for item in books_collection.iterator():
    print(item.uuid, item.properties)

0039950c-9a5c-418e-98e9-6e2e74703464 {'description': '"The two works \'On fairy-stories\' and \'Leaf by Niggle\' were first brought together to form the book \'Tree and leaf\' in 1964. In this new edition a third element is added: the poem Mythopoeia, the making of myths..."--Preface.', 'thumbnail': 'http://books.google.com/books/content?id=aPb_AAIcwZ0C&printsec=frontcover&img=1&zoom=1&source=gbs_api', 'authors': 'John Ronald Reuel Tolkien', 'categories': 'Literary Collections', 'average_rating': 4.09, 'title': 'Tree and Leaf', 'isbn10': '0007105045', 'num_pages': 176.0, 'isbn13': 9780007105045.0, 'published_year': 2001.0}
03e1ae04-953b-4945-98fc-1e144e4d2554 {'description': "'Girls' Night In' features stories about growing up, growing out of, moving out, moving on, falling apart and getting it all together. So turn off your cell phone and curl up on the couch: this is one 'Girls' Night In' you won't want to miss.", 'thumbnail': 'http://books.google.com/books/content?id=xLwHHQAACAAJ&pr

Now we wil implement the semantic search using Weaviate.

In [11]:
#semantic search
book_collection= client.collections.get('Book')

response = book_collection.query.near_text(
    query="biology",
    limit=3
)

print()
for book in response.objects:
    print(book.properties['title'])
    print(book.properties['description'])
    print(book.properties['categories'])
    print('---')


Gravity
Emma Watson a research physician has been training for the mission of a lifetime: to study living organisms in the microgravity of space. But the true and lethal nature of the experiment has not been revealed to NASA and once aboard the space station things start to go wrong. A culture of single-celled Archaeons, gathered from the deep sea, begin to rapidly multiply and infect the crew - with deadly and agonising results. As her estranged husband and ground crew at NASA work against the clock to launch a rescue Emma stuggles to contain the lethal microbe. But with the contagion threatening Earth's population, there are those who would leave the astronauts stranded in orbit, quarantined aboard the station.
Science fiction
---
Gravity
Emma Watson a research physician has been training for the mission of a lifetime: to study living organisms in the microgravity of space. But the true and lethal nature of the experiment has not been revealed to NASA and once aboard the space stati

In [12]:
# Generative Search
response = book_collection.generate.near_text(
    query="technology, data structures and algorithms, distributed systems",
    limit=2,
    single_prompt="Explain why this book might be interesting to someone who likes playing the violin, rock climbing, and doing yoga. the book's title is {title}, with a description: {description}, and is in the genre: {categories}."
)
print(response.objects[0].generated)

Microserfs could be an intriguing read for someone who enjoys violin playing, rock climbing, and yoga because it offers a unique perspective on the lives of computer programmers, a world away from the activities you enjoy. 

The book provides an insightful glimpse into the lives of a group of Microsoft employees, known as microserfs, who are trying to find a balance between their demanding jobs and their personal lives. The characters' struggles to establish a sense of self and fulfill their passions outside of work could resonate with readers who have dedicated time and energy to mastering the violin, the physical challenges of rock climbing, and the mental and spiritual practice of yoga. 

The characters in the novel, despite being skilled and successful in their field, find themselves at a crossroads where they realize that their work-life balance is severely skewed. This realization is a common one for individuals passionate about their hobbies and could provide a sense of reflecti