Hi in this Notebook we will be creating a Semantic Search app based on meaning and context rather that keywords.

In [140]:
# !pip install cohere
#!pip install python-dotenv
# !pip install weaviate-client
# !pip install pandas

We will first start by installing the libraries we need for the search engine. We will need at the start ***cohere LLM*** as Base model and ***Weaviate*** as the search engine and vector Database . 

In [2]:
import cohere
from dotenv import load_dotenv
import os
import pandas as pd
import weaviate
import weaviate.classes as wvc
import weaviate.classes.config as wc
import time

from weaviate.classes.config import Integrations


In [28]:
#Load the API keys and the Cluster URL
#You can use  online cluster or local cluster
load_dotenv()
cohere_api = os.getenv("COHERE_API_KEY")
weaviate_api = os.getenv("WEAVIATE_API_KEY")
clusterUrl = os.getenv("WEAVIATE_CLUSTER_URL")
rpm_embeddings=100

In [29]:
#Create a Cohere CLient
co = cohere.Client(cohere_api) 

In [30]:
#Create the Weaviate Client and the Cluster for DataBase

# authConfig = weaviate.auth.AuthApiKey(weaviate_api)
# client = weaviate.connect_to_wcs(
#     cluster_url=clusterUrl,
#     auth_credentials=authConfig,
#     headers={'X-Cohere-Api-Key': cohere_api},
#     skip_init_checks=True
# )

In [31]:
#Create Local Weaviate Client and the Cluster
client = weaviate.connect_to_local(
    skip_init_checks=True
)
integrations=[
    Integrations.cohere(
        api_key=cohere_api,
        requests_per_minute_embeddings=rpm_embeddings,
    )
]
client.integrations.configure(integrations)

Now we can verify if our client is connected. 

In [32]:
print(client.is_connected())

True


<h2>Vector Database Population</h2><br><br>
Before starting the population process let's start by creating  a class for Book and to keep our system stable and coherant .<br> we start by deleting our Class and then choose the properties and keep only _Title_ , _Categories_ and _Description_ as semantic search criterias by specifying that they are the only parameters to be embedded .



In [33]:
client.collections.delete(name='Book')

In [34]:
questions = client.collections.create(
    name="Book",
    vectorizer_config=wc.Configure.Vectorizer.text2vec_cohere(),
    generative_config=wc.Configure.Generative.cohere(),
    properties=[
        wc.Property(name="title", data_type=wc.DataType.TEXT),
        wc.Property(name="isbn10", data_type=wc.DataType.TEXT, skip_vectorization=True),
        wc.Property(name="isbn13", data_type=wc.DataType.NUMBER, skip_vectorization=True),
        wc.Property(name="categories", data_type=wc.DataType.TEXT),
        wc.Property(name="thumbnail", data_type=wc.DataType.TEXT, skip_vectorization=True),
        wc.Property(name="description", data_type=wc.DataType.TEXT),
        wc.Property(name="num_pages", data_type=wc.DataType.NUMBER, skip_vectorization=True),
        wc.Property(name="average_rating", data_type=wc.DataType.NUMBER, skip_vectorization=True),
        wc.Property(name="published_year", data_type=wc.DataType.NUMBER, skip_vectorization=True),
        wc.Property(name="authors", data_type=wc.DataType.TEXT, skip_vectorization=True),
    ],)
    

In [35]:
def process(chunk,collection):
    for i,book in chunk.iterrows():
        properties = {
            "title": book['title'],
            "isbn10": book['isbn10'],
            "isbn13": book['isbn13'],
            "categories": book['categories'],
            "thumbnail": book['thumbnail'],
            "description": book['description'],
            "num_pages": book['num_pages'],
            "average_rating": book['average_rating'],
            "published_year": book['published_year'],
            "authors": book['authors'],
        }
        try:
            uuid = collection.data.insert(properties)
            print(f"Inserting : {book['title']}: {uuid}",end='\n')
        except Exception as e:
            print(f"Exception : {e}")            

In [42]:
book_collections = client.collections.get('Book')

chunksize = 100
chunks = pd.read_csv("./books.csv",chunksize=chunksize)
current_book = None

for chunk in chunks:
    process(chunk,book_collections)

  chunks = pd.read_csv("./books.csv",chunksize=chunksize)


Inserting : Gilead: 91cc21bd-e447-4cbb-bb04-1c335115a61b
Inserting : Spider's Web: bb970110-3385-4f93-a290-e2113fe57bf7
Inserting : The One Tree: ced19992-baaf-453e-a77a-9771ca6760f5
Inserting : Rage of angels: d01a3f05-17e4-448c-9a64-14b3eb76d11e
Inserting : The Four Loves: f49f2b32-ba3d-4b25-a826-ce7596f0f334
Inserting : The Problem of Pain: 7b99d185-ee2e-46ce-b719-2465bae90b29
Inserting : An Autobiography: ed0afd1b-79cc-485e-8ba9-e9ba5d2474b5
Inserting : Empires of the Monsoon: 0be5a2a8-61a6-4520-8d3b-8be9febeb52e
Inserting : The Gap Into Madness: d9f5b9ae-3e87-475d-9312-003c8cfd0346
Inserting : Master of the Game: 2c9caf9c-7b41-41eb-a360-f147a9f3c022
Inserting : If Tomorrow Comes: b588827b-f711-4a74-8beb-26ab5c254a13
Inserting : Assassin's Apprentice: 57c2b574-79bb-4c2f-98c0-b53def2603b9
Inserting : Warhost of Vastmark: 594f88c6-24b6-424c-8dfd-9062ae994669
Inserting : The Once and Future King: acac1c9e-c7c2-45c2-8880-91276fb9ae88
Inserting : Murder in LaMut: 1d94481b-b37d-4c0c-a689

KeyboardInterrupt: 

In [47]:
books = client.collections.get('Book')
print(books)

<weaviate.Collection config={
  "name": "Book",
  "description": null,
  "generative_config": {
    "generative": "generative-cohere",
    "model": {}
  },
  "inverted_index_config": {
    "bm25": {
      "b": 0.75,
      "k1": 1.2
    },
    "cleanup_interval_seconds": 60,
    "index_null_state": false,
    "index_property_length": false,
    "index_timestamps": false,
    "stopwords": {
      "preset": "en",
      "additions": null,
      "removals": null
    }
  },
  "multi_tenancy_config": {
    "enabled": false,
    "auto_tenant_creation": false,
    "auto_tenant_activation": false
  },
  "properties": [
    {
      "name": "title",
      "description": null,
      "data_type": "text",
      "index_filterable": true,
      "index_searchable": true,
      "nested_properties": null,
      "tokenization": "word",
      "vectorizer_config": {
        "skip": false,
        "vectorize_property_name": true
      },
      "vectorizer": "text2vec-cohere"
    },
    {
      "name": "isbn10

Now we wil implement the semantic search using Weaviate.

In [43]:
#semantic search
book_collection= client.collections.get('Book')

response = book_collection.query.near_text(
    query="biology",
    limit=3
)

print()
for book in response.objects:
    print(book.properties['title'])
    print(book.properties['description'])
    print(book.properties['categories'])
    print('---')


Gravity
Emma Watson a research physician has been training for the mission of a lifetime: to study living organisms in the microgravity of space. But the true and lethal nature of the experiment has not been revealed to NASA and once aboard the space station things start to go wrong. A culture of single-celled Archaeons, gathered from the deep sea, begin to rapidly multiply and infect the crew - with deadly and agonising results. As her estranged husband and ground crew at NASA work against the clock to launch a rescue Emma stuggles to contain the lethal microbe. But with the contagion threatening Earth's population, there are those who would leave the astronauts stranded in orbit, quarantined aboard the station.
Science fiction
---
The Problem of Pain
"In The Problem of Pain, C.S. Lewis, one of the most renowned Christian authors and thinkers, examines a universally applicable question within the human condition: If God is good and all-powerful, why does he allow his creatures to suf

In [53]:
# Generative Search
# response = book_collection.generate.near_text(
#     query="technology, data structures and algorithms, distributed systems",
#     limit=2,
#     single_prompt="Explain why this book might be interesting to someone who likes playing the violin, rock climbing, and doing yoga. the book's title is {title}, with a description: {description}, and is in the genre: {categories}."
# )
print(response.objects[0].generated)

Microserfs could be an intriguing read for someone who enjoys violin playing, rock climbing, and yoga because it offers a unique perspective on the lives of computer programmers, a world away from the activities you enjoy. 

The book provides an insightful glimpse into the lives of a group of Microsoft employees, known as microserfs, who are trying to find a balance between their demanding jobs and their personal lives. The characters' struggles to establish a sense of self and fulfill their passions outside of work could resonate with readers who have dedicated time and energy to mastering the violin, the physical challenges of rock climbing, and the mindfulness of yoga. 

The novel is also likely to provide an intriguing look into a different lifestyle and career path—one centered around computer programming. It might offer a fascinating glimpse into a world the reader might not otherwise experience, with an insider's view of the culture and work ethic at a giant tech corporation. 

