# Building a book recommendation system with IRIS - Vector Search

In [6]:
import os 
import time

In [3]:
import pandas as pd

# Load the json file
df = pd.read_csv('/home/irissys/Data/books.csv')

# View the data
df.head(1)
print(f"The dataset has {len(df)} entries.")

The dataset has 6810 entries.


In [33]:
### remove entries with no description
df = df.fillna("")
df = df[df["description"]!=""& (df['description'] != None) ]
df = df[(df['average_rating'] != '') & (df['num_pages'] != '')]

df.drop(columns=["ratings_count"], inplace=True)

print(f"After removing entries with no description, there are now {len(df)} entries")

After removing entries with no description, there are now 6511 entries


  df = df.fillna("")


## Create vector embeddings of the description column

To perform a vector search, we have to transform the descriptions of the books into a vector. We can do this with the sentence transformers library: 

In [14]:
from sentence_transformers import SentenceTransformer

# Load a pre-trained sentence transformer model. This model's output vectors are of size 384

model = SentenceTransformer('all-MiniLM-L6-v2') 

# Generate embeddings for all descriptions at once. Batch processing makes it faster
embeddings = model.encode(df['description'].tolist(), normalize_embeddings=True)

# Add the embeddings to the DataFrame
df['description_vector'] = embeddings.tolist()

In [15]:
df.head(1)

Unnamed: 0,isbn13,isbn10,title,subtitle,authors,categories,thumbnail,description,published_year,average_rating,num_pages,description_vector
0,9780002005883,2005883,Gilead,,Marilynne Robinson,Fiction,http://books.google.com/books/content?id=KQZCP...,A NOVEL THAT READERS and critics have been eag...,2004.0,3.85,247.0,"[-0.04012785106897354, -0.0027914962265640497,..."


# Build the IRIS-SQL table

This is where my application devitates from the hackathon demo. For some reason I could not get the IRIS-Python DB API to work at all (I can share more details) so this is the workaround I came to: 

In [10]:
import irisnative 

In [11]:
conn = irisnative.createConnection("localhost", 1972, "USER", "SuperUser", "SYS")
cursor = conn.cursor()

In [21]:
table_name = "BookRecommender.Books"

create_table_query = f'''
CREATE TABLE {table_name} (
  isbn13     VARCHAR(32),
  isbn10        VARCHAR(32),
  title          VARCHAR(1024) NOT NULL,
  subtitle            VARCHAR(1024),
  authors    VARCHAR(255),
  categories     VARCHAR(1024),
  thumbnail        VARCHAR(1024),
  description       LONGVARCHAR,
  published_year    INTEGER,
  average_rating    DOUBLE,
  num_pages     INTEGER ,
  description_vector   VECTOR(DOUBLE, 384)
)

'''

In [37]:
cursor.execute(create_table_query)

0

In [36]:
### In case a reset is needed!
cursor.execute(f"Drop TABLE {table_name}" )

0

In [28]:
print(", ".join(list(df.columns)))
print("'], row['".join(list(df.columns)))


isbn13, isbn10, title, subtitle, authors, categories, thumbnail, description, published_year, average_rating, num_pages, description_vector
isbn13'], row['isbn10'], row['title'], row['subtitle'], row['authors'], row['categories'], row['thumbnail'], row['description'], row['published_year'], row['average_rating'], row['num_pages'], row['description_vector


In [34]:
df = df[(df['average_rating'] != '') & (df['num_pages'] != '')]


In [38]:
def add_row(row):
    sql = f'Insert into {table_name} (isbn13, isbn10, title, subtitle, authors, categories, thumbnail, description, published_year, average_rating, num_pages, description_vector) values (?,?,?,?,?,?,?,?,?,?,?,TO_VECTOR(?))'
    values = [row['isbn13'], row['isbn10'], row['title'], row['subtitle'], row['authors'], row['categories'], row['thumbnail'], row['description'], row['published_year'], row['average_rating'], row['num_pages'], str(row['description_vector'])]
    try:
        cursor.execute(sql, values)
    except:
        pass
df.apply(add_row, axis=1)

0       None
1       None
2       None
3       None
4       None
        ... 
6803    None
6804    None
6805    None
6808    None
6809    None
Length: 6511, dtype: object

## Vector Search 


In [64]:
user_prompt = "Recommend me a book about the mafia"
search_vector =  model.encode(user_prompt, normalize_embeddings=True).tolist() 

num_results = 3
minimum_rating = 4

In [75]:
def vector_search(user_prompt,num_results = 3):
    search_vector =  model.encode(user_prompt, normalize_embeddings=True).tolist() 
    
    searchSQL = f"""
        SELECT TOP ? title, authors, average_rating, description, categories
        FROM {table_name}
        ORDER BY VECTOR_COSINE(description_vector, TO_VECTOR(?,double)) DESC
    """
    cursor.execute(searchSQL,[num_results,str(search_vector)])
    
    results = cursor.fetchall()
    return results

In [66]:
results

[Row(title='The Gangs of New York', authors='Herbert Asbury', average_rating=3.59, description='The Gangs of New York is a tour through a now unrecognisable city of abysmal poverty and habitual violence cobbled from legend, memory, police records, the self-aggrandizements of aging crooks, popular journalism, and solid historical research. Asbury presents the definitive work on this subject, an illumination of the gangs of old New York that ultimately gave rise to the modem Mafia and its depiction in films like The Godfather.'),
 Row(title='The Godfather', authors='Mario Puzo', average_rating=4.37, description='A portrait of a Mafia family focuses on the life and times of patriarch Don Vito Corleone, a Sicilian-American godfather, and his sons.'),
 Row(title='Organized Crime', authors='Michael D. Lyman;Gary W. Potter', average_rating=3.83, description='Dispelling current myths regarding organized crime, Lyman and Potter’s fourth edition reveals a truer picture of organized crime and cri

## Creating the Text model response

Here I am using a basic text-generation model which is locally downloaded. I am doing this because it is free, but another model would be better for more formal uses. 

In [67]:
from transformers import pipeline
text_model = pipeline(task="text-generation", model="Qwen/Qwen2.5-1.5B-Instruct")

Device set to use cpu


In [None]:
def text_model_query(user_prompt, results ):
    text_model = pipeline(task="text-generation", model="Qwen/Qwen2.5-1.5B-Instruct")
    system_query = """Answer the following prompt using the search results provided below. 
    Do not refer to the search itself, but use the results. If none of these results seem to be a good fit, say so.  """
    results_string = "Search Results: "+str(results)
 
    model_query =  system_query + results_string + "Chat prompt: What kind of book would you like? User prompt: " + user_prompt  + "///ENDPROMPT"
    output = text_model(model_query)

    response = output[0]["generated_text"].split("///ENDPROMPT")[1]
    
    return response

In [69]:
# query_model(user_prompt, results)

'\nBased on the provided search results:\n\n1. "The Gangs of New York" by Herbert Asbury - This book provides a detailed look at the history of the Italian-American Mafia, presented as a series of legends, memories, and factual accounts.\n2. "The Godfather" by Mario Puzo - A novel focusing on the life of Don Vito Corleone and his family, exploring the dynamics within the Mafia family.\n\nGiven your user prompt which specifically asks for a book about the Mafia, both of these options appear relevant and could be considered suitable recommendations. However, if you have any preferences or specific interests within the broader context of organized crime or related topics, please let me know! \n\nFor example:\n- You might prefer a historical account,\n- Or perhaps one with more focus on contemporary issues rather than past events.\n\nWithout additional guidance, I would suggest either "The Gangs of New York" or "The Godfather," depending on your preference between historical detail versus 

In [83]:
def rag_chatbot():
    user_input = input("What kind of book would you like? \n")
    search_results = vector_search(user_input)
    output = text_model_query(user_input, search_results)
    print(output)

In [84]:
rag_chatbot()

What kind of book would you like? 
 A book about dragons


Device set to use cpu


 The search results provide several options for books related to dragons:

1. "The Search for Power" by Margaret Weis - This is an anthology of short fiction that explores the world of Krynn's diverse dragons and the characters who interact with them.

2. "Dragonology" by Ernest Drake - This juvenile non-fiction book offers an introduction to dragonology, including how to catch dragons, details about their natural history, and accounts of legendary dragons and those involved in studying them.

3. "The Book of the Dragon" by unknown author(s) - This fictional work features illustrations of dragons alongside detailed information on various aspects of fantasy creatures, such as their cultures and customs, legends, and types of dragons.

Given your interest in dragons, I recommend reading either "The Search for Power" or "The Book of the Dragon." Both are suitable choices based on your request. However, if you're looking for something more focused on learning about dragons from a scientifi