In [1]:
import pandas as pd
from collections import OrderedDict
import re
import sys
sys.path.append("../models/")
import utilities as ut

In [2]:
SEPARATOR = "###################"
MISTRAL_ANSWERS = []
demo_path = "../demo/"

# Collect demo data

In [3]:
with open(demo_path + "demo_mistral_answers.txt", "r") as file:
    mistral_file_content = file.read()
    mistral_original_answers = str(mistral_file_content).split(SEPARATOR)

# GPT-3.5 turbo
with open(demo_path + "demo_gpt_answers.txt", "r") as gpt_file:
    gpt_file_content = gpt_file.read()
    GPT_ANSWERS = str(gpt_file_content).split(SEPARATOR)

# Extract recommendations

In [4]:
for mistral_answer in mistral_original_answers:
    MISTRAL_ANSWERS.append(ut.filter_llm_output(mistral_answer, model="mistral"))

# MISTRAL ANSWERS

In [5]:
MISTRAL_ANSWERS

['I recommend you the following books: \n0002247399.',
 'Based on your interests, I would suggest the following books:\n\n1. "The Tarot Reader" by Nancy Shavick: This book is about the history and practice of tarot reading, and it would be a great addition to your collection if you are interested in spirituality and self-discovery.\n2. "Coming Apart: Why Relationships End and How to Live Through the Ending of Yours" by Daphne Rose Kingma: This book is a self-help guide that focuses on the end of relationships and how to navigate through the process. It would be a great addition to your collection if you are interested in personal growth and self-improvement.\n3. "Confessions of a Shopaholic" by Sophie Kinsella: This book is a light-hearted and humorous novel about a woman who is addicted to shopping. It would be a great addition to your collection if you are interested in romance and women\'s fiction.\n\nI hope these suggestions help you find the perfect books for your collection!',
 "

# Parsing Potential Methods

I check the average len of item ids in two sample datasets such that, if the computed averages are equal, we might use some sort of regular expression to find the ids in the LLM's answers.

In [6]:
booksDF = pd.read_csv("../data/test/books_2_5__1.csv")
moviesDF = pd.read_csv("../data/test/movies_5_10.csv")

In [7]:
print(f"AVG Len of Item IDs in booksDF: {int(booksDF['item_id'].str.len().mean())}")
print(f"AVG Len of Item IDs in moviesDF: {int(moviesDF['item_id'].str.len().mean())}")

AVG Len of Item IDs in booksDF: 10
AVG Len of Item IDs in moviesDF: 10


In [8]:
# This methods retrieves all strings up to 10 chars similar to: 1234567890, ABCDEFGHIL, 123ABCDF34, or AB540943XX
# Return: set of film ids
def extract_suggestion_id(sentence):
    pattern = r'[0-9A-Z]{10}'
    match = re.findall(pattern, sentence)
    
    if match:
        return list(OrderedDict.fromkeys(match))
    else:
        return None

# This methods retrieves all strings similar to: "Film Title" by Film Author/Actor
# Return: set of film titles
def extract_suggestion_title(sentence):
    pattern = r'"([^"]+)" by [\w\s]+'
    match = re.findall(pattern, sentence)
    
    if match:
        return list(OrderedDict.fromkeys(match))
    else:
        return None

## Examples of Parsing Results

In [9]:
sentence = 'Yesterday I watched "The Tarot Reader" by Nancy Shavick, "The Sium Sium" by Ronaldo and the films with ids "1234567890" and "AB540943XX".'
title = extract_suggestion_title(sentence)
ids = extract_suggestion_id(sentence)

if title and ids:
    print(f'Movie Title: {title}')
    print(f"Found ids: {ids}")
else:
    print('No matching pattern found.')

Movie Title: ['The Tarot Reader', 'The Sium Sium']
Found ids: ['1234567890', 'AB540943XX']


## Parsing Applied on Mistral/GPT Answers

In [27]:
print(MISTRAL_ANSWERS[5])

Based on the given information, it seems that the person enjoys a variety of genres, including action, adventure, comedy, and fantasy. They also seem to have a particular interest in movies and TV shows from the 20th century.

Given this information, I would suggest the book "A Dance with Dragons" by George R.R. Martin. This book is part of the popular "Game of Thrones" series, which is known for its complex characters, intricate plot, and epic battles. It is also a fantasy novel, which fits the person's interest in that genre. Additionally, the book has received high acclaim and has won numerous awards, making it a popular choice among readers.


In [11]:
i = 1
for answer in MISTRAL_ANSWERS:

    title = extract_suggestion_title(answer)
    ids = extract_suggestion_id(answer)

    if title:
        print(f"{i}° ================================================\nMovie Title: {title}")
    else:
        print(f"{i}° ================================================\nMovie Title: No Matches!")
        
    if ids:
        print(f"Found ids: {ids}")
    else:
        print(f"Found ids: No Matches!")

    i += 1

Movie Title: No Matches!
Found ids: ['0002247399']
Movie Title: ['The Tarot Reader', 'Coming Apart: Why Relationships End and How to Live Through the Ending of Yours', 'Confessions of a Shopaholic']
Found ids: No Matches!
Movie Title: No Matches!
Found ids: No Matches!
Movie Title: ['The Prophet', "The Queen's Fool"]
Found ids: No Matches!
Movie Title: ['The Hobbit: An Unexpected Journey']
Found ids: No Matches!
Movie Title: ['A Dance with Dragons']
Found ids: No Matches!


In [15]:
print(MISTRAL_ANSWERS[2])

Based on your interest in Monty Python and the Holy Grail and Sweeney Todd: The Demon Barber of Fleet Street, I would suggest the following books:

1. The Other Daughter: A Novel by Lisa Gardner - This book is a mystery thriller that follows a woman who is trying to uncover the truth about her family's past. It has elements of suspense and intrigue that could appeal to you.
2. The Survivors Club: A Thriller by Lisa Gardner - This book is another mystery thriller that follows a group of people who are connected to a tragic event. It has elements of suspense and intrigue that could appeal to you.
3. One Door Away From Heaven by Dean Koontz - This book is a mystery thriller that follows a woman who is trying to uncover the truth about a missing person case. It has elements of suspense and intrigue that could appeal to you.

All three of these books have elements of mystery, thriller, and suspense that could appeal to you based on your interest in Monty Python and Sweeney Todd.


# GPT ANSWERS

In [20]:
i = 1
for answer in GPT_ANSWERS:
     title = extract_suggestion_title(answer)
     ids = extract_suggestion_id(answer)

     if title:
        print(f"{i}° ================================================\nMovie Title: {title}")
     else:
          print(f"{i}° ================================================\nMovie Title: No Matches!")
          
     if ids:
          print(f"Found ids: {ids}")
     else:
          print(f"Found ids: No Matches!")
          
     i += 1

Movie Title: No Matches!
Found ids: ['1573306894', '6301650662', 'B00022LOTM']
Movie Title: No Matches!
Found ids: ['5558160063', '5559291986', 'B0000009S5']
Movie Title: No Matches!
Found ids: ['6301334175', 'B000002ROL', 'B000000SRO']
Movie Title: ['Glass Houses', 'Transformed Man']
Found ids: ['B00005Q3AN', 'B0000025I4', '5559291986']
Movie Title: No Matches!
Found ids: ['B0000024P5', '5558160063', '1573306894']
Movie Title: No Matches!
Found ids: ['B00006IUGM', 'B000002W3R', '1573306894']


In [17]:
print(GPT_ANSWERS[2])


To suggest 3 CDs from the given list, we can use a content-based filtering approach. We will consider the categories and brands of the CDs, as well as the user's preferences for books.

Step 1: Analyze the user's preferences for books
The user liked books in various categories such as Biographies & Memoirs, Historical, Arts & Photography, Performing Arts, Engineering & Transportation, and Children's Books. Based on this, we can assume that the user has a diverse range of interests.

Step 2: Identify relevant CDs based on categories and brands
We will consider the categories and brands of the CDs to find the most relevant ones for the user.

- CD 1: Pink Floyd In Concert: Delicate Sound Of Thunder VHS
    Categories: Rock, Progressive, Progressive Rock
    Brand: Pink Floyd

- CD 2: Transformed Man
    Categories: Comedy & Spoken Word, Spoken Word
    Brand: William Shatner

- CD 3: Special Bulletin TV-Movie VHS
    Categories: Special Interest, Instructional
    Brand: Ed Flanders

- 

# **We could utilize the suggestion's position in the list to indicate that the user should give greater consideration to the first element compared to the others.**