##### ❗Attention: Fill in the names of your team members

1st member's name: Nikolaos Katsaidonis
1st member's email: el21868@mail.ntua.gr

2nd member's name: Georgios Tzamouranis
2nd member's email: el21141@mail.ntua.gr


# Artificial Intelligence: Unit 2
---

The goal of the work is to build a recommendation system for movies. These recommendations will stem from both the characteristics of the movie and certain ratings of each user.

The exercise data includes a file named movies_metadata.csv which contains the characteristics of each movie such as theme, director, actors, keywords, etc. from imdb as well as the ratings.csv files which contain real user ratings, divided into train and test.


# Pronunciation
In this assignment you are asked to study and implement the following:<br>

### Part 1
In parts 1 and 2 you will work only with the movies_metadata.csv file, while in part three you will also work with the ratings files.
#### Question 1a

First, after studying the structure and characteristics of movies_metadata.csv, you should build a knowledge base for Prolog which will essentially constitute the world with which you will work later. The predicates that will be created will also help you in building the recommender and will be of the form:

```
director(Movie, Director).
genre(Movie, Genre).
```
#### Question 1b

After creating the problem world, you are then asked to create, in Prolog, simple rules that will find all movies with:
1. Common theme (some genre words in common)
2. Quite common theme (some fewer genre words in common, e.g. 3)
3. Relatively common theme (a few genre words in common, e.g. 1)
4. Common director
5. Exactly the same plot (some plot keywords in common)
6. Relatively the same plot (some fewer keywords in common)
7. Same main actors (all 3)
8. Quite the same main actors (some main actors in common, e.g. 2)
9. Relatively the same actors (e.g. 1 of 3)
10. Same language
11. Are they color or black and white
12. Same production studio
13. Same production country
14. Same decade

It is worth noting that in the above you can add more queries or change the scalability of the queries (beyond the same, pretty much the same, relatively the same) that you will construct (e.g. a scale from 1 to 5 where possible), since these will then be used for the recommender function. This way you can add queries with which better recommendations can be produced (the movie_metadata.csv file contains a lot more information for each movie such as release year, imdb rating, facebook_likes, etc.). This particular part of the work is preparatory, so the better and richer the queries you will construct in this part, the better the performance of the recommendation systems for the following queries.

More information about the dataset can be read at this [Link](https://www.kaggle.com/georgefila/movies-metadata).
### Part 2: Recommendation System
At this point, you are asked, based on what you did in part 1, to construct queries that will return similar (in features) movies. These queries will be scalable, meaning there will be queries that return quite common movies but also those that return less and less common ones (on a scale, e.g. from 1 to 5). For example:

```
find_simmilar_movies_5("Pirates Of The Caribbean", M).
M = "Pirates Of The Caribbean: On Stranger Tides"
M = "The Chronicles Of Narnia"
M = "Prince Of Persia: The Sands Of Time"
...
```
For example, the above query will return movies that are quite common in content with the movie "Pirates Of The Caribbean". There will also be corresponding queries that will find less similar movies. The similarity index of the movies is arbitrary and you can define it as you wish, as long as there is some logical connection to the data contained in the movies_metadata.csv file.

Therefore, the function that will make the recommendations with a movie as input must return-print a list of the recommended movies in descending order of similarity.


###Part 3: Recommendation System Based on Preferences - User Ratings

At this point you will work with the ratings files which contain ratings (from 1 to 5) for the above movies. The previous recommendation system suggests movies to the user exclusively based on their similarity. At this point, the system will be upgraded so that better recommendations are produced which will also take into account the user's preferences, which will be extracted from the ratings he has made so far.

The recommender will be trained as follows:

For each movie there will be a score which will initially be equal to 0 and will be formed from the ratings of each user. So for a user based on the ratings in the train_ratings file, we should:

1. For each movie that has been rated, the common movies per scale will be found and a weight will be added to the score of each similar movie so far, which could be the percentage of similarity of the movie (i.e. one weight for each scale, if two movies are similar we increase the weight we add since they are less similar) by the grade that the user has given for the original movie (differently, the user may have given 5/5 or 1/5 to a movie out of 3/5).

2. Then, depending on the score that has been formed for each movie, it will be selected whether it could be recommended for the user or not and we will measure how well our system did based on certain metrics.

The logic behind the above process is that similar movies will have a corresponding grade. For example, if a user has rated several sci-fi movies with a 5/5, then a sci-fi movie that he has not seen will logically be liked by him and we should recommend it.

After training your system, you are invited to test the recommender you built in practice. For this purpose, you will load the file test_ratings.csv containing the same user's ratings for other movies. Your system must predict whether a movie should be recommended to the user. We consider a movie to be recommended to the user if it has a score greater than 3 in test_ratings. Therefore, to monitor your system, you must predict for each of the movies in the test_ratings.csv file whether the user will like it or not in order to recommend it.

ATTENTION! The "score" that you will calculate for each movie during training is not necessarily a prediction of the score that the user would give.

Then, in combination with the actual user responses, you will evaluate your system using the metrics: precision, recall, f1, which are the most well-known metrics and widely used techniques for monitoring and measuring the performance of similar systems.

1. Precision: It shows how accurate the system is. It calculates how many of the snapshots that we predicted to belong to a class actually belong to it. This metric gives us an idea of ​​the number of movies that we predicted as recommended when they should not have.

2. Recall: It calculates how many of the snapshots that belong to a class (e.g. recommended movies) were predicted correctly.

3. F1: It is an average between the above two metrics, so that a balance is maintained between them. It is calculated from the following relationship:

$$F_1=2\frac{Precision\times{Recall}}{Precision+Recall}$$

The above functions are provided by the scikit-learn library.

Finally, for better supervision of your system, you can train your recommender with a subset of films from minimal, few to many (e.g. 3, 5, 10, 50, ...) to study how much the additional evaluations help you each time (i.e. how much the above metrics improve in the test set).

So for example, you can identify cases such as that with a recommender you may not achieve a very high score as with others, but your optimal score is achieved very quickly, e.g. with only 10 movies instead of 100. So for example if the algorithm does very well for three movies and then the score improves only slightly then it is suitable for a recommendation system for new users where given a few movies the algorithm is able to suggest good recommendations. Whereas if the best performance of your algorithm is optimal with more movies e.g. 50 then this recommendation algorithm is suitable for old users with many ratings.

---
## Working Environment Construction
---
### Reading Files in Colab (Only for Colab)
If the implementation is done in google colab then google drive can be used as a file system. To mount google drive, run the following code and click on the link that will appear.

```
from google.colab import drive
drive.mount('/content/drive')
import os
os.listdir('/gdrive/My Drive')
```
Then on the page that opens, select your email and in the next window that opens, click Allow. Then copy the code that will be given to you and paste it into the Input that has opened in colab. So now if we have uploaded a file to google drive we can find it in the location:

```
movies_filename = '/gdrive/My Drive/' + movies_metadata.csv
```
We can now work normally by creating folders or files and generally doing anything we would do if we were local.

### Prolog via Python

The package that will be used for Python and Prolog communication is pyswip (https://pypi.org/project/pyswip/). For Pyswip to work, Swi-Prolog needs to be available, which if we are working locally we must install, following the corresponding instructions on the tool's page. To install Swi-Prolog on Google Colab, we need to run the following code:

```
!sudo apt-get install software-properties-common
!sudo apt-add-repository ppa:swi-prolog/stable
!sudo apt-get update
!sudo apt-get install swi-prolog
```
At some point in the execution, a message appears that we must press enter on an input to continue the process. After this, the execution will continue without any problem.

Finally, we need to install pyswip (**wherever we work**) as follows:

```
!pip isntall pyswip
```

---
# Code to build a workspace

## Google Colab only

Code to Mount to Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

import os
path="/content/drive/MyDrive/ΧΡΗΣΙΜΑ_ΑΡΧΕΙΑ_1Η_ΕΡΓΑΣΤΗΡΙΑΚΗ_ΤΕΧΝΗΤΗ_ΝΟΗΜΟΣΥΝΗ/" # με ευκόλο τρόπο μπορείτε να αλλάξετε το path που είναι αποθηκευμένα τα αρχεία σας

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
%%capture
#install swi-prolog
!sudo apt-get install software-properties-common
!sudo apt-add-repository -y ppa:swi-prolog/devel
!sudo apt-get update
!sudo apt-get install swi-prolog
#install pyswip
!pip install 'git+https://github.com/yuce/pyswip@master#egg=pyswip'


# **Part 1: Study of Metadata, World Creation and Basic Queries.**


In [None]:
import pandas as pd
from pyswip import Prolog

In [None]:
# The pandas library is useful for working with such data
import pandas as pd
path = '/content/drive/MyDrive/ΧΡΗΣΙΜΑ_ΑΡΧΕΙΑ_1Η_ΕΡΓΑΣΤΗΡΙΑΚΗ_ΤΕΧΝΗΤΗ_ΝΟΗΜΟΣΥΝΗ/'
# Reading the file 'movie_metadata.csv'
data = pd.read_csv(path + "movies_metadata.csv") #data IS A TABLE THAT CONTAINS OUR DATA
#In the csv there are cells with nan values
#In these positions we put 'UNK' which we do with the following function
data.fillna("UNK", inplace=True)
# Preview the first 5 lines of the loaded data
data.head()

Unnamed: 0.1,Unnamed: 0,budget,genres,homepage,id,plot_keywords,language,original_title,overview,popularity,...,tagline,movie_title,vote_average,num_voted_users,title_year,country,director_name,actor_1_name,actor_2_name,actor_3_name
0,0,237000000,Action|Adventure|Fantasy|Science Fiction,http://www.avatarmovie.com/,19995,culture clash|future|space war|space colony|so...,English,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,...,Enter the World of Pandora.,Avatar,7.2,11800,2009.0,United States of America,James Cameron,Zoe Saldana,Sigourney Weaver,Stephen Lang
1,1,300000000,Adventure|Fantasy|Action,http://disney.go.com/disneypictures/pirates/,285,ocean|drug abuse|exotic island|east india trad...,English,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,...,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,2007.0,United States of America,Gore Verbinski,Orlando Bloom,Keira Knightley,Stellan Skarsgård
2,2,245000000,Action|Adventure|Crime,http://www.sonypictures.com/movies/spectre/,206647,spy|based on novel|secret agent|sequel|mi6|bri...,Français,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,...,A Plan No One Escapes,Spectre,6.3,4466,2015.0,United Kingdom,Sam Mendes,Christoph Waltz,Léa Seydoux,Ralph Fiennes
3,3,250000000,Action|Crime|Drama|Thriller,http://www.thedarkknightrises.com/,49026,dc comics|crime fighter|terrorist|secret ident...,English,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,...,The Legend Ends,The Dark Knight Rises,7.6,9106,2012.0,United States of America,Christopher Nolan,Michael Caine,Gary Oldman,Anne Hathaway
4,4,260000000,Action|Adventure|Science Fiction,http://movies.disney.com/john-carter,49529,based on novel|mars|medallion|space travel|pri...,English,John Carter,"John Carter is a war-weary, former military ca...",43.926995,...,"Lost in our world, found in another.",John Carter,6.1,2124,2012.0,United States of America,Andrew Stanton,Lynn Collins,Samantha Morton,Willem Dafoe


In [None]:
#FUNCTION THAT REPLACES SPECIAL CHARACTERS SUCH AS \xa0 AND ' WITH SPACE
def clean_text(text):
  text = text.replace(u'\xa0', u'')
  text = text.replace(u"'", u'')
  return text

In [None]:
#GENRE


#create World
#We define our world
prolog = Prolog()
#ENGINE prolog WILL CONTAIN THE CATEGORIES AND RULES OF THE MA WORLD

#For each row of the table we create the predicates we want to store
#initially in a list named literals
literals1 = []
movie_score = {}
for row in data.itertuples(index=True, name='Pandas'):
  movie_title = clean_text(getattr(row, 'movie_title')) #WE TAKE THE TITLE OF THE LINE row AND APPLY THE clean_text
#A MOVIE HAS MANY GENRES SO LOOP FOR THE GENRE CATEGORY
  for genre in getattr(row, 'genres').split("|"):#WE SEPARATE THE ITEMS BETWEEN THE |
    #literals1.append("genre('"+ movie_title +"','"+ genre +"')")
    literals1.append(f"genre('{movie_title}', '{genre}')")

#Prolog wants its predicates in order
literals1.sort() #ALPHABETICAL ORDER OF CATEGORIES
for literal in literals1:
  prolog.assertz(literal) #WE INTRODUCE CATEGORIES TO THE WORLD
  #print (literal +'.')

#We can also consult a ready-made file in the world as below
#prolog.consult(path + "db.pl")

In [None]:
#DIRECTOR

literals2 = []
for row in data.itertuples(index=True, name='Pandas'):
  movie_title = clean_text(getattr(row, 'movie_title'))
  director = getattr(row, 'director_name')
  director = director.replace("'", " ") #WE DID THIS BECAUSE OF DIRECTOR "Remo D'Souza" IN THE MOVIE "ABCD" AS WE HAD A PROBLEM WITH THE CHARACTER '
  #literals2.append("director('"+ movie_title +"','"+ director +"')")
  literals2.append(f"director('{movie_title}', '{director}')")

literals2.sort()
for literal in literals2:
  prolog.assertz(literal) #WE INTRODUCE CATEGORIES TO THE WORLD
  #print (literal +'.')

In [None]:
#PLOT KEYWORDS

literals3 = []
for row in data.itertuples(index=True, name='Pandas'):
  movie_title = clean_text(getattr(row, 'movie_title'))
  for plot_keyword in getattr(row, 'plot_keywords').split("|"):
    plot_keyword = plot_keyword.replace("'", '')
    literals3.append(f"plot_keyword('{movie_title}','{plot_keyword}')")

literals3.sort()
for literal in literals3:
  prolog.assertz(literal) #WE INTRODUCE CATEGORIES TO THE WORLD
  #print (literal +'.')





In [None]:
#ACTORS

literals4 = []
for row in data.itertuples(index=True, name='Pandas'):
  movie_title = clean_text(getattr(row, 'movie_title')) #WE GET THE TITLE OF THE LINE row AND APPLY THE clean_text
  actor1 = clean_text(getattr(row, 'actor_1_name'))
  actor2 = clean_text(getattr(row, 'actor_2_name'))
  actor3 = clean_text(getattr(row, 'actor_3_name'))

  literals4.append(f"actors('{movie_title}', '{actor1}')")
  literals4.append(f"actors('{movie_title}', '{actor2}')")
  literals4.append(f"actors('{movie_title}', '{actor3}')")

literals4.sort()
for literal in literals4:
  prolog.assertz(literal) #WE INTRODUCE CATEGORIES TO THE WORLD
  #print (literal +'.')



In [None]:
#LALNGUAGE

literals5 = []
for row in data.itertuples(index=True, name='Pandas'):
  movie_title = clean_text(getattr(row, 'movie_title'))
  language = getattr(row, 'language')
  literals5.append(f"language('{movie_title}', '{language}')")

literals5.sort()
for literal in literals5:
  prolog.assertz(literal) #WE INTRODUCE CATEGORIES TO THE WORLD
  #print (literal +'.')

In [None]:
#COLORED/BLACK AND WHITE
literals6 = []

for row in data.itertuples(index=True, name='Pandas'):
  movie_title = clean_text(getattr(row, 'movie_title'))
  date = getattr(row, 'release_date')
  date = date[:4] # WE HAVE DATES OF THE FORM 2005-01-28 SO WE ONLY KEEP THE YEAR WITH SLICING
  if date.isdigit():  #THERE WERE CELLS WITH UNK, WHICH CAUSES A PROBLEM IN CONVERSION TO INT
        int_date = int(date)
        if int_date <= 1950:
            literals6.append(f"white('{movie_title}', 'black-white')")


literals6.sort()
for literal in literals6:
  prolog.assertz(literal) #WE INTRODUCE CATEGORIES TO THE WORLD
  #print (literal +'.')




In [None]:
#PRODUCTION COMPANIES

import ast

literals7 = []
for row in data.itertuples(index=True, name='Pandas'):
  movie_title = clean_text(getattr(row, 'movie_title'))
  production_companies = ast.literal_eval(getattr(row, 'production_companies')) #CONVERT getattr(row, 'production_companies') TO A LIST FROM STRING THROUGH ast.literal_eval()
  for company_data in production_companies :
    company_name = company_data['name']
    company_name = company_name.replace("'", " ") #WE DID THIS BECAUSE OF THE CHARACTER ' AT STUDIO Donners' Company
    literals7.append(f"studio('{movie_title}', '{company_name}')")

literals7.sort()
for literal in literals7:
  prolog.assertz(literal) #WE INTRODUCE CATEGORIES TO THE WORLD
  #print (literal +'.')

In [None]:
#PRODUCTION COUNTRY

import ast

literals8 = []

for row in data.itertuples(index=True, name='Pandas'):
  movie_title = clean_text(getattr(row, 'movie_title'))
production_countries = ast.literal_eval(getattr(row, 'production_countries')) #CONVERT getattr(row, 'production_countries') TO LIST FROM STRING THROUGH ast.literal_eval()
for country_data in production_countries:
  country_name = country_data['name'] #WE GAVE US AN ERROR AS THE COLUMN production_countries IS OF THE FORMAT DICTIONARY IN A LIST [{'iso_3166_1': 'US', 'name': 'United States of America'}]
#SO WE DID THE APPROPRIATE HANDLING TO GETTING THE NAME OF THE DICTIONARY

literals8.append(f"country('{movie_title}', '{country_name}')")

literals8.sort()
for literal in literals8:
  prolog.assertz(literal) #INTRODUCING THE CATEGORIES
#print (literal +'.')

In [None]:
#DECADE

literals9 = []
for row in data.itertuples(index=True, name='Pandas'):
  movie_title = clean_text(getattr(row, 'movie_title'))
  date = getattr(row, 'release_date')
  decade = date[:3] # WE HAVE DATES OF THE FORM 2005-01-28 SO WE ONLY KEEP THE YEAR WITH SLICING
  literals9.append(f"decade('{movie_title}', '{decade}')")

literals9.sort()
for literal in literals9:
  prolog.assertz(literal) #WE INTRODUCE CATEGORIES TO THE WORLD
#print (literal +'.')




In [None]:
#UPLOAD THE FILE CONTAINING THE RULES IN PROLOG

prolog.consult("/content/drive/MyDrive/ΧΡΗΣΙΜΑ_ΑΡΧΕΙΑ_1Η_ΕΡΓΑΣΤΗΡΙΑΚΗ_ΤΕΧΝΗΤΗ_ΝΟΗΜΟΣΥΝΗ/predicates.pl")

In [None]:
#QUERY 1

q = prolog.query("common_genre_3('The Terminator', M)")

s = set()
for result in q:
  s.add(result['M'])
q.close()
sorted_list = sorted(s)
print(sorted_list)



['9', 'A Sound of Thunder', 'Alien', 'Alien Zone', 'Aliens', 'Aliens vs Predator: Requiem', 'Anacondas: The Hunt for the Blood Orchid', 'Armageddon', 'Babylon A.D.', 'Battleship', 'Blade: Trinity', 'Capricorn One', 'Carriers', 'Chain Reaction', 'Children of Men', 'Cloverfield', 'Congo', 'Dawn of the Planet of the Apes', 'Daybreakers', 'Death Race', 'Deep Blue Sea', 'District B13', 'Doomsday', 'Dragon Wars: D-War', 'Dragonball Evolution', 'Dylan Dog: Dead of Night', 'Echo Dr.', 'Elysium', 'Equilibrium', 'Escape from L.A.', 'Face/Off', 'Final Fantasy: The Spirits Within', 'Firefox', 'Fortress', 'G.I. Joe: Retaliation', 'G.I. Joe: The Rise of Cobra', 'Gamer', 'Green Lantern', 'Hav Plenty', 'Hollow Man', 'I Am Legend', 'I Am Number Four', 'Impostor', 'In Time', 'Inception', 'Jurassic Park III', 'Jurassic World', 'Knowing', 'Lake Placid', 'Left Behind', 'Lockout', 'Looper', 'Mad Max', 'Mad Max 2: The Road Warrior', 'Mad Max: Fury Road', 'Megiddo: The Omega Code 2', 'Meteor', 'Minority Repor

In [None]:
#QUERY 2
q = prolog.query("common_director('The Terminator', M)")

s = set()
for result in q:
  s.add(result['M'])
q.close()
sorted_list = sorted(s)
print(sorted_list)


['Aliens', 'Avatar', 'Terminator 2: Judgment Day', 'The Abyss', 'Titanic', 'True Lies']


In [None]:
#QUERY 3


q = prolog.query("common_plot_3('The Terminator', M)")

s = set()
for result in q:
  s.add(result['M'])
q.close()
sorted_list = sorted(s)
print(sorted_list)



['Blade Runner', 'Interstellar', 'Terminator 2: Judgment Day', 'Terminator 3: Rise of the Machines', 'Terminator Genisys', 'Terminator Salvation', 'The Matrix', 'The Matrix Reloaded', 'The Matrix Revolutions']


In [None]:
#QUERY 4

q = prolog.query("common_actors_1('The Terminator', M)")

s = set()
for result in q:
  s.add(result['M'])
q.close()
sorted_list = sorted(s)
print(sorted_list)

['Aliens', 'Damnation Alley', 'Dantes Peak', 'Jade', 'Megiddo: The Omega Code 2', 'Shadow Conspiracy', 'Terminator 2: Judgment Day', 'The Abyss', 'The Divide', 'Treachery']


In [None]:
#QUERY 5

q = prolog.query("common_language('The Terminator', M)")

s = set()
for result in q:
  s.add(result['M'])
q.close()
sorted_list = sorted(s)
print(sorted_list)

['#Horror', '(500) Days of Summer', '10 Cloverfield Lane', '10 Days in a Madhouse', '10 Things I Hate About You', '102 Dalmatians', '10th & Wolf', '11:14', '12 Angry Men', '12 Years a Slave', '127 Hours', '13 Going on 30', '1408', '16 Blocks', '16 to Life', '17 Again', '1776', '1941', '1982', '2 Fast 2 Furious', '20 Dates', '20 Feet from Stardom', '20,000 Leagues Under the Sea', '200 Cigarettes', '2001: A Space Odyssey', '2012', '2016: Obamas America', '21', '21 & Over', '21 Grams', '21 Jump Street', '22 Jump Street', '24 7: Twenty Four Seven', '25th Hour', '27 Dresses', '28 Days', '28 Weeks Later', '2:13', '3 Backyards', '3 Days to Kill', '3 Ninjas Kick Back', '3 Strikes', '30 Days of Night', '30 Minutes or Less', '30 Nights of Paranormal Activity With the Devil Inside the Girl With the Dragon Tattoo', '300', '3000 Miles to Graceland', '300: Rise of an Empire', '3:10 to Yuma', '40 Days and 40 Nights', '42nd Street', '47 Ronin', '5 Days of War', '50 First Dates', '50/50', '54', '55 Day

In [None]:
#QUERY 6

q = prolog.query("black_white(M)")

s = set()
for result in q:
  s.add(result['M'])
q.close()
sorted_list = sorted(s)
print(sorted_list)

[]


In [None]:
#QUERY 7

q = prolog.query("common_studio_1('The Terminator', M)")

s = set()
for result in q:
  s.add(result['M'])
q.close()
sorted_list = sorted(s)
print(sorted_list)

['8 Heads in a Duffel Bag', 'Bill & Teds Bogus Journey', 'Bill & Teds Excellent Adventure', 'Caddyshack', 'Dantes Peak', 'First Blood', 'Malone', 'Platoon', 'Radio Days', 'Raising Cain', 'Rivers Edge', 'RoboCop 3', 'Salvador', 'Sphinx', 'Switchback', 'Terminator 2: Judgment Day', 'The Abyss', 'The Addams Family', 'The Cotton Club', 'The Hotel New Hampshire', 'The Relic', 'The Return of the Living Dead', 'The Silence of the Lambs', 'UHF']


In [None]:
#QUERY 8

q = prolog.query("common_country('The Terminator', M)")

s = set()
for result in q:
  s.add(result['M'])
q.close()
sorted_list = sorted(s)
print(sorted_list)

['#Horror', '(500) Days of Summer', '10 Cloverfield Lane', '10 Days in a Madhouse', '10 Things I Hate About You', '102 Dalmatians', '10th & Wolf', '11:14', '12 Angry Men', '12 Rounds', '12 Years a Slave', '127 Hours', '13 Going on 30', '13 Hours: The Secret Soldiers of Benghazi', '1408', '15 Minutes', '16 Blocks', '17 Again', '1776', '1941', '1982', '2 Fast 2 Furious', '2 Guns', '20 Dates', '20 Feet from Stardom', '20,000 Leagues Under the Sea', '200 Cigarettes', '2001: A Space Odyssey', '2012', '2016: Obamas America', '21', '21 & Over', '21 Grams', '21 Jump Street', '22 Jump Street', '24 7: Twenty Four Seven', '25th Hour', '27 Dresses', '28 Days', '28 Days Later', '28 Weeks Later', '2:13', '3 Backyards', '3 Days to Kill', '3 Ninjas Kick Back', '3 Strikes', '30 Days of Night', '30 Minutes or Less', '30 Nights of Paranormal Activity With the Devil Inside the Girl With the Dragon Tattoo', '300', '3000 Miles to Graceland', '300: Rise of an Empire', '3:10 to Yuma', '40 Days and 40 Nights',

In [None]:
#QUERY 9

q = prolog.query("common_decade('The Terminator', M)")

s = set()
for result in q:
  s.add(result['M'])
q.close()
sorted_list = sorted(s)
print(sorted_list)

['A Christmas Story', 'A Nightmare on Elm Street', 'A Nightmare on Elm Street 3: Dream Warriors', 'A Nightmare on Elm Street 4: The Dream Master', 'A Nightmare on Elm Street 5: The Dream Child', 'A Nightmare on Elm Street Part 2: Freddys Revenge', 'A Passage to India', 'A Room with a View', 'A View to a Kill', 'Action Jackson', 'Airplane!', 'Akira', 'Aliens', 'Amadeus', 'American Ninja 2: The Confrontation', 'Anne of Green Gables', 'April Fools Day', 'Back to the Future', 'Back to the Future Part II', 'Batman', 'Beetlejuice', 'Beverly Hills Cop', 'Beverly Hills Cop II', 'Big', 'Big Trouble in Little China', 'Bill & Teds Excellent Adventure', 'Black Rain', 'Blade Runner', 'Bloodsport', 'Blow Out', 'Body Double', 'Born on the Fourth of July', 'Brazil', 'Bright Lights, Big City', 'Butterfly', 'C.H.U.D.', 'Caddyshack', 'Cant Stop the Music', 'Cat People', 'Chariots of Fire', 'Childs Play', 'Class of 1984', 'Coal Miners Daughter', 'Commando', 'Conan the Barbarian', 'Conan the Destroyer', 'C

In [None]:
# prolog.assertz('(directed_by(X,Y) :- findall(M,director(M,X),Y))')


# **Part 2: Recommendation System based only on movie characteristics.**

At this point, based on the rules constructed in Part 1, predicates for the similarity of films will be constructed. Below is a small example of a rule and how it could be written using Pyswip. Also, as mentioned in a comment above, a database with the rules can be written and consulted directly.

In the example below, 5 and 4 express the similarity of the movies, e.g. the movies produced through find_similar_5 are more similar than those produced through find_similar_4.


In [None]:
#RECOMMENDER_1

def recommender_1(movie):
  s = set()
  q = prolog.query(f"find_similar_1('{movie}', M)")

  for result in q:
    s.add(result['M'])

#if m not in s:
#s.add(soln['M'])
  q.close()
  movies = s
  return movies

In [None]:
#RECOMMENDER_2

def recommender_2(movie):
  s = set()
  q = prolog.query(f"find_similar_2('{movie}', M)")

  for result in q:
    s.add(result['M'])

#if m not in s:
#s.add(soln['M'])
  q.close()
  movies = s
  return movies

In [None]:
#RECOMMENDER_3

def recommender_3(movie):
  s = set()
  q = prolog.query(f"find_similar_3('{movie}', M)")

  for result in q:
    s.add(result['M'])

#if m not in s:
#s.add(soln['M'])
  q.close()
  movies = s
  return movies

In [None]:
#RECOMMENDER_4

def recommender_4(movie):
  s = set()
  q = prolog.query(f"find_similar_4('{movie}', M)")

  for result in q:
    s.add(result['M'])

#if m not in s:
#s.add(soln['M'])
  q.close()
  movies = s
  return movies

In [None]:
#RECOMMENDER_5

def recommender_5(movie):
  s = set()
  q = prolog.query(f"find_similar_5('{movie}', M)")

  for result in q:
    s.add(result['M'])

#if m not in s:
#s.add(soln['M'])
  q.close()
  movies = s
  return movies

In [None]:
recommendation_list_1 = list( recommender_1('The Terminator') )[:5]
print(recommendation_list_1)


['Sound of My Voice', 'The Pink Panther', 'Haywire', 'The Expendables 3', 'Rotor DR1']


In [None]:
recommendation_list_2 = list( recommender_2('The Terminator') )[:5]
print(recommendation_list_2)

['Dantes Peak', 'Jade', 'The Abyss', 'Shadow Conspiracy', 'Terminator 2: Judgment Day']


In [None]:
recommendation_list_3 = list( recommender_3('The Terminator') )[:5]
print(recommendation_list_3)

[]


In [None]:
recommendation_list_4 = list( recommender_4('The Terminator') )[:5]
print(recommendation_list_4)

['Terminator 2: Judgment Day']


In [None]:
recommendation_list_5 = list( recommender_5('Avatar') )[:5]
print(recommendation_list_5)

[]


# **Part 3: Recommendation System Based on Preferences-User Ratings-Training and Prediction**

First we study each user's ratings to understand the structure and information of each file.




In [None]:
'''
def simpleRecommender(movie):
    s = set()
    for i in range(1, 5):
        q = prolog.query("recommender_" + str(i) + "('" + movie +"', M)")
        for sol in q:
            m = sol['M']
            #print(m)
            s.add(sol['M'])
        q.close()
    return s
'''

In [None]:
from tqdm.notebook import tqdm
from sklearn.metrics import precision_score, recall_score, f1_score
import numpy as np
import random

rating_weights = {0: -1, 1: -0.5, 2:0, 3:0, 4:0.5, 5:1}
score_weights = {i:i + 1 for i in range(1)} # depending on the similarity levels set in simple_recommender

def train_recommender(ratings, rating_weights, score_weights, number_of_movies = 10):
    """
    In this function we can define which subset of the ratings we will use for training along with the similarity weights and scores
    In combination with the number of movies we want to use as a dataset e.g. 10 out of 100 or 3 out of 100 and so on
    If we want to use all the movies as a training set then we define number_of_movies = - 1
    """

    if number_of_movies > len(ratings):
        number_of_movies = len(ratings)


    if number_of_movies != -1:
        indexes = random.sample(range(len(ratings)), number_of_movies)
        ratings = ratings.iloc[indexes]

    movie_score = {}
    for row in tqdm(ratings.itertuples(index=True, name='Pandas')):
        movie = clean_text(getattr(row, 'movie_title'))
        rating = getattr(row, 'rating')

        similar_movies = recommender_1(movie)

        for similar_movie in similar_movies:
            if similar_movie not in movie_score:
                movie_score[similar_movie] = rating_weights[int(rating)] * score_weights[0]
            else:
                movie_score[similar_movie] += rating_weights[int(rating)] * score_weights[0] # you will set the weight based on the level of similarity, very similar movies will have a higher weight
    return movie_score


# This is an example of how predict could be implemented.
# We have defined that a movie should be recommended if it had a score > 0.
def predict_example(ratings, movie_score):
    real, pred = [], []
    for i, row in enumerate(ratings.itertuples(index=True, name='Pandas')):
        movie = clean_text(getattr(row, 'movie_title'))
        rating = getattr(row, 'rating')

        if movie in movie_score: #if we have formed a rating for this movie
            pred.append(int(movie_score[movie] > 0)) #heuristic for whether a movie is recommended
            real.append(int(rating > 3))# this is how we define that a movie should be recommended
            #this condition cannot be changed
        else: #we can't recommend something we haven't formed an image of
            pred.append(0)
            real.append(int(rating > 3))

    return real, pred


def get_metrics(real, pred):
    metrics = {}
    metrics["precision"] = precision_score(real, pred)
    metrics["recall"] = recall_score(real, pred)
    metrics["f1"] = f1_score(real, pred)
    return metrics

The above trains, tests and measures the performance of our attitude system. To train the system, we can use a random subset of the training set each time. However, it is possible that this subset of movies affects the results in the training set. For example, out of 3 movies out of 10, our results may be the same for an experiment and this may not be due to the fact that our classifier does well on the 3 movies but to the fact that the remaining 7 are such movies that do not help us at all and if we had chosen different 3 movies we would have done badly. So we suggest that you run a number of experiments each time for each subset of movies, e.g. 10 experiments with 3 movies, 10 experiments for 20 movies and so on. and keep the average of all the experiments as the final score. This way your English results will be more objective.

In [None]:
train_ratings = pd.read_csv(path + "train_ratings.csv")
test_ratings = pd.read_csv(path + "test_ratings.csv")

In [None]:
#10 MOVIES
print ("10 MOVIES:")
metrics = []
for i in range (10):
    movie_score = train_recommender(train_ratings, rating_weights, score_weights, 10)
    real, pred = predict_example(test_ratings, movie_score)
    metrics.append(get_metrics(real, pred))

for metric in metrics[0].keys():
    print (f"{metric}: {np.mean([m[metric] for m in metrics])}")



10 MOVIES:


0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

precision: 0.5299521449211391
recall: 0.736111111111111
f1: 0.584186597031319


In [None]:
#30 MOVIES
print("30 MOVIES:")
metrics = []
for i in range (10):
    movie_score = train_recommender(train_ratings, rating_weights, score_weights, 30)
    real, pred = predict_example(test_ratings, movie_score)
    metrics.append(get_metrics(real, pred))

for metric in metrics[0].keys():
    print (f"{metric}: {np.mean([m[metric] for m in metrics])}")

30 MOVIES:


0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

precision: 0.5146405584268248
recall: 0.9166666666666666
f1: 0.6540749862854532


In [None]:
#ALL MOVIES
print("ALL MOVIES")
metrics = []
for i in range (10):
    movie_score = train_recommender(train_ratings, rating_weights, score_weights, -1)
    real, pred = predict_example(test_ratings, movie_score)
    metrics.append(get_metrics(real, pred))

for metric in metrics[0].keys():
    print (f"{metric}: {np.mean([m[metric] for m in metrics])}")

ALL MOVIES


0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

precision: 0.5106382978723404
recall: 1.0
f1: 0.676056338028169
