### RECOMMENDATION BASED ON SIMILAR USERS

Implementazione di https://medium.com/eni-digitalks/a-simple-recommender-system-using-pagerank-4a63071c8cbf

Da: https://grouplens.org/datasets/movielens/ scaricare [ml-latest-small.zip](https://files.grouplens.org/datasets/movielens/ml-latest-small.zip)

## recommended for education and development

### MovieLens Latest Datasets

These datasets will change over time, and are not appropriate for reporting research results. We will keep the download links stable for automated downloads. We will not archive or make available previously released versions.

_Small_: 100,000 ratings and 3,600 tag applications applied to 9,000 movies by 600 users. Last updated 9/2018.

    README.html
    ml-latest-small.zip (size: 1 MB)

_Full_: approximately 33,000,000 ratings and 2,000,000 tag applications applied to 86,000 movies by 330,975 users. Includes tag genome data with 14 million relevance scores across 1,100 tags. Last updated 9/2018.

    README.html
    ml-latest.zip (size: 335 MB)

Permalink: https://grouplens.org/datasets/movielens/latest/

In [2]:
import pandas as pd

# Open ratings.csv file
ratings = pd.read_csv("ratings.csv")

In [3]:
# Open movies.csv file
movies = pd.read_csv("movies.csv")

In [4]:
# Merge ratings and movies
user_movie_matrix = pd.merge(ratings, movies, on="movieId")
print(user_movie_matrix.head())

   userId  movieId  rating  timestamp                        title  \
0       1        1     4.0  964982703             Toy Story (1995)   
1       1        3     4.0  964981247      Grumpier Old Men (1995)   
2       1        6     4.0  964982224                  Heat (1995)   
3       1       47     5.0  964983815  Seven (a.k.a. Se7en) (1995)   
4       1       50     5.0  964982931   Usual Suspects, The (1995)   

                                        genres  
0  Adventure|Animation|Children|Comedy|Fantasy  
1                               Comedy|Romance  
2                        Action|Crime|Thriller  
3                             Mystery|Thriller  
4                       Crime|Mystery|Thriller  


In [None]:
# # Check for common movies rated by multiple users
# common_movies = user_movie_matrix.groupby("title").size().reset_index(name='count')
# print(common_movies[common_movies['count'] > 1].head(10))  # Print movies rated by more than one user

In [5]:
# Map rating to scores

mapping_score = {
    0.5:-1,
    1:-1,
    1.5:-0.5,
    2:0,
    2.5:0,
    3:0,
    3.5:0.5,
    4:1,
    4.5:1.1,
    5:1.2
}

Basandoci sulle ratings presenti nel dataset vado a proiettare il grafo movie_movie che mi consente di trovare la similarità tra i film. Il vettore di mappaggio è scelto arbitrariamente e serve a decidere quanto pesano le varie ratings.

In [6]:
import networkx as nx

# Create a directed graph
user_movie_graph = nx.Graph()

# Add nodes and edges
for _, row in user_movie_matrix.iterrows():
    user_movie_graph.add_node(row["userId"], bipartite=0)
    user_movie_graph.add_node(row["title"], bipartite=1, genre=row["genres"], movieId=row["movieId"])
    # user_movie_graph.add_edge(row["userId"], row["title"], weight=row["rating"])
    user_movie_graph.add_edge(row["userId"], row["title"], weight=mapping_score[row["rating"]])

# Debug print to check the graph construction
print(f"Nodes in the graph: {list(user_movie_graph.nodes(data=True))[:10]}")
print(f"Edges in the graph: {list(user_movie_graph.edges(data=True))[:10]}")

Nodes in the graph: [(1, {'bipartite': 0}), ('Toy Story (1995)', {'bipartite': 1, 'genre': 'Adventure|Animation|Children|Comedy|Fantasy', 'movieId': 1}), ('Grumpier Old Men (1995)', {'bipartite': 1, 'genre': 'Comedy|Romance', 'movieId': 3}), ('Heat (1995)', {'bipartite': 1, 'genre': 'Action|Crime|Thriller', 'movieId': 6}), ('Seven (a.k.a. Se7en) (1995)', {'bipartite': 1, 'genre': 'Mystery|Thriller', 'movieId': 47}), ('Usual Suspects, The (1995)', {'bipartite': 1, 'genre': 'Crime|Mystery|Thriller', 'movieId': 50}), ('From Dusk Till Dawn (1996)', {'bipartite': 1, 'genre': 'Action|Comedy|Horror|Thriller', 'movieId': 70}), ('Bottle Rocket (1996)', {'bipartite': 1, 'genre': 'Adventure|Comedy|Crime|Romance', 'movieId': 101}), ('Braveheart (1995)', {'bipartite': 1, 'genre': 'Action|Drama|War', 'movieId': 110}), ('Rob Roy (1995)', {'bipartite': 1, 'genre': 'Action|Drama|Romance|War', 'movieId': 151})]
Edges in the graph: [(1, 'Toy Story (1995)', {'weight': 1}), (1, 'Grumpier Old Men (1995)', {

In [7]:
users = {n for n, d in user_movie_graph.nodes(data=True) if d["bipartite"] == 0}
print(f"Users: {list(users)[:10]}")
print(f"Number of users: {len(users)}")

Users: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Number of users: 610


In [8]:
movies = {n for n, d in user_movie_graph.nodes(data=True) if d["bipartite"] == 1}
print(f"Movies: {list(movies)[:10]}")
print(f"Number of movies: {len(movies)}")

Movies: ["Porky's (1982)", 'Desperately Seeking Susan (1985)', "Things to Do in Denver When You're Dead (1995)", 'Kalifornia (1993)', 'Kin-Dza-Dza! (1986)', 'Shot in the Dark, A (1964)', 'Jesus Christ Vampire Hunter (2001)', 'Non-Stop (2014)', 'Crimson Rivers, The (Rivières pourpres, Les) (2000)', 'Night of the Shooting Stars (Notte di San Lorenzo, La) (1982)']
Number of movies: 9719


In [9]:
print(nx.is_bipartite(user_movie_graph))
print(nx.is_connected(user_movie_graph))

True
True


In [12]:
#Project the graph using weights
user_user_graph = nx.bipartite.weighted_projected_graph(user_movie_graph, users)
print(f"Nodes in the user-user graph: {list(user_user_graph.nodes(data=True))[:10]}")
print(f"Edges in the user-user graph: {list(user_user_graph.edges(data=True))[:10]}")
print(len(user_user_graph.nodes()))
print(len(user_user_graph.edges()))

Nodes in the user-user graph: [(1, {'bipartite': 0}), (2, {'bipartite': 0}), (3, {'bipartite': 0}), (4, {'bipartite': 0}), (5, {'bipartite': 0}), (6, {'bipartite': 0}), (7, {'bipartite': 0}), (8, {'bipartite': 0}), (9, {'bipartite': 0}), (10, {'bipartite': 0})]
Edges in the user-user graph: [(1, 2, {'weight': 2}), (1, 3, {'weight': 7}), (1, 4, {'weight': 45}), (1, 5, {'weight': 13}), (1, 6, {'weight': 33}), (1, 7, {'weight': 26}), (1, 8, {'weight': 15}), (1, 9, {'weight': 5}), (1, 10, {'weight': 6}), (1, 11, {'weight': 16})]
610
164054


In [10]:
# Project the graph using weights
movie_movie_graph = nx.bipartite.weighted_projected_graph(user_movie_graph, movies)
print(f"Nodes in movie_movie_graph: {list(movie_movie_graph.nodes(data=True))[:10]}")
print(f"Edges in movie_movie_graph: {list(movie_movie_graph.edges(data=True))[:10]}")

Nodes in movie_movie_graph: [("Porky's (1982)", {'bipartite': 1, 'genre': 'Comedy', 'movieId': 3688}), ('Desperately Seeking Susan (1985)', {'bipartite': 1, 'genre': 'Comedy|Drama|Romance', 'movieId': 2369}), ("Things to Do in Denver When You're Dead (1995)", {'bipartite': 1, 'genre': 'Crime|Drama|Romance', 'movieId': 81}), ('Kalifornia (1993)', {'bipartite': 1, 'genre': 'Drama|Thriller', 'movieId': 481}), ('Kin-Dza-Dza! (1986)', {'bipartite': 1, 'genre': 'Comedy|Drama|Sci-Fi', 'movieId': 36363}), ('Shot in the Dark, A (1964)', {'bipartite': 1, 'genre': 'Comedy|Crime|Mystery', 'movieId': 7073}), ('Jesus Christ Vampire Hunter (2001)', {'bipartite': 1, 'genre': 'Action|Comedy|Horror|Musical', 'movieId': 27595}), ('Non-Stop (2014)', {'bipartite': 1, 'genre': 'Action|Mystery|Thriller', 'movieId': 109578}), ('Crimson Rivers, The (Rivières pourpres, Les) (2000)', {'bipartite': 1, 'genre': 'Crime|Drama|Mystery|Thriller', 'movieId': 4383}), ('Night of the Shooting Stars (Notte di San Lorenzo, 

In [13]:
print(nx.is_connected(movie_movie_graph))

True


PageRank misura l'_importanza_ di una pagina web sulla base del numero di link entranti, mentre i link uscenti _distribuiscono_ quella stessa importanza alle pagine raggiunte.

Se il grafo è rappresentato tramite _matrice di adiacenza_, $L_{ij} = 1$ se esiste un link che permette di passare dalla pagina _j_ alla pagina _i_, $j \rightarrow i$, altrimenti $L_{ij} = 0$.
Se $m_{j} = \sum_{k=1}^{n} L_{kj}$ è il numero di pagine che vengono linkate da _j_, un possibile valore di PageRank, _BrokenRank_, per la pagina _i_ è:

$p_{i} = \sum_{j \rightarrow i} \frac{p_{j}}{m_{j}} = \sum_{j=1}^{n} \frac{L_{ij}}{m_{j}}p_{j}$

In notazione matriciale:

$p=\begin{bmatrix}
    p_{1} \\
    p_{2} \\
    \vdots \\
    p_{n} \\
\end{bmatrix},
L=\begin{bmatrix}
    L_{11} && L_{12} && \dots && L_{1n} \\
    L_{21} && L_{22} && \dots && L_{2n} \\
    \vdots && \vdots && \ddots && \vdots \\
    L_{n1} && L_{n2} && \dots && L_{nn} \\
\end{bmatrix},
M=\begin{bmatrix}
    m_{1} && 0 && \dots && 0 \\
    0 && m_{2} && \dots && 0 \\
    \vdots && \vdots && \ddots && \vdots \\
    0 && 0 && \dots && m_{n} \\
\end{bmatrix} \implies p = LM^{-1}p = Ap$

Perciò _p_ è un autovettore della matrice _A_ con autovalore 1. Sfruttando algoritmi noti è possibile trovare gli autovalori della matrice  _A_ e nel caso in cui questa sia __sparsa__ si ottengono anche prestazioni migliori, dato l'alto numero di valori nulli che si possono ignorare.

Utilizzando le catene di Markov, posso rappresentare il processo di navigazione di un utente. Ho _n_ stati che definiscono una matrice di transizione di stato _P_ di dimensione $n \times n$ dove $P_{ij} è la probabilità di andare dalla pagina _i_ alla pagina _j_.
Se $p^{(0)}$ è il vettore con le probabilità iniziali, si vede come $p^{(1)} = P^{T}p^{(0)}$ è il vettore con tutte le probabilità di trovarsi in un certo stato _i_ dopo un singolo step.

Sia $A^{T}$ la nostra matrice di transizione, con $(A^{T})_{ij} = L_{ij}/m_{i}$. La nostra catena di Markov diventa:

$P_{ij} = \begin{cases}
    \frac{1}{m_{i}} & \text{if } i \rightarrow j \\
    0 & \text{altrimenti}
\end{cases}$

Una _distribuzione stazionaria_ per la catena di Markov è un vettore di probabilità _p_ con $p = Ap$, ovvero dopo uno step la distribuzione rimane invariata.
Se la catena di Markov è _connessa_, ovvero ogni stato è raggiungibile da ogni altro, la distribuzione _p_ esiste ed è _unica_.

Il problema è che non possiamo soddisfare queste ipotesi con il nostro grafo in quanto:
- Non tutti gli utenti hanno visto tutti i film
- Anche nei grafi di proiezione non è detto che tra ogni coppia di film esista un valore di similarità definito.

Questo porterebbe a delle dei vettori _p_ ambigui.

Per questo PageRank è definito modificando BrokenRank:

$p_{i} = \frac{1-d}{n} + d(\sum_{j=1}^{n} \frac{L_{ij}}{m_{j}}p_{j})$, con $ 0 < d < 1$, che in notazione matriciale diventa:
$p = (\frac{1-d}{n}E + dLM^{-1})p$, s.t. $\sum_{i=1}^{n} p_{i} = 1$, con _E_ matrice $n \times n$ di soli 1

Sia $ A = \frac{1-d}{n}E + dLM^{-1}$ e si consideri la catena di Markov con matrice di transizione $A^{T}$ t.c. $(A^{T})_{ij} = (1 - d)/n + d L_{ij}/m_{i}$. La catena può essere descritta come:

$P_{ij} = \begin{cases}
    (1 - d)/n + d / m_{i} & \text{if } i \rightarrow j \\
    (1 - d)/n & \text{altrimenti}
\end{cases}$, dove $(1 -d)/n$ è la probabilità di saltare verso una pagina non linkata. Questo risolve i problemi di connettività ed è ancora applicabile al nostro caso, in quanto è accettabile pensare che un utente, solito a vedere una certa categoria di film, provi a vedere un film totalmente al di fuori dei suoi standard.

Sfruttando le proprietà della catena di markov definita inoltre, è possibile evitare i classici algoritmi di eigendecomposition di complessità $O(n^{3})$ utilizzando una qualsiasi distribuzione iniziale $p^{(0)}$ e calcolando
- $p^{(1)} = Ap^{(0)}$
- $p^{(2)} = Ap^{(1)}$
- $\vdots$
- $p^{(t)} = Ap^{(t-1)}$

Quando $t \rightarrow \inf$, $p^{(t)} \rightarrow p$, perciò con un numero di iterazioni sufficientemente grande _t_ si ottiene una complessità $O(tn^{2}) = O(n^{2})$ per n grandi.

L'ultimo cambiamento introdotto è una generalizzazione della matrice A:

$A^{T} = d(L^{T}D_{L}^{-1} + kg_{L}^{T}) + (1-d)ke^{T}$, dove L è la matrice di adiacenza, $D_{L}$ è la matrice diagonale contenente il grado uscente di ogni nodo, mentre $g_{L}^{T}$ contiene gli indici delle pagine che non hanno link uscenti.

_k_ è il vettore di __personalizzazione__. Se tende ad una distribuzione uniforme fornisce un risultato simile al PageRank classico, mentre più il vettore è polarizzato, più l'algoritmo è portato a teleportarsi verso i nodi specificati.

Nel nostro caso il vettore di personalizzazione dovrà tenere conto dei film visti dall'utente per dare più importanza a quel vicinato di film e polarizzare PageRank nel dare punteggi più alti a film simili a quelli già visti.

In [14]:
# 0: User, 1: Movie
def filter_nodes(graph: nx.Graph, node_type: int):
    return [n for n, d in graph.nodes(data=True) if d["bipartite"] == node_type]

def create_preference_vector(debug: bool, user_id: int, user_movie_graph: nx.Graph):
    # Get the edges for the user
    edges = {m: v for _, m, v in user_movie_graph.edges(user_id, data="weight")}

    if debug:
        print(f"Edges for user {user_id}: {list(edges)[:10]}")
        print(f"Number of edges for user {user_id}: {len(edges)}")

        for k, v in edges.items():
            print(k,v)

    # Sum of the ratings done by the user
    tot = sum(edges.values())

    if debug:
        print(f"Total for user {user_id}: {tot}")

    if tot > 0:
        print(f"User {user_id} has rated movies")
        return len(edges), {
            # Assign to each movie a normalized weight. The higher the rating, the higher the weight.
            # Se provo a prendere un film che non sta in edges, mi ritorna 0
            movie: edges.get(movie, 0) / tot
            for movie in filter_nodes(user_movie_graph, 1) # 1 : Movie
        }
    else:
        # User has not rated any movies or the sum of all weighted ratings is zero / negative
        print(f"User {user_id} has not rated any movies or the sum of all weighted ratings is zero / negative. All movies will have a weight of 1")
        return len(edges), {
            movie: 1 for movie in filter_nodes(user_movie_graph, 1) # Penso che dovremmo metterci zero come peso in questo caso
        }

Il vettore di personalizzazione assegna 0 a tutti i film non visti, mentre assegna valori normalizzati in base alla rating a tutti gli altri archi. Per come è stato costruito il vettore dei pesi delle ratings, al momento voti come 2, 2.5 e 3 vengono considerati allo stesso modo di film non visti

In [None]:
# # Test the function
# debug = True
# num, p_vec = create_preference_vector(debug, 1, user_movie_graph)

Edges for user 1: ['Toy Story (1995)', 'Grumpier Old Men (1995)', 'Heat (1995)', 'Seven (a.k.a. Se7en) (1995)', 'Usual Suspects, The (1995)', 'From Dusk Till Dawn (1996)', 'Bottle Rocket (1996)', 'Braveheart (1995)', 'Rob Roy (1995)', 'Canadian Bacon (1995)']
Number of edges for user 1: 232
Toy Story (1995) 1
Grumpier Old Men (1995) 1
Heat (1995) 1
Seven (a.k.a. Se7en) (1995) 1.2
Usual Suspects, The (1995) 1.2
From Dusk Till Dawn (1996) 0
Bottle Rocket (1996) 1.2
Braveheart (1995) 1
Rob Roy (1995) 1.2
Canadian Bacon (1995) 1.2
Desperado (1995) 1.2
Billy Madison (1995) 1.2
Clerks (1994) 0
Dumb & Dumber (Dumb and Dumber) (1994) 1.2
Ed Wood (1994) 1
Star Wars: Episode IV - A New Hope (1977) 1.2
Pulp Fiction (1994) 0
Stargate (1994) 0
Tommy Boy (1995) 1.2
Clear and Present Danger (1994) 1
Forrest Gump (1994) 1
Jungle Book, The (1994) 1.2
Mask, The (1994) 1
Blown Away (1994) 0
Dazed and Confused (1993) 1
Fugitive, The (1993) 1.2
Jurassic Park (1993) 1
Mrs. Doubtfire (1993) 0
Schindler's Lis

In [None]:
# print(f"Number of movies rated by the user: {num}")
# weight_dict = {}
# for k, v in p_vec.items():
#     if v in weight_dict:
#         weight_dict[v] += 1
#     else:
#         weight_dict[v] = 1

# print(weight_dict)

# total = 0
# for k, v in weight_dict.items():
#     if k != 0.0:
#         total += v

# print("Number of values different from zero", total)
# if total != num:
#     print("Alcuni dei film visti dall'utente hanno peso 0")

Number of movies rated by the user: 232
{0.004468275245755139: 76, 0.005361930294906166: 124, 0.0: 9518, -0.004468275245755139: 1}
Number of values different from zero 201
Alcuni dei film visti dall'utente hanno peso 0


In [None]:
# # print(p_vec)
# # print(type(p_vec))

# # for k, v in p_vec.items():
# #     print(k, v)

# print('Valori negativi')
# for k, v in p_vec.items():
#     if v < 0.0:
#         print(k, v)

# print('Valori maggiori di 0')
# for k, v in p_vec.items():
#     if v > 0.0:
#         print(k, v)

PAGE RANK

In [None]:
###VECCHIA PREDICT USER E PAGE RANK

# def predict_user(user_id, user_movie_graph: nx.Graph, movie_movie_graph: nx.Graph):
#     _, p_vec = create_preference_vector(False, user_id, user_movie_graph)
#     print(f"Preference vector for user {user_id}: {list(p_vec)[:10]}")

#     already_seen = [movie for movie, p in p_vec.items() if p > 0]
#     # Qua uso p > 0 perchè sopra assegno 0 ad ogni film non visto e valori diversi da 0 per quelli visti

#     print(f"Already seen movies for user {user_id}: {list(already_seen)[:10]}")

#     if len(already_seen) < 1:
#         return []
#     item_rank = nx.pagerank(movie_movie_graph, personalization=p_vec, alpha=0.95, weight="weight")
#     print(f"Item rank for user {user_id}: {list(item_rank)[:10]}")
#     s_t = [
#         x for x in sorted(
#             movie_movie_graph.nodes(), key=lambda x: item_rank[x] if x in item_rank else 0, reverse=True
#             )
#         if x not in already_seen
#         ]

#     return s_t

In [30]:
# vorrei fare page rank per trovare sul grafo user_user i 10 utenti più simili all'utente a cui farò la raccomandazione
def find_similar_users_with_graph(user_id, user_user_graph, top_n=10):
    # Calcola il PageRank personalizzato
    pagerank_scores = nx.pagerank(user_user_graph, alpha=0.85, weight="weight")

    # Ordina gli utenti in base al punteggio di PageRank, escludendo l'utente stesso
    similar_users = sorted(
        [(u, score) for u, score in pagerank_scores.items() if u != user_id],
        key=lambda x: x[1],
        reverse=True
    )[:top_n]

    # Restituisci solo gli ID degli utenti
    return [user for user, _ in similar_users]


In [31]:
#POSSIBILE NUOVA PAGE RANK E PREDICT che sfrutta gli utenti simili al mio per la raccomandazione
def predict_user_with_similars(user_id, user_movie_graph, movie_movie_graph, similar_users, already_seen):

    # Costruisci il preference vector per l'utente target usando solo i film degli utenti simili
    _, p_vec = create_preference_vector(False, user_id, user_movie_graph)

    # Se non ci sono film da analizzare, ritorna una lista vuota
    if len(p_vec) < 1 or len(already_seen) < 1:
        return []

    # Calcola il ranking dei film usando PageRank
    item_rank = nx.pagerank(movie_movie_graph, personalization= p_vec, alpha=0.95, weight="weight")

    # Ordina i film per punteggio e rimuovi quelli già visti
    recommended_movies = [
        x for x in sorted(
            movie_movie_graph.nodes(),
            key=lambda x: item_rank[x] if x in item_rank else 0,
            reverse=True
        )
        if x not in already_seen
    ]

    return recommended_movies


Cercare i 10 utenti più simili al mio

In [38]:
# Test the prediction
user = 599
similar_users = find_similar_users_with_graph(user, user_user_graph)

# Costruisci il preference vector per l'utente target usando solo i film degli utenti simili
_, p_vec = create_preference_vector(False, user, user_movie_graph)

already_seen = [movie for _ , movie, v in user_movie_graph.edges(user, data = "weight")]
print(f"Similar users to user {user}: {similar_users}")
print(f"Already seen movies for user {user}: {already_seen}")

User 599 has rated movies
Similar users to user 599: [414, 68, 474, 274, 448, 380, 480, 288, 608, 590]
Already seen movies for user 599: ['Toy Story (1995)', 'Jumanji (1995)', 'Grumpier Old Men (1995)', 'Heat (1995)', 'Sabrina (1995)', 'Sudden Death (1995)', 'GoldenEye (1995)', 'American President, The (1995)', 'Dracula: Dead and Loving It (1995)', 'Cutthroat Island (1995)', 'Casino (1995)', 'Sense and Sensibility (1995)', 'Four Rooms (1995)', 'Ace Ventura: When Nature Calls (1995)', 'Money Train (1995)', 'Get Shorty (1995)', 'Assassins (1995)', 'Powder (1995)', 'Othello (1995)', 'City of Lost Children, The (Cité des enfants perdus, La) (1995)', 'Dangerous Minds (1995)', 'Twelve Monkeys (a.k.a. 12 Monkeys) (1995)', 'Clueless (1995)', 'Richard III (1995)', 'Dead Presidents (1995)', 'Restoration (1995)', 'Mortal Kombat (1995)', 'To Die For (1995)', 'Seven (a.k.a. Se7en) (1995)', 'Usual Suspects, The (1995)', 'Mighty Aphrodite (1995)', 'Home for the Holidays (1995)', 'Indian in the Cupboa

In [41]:
recommendation = []
for u in similar_users:
  s_t = predict_user_with_similars(u, user_movie_graph, movie_movie_graph, similar_users, already_seen)
  print(f"Predicted movies for user {u}: {s_t[:10]}")
  for m in s_t[:10]:
    if m not in recommendation:
      recommendation.append(m)

print(f"Recommended movies for user {user}: {recommendation[:10]}")
print(f"The number of movies recommended is: {len(recommendation)}")



User 414 has rated movies
Predicted movies for user 414: ["Schindler's List (1993)", 'Finding Nemo (2003)', "One Flew Over the Cuckoo's Nest (1975)", 'Toy Story 2 (1999)', 'Dances with Wolves (1990)', 'Shrek 2 (2004)', 'Babe (1995)', 'Ice Age (2002)', 'Bruce Almighty (2003)', 'Mission: Impossible II (2000)']
User 68 has rated movies
Predicted movies for user 68: ["Schindler's List (1993)", 'Finding Nemo (2003)', 'Dances with Wolves (1990)', 'Harry Potter and the Prisoner of Azkaban (2004)', "One Flew Over the Cuckoo's Nest (1975)", 'Harry Potter and the Goblet of Fire (2005)', 'Shrek 2 (2004)', 'School of Rock (2003)', 'Mission: Impossible II (2000)', 'Toy Story 2 (1999)']
User 474 has rated movies
Predicted movies for user 474: ["Schindler's List (1993)", 'Finding Nemo (2003)', "One Flew Over the Cuckoo's Nest (1975)", 'Toy Story 2 (1999)', 'Dances with Wolves (1990)', 'Shrek 2 (2004)', 'Babe (1995)', 'Harry Potter and the Prisoner of Azkaban (2004)', 'Harry Potter and the Goblet of F

User 414 has rated movies
Predicted movies for user 414: ['Silence of the Lambs, The (1991)', 'Toy Story (1995)', 'Men in Black (a.k.a. MIB) (1997)', 'Lord of the Rings: The Two Towers, The (2002)', 'Lord of the Rings: The Return of the King, The (2003)', 'Star Wars: Episode I - The Phantom Menace (1999)', 'Shrek (2001)', 'Pirates of the Caribbean: The Curse of the Black Pearl (2003)', 'Twelve Monkeys (a.k.a. 12 Monkeys) (1995)', 'Speed (1994)']


User 68 has rated movies
Predicted movies for user 68: ['Silence of the Lambs, The (1991)', 'Men in Black (a.k.a. MIB) (1997)', 'Lord of the Rings: The Return of the King, The (2003)', 'Lord of the Rings: The Two Towers, The (2002)', 'Shrek (2001)', 'Pirates of the Caribbean: The Curse of the Black Pearl (2003)', 'Toy Story (1995)', 'Speed (1994)', 'Fugitive, The (1993)', 'Mission: Impossible (1996)']


User 474 has rated movies
Predicted movies for user 474: ['Silence of the Lambs, The (1991)', 'Toy Story (1995)', 'Men in Black (a.k.a. MIB) (1997)', 'Lord of the Rings: The Two Towers, The (2002)', 'Lord of the Rings: The Return of the King, The (2003)', 'Star Wars: Episode I - The Phantom Menace (1999)', 'Shrek (2001)', 'Twelve Monkeys (a.k.a. 12 Monkeys) (1995)', 'Pirates of the Caribbean: The Curse of the Black Pearl (2003)', "Schindler's List (1993)"]


User 274 has rated movies
Predicted movies for user 274: ['Silence of the Lambs, The (1991)', 'Toy Story (1995)', 'Pirates of the Caribbean: The Curse of the Black Pearl (2003)', 'Lord of the Rings: The Two Towers, The (2002)', 'Twelve Monkeys (a.k.a. 12 Monkeys) (1995)', 'Men in Black (a.k.a. MIB) (1997)', 'Shrek (2001)', 'Mask, The (1994)', "Schindler's List (1993)", 'Lord of the Rings: The Return of the King, The (2003)']


User 448 has rated movies
Predicted movies for user 448: ['Silence of the Lambs, The (1991)', 'Toy Story (1995)', 'Men in Black (a.k.a. MIB) (1997)', 'Lord of the Rings: The Return of the King, The (2003)', 'Lord of the Rings: The Two Towers, The (2002)', 'Speed (1994)', 'Fugitive, The (1993)', 'Pirates of the Caribbean: The Curse of the Black Pearl (2003)', 'Monsters, Inc. (2001)', 'Mrs. Doubtfire (1993)']

User 380 has rated movies
Predicted movies for user 380: ['Silence of the Lambs, The (1991)', 'Toy Story (1995)', 'Men in Black (a.k.a. MIB) (1997)', 'Lord of the Rings: The Two Towers, The (2002)', 'Lord of the Rings: The Return of the King, The (2003)', 'Shrek (2001)', 'Pirates of the Caribbean: The Curse of the Black Pearl (2003)', 'Star Wars: Episode I - The Phantom Menace (1999)', 'Twelve Monkeys (a.k.a. 12 Monkeys) (1995)', 'Speed (1994)']

User 480 has rated movies
Predicted movies for user 480: ['Silence of the Lambs, The (1991)', 'Lord of the Rings: The Two Towers, The (2002)', 'Lord of the Rings: The Return of the King, The (2003)', 'Men in Black (a.k.a. MIB) (1997)', 'Twelve Monkeys (a.k.a. 12 Monkeys) (1995)', "Schindler's List (1993)", 'Toy Story (1995)', 'Shrek (2001)', 'Mask, The (1994)', 'Aladdin (1992)']

User 288 has rated movies
Predicted movies for user 288: ['Silence of the Lambs, The (1991)', 'Toy Story (1995)', 'Men in Black (a.k.a. MIB) (1997)', 'Lord of the Rings: The Return of the King, The (2003)', 'Star Wars: Episode I - The Phantom Menace (1999)', 'Twelve Monkeys (a.k.a. 12 Monkeys) (1995)', 'Shrek (2001)', 'Pirates of the Caribbean: The Curse of the Black Pearl (2003)', "Schindler's List (1993)", 'Lord of the Rings: The Two Towers, The (2002)']

User 608 has rated movies
Predicted movies for user 608: ['Silence of the Lambs, The (1991)', 'Lord of the Rings: The Two Towers, The (2002)', 'Shrek (2001)', 'Lord of the Rings: The Return of the King, The (2003)', 'Star Wars: Episode I - The Phantom Menace (1999)', 'Pirates of the Caribbean: The Curse of the Black Pearl (2003)', 'Men in Black (a.k.a. MIB) (1997)', "Schindler's List (1993)", 'Monsters, Inc. (2001)', 'Toy Story (1995)']

User 590 has rated movies
Predicted movies for user 590: ['Silence of the Lambs, The (1991)', 'Toy Story (1995)', 'Lord of the Rings: The Two Towers, The (2002)', 'Lord of the Rings: The Return of the King, The (2003)', 'Shrek (2001)', 'Men in Black (a.k.a. MIB) (1997)', 'Pirates of the Caribbean: The Curse of the Black Pearl (2003)', "Schindler's List (1993)", 'Fugitive, The (1993)', 'Lion King, The (1994)']
Recommended movies for user 599: []
0



In [40]:
#printa i film che ha visto l'utente 599 e ordinali in ordine alfabetico

print(f"Movies seen by user {user}: {already_seen}")



Movies seen by user 599: ['Toy Story (1995)', 'Jumanji (1995)', 'Grumpier Old Men (1995)', 'Heat (1995)', 'Sabrina (1995)', 'Sudden Death (1995)', 'GoldenEye (1995)', 'American President, The (1995)', 'Dracula: Dead and Loving It (1995)', 'Cutthroat Island (1995)', 'Casino (1995)', 'Sense and Sensibility (1995)', 'Four Rooms (1995)', 'Ace Ventura: When Nature Calls (1995)', 'Money Train (1995)', 'Get Shorty (1995)', 'Assassins (1995)', 'Powder (1995)', 'Othello (1995)', 'City of Lost Children, The (Cité des enfants perdus, La) (1995)', 'Dangerous Minds (1995)', 'Twelve Monkeys (a.k.a. 12 Monkeys) (1995)', 'Clueless (1995)', 'Richard III (1995)', 'Dead Presidents (1995)', 'Restoration (1995)', 'Mortal Kombat (1995)', 'To Die For (1995)', 'Seven (a.k.a. Se7en) (1995)', 'Usual Suspects, The (1995)', 'Mighty Aphrodite (1995)', 'Home for the Holidays (1995)', 'Indian in the Cupboard, The (1995)', 'Eye for an Eye (1996)', 'Bio-Dome (1996)', 'Friday (1995)', 'From Dusk Till Dawn (1996)', 'Mis

In [35]:
# # Make all predictions
# predictions = {}
# for user in filter_nodes(user_movie_graph, 0):
#     predictions[user] = predict_user_with_similars(user, user_movie_graph, movie_movie_graph)[:10]