<h2>Movie recommender with item-based collaborative filtering</h2> 

Contents:
- Introduction
- Basic concept
- Data set
- Implementation
- In addition - building app in Flask

Collaborative filtering is a technique that helps filter out items that a user might like on the basis of reactions by similar users. In this article I show a simple example of using item-based filtering applied to a movies data set.


<h3>Introduction</h3>

  The idea is to develop a movie recommender that shows five the nearest movies to a selected movie. Since data about movies genres is available, we can also add filtering by genres. However, first it doesn't hurt to provide a little introduction about what collaborative filtering is.
There are two main types of collaborative filtering - user-based and item-based. Item based means that for item, which wasn't rated by user, rating is calculated based on ratings of similar items given by user. Vice versa, user-based means that rating for a certain item not rated by user is calclated based on ratings of similar users for this item.

Item-based collaborative filtering was developed by Amazon. Advantages of item-based filtering:
- item-based filtering is faster and more stable when there are more users than items. The reason is that usually, the average rating received by an item doesn’t change as quickly as the average rating given by a user to different items. 
- item-based filtering performs better when the ratings matrix is sparse - it has many null values.




<h3>Basic concept</h3>

Detailed description of item-based collaborative filtering is provided in [this article.](https://towardsdatascience.com/item-based-collaborative-filtering-in-python-91f747200fab)
To briefly explain how collaborative filtering works, a good way is do imagine a small table, where rows are movies, columns are users, a value on intersection of rows and columns is a rating, which were given to a movie by a user. Since there's no situation in real life when all users rated all movies, table cointains some null values which should be filled.

The fundamental assumption for this method is that a user gives similar ratings to similar movies. 'Similarity' can be estimated in a different ways. For example, euclidean distance or cosine similarity.

The cosine similarity uses cosinus of an angle to measure the distance between two vectors. Vectors of ratings for one movie are constructed using ratings of all users. We need to measure an angle between two movies vectors. As angle increases, cosinus (= similarity)decreases. 

Finally, to calculate rating for not rated movie, we calculate weighted average of ratings given to items which we consider the most similar. The more cosine similarity value is, the more weight should be given to rating of item used for estimation.

We can determine number of the closest items to the estimated one by ourselves, using K- nearest neighbors algorithm. The KNN algorithm assumes that similar things exist in close proximity. In other words, similar things are near to each other. So, we use KNN algorithm with cosine similarity metrics and pass as a parameter a number of neighbors.

<h3>Implementation of the algorithm</h3>
<h4>Data set</h4>

In the example I use [MovieLens 25M Dataset.](https://grouplens.org/datasets/movielens/) 

This data set (ml-25m) describes 5-star rating and free-text tagging activity from MovieLens, a movie recommendation service. It contains 25000095 ratings and 1093360 tag applications across 62423 movies. These data were created by 162541 users between January 09, 1995 and November 21, 2019. This data set was generated on November 21, 2019.

Users were selected at random for inclusion. All selected users had rated at least 20 movies. No demographic information is included. Each user is represented by an id, and no other information is provided.

I use only two tables - movies and ratings.
<h4>Code</h4>

I will work with data from Google Colab. I uploaded the zip file on Drive, so, I need to mount at it.

In [11]:
from google.colab import drive
drive.mount('/content/gdrive') 

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


Next step is to unzip files. It's needed to pass file directory.

In [12]:
!unzip '/content/gdrive/MyDrive/Collaborative Filtering/ml-25m.zip' -d 'ml-25m'

Archive:  /content/gdrive/MyDrive/Collaborative Filtering/ml-25m.zip
replace ml-25m/ml-25m/tags.csv? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

Folder appears in files section on Google Colab. We can see that inside the folder there is another folder with the same name and inside it there are 5 files. Let's create a path for more convenient use.

In [13]:
path = 'ml-25m/ml-25m/'

Let's read two csv files using pandas. Ratings dataframe includes user id, movie id, rating and timestamp columns. Movies dataframe includes movie id, title and genre columns. As we can see, both dataframes contain column movie id, so they can be merged (inner joint).

In [17]:
#import modules
import pandas as pd
from IPython.display import display

#read data
ratings_df = pd.read_csv(path + 'ratings.csv')
movies_df = pd.read_csv(path + 'movies.csv')

#display shape and a part of the dataframes
print(ratings_df.shape)
display(ratings_df.head(3))
print(movies_df.shape)
display(movies_df.head(3))

(25000095, 4)


Unnamed: 0,userId,movieId,rating,timestamp
0,1,296,5.0,1147880044
1,1,306,3.5,1147868817
2,1,307,5.0,1147868828


(62423, 3)


Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance


In [18]:
#merge dataframes
df = pd.merge(movies_df, ratings_df)
df.head(3)


Unnamed: 0,movieId,title,genres,userId,rating,timestamp
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,2,3.5,1141415820
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,3,4.0,1439472215
2,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,4,3.0,1573944252


Below statistics of the dataframe is provided. As we can see, the dataframe has no duplicates, no null values. Also it has appropriate columns data types (excluding timestamp but it won't be used in the analysis).

In [19]:
#display the shape
print(f'Dataframe shape - {df.shape}')
#display data types of the dataframe columns
print('Dataframe info:')
df.info()
#display how many null values each column has
print('Number of null values:')
df.isna().sum()

Dataframe shape - (25000095, 6)
Dataframe info:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 25000095 entries, 0 to 25000094
Data columns (total 6 columns):
 #   Column     Dtype  
---  ------     -----  
 0   movieId    int64  
 1   title      object 
 2   genres     object 
 3   userId     int64  
 4   rating     float64
 5   timestamp  int64  
dtypes: float64(1), int64(3), object(2)
memory usage: 1.3+ GB
Number of null values:


movieId      0
title        0
genres       0
userId       0
rating       0
timestamp    0
dtype: int64

In [None]:
#count duplicate values. The dataframe has no duplicates.
bool_series = df.duplicated()
bool_series.value_counts()

False    25000095
dtype: int64

Also we can see that some movies have quite strange names, e.g., Rock, The (1996) and Aristocrats, The (2005). It's needed to process the column names and remove 'The'.

In [None]:
#apply the function to df
df.title = df.title.map(lambda x: x.replace(', The', ''))
df.sample(3)

Unnamed: 0,movieId,title,genres,userId,rating,timestamp
7743253,1359,Jingle All the Way (1996),Children|Comedy,76770,1.0,1028581731
9862939,2028,Saving Private Ryan (1998),Action|Drama|War,122300,0.5,1237340274
6342659,1198,Raiders of the Lost Ark (Indiana Jones and the...,Action|Adventure,162023,0.5,1450643936


Let's calculate number of users and number of items in the united dataframe. We can see that number of users exceeds the number of items. That's why using item-based filtering is more appropriate.

In [None]:
#display number of items and users
print(f'Number of items is: {len(df.movieId.unique())}')
print(f'Number of users is: {len(df.userId.unique())}')

Number of items is: 59047
Number of users is: 162541


Next step is to transform the current dataframe to a table used for ratings estimation: columns - users, rows - movies. For this purpose we'll use pivot table. However, during the implementation I realised that my computer has not enough RAM to process 59047x162541 table. To solve this problem, dimentionality reduction echniques such as matrix factorization can be used but for easier illustration I decided to filter the initial dataframe. New dataframe will consist of movies that were seen by at least 1000 users and users that watched at least 500 movies.

In [None]:
#create a dataframe that groups movies by number of users that watched it
move_group = df.groupby('title').agg({'userId': 'count'})
#filter the created dataframe - save only rown which satisfy the criteria above
move = move_group[move_group.userId >=1000].reset_index()
#display the filtered dataframe's shape
print(move.shape)

#create a dataframe that groups users by number of movies that they watched
user_group = df.groupby('userId').agg({'title': 'count'})
#filter the created dataframe - save only rown which satisfy the criteria above
user = user_group[user_group.title >=500].reset_index()
#display the filtered dataframe's shape
print(user.shape)


(3794, 2)
(9713, 2)


In [None]:
#filter the initial dataframe by movie title and user id.
df_move = df[(df['title'].isin(move.title)) & (df['userId'].isin(user.userId))]
print(df_move.shape)
df_move.head(3)

(7128241, 6)


Unnamed: 0,movieId,title,genres,userId,rating,timestamp
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,3,4.0,1439472215
6,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,12,4.0,1167582601
35,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,120,5.0,956264593


Regarding shape of the new dataframe we can see that number of rows is reduced from 25M to 7M and it's enough RAM to create the pivot table. 

In [None]:
pivot = pd.pivot_table(df_move, values=['rating'], index=['title'], columns=['userId'], fill_value = 0)
pivot.head(3)

Unnamed: 0_level_0,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating
userId,3,12,72,80,120,166,171,175,181,187,...,162386,162387,162394,162445,162481,162484,162495,162508,162516,162519
title,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
'burbs (1989),0.0,0.0,3,0,0,0,0.0,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,3.5,0.0,0.0,0
(500) Days of Summer (2009),0.0,3.0,0,0,0,0,0.0,0,0.0,4.0,...,4.5,0.0,2.0,0.0,0.0,0.0,0.0,2.5,4.0,0
*batteries not included (1987),0.0,0.0,0,0,0,0,0.0,0,0.0,0.0,...,0.0,0.0,0.0,0.0,3.5,0.0,3.5,0.0,3.0,0


We created pivot table with ratings and we're ready to use KNN algorithm.We need to create a KNN object, where metric of similarity is a cosine similarity, algorithm used - brute-force search. Brute-forced search is an algorithm that enumerates all possible options and picks those that satisfy the criteria the most. I set number of closest neighbors equal to 6. This number includes an analyzed object itself, so 5 nearest movies will be shown.

In [None]:
#import module
from sklearn.neighbors import NearestNeighbors
#create KNN class object
knn = NearestNeighbors(metric='cosine', algorithm='brute')
#fit the model to our data
knn.fit(pivot.values)
#dispaly movie indices and corresponding values of cosine similarity
distances, indices = knn.kneighbors(pivot.values, n_neighbors=6)
indices
        

array([[   0, 2238,  661, 2926, 3793,  705],
       [   1, 1662, 1798, 3086, 1677,  819],
       [   2, 3003, 1915, 1190, 1681, 3004],
       ...,
       [3791, 1134, 1902,  418, 2168, 2136],
       [3792, 3791, 3489,   21,  421, 1135],
       [3793, 1187, 3113,  374, 1148, 1317]])

After running, the algorithm shows indices of five closest movies for each of 3793 movies in the dataframe. For example, movie with title 'burbs, The (1989)' has index 0 and we need to look at the first list in the array. We can see that the closest movies to 'burbs, The (1989)' have indices 2238,  661, 2926, 3793 and 705. The similar interpretation is for all the rest movies. Since we have found the indices of the closest movies, we're able to display their names to present information received in more convenient form. For this purpose I define a function that searches and displays movies titles by their index.

In [None]:
def show_closest():
  movie = input('Please enter a movie name: ') 
  #find index of movie in the data set 
  index_for_movie = pivot.index.tolist().index(movie)
  # find the indices for the similar movies
  sim_movies = indices[index_for_movie].tolist()
  #find corresponding movies titles by indices
  names = [pivot.index[item] for item in sim_movies]
  # find distances between the chosen movie and the similar movies
  movie_distances = distances[index_for_movie].tolist()
  #round the numbers
  movie_distances = [str(round(item,2)) for item in movie_distances]
  #define a separator
  nl = '\n'
  #display moves titles and distances
  print(f'\nThe nearest movies to {names[0]}:\n{nl.join(names[1:])}\n')
  print(f'The distance from {names[0]}: {", ".join(movie_distances[1:])}')

Let's pick some movie from the dataframe and find out that movies are considered to be the nearest to it.

In [None]:
show_closest()

Please enter a movie name: (500) Days of Summer (2009)

The nearest movies to (500) Days of Summer (2009):
Inception (2010)
Juno (2007)
Social Network (2010)
Inglourious Basterds (2009)
Dark Knight (2008)

The distance from (500) Days of Summer (2009): 0.33, 0.33, 0.34, 0.35, 0.35


The result of the function is a list of the nearest movies and their corresponding distances. It could be useful to add information about genres and filter proposed movies by genres. It's needed to find out what genre an input movie has and filter the dataframe by specific genres. I propose two types of genres filtering: strict filtering and soft filtering. Strict means that the nearest movies should have exactly the same genre the input movie has. Soft means that genres of the nearest movies should cointain specified part of the input movie genre. Let's add some code to the existing function.

In [None]:
#define a function with strict filtering
def show_closest_movies_genre_strict(movie):
    #find the input movie genre in the dataframe
    genre = ''.join(df[df.title == movie].genres.unique())
    print(f"Movie genre is: {genre}")
    #create a dataframe, filtered by genre
    filter_genre = df_move[df_move['genres'] == genre]
    #create a pivot table based on filtered dataframe
    pivot = pd.pivot_table(filter_genre, values=['rating'], index=['title'], columns=['userId'], fill_value = 0)
    #create condition - check that number of nearest movies is more or equal to number of K nearest neighbors
    if len(filter_genre.title.unique()) >= 6:  
        knn = NearestNeighbors(metric='cosine', algorithm='brute')
        knn.fit(pivot.values)
        distances, indices = knn.kneighbors(pivot.values, n_neighbors=6)
        index_for_movie = pivot.index.tolist().index(movie)
        sim_movies = indices[index_for_movie].tolist()
        names = [pivot.index[item] for item in sim_movies]
        #create a list of the nearest movies genres
        genres_film = [''.join(filter_genre[filter_genre.title == item].genres.unique()) for item in names]
        #create a list of movies names and their genres
        names_genres = [', '.join(list(i)) for i in list(zip(names, genres_film))]
        movie_distances = distances[index_for_movie].tolist()
        movie_distances = [str(round(item,2)) for item in movie_distances]
        id_movie = sim_movies.index(index_for_movie)
        sim_movies.remove(index_for_movie)
        movie_distances.pop(id_movie)
        nl = '\n'
        #display movies names and genres
        print(f'\nThe Nearest Movies to {names[0]}:\n{nl.join(names_genres[1:])}\n')
        print(f'The Distance from {names[0]}: {", ".join(movie_distances)}')
    #if number of movies in the filtered by genre datase is less than number of nearest neighbors - display it
    else:
        print('Limited number of movies. Try to use soft filtering.')


Let's check the result. We can see that all the movies have the same genre.

In [None]:
show_closest_movies_genre_strict('(500) Days of Summer (2009)')

Movie genre is: Comedy|Drama|Romance

The Nearest Movies to (500) Days of Summer (2009):
Juno (2007), Comedy|Drama|Romance
Crazy, Stupid, Love. (2011), Comedy|Drama|Romance
Moonrise Kingdom (2012), Comedy|Drama|Romance
Knocked Up (2007), Comedy|Drama|Romance
Lost in Translation (2003), Comedy|Drama|Romance

The Distance from (500) Days of Summer (2009): 0.33, 0.41, 0.43, 0.45, 0.49


Function with soft filtering is quite similar. I use input to choose a specific genre.

In [None]:
def show_closest_movies_genre_soft(movie):
    #display a genre of the movie
    print(f"Movie genre is: {''.join(df[df.title == movie].genres.unique())}")
    #use input to type a genre
    genre = input('Choose a specific genre which movies should cointain ')
    #condition - make sure that a genre is a part of the input movie genre
    if genre in ''.join(df[df.title == movie].genres.unique()).split('|'):
      filter_genre = df_move[df_move['genres'].str.contains(genre)]
      pivot = pd.pivot_table(filter_genre, values=['rating'], index=['title'], columns=['userId'], fill_value = 0)
      if len(filter_genre.title.unique()) >= 6:  
        knn = NearestNeighbors(metric='cosine', algorithm='brute')
        knn.fit(pivot.values)
        distances, indices = knn.kneighbors(pivot.values, n_neighbors=6)
        index_for_movie = pivot.index.tolist().index(movie)
        sim_movies = indices[index_for_movie].tolist()
        names = [pivot.index[item] for item in sim_movies]
        genres_film = [''.join(filter_genre[filter_genre.title == item].genres.unique()) for item in names]
        names_genres = [', '.join(list(i)) for i in list(zip(names, genres_film))]
        movie_distances = distances[index_for_movie].tolist()
        movie_distances = [str(round(item,2)) for item in movie_distances]
        id_movie = sim_movies.index(index_for_movie)
        sim_movies.remove(index_for_movie)
        movie_distances.pop(id_movie)
        nl = '\n'
        print(f'\nThe Nearest Movies to {names[0]}:\n{nl.join(names_genres[1:])}\n')
        print(f'The Distance from {names[0]}: {", ".join(movie_distances)}')
      else:
        print('Limited number of movies with this genre. Try to use soft filtering.')
    #if the input movie genre doesn't cointain genre that was typed in the input - display it
    else:
      return 'Watched movie does not contain this genre'

Also we can define a simple additional function that allows to choose between strict and soft filtering.

In [None]:
def choose_filter(movie):  
  answer = input('Would you like to use strict filtering (y/n)? ')
  if answer == 'y':
    show_closest_movies_genre_strict(movie)
  elif answer == 'n':
    show_closest_movies_genre_soft(movie)
  else:
    print('Incorrect answer')

choose_filter('(500) Days of Summer (2009)')

Would you like to use strict filtering (y/n)? n
Movie genre is: Comedy|Drama|Romance
Choose a specific genre which movies should cointain Drama

The Nearest Movies to (500) Days of Summer (2009):
Inception (2010), Action|Crime|Drama|Mystery|Sci-Fi|Thriller|IMAX
Juno (2007), Comedy|Drama|Romance
Social Network (2010), Drama
Inglourious Basterds (2009), Action|Drama|War
Dark Knight (2008), Action|Crime|Drama|IMAX

The Distance from (500) Days of Summer (2009): 0.33, 0.33, 0.34, 0.35, 0.35


Each movie genre includes Romance, as well as other genres.

<h3>In addition - building app in Flask</h3>

Also, we can use Flask to build a simple application for movies recommender based on the functions we used before. The app will have three options - recommender without genres filtering, recommender with strict and soft filtering respectively. Since I plan to build the app in Google Colab, I will use ngrok. Ngrok is a cross-platform application that enables developers to expose a local development server to the Internet with minimal effort. It's needed to sign up on [the site](https://ngrok.com/) and create a token according to instructions on the site. First of all, we need to install modules. Pyngrok module is needed to avoid ngrok 6022 error.

In [None]:
#install necessary modules
!pip install flask-ngrok &> /dev/null
!pip install flask-bootstrap &> /dev/null
!pip install pyngrok==4.1.1 &> /dev/null
!pip install flask_navigation &> /dev/null

To run the app, html files are needed. I create it in my local folder and upload it to Google Drive. It's needed to place .html file in a folder named templates. We could also use .css file for styling but in this example I don't use css.

In [None]:
#upload files from google collab
from google.colab import files
uploaded = files.upload()

After signing up and creating the token we can use it to run the app on Colab. Below is the code of the app. Html file and Jupiter Notebook with text and code are placed on my GitHub.

In [None]:
#call ngrok access token
!ngrok authtoken 'your ngrok token'
#import modules
from flask_ngrok import run_with_ngrok
from flask import Flask, request,render_template, session

#specify a folder where .html files are placed
TEMPLATE = '/content/gdrive/MyDrive/Collaborative Filtering/templates'

#app code
app = Flask(__name__, template_folder=TEMPLATE)
run_with_ngrok(app)
app.config["SESSION_PERMANENT"] = False
app.config["SESSION_TYPE"] = "filesystem"

#recommender without filtering home page
@app.route('/')
def home_without_filters():
    movies = list(pivot.index)
    return render_template('home.html', movies=movies)

#recommender without filtering predict page
@app.route('/predict',methods=['POST'])
def predict():
  try:
    movies = list(pivot.index)
    movie = ''.join([x for x in request.form.values()])
    index_for_movie = pivot.index.tolist().index(movie)
    # find the indices for the similar movies
    sim_movies = indices[index_for_movie].tolist()
    #find corresponding movies titles by indices
    names = [pivot.index[item] for item in sim_movies]
    genres_film = [''.join(df_move[df_move.title == item].genres.unique()) for item in names]
    names_genres = [', '.join(list(i)) for i in list(zip(names, genres_film))]
    id_movie = sim_movies.index(index_for_movie)
    selected = 'The nearest movies for ' + names_genres[0] + ':'
    output = names_genres[1:]
    return render_template('home.html', movies=movies, prediction_text=output, selected = selected)
  except:
    error_text = 'Some error occured, please try again'
    return render_template('strict.html', movies=movies, selected=error_text)

#recommender with strict filtering home page
@app.route('/strict')
def home_strict_filters():
    movies = list(pivot.index)
    return render_template('strict.html', movies=movies)

#recommender with strict filtering predict page
@app.route('/predict_strict',methods=['POST'])
def predict_strict():
  try:
    movies = list(pivot.index)
    movie = ''.join([x for x in request.form.values()])
    print(movie)
    genre = ''.join(df[df.title == movie].genres.unique())
    filter_genre = df_move[df_move['genres'] == genre]
    pivot_genre = pd.pivot_table(filter_genre, values=['rating'], index=['title'], columns=['userId'], fill_value = 0)
    if len(filter_genre.title.unique()) >= 6:  
        knn = NearestNeighbors(metric='cosine', algorithm='brute')
        knn.fit(pivot_genre.values)
        distances, indices = knn.kneighbors(pivot_genre.values, n_neighbors=6)
        index_for_movie = pivot_genre.index.tolist().index(movie)
        sim_movies = indices[index_for_movie].tolist()
        names = [pivot_genre.index[item] for item in sim_movies]
        genres_film = [''.join(filter_genre[filter_genre.title == item].genres.unique()) for item in names]
        names_genres = [', '.join(list(i)) for i in list(zip(names, genres_film))]
        id_movie = sim_movies.index(index_for_movie)
        selected = 'The nearest movies for ' + names_genres[0] + ':'
        output = names_genres[1:]
        return render_template('strict.html', movies=movies, prediction_text=output, selected = selected)
    else:
        not_enough = 'There is not enough movies with the same genre, please try another one'
        return render_template('strict.html', movies=movies, selected=not_enough)
  except:
        error_text = 'Some error occured, please try again'
        return render_template('strict.html', movies=movies, selected=error_text)

#recommender with soft filtering home page
@app.route('/soft')
def home_soft_filters():
    movies = list(pivot.index)
    return render_template('soft.html', movies=movies)

#recommender with soft filtering predict page
@app.route('/predict_soft', methods=['GET', 'POST'])
def predict_soft():
   try:
      movies = list(pivot.index)
      select = request.form.get('movies')
      movie = select
      if request.form['action'] == 'Select_movie':
        genre = ''.join(df[df.title == select].genres.unique())
        session["movie_name"] = select
        return render_template('soft.html', selected_genre = f'{movie} genre is {genre}', movies=movies)
      else:
        selected_movie = session.get('movie_name', None)
        resp = [x for x in request.form.values()]
        genre = resp[0]
        filter_genre = df_move[df_move['genres'].str.contains(genre)]
        pivot_genre = pd.pivot_table(filter_genre, values=['rating'], index=['title'], columns=['userId'], fill_value = 0)
        if len(filter_genre.title.unique()) >= 6:  
          knn = NearestNeighbors(metric='cosine', algorithm='brute')
          knn.fit(pivot_genre.values)
          distances, indices = knn.kneighbors(pivot_genre.values, n_neighbors=6)
          index_for_movie = pivot_genre.index.tolist().index(selected_movie)
          sim_movies = indices[index_for_movie].tolist()
          names = [pivot_genre.index[item] for item in sim_movies]
          genres_film = [''.join(filter_genre[filter_genre.title == item].genres.unique()) for item in names]
          names_genres = [', '.join(list(i)) for i in list(zip(names, genres_film))]
          id_movie = sim_movies.index(index_for_movie)
          selected = 'The nearest movies for ' + names_genres[0] + ':'
          output = names_genres[1:]
          return render_template('soft.html', selected_genre = f'Movie genre should cointain {genre}', movies=movies, prediction_text=output, selected = selected)   
        else:
          not_enough = 'There is not enough movies with the same genre, please try another one'
          return render_template('strict.html', movies=movies, selected=not_enough)
   except:
       error_text = 'You should enter one of the genres of the selected movie.'
       return render_template('strict.html', movies=movies, selected=error_text)

if __name__ == "__main__":
  app.secret_key = 'super secret key'
  app.run()

Now the app runs at a localhost via ngrok.

That's all about developing the movie recommender using item-based collaborative filtering. Thank you for your attention!

<h3>References:</h3>
1. Yohan Jeong. Item-Based Collaborative Filtering in Python
The practice of making the item-based collaborative filtering in python. Towards Data Sciense. URL: https://towardsdatascience.com/item-based-collaborative-filtering-in-python-91f747200fab

2. Abhinav Ajitsaria. Build a Recommendation Engine With Collaborative Filtering. Real Python. URL: https://realpython.com/build-recommendation-engine-collaborative-filtering/#user-based-vs-item-based-collaborative-filtering

3. MovieLens 25M Dataset. Grouplens. URL: https://files.grouplens.org/datasets/movielens/ml-25m-README.html

4. MovieLens 25M Dataset summary. Grouplens. URL: https://files.grouplens.org/datasets/movielens/ml-25m-README.html

5. Onel Harrison. Machine Learning Basics with the K-Nearest Neighbors Algorithm. Towards Data Sciense. URL: https://towardsdatascience.com/machine-learning-basics-with-the-k-nearest-neighbors-algorithm-6a6e71d01761