**Importing all the required librarires**

In [2]:
import pandas as pd
from sklearn.metrics.pairwise import linear_kernel
from sklearn.feature_extraction.text import TfidfVectorizer

**Reading the dataset**

In [3]:
url="https://raw.githubusercontent.com/keshav211/MovieRecommender/master/movies_metadata.csv"
metadata=pd.read_csv(url)

  interactivity=interactivity, compiler=compiler, result=result)


**Checking the size and shape of the movies_dataset**

In [4]:
print(metadata.shape)
print(metadata.size)

(45466, 24)
1091184


**Checking all the information of the movies_metadataset**

In [5]:
metadata.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45466 entries, 0 to 45465
Data columns (total 24 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   adult                  45466 non-null  object 
 1   belongs_to_collection  4494 non-null   object 
 2   budget                 45466 non-null  object 
 3   genres                 45466 non-null  object 
 4   homepage               7782 non-null   object 
 5   id                     45466 non-null  object 
 6   imdb_id                45449 non-null  object 
 7   original_language      45455 non-null  object 
 8   original_title         45466 non-null  object 
 9   overview               44512 non-null  object 
 10  popularity             45461 non-null  object 
 11  poster_path            45080 non-null  object 
 12  production_companies   45463 non-null  object 
 13  production_countries   45463 non-null  object 
 14  release_date           45379 non-null  object 
 15  re

#  **Building a Simple Movie Recommender**
One of the most basic metrics you can think of is the ranking to decide which top 250 movies are based on their respective ratings.

However, using a rating as a metric has a few caveats:



*   For one, it does not take into consideration the popularity of a movie. Therefore, a movie with a rating of 9 from 10 voters will be considered 'better' than a movie with a rating of 8.9 from 10,000 voters.

> For example, imagine you want to order Chinese food, you have a couple of options, one restaurant has a 5-star rating by only 5 people while the other restaurant has 4.5 ratings by 1000 people. Which restaurant would you prefer? The second one, right?

> Of course, there could be an exception that the first restaurant opened just a few days ago; hence, fewer people voted for it while, on the contrary, the second restaurant is operational for a year.
 
*  On a related note, this metric will also tend to favor movies with a smaller number of voters with skewed and/or extremely high ratings. As the number of voters increases, the rating of a movie regularizes and approaches towards a value that is reflective of the movie's quality and gives the user a much better idea as to which movie he/she should choose. While it is difficult to discern the quality of a movie with extremely few voters, you might have to consider external sources to conclude. 


Taking these shortcomings into consideration, you must come up with a weighted rating that takes into account the average rating and the number of votes it has accumulated. Such a system will make sure that a movie with a 9 rating from 100,000 voters gets a (far) higher score than a movie with the same rating but a mere few hundred voters.

Since we are trying to build a clone of IMDB's Top 250, let's use its weighted rating formula as a metric/score. Mathematically, it is represented as follows:

\begin{equation}
\text Weighted Rating (\bf WR) = \left({{\bf v} \over {\bf v} + {\bf m}} \cdot R\right) + \left({{\bf m} \over {\bf v} + {\bf m}} \cdot C\right)
\end{equation}

In the above equation,

* **v** is the **number of votes** for the movie;
* **m** is the **minimum votes** required to be listed in the chart;
* **R** is the **average rating** of the movie;
* **C** is the **mean vote** across the whole report.

In [6]:
C = metadata['vote_average'].mean()
m = metadata['vote_count'].quantile(0.90)
q_movies = metadata.copy().loc[metadata['vote_count'] >= m]
def weighted_rating(x, m=m, C=C):
    v = x['vote_count']
    R = x['vote_average']
    return (v/(v+m) * R) + (m/(m+v) * C)
q_movies['score'] = q_movies.apply(weighted_rating, axis=1)    
q_movies = q_movies.sort_values('score', ascending=False)
n=int(input("Please enter the number:"))
q_movies.reset_index(inplace=True)
q_movies.index += 1 
q_movies = q_movies.rename(columns = {'title': 'Movie Name','score':'Movie Score'}, inplace = False)


Please enter the number:3


**The sorted table of the names of the Movie according to their respective  scores are\:**

In [7]:
print((q_movies[['Movie Name','Movie Score']].head(n)))

                    Movie Name  Movie Score
1     The Shawshank Redemption     8.445869
2                The Godfather     8.425439
3  Dilwale Dulhania Le Jayenge     8.421453
