# Introduction
![](https://miro.medium.com/max/1000/1*WxcROmxvYow2TJrJGXpwWQ.png)<br>
source:[medium.com](https://medium.com/)

## What is Recommendation Systems
Recommender systems are the systems that are designed to recommend things to the user based on many different factors. These systems predict the most likely product that the users are most likely to purchase and are of interest to. Companies like Netflix, Amazon, etc. use recommender systems to help their users to identify the correct product or movies for them. 

 

The recommender system deals with a large volume of information present by filtering the most important information based on the data provided by a user and other factors that take care of the user’s preference and interest. It finds out the match between user and item and imputes the similarities between users and items for recommendation.


**Collaborative filtering systems** use the actions of users to recommend other movies. In general, they can either be user-based or item-based. Item based approach is usually preferred over **user-based approach.** User-based approach is often harder to scale because of the dynamic nature of users, whereas items usually don’t change much, and item based approach often can be computed offline and served without constantly re-training.

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

#### Due to the limitation of The Pc Power I only took first 50million records. If you have access to a good workstation you can use all 109million records

In [2]:
rating_data = pd.read_csv("../input/anime-recommendation-database-2020/animelist.csv",nrows=50000000)
anima_data = pd.read_csv("../input/anime-recommendation-database-2020/anime.csv")

# To save my Pc time I made a new data where i just fetch anime_id(MAL_ID) and Name so that i can use merge() function on it.
anima_data = anima_data.rename(columns={"MAL_ID": "anime_id"})
anima_contact_data = anima_data[["anime_id", "Name"]]

# Preprocessing and applying all the methods for Recommendation System

#### I will now merge the rating data and anima_cantct_data(data extracted from anima_data) in terms of anima_id

In [3]:
rating_data = rating_data.merge(anima_contact_data, left_on = 'anime_id', right_on = 'anime_id', how = 'left')
rating_data = rating_data[["user_id", "Name", "anime_id","rating", "watching_status", "watched_episodes"]]
rating_data.head()

Unnamed: 0,user_id,Name,anime_id,rating,watching_status,watched_episodes
0,0,Basilisk: Kouga Ninpou Chou,67,9,1,1
1,0,Fairy Tail,6702,7,1,4
2,0,Gokusen,242,10,1,4
3,0,Kuroshitsuji,4898,0,1,1
4,0,One Piece,21,10,1,0


In [4]:
rating_data.shape

(50000000, 6)

### Now I will take only that data in which a particular anime has more than 200Votes and if a user has gave in total more than 500Votes to the anime.  

In [5]:
count = rating_data['user_id'].value_counts()
count1 = rating_data['anime_id'].value_counts()
rating_data = rating_data[rating_data['user_id'].isin(count[count >= 500].index)].copy()
rating_data = rating_data[rating_data['anime_id'].isin(count1[count1 >= 200].index)].copy()

In [6]:
rating_data.shape

(27117919, 6)

### We are now grouping rating_data by Name and rating into a new column movie_ratingCount.head()

In [7]:
combine_movie_rating = rating_data.dropna(axis = 0, subset = ['Name'])
movie_ratingCount = (combine_movie_rating.
     groupby(by = ['Name'])['rating'].
     count().
     reset_index()
     [['Name', 'rating']]
    )
movie_ratingCount.head()

Unnamed: 0,Name,rating
0,"""0""",553
1,"""Bungaku Shoujo"" Kyou no Oyatsu: Hatsukoi",2687
2,"""Bungaku Shoujo"" Memoire",3312
3,"""Bungaku Shoujo"" Movie",6087
4,"""Eiji""",298


### We will combine movie_ratingCount and combine_movie_rating this will give us the most Known Anime name and filter out the lesser one.

In [8]:
rating_data = combine_movie_rating.merge(movie_ratingCount, left_on = 'Name', right_on = 'Name', how = 'left')
rating_data = rating_data.drop(columns = "rating_x")
rating_data = rating_data.rename(columns={"rating_y": "rating"})
rating_data

Unnamed: 0,user_id,Name,anime_id,watching_status,watched_episodes,rating
0,6,Angel Beats! Specials,9062,1,1,12469
1,6,Ao no Exorcist,9919,1,2,24296
2,6,Blood+,150,1,15,12899
3,6,Casshern Sins,4981,1,12,8343
4,6,Guilty Crown,10793,1,2,23741
...,...,...,...,...,...,...
27117914,162163,Zetman,11837,2,13,7148
27117915,162163,Zettai Bouei Leviathan,17681,2,13,2729
27117916,162163,Zipang,29,2,26,1551
27117917,162163,_Summer,1692,2,2,1250


### Getting some details of the data

In [9]:
# Encoding categorical data
user_ids = rating_data["user_id"].unique().tolist()
user2user_encoded = {x: i for i, x in enumerate(user_ids)}
user_encoded2user = {i: x for i, x in enumerate(user_ids)}
rating_data["user"] = rating_data["user_id"].map(user2user_encoded)
n_users = len(user2user_encoded)

anime_ids = rating_data["anime_id"].unique().tolist()
anime2anime_encoded = {x: i for i, x in enumerate(anime_ids)}
anime_encoded2anime = {i: x for i, x in enumerate(anime_ids)}
rating_data["anime"] = rating_data["anime_id"].map(anime2anime_encoded)
n_animes = len(anime2anime_encoded)

print("Num of users: {}, Num of animes: {}".format(n_users, n_animes))
print("Min total rating: {}, Max total rating: {}".format(min(rating_data['rating']), max(rating_data['rating'])))

Num of users: 30555, Num of animes: 9864
Min total rating: 136, Max total rating: 27651


In [10]:
g = rating_data.groupby('user_id')['rating'].count()
top_users = g.dropna().sort_values(ascending=False)[:20]
top_r = rating_data.join(top_users, rsuffix='_r', how='inner', on='user_id')

g = rating_data.groupby('anime_id')['rating'].count()
top_animes = g.dropna().sort_values(ascending=False)[:20]
top_r = top_r.join(top_animes, rsuffix='_r', how='inner', on='anime_id')

pivot = pd.crosstab(top_r.user_id, top_r.anime_id, top_r.rating, aggfunc=np.sum)

In [11]:
pivot.fillna(0, inplace=True)
pivot

anime_id,1535,1575,2001,2167,4224,5081,5114,6547,6746,9253,9989,10620,11111,11757,14813,15809,16498,19815,20507,30276
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
11100,27110,26043,24552,24646,27298,26124,26062,27397,25271,27019,24821,25771,24347,27550,24380,25165,27651,26084,25458,25507
20807,27110,26043,24552,24646,27298,26124,26062,27397,25271,27019,24821,25771,24347,27550,24380,25165,27651,26084,25458,25507
22022,27110,26043,24552,24646,27298,26124,26062,27397,25271,27019,24821,25771,24347,27550,24380,25165,27651,26084,25458,25507
34501,27110,26043,24552,24646,27298,26124,26062,27397,25271,27019,24821,25771,24347,27550,24380,25165,27651,26084,25458,25507
50485,27110,26043,24552,24646,27298,26124,26062,27397,25271,27019,24821,25771,24347,27550,24380,25165,27651,26084,25458,25507
61445,27110,26043,24552,24646,27298,26124,26062,27397,25271,27019,24821,25771,24347,27550,24380,25165,27651,26084,25458,25507
63900,27110,26043,24552,24646,27298,26124,26062,27397,25271,27019,24821,25771,24347,27550,24380,25165,27651,26084,25458,25507
68042,27110,26043,24552,24646,27298,26124,26062,27397,25271,27019,24821,25771,24347,27550,24380,25165,27651,26084,25458,25507
82327,27110,26043,24552,24646,27298,26124,26062,27397,25271,27019,24821,25771,24347,27550,24380,25165,27651,26084,25458,25507
83196,27110,26043,24552,24646,27298,26124,26062,27397,25271,27019,24821,25771,24347,27550,24380,25165,27651,26084,25458,25507


### We will now create a pivot table based on Name and User_id column and save it into a variable name piviot_table

In [12]:
piviot_table = rating_data.pivot_table(index="Name",columns="user_id", values="rating").fillna(0)
piviot_table

user_id,6,12,17,19,21,42,44,47,53,60,...,162119,162124,162135,162136,162138,162141,162148,162157,162160,162163
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"""0""",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"""Bungaku Shoujo"" Kyou no Oyatsu: Hatsukoi",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2687.0,0.0,...,0.0,2687.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"""Bungaku Shoujo"" Memoire",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3312.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"""Bungaku Shoujo"" Movie",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6087.0,6087.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"""Eiji""",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
xxxHOLiC Movie: Manatsu no Yoru no Yume,0.0,0.0,4666.0,0.0,0.0,0.0,0.0,0.0,4666.0,0.0,...,0.0,4666.0,0.0,0.0,0.0,4666.0,0.0,0.0,0.0,0.0
xxxHOLiC Rou,0.0,0.0,4571.0,0.0,0.0,0.0,0.0,0.0,4571.0,0.0,...,0.0,4571.0,0.0,0.0,0.0,4571.0,0.0,0.0,0.0,0.0
xxxHOLiC Shunmuki,0.0,0.0,4977.0,0.0,0.0,0.0,0.0,0.0,4977.0,0.0,...,0.0,4977.0,0.0,0.0,0.0,4977.0,0.0,0.0,0.0,0.0
ēlDLIVE,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


# Model Creation

![](https://miro.medium.com/max/650/1*OyYyr9qY-w8RkaRh2TKo0w.png)</br>
Source:[towardsdatascience](https://towardsdatascience.com)

### Implementing kNN
We convert our table to a 2D matrix, and fill the missing values with zeros (since we will calculate distances between rating vectors). We then transform the values(ratings) of the matrix dataframe into a scipy sparse matrix for more efficient calculations.

### Finding the Nearest Neighbors
We use unsupervised algorithms with sklearn.neighbors. The algorithm we use to compute the nearest neighbors is “brute”, and we specify “metric=cosine” so that the algorithm will calculate the cosine similarity between rating vectors. Finally, we fit the model.

# Test our model and make some recommendations:
The kNN algorithm measures distance to determine the “closeness” of instances. It then classifies an instance by finding its nearest neighbors, and picks the most popular class among the neighbors.

In [13]:
from scipy.sparse import csr_matrix
piviot_table_matrix = csr_matrix(piviot_table.values)

In [14]:
from sklearn.neighbors import NearestNeighbors
model = NearestNeighbors(metric="cosine", algorithm="brute")
model.fit(piviot_table_matrix)

NearestNeighbors(algorithm='brute', metric='cosine')

### I will create a Predict() function so that every time I call this function it will recommend me the 5 most closest recommendations

In [15]:
def predict():
    random_anime = np.random.choice(piviot_table.shape[0]) # This will choose a random anime name and our model will predict on it.

    query = piviot_table.iloc[random_anime, :].values.reshape(1, -1)
    distance, suggestions = model.kneighbors(query, n_neighbors=6)
    
    for i in range(0, len(distance.flatten())):
        if i == 0:
            print('Recommendations for {0}:\n'.format(piviot_table.index[random_anime]))
        else:
            print('{0}: {1}, with distance of {2}:'.format(i, piviot_table.index[suggestions.flatten()[i]], distance.flatten()[i]))

# Result

In [16]:
predict()

Recommendations for Precure All Stars DX the Dance Live♥: Miracle Dance Stage e Youkoso:

1: Precure kara Minna e no Ouen Movie, with distance of 0.2759942079304938:
2: Precure All Stars GoGo Dream Live!, with distance of 0.27877387469205195:
3: Precure All Stars Movie New Stage 3: Eien no Tomodachi, with distance of 0.367459445791819:
4: Precure All Stars Movie New Stage: Mirai no Tomodachi, with distance of 0.36893352883159103:
5: Precure All Stars Movie New Stage 2: Kokoro no Tomodachi, with distance of 0.3709758705961854:


## Screenshots of some more results
1) ![](https://raw.githubusercontent.com/everydaycodings/files-for-multiplethings/master/recommandation_results/Screenshot%202021-12-05%2013%3A18%3A39.png)<br>
2) ![](https://raw.githubusercontent.com/everydaycodings/files-for-multiplethings/master/recommandation_results/Screenshot%202021-12-05%2013%3A19%3A14.png)<br>
3) ![](https://raw.githubusercontent.com/everydaycodings/files-for-multiplethings/master/recommandation_results/Screenshot%202021-12-05%2013%3A19%3A44.png)<br>