## Recommendation System

### Objective:
The objective of this assignment is to implement a recommendation system using cosine similarity on an anime dataset.
Dataset:
Use the Anime Dataset which contains information about various anime, including their titles, genres,No.of episodes and user ratings etc.


In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

In [4]:
# Setting up the plot aesthetics
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['figure.dpi'] = 300
warnings.filterwarnings('ignore')
sns.set_theme(style='darkgrid', palette='magma')
%matplotlib inline

In [5]:
# Loading the dataset
df_anime = pd.read_csv(r"https://raw.githubusercontent.com/rohitmaind/ExcelR_Assignments/main/Datasets/anime.csv")

In [6]:
# Display the first few rows of the dataframe
df_anime.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


In [7]:
# Data Preprocessing: Handling missing values
print("Missing values before cleaning:")
print(df_anime.isnull().sum())

Missing values before cleaning:
anime_id      0
name          0
genre        62
type         25
episodes      0
rating      230
members       0
dtype: int64


In [8]:
# Dropping rows with missing values
df_anime_cleaned = df_anime.dropna()
print("\nMissing values after cleaning:")
print(df_anime_cleaned.isnull().sum())


Missing values after cleaning:
anime_id    0
name        0
genre       0
type        0
episodes    0
rating      0
members     0
dtype: int64


In [9]:
# Checking the uniqueness of some important columns
print("\nUnique anime IDs:", df_anime_cleaned['anime_id'].nunique())
print("Unique anime names:", df_anime_cleaned['name'].nunique())
print("Unique genres:", df_anime_cleaned['genre'].nunique())
print("Unique broadcast types:", df_anime_cleaned['type'].nunique())


Unique anime IDs: 12017
Unique anime names: 12015
Unique genres: 3229
Unique broadcast types: 6


In [10]:
# Splitting the genre column into individual genres
def split_genre(txt):
    return [x.strip() for x in txt.split(',')]

genres_split = df_anime_cleaned['genre'].apply(split_genre)


In [11]:
# Creating a pivot table with anime_id as rows and genres as columns
pivot_table = pd.pivot_table(data=df_anime_cleaned, index='anime_id', columns='genre', values='rating', fill_value=0)

In [12]:
# Fill any missing values with zero
pivot_table.fillna(0, inplace=True)


In [13]:
# Verify the pivot table structure
print("\nPivot Table:")
print(pivot_table.head())


Pivot Table:
genre     Action  Action, Adventure  \
anime_id                              
1            0.0                0.0   
5            0.0                0.0   
6            0.0                0.0   
7            0.0                0.0   
8            0.0                0.0   

genre     Action, Adventure, Cars, Comedy, Sci-Fi, Shounen  \
anime_id                                                     
1                                                      0.0   
5                                                      0.0   
6                                                      0.0   
7                                                      0.0   
8                                                      0.0   

genre     Action, Adventure, Cars, Mecha, Sci-Fi, Shounen, Sports  \
anime_id                                                            
1                                                       0.0         
5                                                       0.0         
6

In [15]:
# Computing cosine similarity matrix
from sklearn.metrics.pairwise import cosine_similarity
cosine_sim_matrix = cosine_similarity(pivot_table)
cosine_sim_df = pd.DataFrame(cosine_sim_matrix)

In [16]:
# Adjust the diagonal of the similarity matrix to 0, as we don't want to consider the anime itself in recommendations
np.fill_diagonal(cosine_sim_matrix, 0)

In [17]:
# Updating the dataframe with proper index and column names
cosine_sim_df.index = df_anime_cleaned['anime_id'].unique()
cosine_sim_df.columns = df_anime_cleaned['anime_id'].unique()

In [18]:
# Display the similarity dataframe
pd.set_option('display.max_columns', None)
print("\nCosine Similarity DataFrame:")
print(cosine_sim_df.head())


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
9253     0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0   
9969     0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0   

       18549  15095  15015  1960   34009  33208  20987  5849   20655  810    \
32281    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0   
5114     0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0   
28977    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0   
9253     0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0   
9969     0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0   

       999    9881   14317  15325  7578   9291   23787  30237  16101  3280   \
32281    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0   
5114     0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0   
28977    0.0    0.0    0.0    0.0    0.0    0.0    0.0    0.0   

In [19]:
# Finding the most similar anime for each anime based on cosine similarity
most_similar_anime = cosine_sim_df.idxmax(axis=1)

In [20]:
# Display the results for the top few animes
for anime_id in most_similar_anime.head().index:
    print("\nFor Anime ID:", anime_id)
    print("Most similar Anime ID:", most_similar_anime[anime_id])
    similar_animes = df_anime_cleaned[(df_anime_cleaned['anime_id'] == anime_id) | (df_anime_cleaned['anime_id'] == most_similar_anime[anime_id])]
    print(similar_animes[['anime_id', 'name', 'genre', 'type', 'episodes', 'rating', 'members']])



For Anime ID: 32281
Most similar Anime ID: 8800
      anime_id                           name  \
0        32281                 Kimi no Na wa.   
3456      8800  Senkou no Night Raid Specials   

                                          genre     type episodes  rating  \
0          Drama, Romance, School, Supernatural    Movie        1    9.37   
3456  Action, Historical, Military, Super Power  Special        3    6.99   

      members  
0      200630  
3456     3667  

For Anime ID: 5114
Most similar Anime ID: 32281
   anime_id                              name  \
0     32281                    Kimi no Na wa.   
1      5114  Fullmetal Alchemist: Brotherhood   

                                               genre   type episodes  rating  \
0               Drama, Romance, School, Supernatural  Movie        1    9.37   
1  Action, Adventure, Drama, Fantasy, Magic, Mili...     TV       64    9.26   

   members  
0   200630  
1   793665  

For Anime ID: 28977
Most similar Anime ID: 86

In [21]:
# Example queries
query_anime_ids = [32281, 868, 33032, 9316, 5114]

for q_id in query_anime_ids:
    print(f"\nQuery for Anime ID: {q_id}")
    print(df_anime_cleaned[df_anime_cleaned['anime_id'] == q_id][['anime_id', 'name', 'genre', 'type', 'episodes', 'rating', 'members']])



Query for Anime ID: 32281
   anime_id            name                                 genre   type  \
0     32281  Kimi no Na wa.  Drama, Romance, School, Supernatural  Movie   

  episodes  rating  members  
0        1    9.37   200630  

Query for Anime ID: 868
      anime_id           name                                       genre  \
1917       868  Slayers Great  Adventure, Comedy, Fantasy, Magic, Shounen   

       type episodes  rating  members  
1917  Movie        1    7.41    12760  

Query for Anime ID: 33032
     anime_id                       name  \
455     33032  Drifters: Special Edition   

                                                 genre type episodes  rating  \
455  Action, Adventure, Comedy, Fantasy, Historical...  OVA        1    8.06   

     members  
455     9807  

Query for Anime ID: 9316
       anime_id                                          name   genre type  \
12289      9316  Toushindai My Lover: Minami tai Mecha-Minami  Hentai  OVA   

      epis

## Interview Questions:

### 1. Difference between User-Based and Item-Based Collaborative Filtering:
- **User-Based**: Recommends items based on the preferences of similar users.
- **Item-Based**: Recommends items similar to those the user has liked.

### 2. What is Collaborative Filtering?
- **Definition**: A technique that recommends items by leveraging the preferences of many users.
- **Types**: User-Based (similar users) and Item-Based (similar items).
- **Challenges**: Sparsity, scalability, and cold start issues.