# **Recommendation System**

The objective of this assignment is to implement a recommendation system using cosine similarity on an anime dataset.
Dataset:
Use the Anime Dataset which contains information about various anime, including their titles, genres,No.of episodes and user ratings etc.


# Data Preprocessing:

In [1]:
import pandas as pd
import numpy as np
from scipy.sparse import hstack #It’s used to combine multiple sparse matrices (or arrays) side by side (column-wise).
from sklearn.metrics.pairwise import cosine_similarity #It measures how similar two vectors are in direction, regardless of their magnitude.
from sklearn.preprocessing import MultiLabelBinarizer, MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_score, recall_score, f1_score
#importing all the required libraries from packages

In [3]:
df = pd.read_csv('/content/drive/MyDrive/Python excelr/anime.csv') #load the data set
df

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266
...,...,...,...,...,...,...,...
12289,9316,Toushindai My Lover: Minami tai Mecha-Minami,Hentai,OVA,1,4.15,211
12290,5543,Under World,Hentai,OVA,1,4.28,183
12291,5621,Violence Gekiga David no Hoshi,Hentai,OVA,4,4.88,219
12292,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,Hentai,OVA,1,4.98,175


In [7]:
# Handle missing values (drop rows with missing genre or rating)
df = df.dropna(subset=['genre', 'rating'])
df

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266
...,...,...,...,...,...,...,...
12289,9316,Toushindai My Lover: Minami tai Mecha-Minami,Hentai,OVA,1,4.15,211
12290,5543,Under World,Hentai,OVA,1,4.28,183
12291,5621,Violence Gekiga David no Hoshi,Hentai,OVA,4,4.88,219
12292,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,Hentai,OVA,1,4.98,175


# Feature Extraction:

In [9]:
df['genre_list'] = df['genre'].apply(lambda x: [i.strip() for i in x.split(',')])
#This line is creating a new column (genre_list) in your dataframe df by processing the existing column genre.
df['genre_list']

Unnamed: 0,genre_list
0,"[Drama, Romance, School, Supernatural]"
1,"[Action, Adventure, Drama, Fantasy, Magic, Mil..."
2,"[Action, Comedy, Historical, Parody, Samurai, ..."
3,"[Sci-Fi, Thriller]"
4,"[Action, Comedy, Historical, Parody, Samurai, ..."
...,...
12289,[Hentai]
12290,[Hentai]
12291,[Hentai]
12292,[Hentai]


In [11]:
#Convert categorical features into numerical representations if necessary.
#converts lists of genres into a binary matrix
mlb = MultiLabelBinarizer() #This creates an instance (object) of the MultiLabelBinarizer class.
genre_matrix = mlb.fit_transform(df['genre_list'])


In [12]:
mlb

In [13]:
genre_matrix

array([[0, 0, 0, ..., 0, 0, 0],
       [1, 1, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

In [15]:
#Normalize numerical features if required.
scaler = MinMaxScaler()
df['rating_norm'] = scaler.fit_transform(df[['rating']])
scaler

In [16]:
df['rating_norm']

Unnamed: 0,rating_norm
0,0.924370
1,0.911164
2,0.909964
3,0.900360
4,0.899160
...,...
12289,0.297719
12290,0.313325
12291,0.385354
12292,0.397359


In [17]:
# Concatenate genre matrix and normalized rating to create the feature matrix
features = np.hstack([genre_matrix, df[['rating_norm']].values])
print(type(features))

<class 'numpy.ndarray'>


# Recommendation System:

In [18]:
#Design a function to recommend anime based on cosine similarity.
similarity_matrix = cosine_similarity(features) #This line calculates the pairwise cosine similarity between all the rows (samples) in your features matrix
similarity_matrix

array([[1.        , 0.29880771, 0.13644987, ..., 0.15085865, 0.15492584,
        0.1737458 ],
       [0.29880771, 1.        , 0.36135915, ..., 0.11708593, 0.12024259,
        0.13484933],
       [0.13644987, 0.36135915, 1.        , ..., 0.116948  , 0.12010094,
        0.13469047],
       ...,
       [0.15085865, 0.11708593, 0.116948  , ..., 1.        , 0.99994581,
        0.99824985],
       [0.15492584, 0.12024259, 0.12010094, ..., 0.99994581, 1.        ,
        0.99881138],
       [0.1737458 , 0.13484933, 0.13469047, ..., 0.99824985, 0.99881138,
        1.        ]])

Given a target anime, recommend a list of similar anime based on cosine similarity scores.
Experiment with different threshold values for similarity scores to adjust the recommendation list size.
Analyze the performance of the recommendation system and identify areas of improvement.


In [20]:
def recommend_anime(anime_name, top_n=5, similarity_threshold=0.7):
    # try to find the index of the target anime in the dataframe
    try:
        # locate the index where 'name' matches the input anime_name
        idx = df.index[df['name'].str.lower() == anime_name.lower()][0]
    except IndexError:
        # if anime_name not found, print message and return empty list
        print(f"anime '{anime_name}' not found in the dataset.")
        return []

    # get similarity scores of the target anime with all other anime
    # enumerate gives both index and similarity score
    sim_scores = list(enumerate(similarity_matrix[idx])) #enumerate is a python built-in function that adds an index to an iterable.

    # sort the similarity scores in descending order
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # create the recommendation list
    recommendations = [
        # get anime name and similarity score for each similar anime
        (df.iloc[i]['name'], score)
        # skip the first entry because it's the anime itself (similarity = 1)
        for i, score in sim_scores[1:]
        # only include animes with similarity >= threshold
        if score >= similarity_threshold
    ][:top_n]  # limit the list to top_n recommendations

    # return the final list of recommended anime
    return recommendations

In [21]:
# Example :
print("Recommendations for 'One Piece':")
for anime, score in recommend_anime('One Piece', top_n=5):
    print(f"{anime} (similarity: {score:.2f})")

Recommendations for 'One Piece':
One Piece: Episode of Merry - Mou Hitori no Nakama no Monogatari (similarity: 1.00)
One Piece: Episode of Nami - Koukaishi no Namida to Nakama no Kizuna (similarity: 1.00)
One Piece: Episode of Sabo - 3 Kyoudai no Kizuna Kiseki no Saikai to Uketsugareru Ishi (similarity: 1.00)
One Piece Film: Strong World (similarity: 0.93)
One Piece Film: Z (similarity: 0.93)


# Interview Questions:

1. Can you explain the difference between user-based and item-based collaborative filtering?\
a. **user-based cf:**\
finds users similar to the target user\
recommends items liked by those similar users\
works well if users have consistent tastes\
can be expensive for large user bases because it compares all users\
**item-based cf:**\
finds items similar to items the user has liked\
recommends items similar to those the user liked\
more stable over time, less affected by the number of users\
needs enough ratings per item to work effectively\
**key takeaway:**\
user-based cf = “people like you liked this”\
item-based cf = “because you liked this, you may like similar items”


2. What is collaborative filtering, and how does it work?\
a. **Collaborative filtering** (cf) is a recommendation technique that suggests items (movies, products, songs, etc.) to a user based on either:\
the preferences of similar users, or\
the similarity between items\
it does not require any information about the items themselves. it works purely using user-item interactions like ratings, clicks, or purchases.\
**how it works**:\
create a user-item matrix where rows are users, columns are items, and values represent interactions (ratings, likes, etc.).\
compute similarity either between users (user-based cf) or between items (item-based cf) using metrics like cosine similarity or pearson correlation.
predict unknown preferences based on these similarities.\
recommend the top items the user has not interacted with yet.\
*example*:\
user a rated anime1 and anime2 highly, and user b also liked anime1 and anime2. if user a liked anime3, recommend anime3 to user b.
if anime1 and anime3 are similar and user b liked anime1, recommend anime3 to user b.