# üé¨ Anime Recommendation System Using Cosine Similarity

## Background & Objective
Anime fans often look for recommendations that match their preferences in terms of genres, type, ratings, and popularity. A recommendation system helps by finding anime that are similar to the one a user likes.
The objective of this assignment is to implement a content-based recommendation system using cosine similarity on an Anime dataset.
We will:
1.	Preprocess the dataset.
2.	Extract features from anime genres and ratings.
3.	Compute cosine similarity between anime titles.
4.	Build a recommendation function.
5.	Test with an example (e.g., Naruto).
6.	Interpret results and provide insights.


## 1. Import Required Libraries & Load Dataset

In [3]:

#  1. Import Libraries
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import MinMaxScaler

# Load Dataset
file_path=r"D:\Data sciences\Assignments\Assignment files\Assignment files Extracs\Recommendation System\anime.csv"
df = pd.read_csv(file_path)   # Change filename if needed
print(df.info())
print(df.head())


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12294 entries, 0 to 12293
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   anime_id  12294 non-null  int64  
 1   name      12294 non-null  object 
 2   genre     12232 non-null  object 
 3   type      12269 non-null  object 
 4   episodes  12294 non-null  object 
 5   rating    12064 non-null  float64
 6   members   12294 non-null  int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 672.5+ KB
None
   anime_id                              name  \
0     32281                    Kimi no Na wa.   
1      5114  Fullmetal Alchemist: Brotherhood   
2     28977                          Gintama¬∞   
3      9253                       Steins;Gate   
4      9969                     Gintama&#039;   

                                               genre   type episodes  rating  \
0               Drama, Romance, School, Supernatural  Movie        1    9.37   
1  Action, Adven

### Explanation:
‚Ä¢	Dataset has 12,294 anime records.

‚Ä¢	Missing values exist in genre, type, and rating.

‚Ä¢	episodes column is stored as text, so we need to convert it into numbers.


## 2. Data Preprocessing

In [5]:

# 2. Data Preprocessing

# Handle missing values
df['genre'] = df['genre'].fillna("Unknown")
df['type'] = df['type'].fillna("Unknown")
df['rating'] = df['rating'].fillna(df['rating'].mean())

# Convert episodes column to numeric
df['episodes'] = pd.to_numeric(df['episodes'], errors='coerce')
df['episodes'] = df['episodes'].fillna(df['episodes'].median())


df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12294 entries, 0 to 12293
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   anime_id  12294 non-null  int64  
 1   name      12294 non-null  object 
 2   genre     12294 non-null  object 
 3   type      12294 non-null  object 
 4   episodes  12294 non-null  float64
 5   rating    12294 non-null  float64
 6   members   12294 non-null  int64  
dtypes: float64(2), int64(2), object(3)
memory usage: 672.5+ KB


###  Explanation:
‚Ä¢	Missing values handled ‚Üí dataset is clean.

‚Ä¢	episodes successfully converted to numeric.

‚Ä¢	Now all features are ready for modeling.


## 3. Feature Engineering

In [6]:

# 3. Feature Extraction

# Genres as bag-of-words
count = CountVectorizer(tokenizer=lambda x: x.split(", "))
genre_matrix = count.fit_transform(df['genre'])

# Normalize numerical features (episodes, rating, members)
scaler = MinMaxScaler()
num_features = scaler.fit_transform(df[['episodes','rating','members']])

# Combine all features (genres + numerical features)
from scipy.sparse import hstack
features = hstack([genre_matrix, num_features])




### Explanation:
‚Ä¢	CountVectorizer ‚Üí converts genre text into numbers (bag-of-words).

‚Ä¢	MinMaxScaler ‚Üí normalizes episodes, ratings, and members (0‚Äì1 scale).

‚Ä¢	hstack ‚Üí combines genre + numerical features into one feature matrix.


## 4. Cosine Similarity

In [7]:

# 4. Cosine Similarity
cosine_sim = cosine_similarity(features, features)


### Explanation:
‚Ä¢	Computes similarity scores between every pair of anime.

‚Ä¢	Score ranges from 0 (not similar) to 1 (very similar).


## 5. Recommendation Function

In [8]:

# 5. Recommendation Function
def recommend_anime(title, top_n=5):
    if title not in df['name'].values:
        return f"X '{title}' not found in dataset."

    # Get index of target anime
    idx = df[df['name'] == title].index[0]

    # Get similarity scores
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Sort by similarity (ignore first = itself)
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)[1:top_n+1]

    # Get anime indices
    anime_indices = [i[0] for i in sim_scores]

    # Return recommended anime
    return df[['name','genre','type','rating']].iloc[anime_indices]


### Explanation:
‚Ä¢	Finds the chosen anime in dataset.

‚Ä¢	Gets similarity scores with all others.

‚Ä¢	Sorts by highest similarity and returns Top N recommendations.


## 6. Example Test (Naruto)

In [9]:

# 6. Test the System
print(" Example Recommendations for 'Naruto':")
print(recommend_anime("Naruto", top_n=5))


 Example Recommendations for 'Naruto':
                                                   name  \
615                                  Naruto: Shippuuden   
1472        Naruto: Shippuuden Movie 4 - The Lost Tower   
1573  Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsu...   
486                            Boruto: Naruto the Movie   
1343                                        Naruto x UT   

                                                  genre   type  rating  
615   Action, Comedy, Martial Arts, Shounen, Super P...     TV    7.94  
1472  Action, Comedy, Martial Arts, Shounen, Super P...  Movie    7.53  
1573  Action, Comedy, Martial Arts, Shounen, Super P...  Movie    7.50  
486   Action, Comedy, Martial Arts, Shounen, Super P...  Movie    8.03  
1343  Action, Comedy, Martial Arts, Shounen, Super P...    OVA    7.58  


##  Explanation:
‚Ä¢	For "Naruto", the system recommends Naruto Shippuuden, Naruto movies, Boruto etc.

‚Ä¢	This shows the model is working well, grouping related anime together.


## üîç 7. Conclusion
‚Ä¢	Built a Content-based Recommendation System using cosine similarity.

‚Ä¢	Used genres, episodes, ratings, and members as features.

‚Ä¢	System works well, successfully recommending similar anime.

‚Ä¢	For ‚ÄúNaruto‚Äù, all top recommendations belong to the same franchise/genre.


## üé§ Interview Questions
Q1. What is the difference between User-based and Item-based Collaborative Filtering?

‚Ä¢	User-based: Finds users with similar interests, then recommends what they liked.

‚Ä¢	Item-based: Finds items similar to the one user likes and recommends them.

Q2. What is Collaborative Filtering and how does it work?

‚Ä¢	Collaborative Filtering uses past user behavior (ratings, purchases, views) to recommend new items.

‚Ä¢	Example: If users who liked Naruto also liked One Piece, then One Piece is recommended.

Q3. Why is Cosine Similarity used here?

‚Ä¢	Cosine similarity checks the angle between feature vectors.

‚Ä¢	If angle is small ‚Üí items are similar.

‚Ä¢	It works well for text (like genres) and normalized numeric features.
