In [3]:
# ============================================
# Anime Recommendation System using Cosine Similarity
# ============================================

# 1. Import Required Libraries
import pandas as pd
import numpy as np

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_score, recall_score, f1_score

# ============================================
# 2. Data Preprocessing
# ============================================

# Load dataset
anime_df = pd.read_csv("/content/anime.csv")

# Display basic information
print("Dataset Shape:", anime_df.shape)
print("\nFirst 5 Rows:\n", anime_df.head())

# Handle missing values for genre, rating, and members
anime_df.fillna({
'genre': '',
'rating': anime_df['rating'].mean(),
'members': anime_df['members'].mean()
}, inplace=True)

# Convert 'episodes' to numeric, coercing errors to NaN, then fill NaNs with the mean
anime_df['episodes'] = pd.to_numeric(anime_df['episodes'], errors='coerce')
anime_df['episodes'].fillna(anime_df['episodes'].mean(), inplace=True)

# ============================================
# 3. Feature Selection & Extraction
# ============================================

# Selecting features
features = anime_df[['name', 'genre', 'rating', 'episodes', 'members']]

# Convert genres to numerical representation using TF-IDF
tfidf = TfidfVectorizer(stop_words='english')
genre_matrix = tfidf.fit_transform(features['genre'])

# Normalize numerical features
scaler = MinMaxScaler()
numeric_features = scaler.fit_transform(features[['rating', 'episodes', 'members']])

# Combine genre and numeric features
from scipy.sparse import hstack
final_features = hstack((genre_matrix, numeric_features))

# ============================================
# 4. Cosine Similarity Computation
# ============================================

cosine_sim = cosine_similarity(final_features, final_features)

# ============================================
# 5. Recommendation Function
# ============================================

def recommend_anime(anime_name, top_n=5, similarity_threshold=0.2):
    if anime_name not in features['name'].values:
        return "Anime not found in the dataset."

    anime_index = features[features['name'] == anime_name].index[0]

    similarity_scores = list(enumerate(cosine_sim[anime_index]))
    similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)

    recommendations = []
    for idx, score in similarity_scores[1:]:
        if score >= similarity_threshold:
            recommendations.append(features.iloc[idx]['name'])
        if len(recommendations) == top_n:
            break

    return recommendations

# Example Recommendation
print("\nRecommended Anime:")
print(recommend_anime("Naruto", top_n=5, similarity_threshold=0.25))

# ============================================
# 6. Evaluation of Recommendation System
# ============================================

# Create binary relevance based on rating threshold
anime_df['relevant'] = anime_df['rating'] >= 7

# Train-Test Split
train_idx, test_idx = train_test_split(
anime_df.index, test_size=0.2, random_state=42
)

# Predict relevance using similarity (simple baseline)
y_true = anime_df.loc[test_idx, 'relevant'].astype(int)

y_pred = []
for idx in test_idx:
    similar_scores = cosine_sim[idx]
    avg_score = np.mean(similar_scores)
    y_pred.append(1 if avg_score >= 0.3 else 0)

# Evaluation Metrics
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
f1 = f1_score(y_true, y_pred)

print("\nEvaluation Metrics:")
print("Precision:", precision)
print("Recall:", recall)
print("F1-Score:", f1)

# ============================================
# 7. Conclusion
# ============================================

print("""
The recommendation system successfully uses cosine similarity
to identify similar anime based on genres and numerical features.
Performance can be improved by incorporating user interaction data
and advanced collaborative filtering techniques.
""")

Dataset Shape: (12294, 7)

First 5 Rows:
    anime_id                              name  \
0     32281                    Kimi no Na wa.   
1      5114  Fullmetal Alchemist: Brotherhood   
2     28977                          Gintama°   
3      9253                       Steins;Gate   
4      9969                     Gintama&#039;   

                                               genre   type episodes  rating  \
0               Drama, Romance, School, Supernatural  Movie        1    9.37   
1  Action, Adventure, Drama, Fantasy, Magic, Mili...     TV       64    9.26   
2  Action, Comedy, Historical, Parody, Samurai, S...     TV       51    9.25   
3                                   Sci-Fi, Thriller     TV       24    9.17   
4  Action, Comedy, Historical, Parody, Samurai, S...     TV       51    9.16   

   members  
0   200630  
1   793665  
2   114262  
3   673572  
4   151266  


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  anime_df['episodes'].fillna(anime_df['episodes'].mean(), inplace=True)



Recommended Anime:
['Naruto: Shippuuden', 'Dragon Ball Z', 'Dragon Ball', 'Naruto: Shippuuden Movie 4 - The Lost Tower', 'Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsugu Mono']

Evaluation Metrics:
Precision: 0.4432194046306505
Recall: 0.9913686806411838
F1-Score: 0.6125714285714285

The recommendation system successfully uses cosine similarity
to identify similar anime based on genres and numerical features.
Performance can be improved by incorporating user interaction data
and advanced collaborative filtering techniques.



1. Can you explain the difference between user-based and item-based collaborative filtering?     
answer:   
I. User-Based Collaborative Filtering (User-User CF)  
It recommends items to a user based on the preferences of similar users.   
    
How it works:    
   
i)Identify users who have similar tastes or behavior (using similarity metrics like cosine similarity, Pearson correlation, etc.).

ii)Look at items that these similar users liked but the target user hasn’t  interacted with yet.   

iii)Recommend the items with the highest predicted preference.  


Item-Based Collaborative Filtering (Item-Item CF):  
It recommends items that are similar to items the user has already liked.

How it works:

i)Calculate similarity between items (based on user ratings or interactions).

ii)For items a user likes, find other items that are highly similar.

iii)Recommend these similar items to the user.


2. What is collaborative filtering, and how does it work?   
answer:   
Definition:   
Collaborative Filtering is a recommendation technique that predicts what a user might like based on past interactions and the behavior of other users.   
i)It doesn’t require knowledge about the content itself (like genre, color, or features).     
ii)It relies purely on user-item interactions, such as ratings, clicks, purchases, or likes    
How Does It Work?   
Step 1: Collect Data   
Step 2: Find Similarity   
Step 3: Make Predictions    
Types of Collaborative Filtering:   
i)User-based CF → “Users like you liked these.”     
ii)Item-based CF → “Because you liked this, you might like that.”    
iii)Model-based CF → Uses machine learning (e.g., matrix factorization, SVD) to predict ratings.   