<a href="https://colab.research.google.com/github/sathasivamn/Recommendation-System/blob/main/Assignment_11_Recommendation_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Anime Recommendation System

### Data preprocessing
### Feature extraction (genre-based + rating)
### Cosine similarity-based recommendations
### Evaluation framework
### Interview question answer

In [1]:
# Install necessary libraries
!pip install scikit-learn pandas



In [19]:
# Import libraries
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import MultiLabelBinarizer, MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_score, recall_score, f1_score
from sklearn.preprocessing import MinMaxScaler

In [3]:
# Load dataset
df = pd.read_csv('anime.csv')

In [4]:
# Preview dataset
df.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


# Data Preprocessing
### Cleaned nulls, encoded genres, normalized ratings,episode counts, and TF-IDF Vectorizer on combined features.

In [13]:
# Data Preprocessing
# Drop rows with missing values in critical columns
df.dropna(subset=['name', 'genre', 'rating'], inplace=True)

In [14]:
# Reset index after dropping
df.reset_index(drop=True, inplace=True)

In [16]:
# Fill missing 'episodes' with median
df['episodes'] = pd.to_numeric(df['episodes'], errors='coerce')
df['episodes'].fillna(df['episodes'].median(), inplace=True)

# Fill missing rating values with 0
df['rating'] = df['rating'].fillna(0)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['episodes'].fillna(df['episodes'].median(), inplace=True)


In [17]:
# Feature Engineering
# Convert genres from string to list
df['genre'] = df['genre'].apply(lambda x: [i.strip() for i in str(x).split(',')])

In [20]:
# One-hot encode genres using MultiLabelBinarizer
mlb = MultiLabelBinarizer()
genre_encoded = mlb.fit_transform(df['genre'])

In [21]:
# Normalize 'rating' and 'episodes'
scaler = MinMaxScaler()
scaled_features = scaler.fit_transform(df[['rating', 'episodes']])

In [22]:
# Combine genre and scaled numeric features
feature_matrix = np.hstack((genre_encoded, scaled_features))

In [25]:
# Combine relevant features into a single string for TF-IDF
df['combined_features'] = df['genre'].apply(lambda x: ', '.join(x)) + ' ' + df['type'].fillna('') + ' ' + df['episodes'].fillna('0').astype(str)

In [26]:
# TF-IDF Vectorizer on combined features
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(df['combined_features'])

# Cosine Similarity
### Used for similarity-based recommendations

In [27]:
# Cosine Similarity Based Recommendation System
# Compute cosine similarity matrix
cos_sim = cosine_similarity(feature_matrix)
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

In [28]:
# Mapping anime titles to index
anime_index = pd.Series(df.index, index=df['name']).drop_duplicates()

# Recommendation Function
### Recommends anime similar to the input title

In [33]:
# Function to recommend similar anime
def recommend_anime(title, top_n=5, threshold=0.5):
    # Use anime_index instead of anime_indices
    idx = anime_index.get(title)
    if idx is None:
        return f"Anime '{title}' not found in the dataset."

    sim_scores = list(enumerate(cos_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:top_n+1]
    anime_indices = [i[0] for i in sim_scores]
    return df['name'].iloc[anime_indices]

    filtered_scores = [i for i in sim_scores[1:] if i[1] >= threshold][:top_n]
    anime_recommendations = df.iloc[[i[0] for i in filtered_scores]]

    print(f"\nTop {top_n} recommendations for '{title}':")
    for i, row in anime_recommendations.iterrows():
        print(f"{row['name']} (Similarity: {cos_sim[idx][i]:.2f})")

In [34]:
# Example usage
recommend_anime("Naruto", top_n=5, threshold=0.6)

Unnamed: 0,name
615,Naruto: Shippuuden
1103,Boruto: Naruto the Movie - Naruto ga Hokage ni...
486,Boruto: Naruto the Movie
1343,Naruto x UT
1472,Naruto: Shippuuden Movie 4 - The Lost Tower


# Evaluation
### Mock evaluation using precision, recall, and F1

In [35]:
# Evaluation (Simple Precision/Recall Setup)

# Simulate recommendation output and ground truth
# Let's split for example purposes
train, test = train_test_split(df, test_size=0.2, random_state=42)

In [37]:
# Define mock evaluation (not collaborative since we lack real user ratings)
def evaluate_mock():
    y_true = []
    y_pred = []

    for anime in test['name'][:50]:  # Only sample first 50 for demo
        # Use anime_index instead of anime_indices
        idx = anime_index.get(anime)
        if idx is None: continue
        sim_scores = list(enumerate(cos_sim[idx]))
        sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)[1:6]
        recommended_names = df.iloc[[i[0] for i in sim_scores]]['name'].tolist()

        # Assume all 5 are 'relevant' for mock testing
        y_true.extend([1]*5)
        y_pred.extend([1 if s[1] > 0.5 else 0 for s in sim_scores])

    print("\nEvaluation Metrics (Mock):")
    print("Precision:", precision_score(y_true, y_pred))
    print("Recall:", recall_score(y_true, y_pred))
    print("F1 Score:", f1_score(y_true, y_pred))

evaluate_mock()


Evaluation Metrics (Mock):
Precision: 1.0
Recall: 1.0
F1 Score: 1.0


# Answering Interview Questions

In [39]:
# Interview Questions

print("\nInterview Questions and Answers:")

print("\n1. What is the difference between user-based and item-based collaborative filtering?")
print("- User-based CF recommends items based on similar users’ preferences.")
print("- Item-based CF recommends similar items based on users' interactions with them.")

print("\n2. What is collaborative filtering and how does it work?")
print("- It is a recommendation method based on historical interactions (e.g., ratings).")
print("- It works by finding users/items with similar behavior or patterns and recommending accordingly.")
print("- It doesn’t need item content but purely relies on historical user behavior data.")


Interview Questions and Answers:

1. What is the difference between user-based and item-based collaborative filtering?
- User-based CF recommends items based on similar users’ preferences.
- Item-based CF recommends similar items based on users' interactions with them.

2. What is collaborative filtering and how does it work?
- It is a recommendation method based on historical interactions (e.g., ratings).
- It works by finding users/items with similar behavior or patterns and recommending accordingly.
- It doesn’t need item content but purely relies on historical user behavior data.
