## Tasks:

### Data Preprocessing:

Load the dataset into a suitable data structure (e.g., pandas DataFrame).
Handle missing values, if any.
Explore the dataset to understand its structure and attributes.

### Feature Extraction:

Decide on the features that will be used for computing similarity (e.g., genres, user ratings).
Convert categorical features into numerical representations if necessary.
Normalize numerical features if required.

### Recommendation System:

Design a function to recommend anime based on cosine similarity.
Given a target anime, recommend a list of similar anime based on cosine similarity scores.
Experiment with different threshold values for similarity scores to adjust the recommendation list size.

### Evaluation:

Split the dataset into training and testing sets.
Evaluate the recommendation system using appropriate metrics such as precision, recall, and F1-score.
Analyze the performance of the recommendation system and identify areas of improvement.


In [1]:
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_score, recall_score, f1_score

# Load the dataset
file_path = r"D:\Assignments\Recommendation System\anime.csv"
data = pd.read_csv(file_path)

# Step 1: Data Preprocessing
# Handle missing values for relevant columns
data.fillna({'genre': 'Unknown', 'type': 'Unknown', 'rating': 0, 'episodes': 'Unknown'}, inplace=True)

# Convert 'episodes' column to numerical, replacing 'Unknown' with 0
data['episodes'] = data['episodes'].replace('Unknown', 0).astype(int)

# Ensure all columns used in 'features' are strings to avoid errors
data['genre'] = data['genre'].astype(str)
data['type'] = data['type'].astype(str)
data['rating'] = data['rating'].astype(str)

# Combine relevant features for similarity calculation
data['features'] = data['genre'] + " " + data['type'] + " " + data['rating']

# Verify there are no NaN values in the 'features' column
assert not data['features'].isnull().any(), "Features column contains NaN values!"



# Step 3: Recommendation System
def recommend_anime(anime_title, top_n=5):
    """
    Recommend anime based on cosine similarity.
    """
    # Get the index of the target anime
    try:
        target_index = data[data['name'] == anime_title].index[0]
    except IndexError:
        return f"Anime '{anime_title}' not found in the dataset."

    # Calculate cosine similarity
    similarity_scores = cosine_similarity(tfidf_matrix[target_index], tfidf_matrix).flatten()

    # Get top N similar anime indices
    similar_indices = similarity_scores.argsort()[-top_n-1:-1][::-1]

    # Return recommended anime
    return data.iloc[similar_indices][['name', 'genre', 'rating']]

# Example: Recommend similar anime to a given title
anime_title = "Naruto"  # Replace with a title from your dataset
print(f"Recommendations for '{anime_title}':")
print(recommend_anime(anime_title))


Recommendations for 'Naruto':
                                                   name  \
7867                                    Iron Virgin Jun   
1573  Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsu...   
1930                                  Dragon Ball Super   
4067                     Ikkitousen: Extravaganza Epoch   
3038                                       Tenjou Tenge   

                                                  genre rating  
7867  Action, Comedy, Fantasy, Martial Arts, Super P...   4.81  
1573  Action, Comedy, Martial Arts, Shounen, Super P...    7.5  
1930  Action, Adventure, Comedy, Fantasy, Martial Ar...    7.4  
4067  Action, Ecchi, Martial Arts, School, Seinen, S...   6.81  
3038  Action, Comedy, Ecchi, Martial Arts, School, S...    7.1  


In [3]:
# Step 2: Feature Extraction
# Transform textual features into numerical using TF-IDF
tfidf_vectorizer = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf_vectorizer.fit_transform(data['features'])


In [2]:
# Step 3: Recommendation System
def recommend_anime(anime_title, top_n=5):
    """
    Recommend anime based on cosine similarity.
    """
    # Get the index of the target anime
    try:
        target_index = data[data['name'] == anime_title].index[0]
    except IndexError:
        return f"Anime '{anime_title}' not found in the dataset."

    # Calculate cosine similarity
    similarity_scores = cosine_similarity(tfidf_matrix[target_index], tfidf_matrix).flatten()

    # Get top N similar anime indices
    similar_indices = similarity_scores.argsort()[-top_n-1:-1][::-1]

    # Return recommended anime
    return data.iloc[similar_indices][['name', 'genre', 'rating']]


In [4]:
# Example: Recommend similar anime to a given title
anime_title = "Naruto"  # Replace with a title from your dataset
print(f"Recommendations for '{anime_title}':")
print(recommend_anime(anime_title))


Recommendations for 'Naruto':
                                                   name  \
7867                                    Iron Virgin Jun   
1573  Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsu...   
1930                                  Dragon Ball Super   
4067                     Ikkitousen: Extravaganza Epoch   
3038                                       Tenjou Tenge   

                                                  genre rating  
7867  Action, Comedy, Fantasy, Martial Arts, Super P...   4.81  
1573  Action, Comedy, Martial Arts, Shounen, Super P...    7.5  
1930  Action, Adventure, Comedy, Fantasy, Martial Ar...    7.4  
4067  Action, Ecchi, Martial Arts, School, Seinen, S...   6.81  
3038  Action, Comedy, Ecchi, Martial Arts, School, S...    7.1  


### **Interview Questions Explained**

1. **Difference Between User-Based and Item-Based Collaborative Filtering:**
   - **User-Based Collaborative Filtering**: Recommends items based on the similarity between users. If two users have a similar rating pattern, the system recommends items liked by one user to the other.
   - **Item-Based Collaborative Filtering**: Focuses on the similarity between items. It recommends items that are similar to the ones the user has already liked or interacted with.

2. **What is Collaborative Filtering, and How Does It Work?**
   - **Definition**: Collaborative filtering is a recommendation technique that predicts a user's preferences based on the preferences of similar users or items.
   - **Working**:
     - Create a matrix of users vs. items, with values representing interactions (e.g., ratings).
     - Calculate similarities (using cosine similarity, Pearson correlation, etc.) between users or items.
     - Predict the missing values in the matrix based on these similarities to make recommendations.

### Answer 1
User-Based Collaborative Filtering:

Focuses on user similarities.
If User A and User B have rated items similarly, items liked by User A are recommended to User B.
Example: Netflix recommending movies watched by users with similar viewing patterns.

Item-Based Collaborative Filtering:

Focuses on item similarities.
Recommends items similar to the ones a user has interacted with based on other users' preferences.
Example: Amazon suggesting items frequently bought together with a product.

In [None]:
### A