In [30]:
import pandas as pd
# Load the dataset
anime_df = pd.read_csv("anime.csv")

In [31]:
anime_df.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


In [33]:
# Fill missing values
anime_df['genre'] = anime_df['genre'].fillna('')
anime_df['type'] = anime_df['type'].fillna('')
anime_df['rating']=anime_df['rating'].fillna(anime_df['rating'].mean())

In [34]:
# Combine text features
anime_df['combined_features'] = anime_df['genre'] + ' ' + anime_df['type'] + ' ' + anime_df['rating'].astype(str)

In [35]:
from sklearn.feature_extraction.text import TfidfVectorizer
# Initialize TF-IDF Vectorizer
tfidf_vectorizer = TfidfVectorizer()

# Fit and transform the combined features
tfidf_matrix = tfidf_vectorizer.fit_transform(anime_df['combined_features'])

In [36]:
tfidf_matrix

<12294x144 sparse matrix of type '<class 'numpy.float64'>'
	with 64043 stored elements in Compressed Sparse Row format>

In [37]:
from sklearn.metrics.pairwise import linear_kernel
def recommend_anime(anime_title, tfidf_matrix=tfidf_matrix):
    # Find the index of the given anime title
    anime_index = anime_df[anime_df['name'] == anime_title].index[0]

    # Calculate cosine similarity between the given anime and all other anime
    cosine_similarities = linear_kernel(tfidf_matrix[anime_index], tfidf_matrix).flatten()

    # Get top 10 similar anime indices
    similar_anime_indices = cosine_similarities.argsort()[-11:-1][::-1]

    # Get top 10 similar anime titles
    similar_anime_titles = anime_df.iloc[similar_anime_indices]['name'].values

    return similar_anime_titles

In [41]:
# Sample usage
anime_title = 'Naruto'
recommended_anime = recommend_anime(anime_title)
print("Anime similar to", anime_title, ":")
for anime in recommended_anime:
    print(anime)

Anime similar to Naruto :
Iron Virgin Jun
Naruto: Shippuuden Movie 3 - Hi no Ishi wo Tsugu Mono
Dragon Ball Super
Ikkitousen: Extravaganza Epoch
Tenjou Tenge
Naruto: Shippuuden
Gakuen Tokusou Hikaruon
Rekka no Honoo
Naruto x UT
Dragon Ball Z


* TF-IDF Vectorization: The text features of each anime (in this case, 'genre', 'type', and 'rating') are combined into a single string and then converted into numerical representations using TF-IDF vectorization. Each anime is represented as a vector in a high-dimensional space, where each dimension corresponds to a unique term (word or feature) present in the dataset.

* Cosine Similarity Calculation: Cosine similarity is then calculated between the TF-IDF vectors of the given anime and all other anime in the dataset. Cosine similarity measures the cosine of the angle between two vectors and ranges from -1 to 1, where a value closer to 1 indicates higher similarity. In essence, cosine similarity involves projecting one vector onto another and measuring the cosine of the resulting angle.

* Recommendation Generation: Based on the cosine similarity scores, the top N anime with the highest similarity scores are recommended as similar anime to the given anime.