# Content based anime recommendation system

## Notebook set up

### Imports

In [None]:
import pandas as pd

### Dataset

Load animes & ratings data.

In [2]:
# Load anime information from CSV file
animes = pd.read_csv('../data/anime.csv')
animes.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama째,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


In [3]:
animes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12294 entries, 0 to 12293
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   anime_id  12294 non-null  int64  
 1   name      12294 non-null  object 
 2   genre     12232 non-null  object 
 3   type      12269 non-null  object 
 4   episodes  12294 non-null  object 
 5   rating    12064 non-null  float64
 6   members   12294 non-null  int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 672.5+ KB


## Content-based filtering

Examine the content features (genre, type) available for each anime to use in content-based filtering.

In [4]:
# Display relevant features for content-based filtering
animes[['anime_id', 'name', 'genre', 'type']].head(10)

Unnamed: 0,anime_id,name,genre,type
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV
2,28977,Gintama째,"Action, Comedy, Historical, Parody, Samurai, S...",TV
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV
5,32935,Haikyuu!!: Karasuno Koukou VS Shiratorizawa Ga...,"Comedy, Drama, School, Shounen, Sports",TV
6,11061,Hunter x Hunter (2011),"Action, Adventure, Shounen, Super Power",TV
7,820,Ginga Eiyuu Densetsu,"Drama, Military, Sci-Fi, Space",OVA
8,15335,Gintama Movie: Kanketsu-hen - Yorozuya yo Eien...,"Action, Comedy, Historical, Parody, Samurai, S...",Movie
9,15417,Gintama&#039;: Enchousen,"Action, Comedy, Historical, Parody, Samurai, S...",TV


Create a function to calculate similarity between animes based on their genres using Jaccard similarity (intersection over union).

In [5]:
# Convert genre strings to sets for easier comparison
animes['genre_set'] = animes['genre'].fillna('').apply(lambda x: set(x.split(', ')))

In [6]:
# Save anime dataframe with genra_set added
animes.to_csv('../data/processed_animes.csv', index=False)

In [7]:
def genre_similarity(genres1, genres2):
    """Calculate Jaccard similarity between two genre sets"""

    # Return 0 if either set is empty
    if len(genres1) == 0 or len(genres2) == 0:
        return 0

    # Calculate intersection (common genres) and union (all unique genres)
    intersection = len(genres1.intersection(genres2))
    union = len(genres1.union(genres2))

    # Jaccard similarity = intersection / union
    return intersection / union if union > 0 else 0

Select a target anime to demonstrate content-based filtering using genre similarity.

In [8]:
# Choose anime to find similar content for
target_anime_id = 1
target_anime = animes[animes['anime_id'] == target_anime_id].iloc[0]
target_genres = target_anime['genre_set']

print(f"Target anime: {target_anime['name']}")
print(f"Genres: {target_anime['genre']}")

Target anime: Cowboy Bebop
Genres: Action, Adventure, Comedy, Drama, Sci-Fi, Space


Demonstrate content-based filtering by finding animes with the most similar genres to the target anime.

In [9]:
# Calculate genre similarity for all animes
animes['similarity'] = animes['genre_set'].apply(
    lambda x: genre_similarity(target_genres, x)
)

# Find top similar animes (excluding the target itself)
similar_animes = animes[animes['anime_id'] != target_anime_id].sort_values(
    'similarity', 
    ascending=False
)[['name', 'genre', 'similarity']].head(5)

print('Top 5 similar animes based on genre:')
similar_animes.head()

Top 5 similar animes based on genre:


Unnamed: 0,name,genre,similarity
1465,Cowboy Bebop: Yose Atsume Blues,"Action, Adventure, Comedy, Drama, Sci-Fi, Space",1.0
6568,Seihou Tenshi Angel Links,"Action, Adventure, Comedy, Drama, Romance, Sci...",0.857143
5721,Kaitei Choutokkyuu: Marine Express,"Action, Adventure, Comedy, Drama, Sci-Fi",0.833333
2333,Ginga Tetsudou Monogatari,"Action, Adventure, Drama, Sci-Fi, Space",0.833333
1073,Waga Seishun no Arcadia,"Action, Adventure, Drama, Sci-Fi, Space",0.833333
