In this notebook, we will explore the anime datasets and perform data analysis. 

## Anime Dataset Description

- **anime_id**: Unique ID for each anime.
- **Name**: The name of the anime in its original language.
- **English name**: The English name of the anime.
- **Other name**: Native name or title of the anime (can be in Japanese, Chinese, or Korean).
- **Score**: The score or rating given to the anime.
- **Genres**: The genres of the anime, separated by commas.
- **Synopsis**: A brief description or summary of the anime's plot.
- **Type**: The type of the anime (e.g., TV series, movie, OVA, etc.).
- **Episodes**: The number of episodes in the anime.
- **Aired**: The dates when the anime was aired.
- **Premiered**: The season and year when the anime premiered.
- **Status**: The status of the anime (e.g., Finished Airing, Currently Airing, etc.).
- **Producers**: The production companies or producers of the anime.
- **Licensors**: The licensors of the anime (e.g., streaming platforms).
- **Studios**: The animation studios that worked on the anime.
- **Source**: The source material of the anime (e.g., manga, light novel, original).
- **Duration**: The duration of each episode.
- **Rating**: The age rating of the anime.
- **Rank**: The rank of the anime based on popularity or other criteria.
- **Popularity**: The popularity rank of the anime.
- **Favorites**: The number of times the anime was marked as a favorite by users.
- **Scored By**: The number of users who scored the anime.
- **Members**: The number of members who have added the anime to their list on the platform.
- **Image URL**: The URL of the anime's image or poster.

## User-Anime Ratings Dataset Description

- **user_id**: Unique ID for each user.
- **Username**: The username of the user.
- **anime_id**: Unique ID for each anime.
- **Anime Title**: The title of the anime.
- **rating**: The rating given by the user to the anime.

In [2]:
# import zipfile
# import os

# zip_file_path = '/home/karthikponna/kittu/Anime Recommendation system-MLops/Mlops-Anime-Recommendation-System/datasets/archive.zip'
# destination_folder = '/home/karthikponna/kittu/Anime Recommendation system-MLops/Mlops-Anime-Recommendation-System/datasets'

# with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
#     file_names = zip_ref.namelist()
#     csv_files = [file for file in file_names if file.endswith('.csv')]
#     for file in csv_files:
#         file_name = os.path.basename(file)
#         destination_path = os.path.join(destination_folder, file_name)
#         with open(destination_path, 'wb') as f:
#             f.write(zip_ref.read(file))
# print("CSV files extracted successfully.")

In [4]:
# Reading Dataset
import numpy as np
import pandas as pd
# Visualization
import plotly.express as px
import plotly.graph_objects as go  # for 3D plot visualization
import plotly.figure_factory as ff
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)
import matplotlib.pyplot as plt
import seaborn as sns
from wordcloud import WordCloud
# from langdetect import detect
from datetime import datetime

In [5]:
pd.set_option('display.max_columns', 50)
df_anime=pd.read_csv('/home/karthikponna/kittu/Anime Recommendation system-MLops/Mlops-Anime-Recommendation-System/datasets/anime-dataset-2023.csv')
print("Shape of the Dataset:",df_anime.shape)
df_anime.head(3).T

Shape of the Dataset: (24905, 18)


Unnamed: 0,0,1,2
anime_id,1,5,6
name,Cowboy Bebop,Cowboy Bebop: Tengoku no Tobira,Trigun
average_rating,8.75,8.38,8.22
genres,"Action, Award Winning, Sci-Fi","Action, Sci-Fi","Action, Adventure, Sci-Fi"
overview,"Crime is timeless. By the year 2071, humanity ...","Another day, another bounty—such is the life o...","Vash the Stampede is the man with a $$60,000,0..."
type,TV,Movie,TV
episodes,26,1,26
producers,Bandai Visual,"Sunrise, Bandai Visual",Victor Entertainment
licensors,"Funimation, Bandai Entertainment",Sony Pictures Entertainment,"Funimation, Geneon Entertainment USA"
studios,Sunrise,Bones,Madhouse


In [6]:
df_score=pd.read_csv('/home/karthikponna/kittu/Anime Recommendation system-MLops/Mlops-Anime-Recommendation-System/datasets/users-score-2023.csv')
print("Shape of the dataset:",df_score.shape)

: 

In [None]:
df_score.rename(columns={'Username':'username','Anime Title':"name"},inplace=True)
df_score.head()

# Explorartory Data Analysis

## Data Exploration

#### Checking each dataframes
In order to gain a better understanding of the data, it is important to examine each DataFrame individually. This includes assessing its structure and identifying any missing values. We will begin this process by using the info() method, which provides a comprehensive overview of the DataFrame's columns and structure.

In [None]:
df_anime.info()

In [None]:
df_anime['average_rating'].value_counts()

In [None]:
average_rating = df_anime['average_rating'][df_anime['average_rating'] != 'UNKNOWN']
average_rating = average_rating.astype('float')
average_rating_mean= round(average_rating.mean() , 2)

In [None]:
df_anime['average_rating'] = df_anime['average_rating'].replace('UNKNOWN', average_rating_mean)
df_anime['average_rating'] = df_anime['average_rating'].astype('float64')