# ![](img/udacity.png)  **Studying MyAnimeList's Data to Assert Favourable Viewership KPIs Anime Studios Should Focus**  
## by **[Tevin Aduma](https://www.linkedin.com/in/tevin-aduma-926803ab/)**
![](img/MyAnimeList_Logo.png)

>**Before you start**: You must have the README.md file ready that include a summary of main findings that reflects on the steps taken during the data exploration (Part I notebook). The README.md file should also describes the key insights that will be conveyed by the explanatory slide deck (Part II  outcome)



## Investigation Overview


> Describe the overall goals of your presentation here. Add a summary of key insights at the start of the notebook, just as you added in the README.md. This will help your notebook to stay aligned to the key insights you want to include in your slide deck.

> **Rubric Tip**: The key insights in the slideshow must match those documented in the README.md summary. 


## Dataset Overview

> Provide a brief overview of the dataset to be presented here.

In [1]:
# import all packages and set plots to be embedded inline
from wordcloud import WordCloud, STOPWORDS

import pandas as pd
import numpy as np
import re

#ensure all Plotly plots render while offline
import matplotlib.pyplot as plt
import seaborn as sb

from plotly.offline import iplot, init_notebook_mode
import plotly.graph_objects as go
import plotly.express as px

init_notebook_mode(connected=True)

%matplotlib inline

> ### Loading the datasets

In [2]:
#custom function to read data into Pandas DataFrame
def open_set(csv, sep=',', encoding='utf-8', usecols=None):
    df = pd.read_csv('data/'+csv, low_memory=False, encoding=encoding)
    
    return df

In [3]:
df_animes = open_set('animes_cleaned.csv')
df_reviews = open_set('reviews_cleaned.csv', encoding='latin')

> Note that the above cells have been set as "Skip"-type slides. That means
that when the notebook is rendered as http slides, those cells won't show up.

## Reflected Anime Industry Growth Through the Seasons and Years


> There were a few things to note:
> * Overally, yearly anime productions have gone up since the 90s. 
>
> * **Spring** has the most releases while **Winter** has the fewest. However, it is noteworthy that **Winter** lead in production in 2005 while production of anime in the other seasons dropped. 
>
> * **Fall** and **Spring** numbers have significantly improved over the three decades. Could these seasons seemingly be the most popular among the viewership?

In [35]:
df_agg = df_animes.groupby(['year_released', 'season_released'])['title'].count().reset_index()
fig = px.line(df_agg, x='year_released', y='title', color='season_released',
                labels={'season_released':'Season Released', 'year_released':'Year','title':'Count'})
fig.update_layout(title='Growth of Anime Production Through Various Seasons (1992-2018)', template='plotly_white')
fig.add_shape( # add a horizontal "target" line
    type="line", line_color="salmon", line_width=3, opacity=1, line_dash="dot",
    x0=0, x1=1, xref="paper", y0=66, y1=66, yref="y"
)
fig.add_annotation( # add a text callout with arrow
    text="Spring titles lead the pack 🍃", x="2016", y=160, arrowhead=1, showarrow=True
)
fig.add_annotation(text="Winter production abnormally improved", x="2005", y=65, arrowhead=1, showarrow=True
)

## Studios Producing the Most Anime Titles (1992-2018)

> * **Toei Animation** and **Sunrise** lead the pack by a distance in terms of the titles they have churned out. This could speak to the success they have had in many of their releases: Toei with world renowed **One Piece** anime, Sunrise with the **Cowboy Bebop** series, the **Gundam franchise** to name a few.
>
> * Compared to their overall volume in production, **Studio Bones**, **Madhouse**, **A-1 Pictures**, **Kyoto Animation (KyoAni ♥)** and **J.C. Staff** have had the most number of top releases in the last 3 decades compared to the rest of the pack.



In [None]:
# I tally the appearance of each studio per record as an account of their involvement in the production of the record's title. 

fig = px.bar(df_animes.studio.value_counts().reset_index().rename(columns={'studio':'count', 'index':'studio'}).head(10),
        y='studio', x='count', text='studio', orientation='h',
        labels={'studio':'Anime Studios','count':'Number of anime produced'},
        title='Top 10 Studios Producing The Most Anime Between 1992-2018')
fig.update_yaxes(visible=False, showticklabels=False)
fig.update_layout(yaxis=dict(autorange="reversed"))

In [47]:
# I tally the appearance of each studio per record as an account of their involvement in the production of the record's title. 
# create a dataframe of animes with the top 100 smallest avg_rank

top100_anime = df_animes.nsmallest(100,'avg_rank_score')
fig = px.bar(top100_anime['studio'].value_counts().to_frame().head(10).reset_index().rename(columns={'index':'studio', 'studio':'count'}),
        x='studio', y='count', text='count', orientation='v',
        labels={'studio':'Anime Studios','count':'Number of anime produced'},
        title='Studios That Produced Animes Titles Rated in the Top 100 (1992-2018)')
fig.update_yaxes(visible=False, showticklabels=False)
fig.update_layout(template='plotly_dark')


## Insights generated from a WordCloud of all reviews text the MAL dataset



> * Words like **_character_**, **_enjoyed_**, **_felt_**, **_main character_**, and  **_story_** gravitate towards strong correlation between **Character Development**, **Enjoyment**, and **Story**.
>
> * **_Music_** and **_soundtrack_** attribute to the importance of the music used in anime productions.
>
> * **_Art_** and **_character design_** are some of the highlights the viewership seeks in **Animation**.
>
> * Due to the frequency of TV content, it's also noteworthy of the word **_episode_** being a mainstay point-of-note in the viewership.

![](img/wordcloud.png)

## Understanding the Relationship Between the Viewership's Polarity and Sentiment Scores 

> * **Character Development** and **Enjoyment** have a strong correlation with the outcome of the **Story** score. If an anime does not deliever on the two former features, it's highly likely that it will score quite poorly on the **Story** factor.
>
> * **Animation** and **Music** go hand in hand to a certain degree. This explains why certain action scenes in anime could go for fast-paced upbeat tunes while emotive scenes would require songs along the melancholic to poignant scale.
>
> * The outstanding sentiment left on a viewer could be highly influenced by how the anime scores in the **Character Development**, **Enjoyment** and **Story** features. 


In [None]:
fig = px.imshow(df_review_studios[['Story', 'Animation', 'Music', 'Character Development',
       'Enjoyment', 'polarity', 'subjectivity']].corr(), color_continuous_scale='icefire', width=1000, height=800, title='Relationship between Features of Anime as Ranked by Audiences', text_auto='.2f')
fig.show()

<hr>

>**Generate Slideshow**: Once you're ready to generate your slideshow, use the `jupyter nbconvert` command to generate the HTML slide show. . From the terminal or command line, use the following expression.

In [None]:
!jupyter nbconvert <Part_II_Filename>.ipynb --to slides --post serve --no-input --no-prompt

> This should open a tab in your web browser where you can scroll through your presentation. Sub-slides can be accessed by pressing 'down' when viewing its parent slide. Make sure you remove all of the quote-formatted guide notes like this one before you finish your presentation! At last, you can stop the Kernel. 