In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import requests

# Data Sources
Ideally, I would like to get data from:
- Reddit episode discussions
    - upvotes
    - number of comments
    - episode ratings through the surveys in episode discussion threads
- IMBD episode ratings
- MAL episode polls

The reddit upvotes and comments are more a measure of popularity than quality. However the episode ratings and polls are likely to be a good measure. Most people will rate episodes they enjoy highly, and will rate low the episodes they don't like. Some people will rate all episodes highly, and some will rate all lowly, that is just the nature of the internet.

In terms of audience, these measures are obviously pretty specific. They are limited to the type of people that participate in these sites, which is a very different audience to the total population that have watched DBZ. Ideally, if one wanted to get a measure of the 'best' DBZ episode with the widest possible sample population, one would include statistics such as TV viewership and media sales. But to keep this project manageable I'll restrict to just the listed data sources. As such, this data will only be representative of the interactive, online, English audience.

---

# Hypotheses
Based off my own internal biases and preconceptions, I reckon we'll get the following:
- Top rated episode is either during Gohan's fight with Perfect Cell, or during the Goku vs Vegeta fight in the Saiyan saga.
- Top episode in each arc is during the final fight, with the exception of the Frieza saga where it's the episode when Goku achieves super saiyan.
- Lowest rated episode will be a filler episode, maybe in the Garlic Jr arc.


# MyAnimeList (MAL)
MAL has an unofficial API, [Jikan](jikan.moe), that we can use to get data from MAL. First, we can get the episode titles and other info, then hopefully we can use it to get the voting data from the forums.

The entry for DBZ is [here](https://myanimelist.net/anime/813/Dragon_Ball_Z). All anime in MAL have a corresponding ID, and from the URL we see that the ID for DBZ is 813.

The below code gets data from Jikan.

In [25]:
MAL_ID = 813 # DBZ's ID in the MAL database
MAX_PAGES = 10 # if we reach MAX_PAGES, something might have gone wrong, 
TIMEOUT = 1.0 # how long to wait before timing out

# get json from jikan
request_list = [] # holds request objects
page = 1
print("Obtaining data from Jikan...")
next_page_exists = True
while page <= MAX_PAGES and next_page_exists:
    print("Getting page",page)
    
    try:
        # get data
        r = requests.get(f"https://api.jikan.moe/v4/anime/{MAL_ID}/episodes?page={page}",\
                        timeout=TIMEOUT)
    except requests.exceptions.Timeout:
        # exit loop if we time out
        print("Timed out on page",page)
        break
    
    # append request results to list
    if r.status_code == 200:
        print("Successfully got data")
        request_list.append(r)
        next_page_exists = r.json()["pagination"]["has_next_page"]
    else:
        # if we errored, print the data and break
        print(f"HTML ERROR CODE {r.status_code}")
        print(r.json())
        break
    
    page += 1

if not next_page_exists:
    print("Reached last page")
else:
    print("ended prematurely")

Obtaining data from jikan...
Getting page 1
Successfully got data
Getting page 2
Successfully got data
Getting page 3
Successfully got data
Reached last page


Now that we have out mal data, we need to append compile it into a Pandas DataFrame and save it for analysis later

In [32]:
# append data from different pages
mal_data = []
for r in request_list:
    mal_data = mal_data + r.json()['data']
mal_df = pd.DataFrame(mal_data)

# save to a csv
MAL_FILENAME = "data/MAL_data.csv"
mal_df.to_csv(MAL_FILENAME)
mal_df

Unnamed: 0,mal_id,url,title,title_japanese,title_romanji,aired,score,filler,recap,forum_url
0,1,https://myanimelist.net/anime/813/Dragon_Ball_...,The New Threat,ミニ悟空はおぼっちゃま！ボク悟飯です。,Mini Gokuu wa Obotchama! Boku Gohan Desu.,1989-04-26T00:00:00+00:00,4.2,False,False,https://myanimelist.net/forum/?topicid=13008
1,2,https://myanimelist.net/anime/813/Dragon_Ball_...,Reunions,史上最強の戦士は悟空の兄だった！,Shijou Saikyou no Senshi wa Gokuu no Ani Datta!,1989-05-03T00:00:00+00:00,4.4,False,False,https://myanimelist.net/forum/?topicid=13009
2,3,https://myanimelist.net/anime/813/Dragon_Ball_...,Unlikely Alliance,やった！これが地上最強のコンビだ！,Yatta! Kore ga Chijou Saikyou no Combo Da!,1989-05-10T00:00:00+00:00,4.5,False,False,https://myanimelist.net/forum/?topicid=13010
3,4,https://myanimelist.net/anime/813/Dragon_Ball_...,Piccolo's Plan,ピッコロの切り札！悟飯は泣きむしクン,Pikkolo no Kirifuda! Gohan wa Nakimushikun,1989-05-17T00:00:00+00:00,4.6,False,False,https://myanimelist.net/forum/?topicid=13011
4,5,https://myanimelist.net/anime/813/Dragon_Ball_...,Gohan's Rage,悟空死す！ラストチャンスは一度だけ,Gokuu Shisu! Last Chance wa Ichido Dake,1989-05-24T00:00:00+00:00,4.5,False,False,https://myanimelist.net/forum/?topicid=13007
...,...,...,...,...,...,...,...,...,...,...
286,287,https://myanimelist.net/anime/813/Dragon_Ball_...,Celebrations with Majin Buu,戻った平和！！正義の味方魔人ブウ！？,Modotta Heiwa!! Seigi no Mikata Majin Buu!?,1995-12-20T00:00:00+00:00,4.8,False,False,https://myanimelist.net/forum/?topicid=696097
287,288,https://myanimelist.net/anime/813/Dragon_Ball_...,He's Always Late,遅いぜ悟空！みんなでパーティ！！,Osoi ze Gokuu! Minna de Party!!,1996-01-10T00:00:00+00:00,4.5,True,False,https://myanimelist.net/forum/?topicid=696103
288,289,https://myanimelist.net/anime/813/Dragon_Ball_...,Granddaughter Pan,悟空おじいちゃん！私がパンよ！！,Gokuu Ojii-chan! Watashi ga Pan yo!!,1996-01-17T00:00:00+00:00,4.6,False,False,https://myanimelist.net/forum/?topicid=696113
289,290,https://myanimelist.net/anime/813/Dragon_Ball_...,Buu's Reincarnation,オイラはウーブ！今１０歳で元魔人！？,Oira wa Oob! Ima Jussai de Moto Majin!?,1996-01-24T00:00:00+00:00,4.6,False,False,https://myanimelist.net/forum/?topicid=696117


There we go, this gives a bunch of information. Most importantly:
- English and japanese episode titles
- Each episode's score from the forums
- Whether each episode is a filler or recap episode
This is great, especially the score data. I was worried that we would have to webscrape for that since it isn't listed in the Jikan documentation, but 