<a href="https://colab.research.google.com/github/exglade/query-anime/blob/main/query_anime_list.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Query Anime List

## Problem Statement

I have a spreadsheet of anime list with following columns:

- **Parent**: The anime's parent title.
- **Title**: The full anime title.
- **Genre**: The anime genre (according to MyAnimeList).
- **Season**: The xth season of the anime. Non-season type: OVA, Movie, ONA, Special.
- **Total Episodes**: The total number of episodes that the anime has.
- **Date of Release**: The date when the anime start airing.
- **Date of Completion**: The date when the anime finishes airing.
- **Status**: The airing status of the anime: Unreleased, Ongoing, Released.
- **Last**: The last episode I've watched.
- **Score**: My personal rating of the anime on a scale of 0-5.

There is a list of Anime tabs stored in OneTab. The list can be exported from OneTab into text. Examples below:

```text
https://www.microsoft.com/design/fluent/#/ | Microsoft Design
https://medium.com/microsoft-design/designing-for-power-simplicity-9cddec615567 | Designing for Power and Simplicity - Microsoft Design - Medium
...
```

After clean-up, there are 150 anime in the list. Performing query on anime one by one and copy the information would take hours.

There has to be a better way so it would be easier for me in future too! 😁

## Solution

1. Data: Extract, Transform, Load the anime list extracted from OneTab.
2. Search for anime by name on public anime list.
3. Map anime to public anime list's identifier.
4. Fetch anime details.
5. Export into spreadsheet. 🎉

### API
- [MyAnimeList](https://myanimelist.net/) - [v2](https://myanimelist.net/apiconfig/references/api/v2) | [Authorisation guide](https://myanimelist.net/blog.php?eid=835707)
- [AniList](https://anilist.co/) - [v2](https://anilist.gitbook.io/anilist-apiv2-docs/)
- [Jikan](https://jikan.moe/) - [v3](https://jikan.docs.apiary.io/)

# Data: Extract, Transform, Load

1. Export the list of tabs using OneTab export
2. For each line, filter away non-relevant lines and parse the data.
3. Save the parsed into CSV for later use.

**Anime record columns:**
- Anime name
- Episode that I watched until
- Full URL
- Page title
- Website

In [None]:
# https://colab.research.google.com/notebooks/io.ipynb#scrollTo=u22w3BFiOveA
from google.colab import drive
drive.mount('/content/drive')

#drive.flush_and_unmount()

In [None]:
# !cat '/content/drive/MyDrive/Colab Notebooks/mal-anime-query/Data/onetab-list.txt'
data_dir = '/content/drive/MyDrive/Colab Notebooks/mal-anime-query/Data/'

## AnimeRecord class

This class provides the parsing mechanism and contains the parsed information for future use.

In [None]:
# Declare AnimeRecord class

import re
import uuid
import ast

class AnimeRecord:
  def __init__(self, url, page_title, name, episode, website, id):
    self.id = id
    self.page_title = page_title
    self.url = url
    self.name = name
    self.episode = self.__convert_to_num(episode)
    self.website = website

  @classmethod
  def construct(self, url, page_title):
    id = uuid.uuid4()
    name = ''
    episode = 0
    website = ''

    clean_page_title = page_title
    match = re.search('Episode (?P<episode>[\d\.]+)', page_title)
    if match:
        episode = self.__convert_to_num(self, match.group('episode').strip())
        clean_page_title = page_title[:match.span()[0]]

    match = re.search('(Watch )*(?P<title>.*)', clean_page_title)
    if match:
      name = match.group('title').strip()

    match = re.search('(http[s]*):\/\/([\w\d]+\.)?(?P<site>[\w\d]+\.[\w]+)\/.*', url)
    if match:
      website = match.group('site').strip()

    return self(url=url, page_title=page_title, id=id, name=name, episode=episode, website=website)

  @classmethod
  def from_dict(self, dict):
    return self(url=dict['url'], page_title=dict['page_title'], id=dict['id'], name=dict['name'], episode=dict['episode'], website=dict['website'])

  def __str__(self):
    return "Name='%s', Episode='%s', Website='%s'" % (self.name, self.episode, self.website)

  def __convert_to_num(self, s):
    f = float(s)
    i = int(f)
    return i if i == f else f

## Parsing the lines

I load the file and read line by line. Extract the page url and page title, then feed them to AnimeRecord class to parse the information.

During the traversal, there are lines that aren't anime streaming site, so with the assumption that all anime streaming sites has the name 'anime' in domain name, I can filter most of the noises. However, there are also some sites that aren't anime streaming sites has 'anime' in domain name, i.e. myanimelist.net. With that in mind, I set additional explicit condition to remove them.

In [None]:
import re

input_file = data_dir + 'onetab-raw2.txt'

animes = []
i = 0
with open(input_file, 'r') as f:
  lines = f.readlines()

  for line in lines:
    # Remove possible irrelevant lines
    if not re.match('http[s]?:\/\/.*anime.*\.[a-z]+\/.* \|', line): # Assumption all anime sites has 'anime' in url
      continue
    if 'myanimelist' in line: # Ignore myanimelist
      continue
    if not len(line.strip()):
      continue

    separator_index = line.index('|')
    page_url = line[0:separator_index].strip()
    page_title = line[separator_index+1:].strip()

    anime = AnimeRecord.construct(page_url, page_title)
    animes.append(anime)
    
    i += 1
    print('[%d]%s' % (i, anime))

print('----------')
print('number of animes found: %s' % len(animes))
print('anime sites:')
anime_sites = set()
for anime in animes:
  anime_sites.add(anime.website)
for site in anime_sites:
  print('- %s' % site)

## Data cleaning

During the parsing, some irrelevant lines have been removed.

However, some data can still be dirty. A few scenarios:

- **Duplicate records** - due to saving tab of same or different episodes using OneTab.
- **Unexpected title form** - not all page titles are created equal.
  - No 'Episode' stop word. Last seen episode will be 0 and the title will take in noise.

In [None]:
# Clean up duplicates

duplicates = set()

animes_dupe_dict = {}
for anime in animes:
  if anime.name in animes_dupe_dict:
    duplicates.add(anime.name)
  if anime.name not in animes_dupe_dict or animes_dupe_dict[anime.name].episode < anime.episode:
    animes_dupe_dict[anime.name] = anime

i = 0
animes = animes_dupe_dict.values()
for anime in animes:
  i += 1
  print('[%d]%s' % (i, anime))

print('----------')
print('Duplicates:')
for duplicate in duplicates:
  print('- %s' % duplicate);

In [None]:
# Print possible dirty records
# Assuming they would have 0 episode because episode is not mentioned in page title.

for anime in animes:
  if anime.episode == 0:
    print(anime)

In [None]:
# Manual clean data according to previous cell output

cleaner_map = {
    'Wonder Egg Priority English Subbed Online Free': 'Wonder Egg Priority',
    'Jujutsu Kaisen (TV) English Subbed Online Free': 'Jujutsu Kaisen (TV)',
    'Sword Art Online: Alicization - War of Underworld Anime English Subbed in HD for Free on Animefreak.TV': 'Sword Art Online: Alicization - War of Underworld',
    'Kekkai Sensen: Ousama no Restaurant no Ousama OVA Online | English Dubbed-Subbed Episodes': 'Kekkai Sensen: Ousama no Restaurant no Ousama OVA',
    'Nanatsu no Taizai OVA 2 Online | English Dubbed-Subbed Episodes': 'Nanatsu no Taizai OVA'
}

for anime in animes:
  if anime.name in cleaner_map:
    print('Changing "%s" to "%s"' % (anime.name, cleaner_map[anime.name]))
    anime.name = cleaner_map[anime.name]

## Cache

Following cells are the caching mechanism for parsed anime list.

This also allows us to clean the data before the next step. It's possible that the data is dirty because page title is not standardise to any specific format.

In [None]:
import csv

output_file = data_dir + 'output.csv'
with open(output_file, mode='w') as csv_file:
    fieldnames = ['id', 'name', 'episode', 'website', 'page_title', 'url']
    writer = csv.DictWriter(csv_file, fieldnames=fieldnames)

    writer.writeheader()
    for anime in animes:
      writer.writerow(vars(anime))

In [None]:
import csv

i = 0
animes_file = data_dir + 'output.csv'
with open(animes_file, mode='r') as csv_file:
  reader = csv.DictReader(csv_file)
  animes = []
  for row in reader:
    anime = AnimeRecord.from_dict(row)
    animes.append(anime)

    i += 1
    print('[%d]%s' % (i, anime))

# Search for Anime by Name on MAL

I need to connect to a public anime list to get anime details. After evaluating other public anime list and trying out MAL's own API, I hit a major blocker that is both require me to register as an app and have complex login mechanism that isn't possible on PowerShell.

Hence, I moved to a unofficial public API that gets the job done -- [Jikan.moe](https://jikan.moe/). Jikan has many wrapper libraries in different languages. After much consideration, I opt for [JikanPy](https://github.com/abhinavk99/jikanpy) due to Python's nature in dealing with data.

In [None]:
!pip install jikanpy

## API References

This section describes some [JikanPy API](https://jikanpy.readthedocs.io/en/latest/) that I would be using. [Jikan's official API documentation](https://jikan.docs.apiary.io/) contains more parameters that may be inserted into JikanPy.

[`jikan.anime()`](https://jikanpy.readthedocs.io/en/latest/#jikanpy.Jikan.anime) gets information on an anime.

Response sample:
```json
{"request_hash": "request:anime:047ce4420fa843606934309866d292274c149a83", "request_cached": true, "request_cache_expiry": 86241, "mal_id": 457, "url": "https://myanimelist.net/anime/457/Mushishi", "image_url": "https://cdn.myanimelist.net/images/anime/2/73862.jpg", "trailer_url": "https://www.youtube.com/embed/h371H0KIuPo?enablejsapi=1&wmode=opaque&autoplay=1", "title": "Mushishi", "title_english": "Mushi-Shi", "title_japanese": "\u87f2\u5e2b", "title_synonyms": [], "type": "TV", "source": "Manga", "episodes": 26, "status": "Finished Airing", "airing": false, "aired": {"from": "2005-10-23T00:00:00+00:00", "to": "2006-06-19T00:00:00+00:00", "prop": {"from": {"day": 23, "month": 10, "year": 2005}, "to": {"day": 19, "month": 6, "year": 2006}}, "string": "Oct 23, 2005 to Jun 19, 2006"}, "duration": "25 min per ep", "rating": "PG-13 - Teens 13 or older", "score": 8.69, "scored_by": 209628, "rank": 49, "popularity": 169, "members": 645057, "favorites": 23613, "synopsis": "\"Mushi\": the most basic forms of life in the world. They exist without any goals or purposes aside from simply \"being.\" They are beyond the shackles of the words \"good\" and \"evil.\" Mushi can exist in countless forms and are capable of mimicking things from the natural world such as plants, diseases, and even phenomena like rainbows. This is, however, just a vague definition of these entities that inhabit the vibrant world of Mushishi, as to even call them a form of life would be an oversimplification. Detailed information on Mushi is scarce because the majority of humans are unaware of their existence. So what are Mushi and why do they exist? This is the question that a \"Mushishi,\" Ginko, ponders constantly. Mushishi are those who research Mushi in hopes of understanding their place in the world's hierarchy of life. Ginko chases rumors of occurrences that could be tied to Mushi, all for the sake of finding an answer. It could, after all, lead to the meaning of life itself. [Written by MAL Rewrite]", "background": null, "premiered": "Fall 2005", "broadcast": "Sundays at 03:40 (JST)", "related": {"Adaptation": [{"mal_id": 418, "type": "manga", "name": "Mushishi", "url": "https://myanimelist.net/manga/418/Mushishi"}], "Sequel": [{"mal_id": 21329, "type": "anime", "name": "Mushishi: Hihamukage", "url": "https://myanimelist.net/anime/21329/Mushishi__Hihamukage"}, {"mal_id": 21939, "type": "anime", "name": "Mushishi Zoku Shou", "url": "https://myanimelist.net/anime/21939/Mushishi_Zoku_Shou"}], "Summary": [{"mal_id": 39738, "type": "anime", "name": "Mushishi Recap", "url": "https://myanimelist.net/anime/39738/Mushishi_Recap"}]}, "producers": [{"mal_id": 52, "type": "anime", "name": "Avex Entertainment", "url": "https://myanimelist.net/anime/producer/52/Avex_Entertainment"}, {"mal_id": 82, "type": "anime", "name": "Marvelous", "url": "https://myanimelist.net/anime/producer/82/Marvelous"}, {"mal_id": 147, "type": "anime", "name": "SKY Perfect Well Think", "url": "https://myanimelist.net/anime/producer/147/SKY_Perfect_Well_Think"}, {"mal_id": 711, "type": "anime", "name": "Delfi Sound", "url": "https://myanimelist.net/anime/producer/711/Delfi_Sound"}], "licensors": [{"mal_id": 102, "type": "anime", "name": "Funimation", "url": "https://myanimelist.net/anime/producer/102/Funimation"}], "studios": [{"mal_id": 8, "type": "anime", "name": "Artland", "url": "https://myanimelist.net/anime/producer/8/Artland"}], "genres": [{"mal_id": 2, "type": "anime", "name": "Adventure", "url": "https://myanimelist.net/anime/genre/2/Adventure"}, {"mal_id": 36, "type": "anime", "name": "Slice of Life", "url": "https://myanimelist.net/anime/genre/36/Slice_of_Life"}, {"mal_id": 7, "type": "anime", "name": "Mystery", "url": "https://myanimelist.net/anime/genre/7/Mystery"}, {"mal_id": 13, "type": "anime", "name": "Historical", "url": "https://myanimelist.net/anime/genre/13/Historical"}, {"mal_id": 37, "type": "anime", "name": "Supernatural", "url": "https://myanimelist.net/anime/genre/37/Supernatural"}, {"mal_id": 10, "type": "anime", "name": "Fantasy", "url": "https://myanimelist.net/anime/genre/10/Fantasy"}, {"mal_id": 42, "type": "anime", "name": "Seinen", "url": "https://myanimelist.net/anime/genre/42/Seinen"}], "opening_themes": ["\"The Sore Feet Song\" by Ally Kerr"], "ending_themes": ["#01: \"Midori no Za\" (\u7dd1\u306e\u5ea7) by Masuda Toshio (ep 1)", "#02: \"Mabuta no Hikari\" (\u77bc\u306e\u5149) by Masuda Toshio (ep 2)", "#03: \"Yawarakai Kaku\" (\u67d4\u3089\u304b\u3044\u89d2) by Masuda Toshio (ep 3)", "#04: \"Makura Kouji\" (\u6795\u5c0f\u8def ) by Masuda Toshio (ep 4)", "#05: \"Tabi wo Suru Numa\" (\u65c5\u3092\u3059\u308b\u6cbc) by Masuda Toshio (ep 5)", "#06: \"Tsuyu wo Suu Mure\" (\u9732\u3092\u5438\u3046\u7fa4\u308c) by Masuda Toshio (ep 6)", "#07: \"Ame ga Kuru Niji ga Tatsu\" (\u96e8\u304c\u304f\u308b\u8679\u304c\u305f\u3064) by Masuda Toshio (ep 7)", "#08: \"Unasaka Yori\" (\u6d77\u5883\u3088\u308a)  by Masuda Toshio (ep 8)", "#09: \"Omoi Mi\" (\u91cd\u3044\u5b9f) by Masuda Toshio (ep 9)", "#10: \"Suzuri ni Sumu Shiro\" (\u786f\u306b\u68f2\u3080\u767d) by Masuda Toshio (ep 10)", "#11: \"Yama Nemuru\" (\u3084\u307e\u306d\u3080\u308b) by Masuda Toshio (ep 11)", "#12: \"Sugame no Sakana\" (\u7707\u306e\u9b5a) by Masuda Toshio (ep 12)", "#13: \"Hitoyobashi\" (\u4e00\u591c\u6a4b) by Masuda Toshio (ep 13)", "#14: \"Kago no Naka\" (\u7c60\u306e\u306a\u304b) by Masuda Toshio (ep 14)", "#15: \"Haru to Usobuko\" (\u6625\u3068\u562f\u304f) by Masuda Toshio (ep 15)", "#16: \"Akatsuki no Hebi\" (\u6681\u306e\u86c7) by Masuda Toshio (ep 16)", "#17: \"Uromayutori\" (\u865a\u7e6d\u53d6\u308a) by Masuda Toshio (ep 17)", "#18: \"Yama Daku Koromo\" (\u5c71\u62b1\u304f\u8863) by Masuda Toshio (ep 18)", "#19: \"Tenpen no Ito\" (\u5929\u8fba\u306e\u7cf8) by Masuda Toshio (ep 19)", "#20: \"Fude no Umi\" (\u7b46\u306e\u6d77) by Masuda Toshio (ep 20)", "#21: \"Wataboshi\" (\u7dbf\u80de\u5b50) by Masuda Toshio (ep 21)", "#22: \"Okitsu Miya\" (\u6c96\u3064\u5bae) by Masuda Toshio (ep 22)", "#23: \"Sabi no Naku Koe\" (\u9306\u306e\u9cf4\u304f\u8072) by Masuda Toshio (ep 23)", "#24: \"Kagarinokou\" (\u7bdd\u91ce\u884c) by Masuda Toshio (ep 24)", "#25: \"Ganpuku Ganka\" (\u773c\u798f\u773c\u798d) by Masuda Toshio (ep 25)", "#26: \"Kusa wo Fumu Oto\" (\u8349\u3092\u8e0f\u3080\u97f3) by Masuda Toshio (ep 26)"], "jikan_url": "https://api.jikan.moe/v3/anime/457", "headers": {"Server": "nginx/1.18.0 (Ubuntu)", "Date": "Fri, 23 Apr 2021 17:43:12 GMT", "Content-Type": "application/json", "Content-Length": "2448", "Connection": "keep-alive", "Access-Control-Allow-Origin": "*", "Access-Control-Allow-Methods": "*", "Cache-Control": "private, must-revalidate", "ETag": "\"d5d48ec53db0e79038220955d8204c42\"", "X-Request-Hash": "request:anime:047ce4420fa843606934309866d292274c149a83", "X-Request-Cached": "1", "X-Request-Cache-Ttl": "86241", "Expires": "Sat, 24 Apr 2021 17:40:33 GMT", "Content-Encoding": "gzip", "Vary": "Accept-Encoding", "X-Cache-Status": "MISS"}}
```

[`jikan.search()`](https://jikanpy.readthedocs.io/en/latest/#jikanpy.Jikan.search) searches for a query on MyAnimeList.

Response sample:
```json
{"request_hash": "request:search:7206d4755417ce06cee022674aba4a6f82a413e8", "request_cached": false, "request_cache_expiry": 432000, "results": [{"mal_id": 38000, "url": "https://myanimelist.net/anime/38000/Kimetsu_no_Yaiba", "image_url": "https://cdn.myanimelist.net/images/anime/1286/99889.jpg?s=e497d08bef31ae412e314b90a54acfda", "title": "Kimetsu no Yaiba", "airing": false, "synopsis": "Ever since the death of his father, the burden of supporting the family has fallen upon Tanjirou Kamado's shoulders. Though living impoverished on a remote mountain, the Kamado family are able to enjo...", "type": "TV", "episodes": 26, "score": 8.6, "start_date": "2019-04-06T00:00:00+00:00", "end_date": "2019-09-28T00:00:00+00:00", "members": 1637158, "rated": "R"}, {"mal_id": 47778, "url": "https://myanimelist.net/anime/47778/Kimetsu_no_Yaiba__Yuukaku-hen", "image_url": "https://cdn.myanimelist.net/images/anime/1338/111945.jpg?s=8ad61bdf543abb9291fe2eeb52a2cb26", "title": "Kimetsu no Yaiba: Yuukaku-hen", "airing": false, "synopsis": "Tanjiro, Zenitsu and Inosuke aided by the Sound Hashira Tengen Uzui travel to Yoshiwara red light district to hunt down a demon that has been terrorizing the town.", "type": "TV", "episodes": 0, "score": 0, "start_date": null, "end_date": null, "members": 100044, "rated": "R"}], "last_page": 20, "jikan_url": "https://api.jikan.moe/v3/search/anime?q=Kimetsu no Yaiba&limit=2", "headers": {"Server": "nginx/1.18.0 (Ubuntu)", "Date": "Fri, 23 Apr 2021 17:44:56 GMT", "Content-Type": "application/json", "Content-Length": "681", "Connection": "keep-alive", "Access-Control-Allow-Origin": "*", "Access-Control-Allow-Methods": "*", "Cache-Control": "private, must-revalidate", "ETag": "\"d1c81f48bae6bc753c4825aa15d5acaf\"", "X-Request-Hash": "request:search:7206d4755417ce06cee022674aba4a6f82a413e8", "X-Request-Cached": "", "X-Request-Cache-Ttl": "432000", "Expires": "Wed, 28 Apr 2021 17:44:56 GMT", "Content-Encoding": "gzip", "Vary": "Accept-Encoding", "X-Cache-Status": "MISS"}}
```

In [None]:
# Search anime by name on MAL

import time
from jikanpy import Jikan
from jikanpy.exceptions import APIException
jikan = Jikan()

# https://jikanpy.readthedocs.io/en/latest/
# time.sleep(4) # Remember to sleep 4 seconds between requests

search_results = {}
for anime in animes:
  try:
    result = jikan.search('anime', anime.name, parameters={ 'limit': 3 })
    search_results[anime.id] = result
    print('Searched "%s", found %d result(s)' % (anime.name, len(result['results'])))
  except APIException as e:
    print('APIException: %s' % e)
  
  time.sleep(4)

## Cache

Following cells are the caching mechanism for search results.

Search API is costly function, it's best to save the result for reuse in future.

In [None]:
# Save search results

import json

search_json_file = data_dir + 'search_results.json' # Change file path

search_json = json.dumps(search_results, indent=2)
#print(search_json)

with open(search_json_file, mode='w') as f:
  f.write(search_json)

In [None]:
# Reload search results

import json

search_json_file = data_dir + 'search_results.json' # Change filepath

with open(search_json_file, mode='r') as f:
  search_results = json.load(f)

# Map Anime to MAL Id



Search with name doesn't guarantee the accuracy. Therefore, this section will check if the anime name matches the first result using a strict string match.

- If the names match, the result is correct.
- If the names doesn't match, I manually check subsequent result match.1
- If none match, a manual search is required.

In [None]:
# Map my anime list to MAL search results

i = 0
anime_dicts = []
for anime in animes:
  search_result = search_results[anime.id]

  matching_result = search_result['results'][0]
  for result in search_result['results']:
    if result['title'] == anime.name:
      matching_result = result

  anime_dict = {
      'anime_id': anime.id,
      'anime_name': anime.name,
      'mal_id': matching_result['mal_id'],
      'mal_name': matching_result['title']
  }
  anime_dicts.append(anime_dict)

  # Output mapping to console and highlight possible mismatch

  i += 1
  print_text = '[%d]%s -> %s'
  if anime_dict['anime_name'] != anime_dict['mal_name']:
    print_text = '\033[31;47m' + print_text + '\033[0m' # https://en.wikipedia.org/wiki/ANSI_escape_code#SGR_(Select_Graphic_Rendition)_parameters

  print(print_text % (i, anime_dict['anime_name'], anime_dict['mal_name']))

## Cache

Following cells are the caching mechanism for anime mapping.

There may be some anime that may not match the result in search because anime title can be different from site to site. The output in the cell above should've highlighted possible mismatch, so I can manually search and remap their MAL Id.

❗ Manually clean the anime mapping.

In [None]:
# Save anime Id mapping

import csv

output_file = data_dir + 'anime_mapping.csv'
with open(output_file, mode='w') as csv_file:
    fieldnames = ['anime_id', 'anime_name', 'mal_id', 'mal_name']
    writer = csv.DictWriter(csv_file, fieldnames=fieldnames)

    writer.writeheader()
    for anime_dict in anime_dicts:
      writer.writerow(anime_dict)

In [None]:
# Reload anime Id mapping

import csv

animes_file = data_dir + 'anime_mapping_cleaned2.csv' # Update this filepath if changed
with open(animes_file, mode='r') as csv_file:
  reader = csv.DictReader(csv_file)
  anime_dicts = []
  for row in reader:
    anime_dicts.append(row)

    i += 1
    print('[%d]%s' % (i, row))

# Fetch Anime Details



This section grabs the specific anime's details from MyAnimeList using MyAnimeList Id that I mapped in previous section.

I hope to get the following information from MyAnimeList.

- **Parent**: The anime's parent title for grouping.
- **Title**: The full anime title.
- **Genre**: The anime genre (according to MyAnimeList).
- **Season**: The xth season of the anime. Non-season type: OVA, Movie, ONA, Special.
- **Total Episodes**: The total number of episodes that the anime has.
- **Date of Release**: The date when the anime start airing.
- **Date of Completion**: The date when the anime finishes airing.
- **Status**: The airing status of the anime: Unreleased, Ongoing, Released.
- **Last**: The last episode I've watched.

However, MyAnimeList doesn't have explicit season number but they have type such as TV, OVA, Movie.

Then some other minor issues: their status doesn't match my label but I can compute this on spreadsheet easily; total episodes and completion date is empty for movie; no explicit link to parent title, which I would just have to do manually.

In [None]:
# Fetch anime details from public anime list

import time
from jikanpy import Jikan
from jikanpy.exceptions import APIException
jikan = Jikan()

# https://jikanpy.readthedocs.io/en/latest/
# time.sleep(4) # Remember to sleep 4 seconds between requests

anime_results = {}
i = 0
for anime_dict in anime_dicts:
  try:
    result = jikan.anime(anime_dict['mal_id'])
    anime_results[anime_dict['anime_id']] = result
    i += 1
    print('[%d]Get anime [%s -> %s] "%s"' % (i, anime_dict['anime_id'], anime_dict['mal_id'], anime_dict['anime_name']))
  except APIException as e:
    print('APIException: %s' % e)
  
  time.sleep(4)

## Cache

Following cells are the caching mechanism for anime results so I don't have to requery MyAnimeList.

In [None]:
# Save anime results

import json

anime_json_file = data_dir + 'anime_results.json'

anime_json = json.dumps(anime_results, indent=2)
#print(search_json)

with open(anime_json_file, mode='w') as f:
  f.write(anime_json)

In [None]:
# Reload anime results

import json

anime_json_file = data_dir + 'anime_results.json'

with open(anime_json_file, mode='r') as f:
  anime_results = json.load(f)

# Export result

In this section, I should have the my anime list and the anime details from public anime list.

Now I can export the information into CSV to be used. The columns I want are:

- **title** - the anime title from public anime list
- **genres** - the anime genre according to public anime list
- **total_episode** - the total number of episodes according to public anime list
- **release_date** - the start airing date of the anime according to public anime list
- **completion_date** - the last airing date of the anime according to public anime list
- **last_watched** - the last episode I've watched according to my list.

In [None]:
import csv

# Construct the columns

final_records = []
for anime in animes:
  result = anime_results[anime.id]

  final_record = {
      'title': result['title'],
      'genres': ', '.join(genre['name'] for genre in result['genres']),
      'total_episode': result['episodes'],
      'release_date': result['aired']['from'],
      'completion_date': result['aired']['to'],
      'last_watched': anime.episode
  }
  final_records.append(final_record)
  print(final_record)

# Save output into file

final_output_file = data_dir + 'final_output.csv'
with open(final_output_file, mode='w') as csv_file:
  fieldnames = final_records[0].keys()
  writer = csv.DictWriter(csv_file, fieldnames=fieldnames)

  writer.writeheader()
  for final_record in final_records:
    writer.writerow(final_record)