# Webscraping MyAnimeList using Python

As I am a big fan of anime and manga, in this project I will be scraping the top anime and mangas from [MyAnimeList](https://myanimelist.net/) which is a site containing information about all of anime and manga. It would be fun.

In [93]:
# Loading required libraries
import requests
import os
import csv
from bs4 import BeautifulSoup
import re
import pandas as pd

from IPython.display import display as disp, Image

## Exploring the html tags

In [183]:
URL = "https://myanimelist.net/topanime.php"

In [98]:
response = requests.get(URL)

In [99]:
parser = BeautifulSoup(response.text, "html.parser")

In [100]:
# Extract headers
headers = parser.select("tr.table-header")[0]

In [101]:
headers.text

'RankTitleScoreYour ScoreStatus'

In [102]:
anime_tags = parser.select("tr.ranking-list")

In [103]:
type(anime_tags)

bs4.element.ResultSet

In [104]:
anime_tags

[<tr class="ranking-list">
 <td class="rank ac" valign="top">
 <span class="lightLink top-anime-rank-text rank2">51</span>
 </td>
 <td class="title al va-t word-break">
 <a class="hoverinfo_trigger fl-l ml12 mr8" href="https://myanimelist.net/anime/1575/Code_Geass__Hangyaku_no_Lelouch" id="#area1575" rel="#info1575">
 <img alt="Anime: Code Geass: Hangyaku no Lelouch" border="0" class="lazyload" data-src="https://cdn.myanimelist.net/r/50x70/images/anime/1032/135088.jpg?s=b67496ea440a61c0d7ac14173e0bd6e0" data-srcset="https://cdn.myanimelist.net/r/50x70/images/anime/1032/135088.jpg?s=b67496ea440a61c0d7ac14173e0bd6e0 1x, https://cdn.myanimelist.net/r/100x140/images/anime/1032/135088.jpg?s=8b895bc4ead0cfc55ccfc813ba627926 2x" height="70" width="50">
 </img></a>
 <div class="detail"><div id="area1575">
 <div class="hoverinfo" id="info1575" rel="a1575"></div>
 </div>
 <div class="di-ib clearfix"><h3 class="hoverinfo_trigger fl-l fs14 fw-b anime_ranking_h3"><a href="https://myanimelist.net/an

## Extracting Ranks

In [105]:
# Extract rankings
ranks_html = parser.select("td[class='rank ac']")

In [106]:
ranks = [rank.text.strip() for rank in ranks_html]

In [107]:
print(ranks)

['51', '52', '53', '54', '55', '56', '57', '58', '59', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '70', '71', '72', '73', '74', '75', '76', '77', '78', '79', '80', '81', '82', '83', '84', '85', '86', '87', '88', '89', '90', '91', '92', '93', '94', '95', '96', '97', '98', '99', '100']


In [108]:
parser.select("td[class='title al va-t word-break']")[0]

<td class="title al va-t word-break">
<a class="hoverinfo_trigger fl-l ml12 mr8" href="https://myanimelist.net/anime/1575/Code_Geass__Hangyaku_no_Lelouch" id="#area1575" rel="#info1575">
<img alt="Anime: Code Geass: Hangyaku no Lelouch" border="0" class="lazyload" data-src="https://cdn.myanimelist.net/r/50x70/images/anime/1032/135088.jpg?s=b67496ea440a61c0d7ac14173e0bd6e0" data-srcset="https://cdn.myanimelist.net/r/50x70/images/anime/1032/135088.jpg?s=b67496ea440a61c0d7ac14173e0bd6e0 1x, https://cdn.myanimelist.net/r/100x140/images/anime/1032/135088.jpg?s=8b895bc4ead0cfc55ccfc813ba627926 2x" height="70" width="50">
</img></a>
<div class="detail"><div id="area1575">
<div class="hoverinfo" id="info1575" rel="a1575"></div>
</div>
<div class="di-ib clearfix"><h3 class="hoverinfo_trigger fl-l fs14 fw-b anime_ranking_h3"><a href="https://myanimelist.net/anime/1575/Code_Geass__Hangyaku_no_Lelouch" id="#area1575" rel="#info1575">Code Geass: Hangyaku no Lelouch</a></h3><div class="icon-watch2">

## Extracting Anime page links

In [109]:
anime_info = parser.select("td[class='title al va-t word-break']")

In [110]:
# # Extract url link
# anime_info[0].find('a')['href']

In [111]:
# Extract url links for all anime
def extract_anime_links(a_tags):
    links = []
    for tag in a_tags:
        tag = str(tag)
        pattern = r'href=(\S+)'
        link = re.findall(pattern, tag)[0][1:-1]
        links.append(link)
        
    return links

In [112]:
a_tags = [info.a for info in anime_info]

In [113]:
anime_links = extract_anime_links(a_tags)

In [114]:
anime_links[:10]

['https://myanimelist.net/anime/1575/Code_Geass__Hangyaku_no_Lelouch',
 'https://myanimelist.net/anime/21939/Mushishi_Zoku_Shou',
 'https://myanimelist.net/anime/45576/Mushoku_Tensei__Isekai_Ittara_Honki_Dasu_Part_2',
 'https://myanimelist.net/anime/44/Rurouni_Kenshin__Meiji_Kenkaku_Romantan_-_Tsuioku-hen',
 'https://myanimelist.net/anime/46102/Odd_Taxi',
 'https://myanimelist.net/anime/21/One_Piece',
 'https://myanimelist.net/anime/245/Great_Teacher_Onizuka',
 'https://myanimelist.net/anime/33050/Fate_stay_night_Movie__Heavens_Feel_-_III_Spring_Song',
 'https://myanimelist.net/anime/164/Mononoke_Hime',
 'https://myanimelist.net/anime/45649/The_First_Slam_Dunk']

## Extracting Anime images

In [115]:
# # Extract anime image links
# anime_info[0].find('img')['data-srcset'].split(', ')[1][:-3]

In [116]:
# Extract image links for all anime
def extract_img_links(img_tags):
    links = []
    for tag in img_tags:
        tag = str(tag)
        pattern = r'https\S+'
        link = re.findall(pattern, tag)[2]
        links.append(link)
        
    return links

In [117]:
img_tags = [info.img for info in anime_info]

In [118]:
img_links = extract_img_links(img_tags)

In [119]:
img_links[:10]

['https://cdn.myanimelist.net/r/100x140/images/anime/1032/135088.jpg?s=8b895bc4ead0cfc55ccfc813ba627926',
 'https://cdn.myanimelist.net/r/100x140/images/anime/13/58533.jpg?s=8b4c39ac1694d8b1ad2a1dcc504691f6',
 'https://cdn.myanimelist.net/r/100x140/images/anime/1028/117777.jpg?s=51c70b8d3ded730debc843006effa91c',
 'https://cdn.myanimelist.net/r/100x140/images/anime/1391/120839.jpg?s=bd43f31022c54683198af846655f6b3e',
 'https://cdn.myanimelist.net/r/100x140/images/anime/1981/113348.jpg?s=9d3e92002db0e698516450efb42e4e59',
 'https://cdn.myanimelist.net/r/100x140/images/anime/6/73245.jpg?s=81e2039193c1feb8d0d41bda2a3841a8',
 'https://cdn.myanimelist.net/r/100x140/images/anime/13/11460.jpg?s=8134d33ff056d87cfa65345aaec756f9',
 'https://cdn.myanimelist.net/r/100x140/images/anime/1142/112957.jpg?s=5def2155b13d757c74eeee6845c0a871',
 'https://cdn.myanimelist.net/r/100x140/images/anime/7/75919.jpg?s=ec4a31ebc5b430efe3ed391d96c228a3',
 'https://cdn.myanimelist.net/r/100x140/images/anime/1745/12

## Getting Anime info

In [120]:
anime_info[0].text

'\n\n\n\n\n\n\nCode Geass: Hangyaku no Lelouch\n        TV (25 eps)\n        Oct 2006 - Jul 2007\n        2,158,686 members\n      \n'

In [121]:
# Create list of all anime with their listed info
anime_list = list(info.text.strip().split('\n') for info in anime_info)

In [122]:
def clean_list(anime_list):
    for anime in anime_list:
        for i, x in enumerate(anime):
            anime[i] = x.strip()

In [123]:
clean_list(anime_list)
anime_list[:2]

[['Code Geass: Hangyaku no Lelouch',
  'TV (25 eps)',
  'Oct 2006 - Jul 2007',
  '2,158,686 members'],
 ['Mushishi Zoku Shou',
  'TV (10 eps)',
  'Apr 2014 - Jun 2014',
  '289,087 members']]

## Getting scores or ratings

In [124]:
parser.select("td[class='score ac fs14']")[1]

<td class="score ac fs14"><div class="js-top-ranking-score-col di-ib al"><i class="icon-score-star fa-solid fa-star mr4 on"></i><span class="text on score-label score-8">8.70</span></div>
</td>

In [125]:
score_tags = parser.select("td[class='score ac fs14']")

In [126]:
# Extract scores of individual anime
anime_scores = [score.text.strip() for score in score_tags]

In [127]:
anime_scores[:5]

['8.70', '8.70', '8.70', '8.70', '8.69']

In [128]:
# Get titles of all anime
anime_titles = [l[0] for l in anime_list]

In [129]:
anime_titles[:5]

['Code Geass: Hangyaku no Lelouch',
 'Mushishi Zoku Shou',
 'Mushoku Tensei: Isekai Ittara Honki Dasu Part 2',
 'Rurouni Kenshin: Meiji Kenkaku Romantan - Tsuioku-hen',
 'Odd Taxi']

## Displaying Anime info with images

In [130]:
# Display info for top 50 anime
def display_anime(title_list, img_list, score_list):
    for i, (title, img_url, score) in enumerate(zip(title_list, img_list, score_list)):
        print(f"Rank {i+1}")
        print(title)
        display(Image(url=img_url, width=100))
        print(f"Score {score}")
        print('\n'+'='*50)

In [131]:
display_anime(anime_titles, img_links, anime_scores)

Rank 1
Code Geass: Hangyaku no Lelouch


Score 8.70

Rank 2
Mushishi Zoku Shou


Score 8.70

Rank 3
Mushoku Tensei: Isekai Ittara Honki Dasu Part 2


Score 8.70

Rank 4
Rurouni Kenshin: Meiji Kenkaku Romantan - Tsuioku-hen


Score 8.70

Rank 5
Odd Taxi


Score 8.69

Rank 6
One Piece


Score 8.69

Rank 7
Great Teacher Onizuka


Score 8.69

Rank 8
Fate/stay night Movie: Heaven's Feel - III. Spring Song


Score 8.68

Rank 9
Mononoke Hime


Score 8.67

Rank 10
The First Slam Dunk


Score 8.67

Rank 11
Violet Evergarden


Score 8.67

Rank 12
Hajime no Ippo: New Challenger


Score 8.66

Rank 13
Howl no Ugoku Shiro


Score 8.66

Rank 14
Made in Abyss


Score 8.66

Rank 15
Made in Abyss: Retsujitsu no Ougonkyou


Score 8.66

Rank 16
Mushishi


Score 8.66

Rank 17
Shigatsu wa Kimi no Uso


Score 8.65

Rank 18
Natsume Yuujinchou Shi


Score 8.64

Rank 19
Jujutsu Kaisen


Score 8.64

Rank 20
Kaguya-sama wa Kokurasetai? Tensai-tachi no Renai Zunousen


Score 8.64

Rank 21
Haikyuu!! Second Season


Score 8.63

Rank 22
JoJo no Kimyou na Bouken Part 6: Stone Ocean Part 3


Score 8.63

Rank 23
Tengen Toppa Gurren Lagann


Score 8.63

Rank 24
Made in Abyss Movie 3: Fukaki Tamashii no Reimei


Score 8.63

Rank 25
Kimetsu no Yaiba Movie: Mugen Ressha-hen


Score 8.62

Rank 26
Natsume Yuujinchou Roku


Score 8.62

Rank 27
Ping Pong the Animation


Score 8.62

Rank 28
Shingeki no Kyojin Season 3


Score 8.62

Rank 29
Death Note


Score 8.62

Rank 30
Cyberpunk: Edgerunners


Score 8.61

Rank 31
Spy x Family


Score 8.61

Rank 32
Suzumiya Haruhi no Shoushitsu


Score 8.61

Rank 33
Evangelion: 3.0+1.0 Thrice Upon a Time


Score 8.60

Rank 34
Seishun Buta Yarou wa Yumemiru Shoujo no Yume wo Minai


Score 8.60

Rank 35
Mushishi Zoku Shou: Suzu no Shizuku


Score 8.59

Rank 36
Hajime no Ippo: Rising


Score 8.59

Rank 37
Chainsaw Man


Score 8.58

Rank 38
JoJo no Kimyou na Bouken Part 5: Ougon no Kaze


Score 8.58

Rank 39
Ookami Kodomo no Ame to Yuki


Score 8.58

Rank 40
Kenpuu Denki Berserk


Score 8.57

Rank 41
Kizumonogatari II: Nekketsu-hen


Score 8.57

Rank 42
Natsume Yuujinchou Go


Score 8.57

Rank 43
Natsume Yuujinchou San


Score 8.57

Rank 44
Shouwa Genroku Rakugo Shinjuu


Score 8.57

Rank 45
Tengen Toppa Gurren Lagann Movie 2: Lagann-hen


Score 8.57

Rank 46
Yojouhan Shinwa Taikei


Score 8.56

Rank 47
Kimi no Suizou wo Tabetai


Score 8.56

Rank 48
Mo Dao Zu Shi: Wanjie Pian


Score 8.55

Rank 49
Neon Genesis Evangelion: The End of Evangelion


Score 8.55

Rank 50
Shoujo☆Kageki Revue Starlight Movie


Score 8.55



## Creating dictionaries

In [132]:
# Add scores, links and image_links to anime info list
for i, (link, img_link, score) in enumerate(zip(anime_links, img_links, anime_scores)):
    anime_list[i].append(link)
    anime_list[i].append(img_link)
    anime_list[i].append(score)

In [133]:
anime_list[:2]

[['Code Geass: Hangyaku no Lelouch',
  'TV (25 eps)',
  'Oct 2006 - Jul 2007',
  '2,158,686 members',
  'https://myanimelist.net/anime/1575/Code_Geass__Hangyaku_no_Lelouch',
  'https://cdn.myanimelist.net/r/100x140/images/anime/1032/135088.jpg?s=8b895bc4ead0cfc55ccfc813ba627926',
  '8.70'],
 ['Mushishi Zoku Shou',
  'TV (10 eps)',
  'Apr 2014 - Jun 2014',
  '289,087 members',
  'https://myanimelist.net/anime/21939/Mushishi_Zoku_Shou',
  'https://cdn.myanimelist.net/r/100x140/images/anime/13/58533.jpg?s=8b4c39ac1694d8b1ad2a1dcc504691f6',
  '8.70']]

In [134]:
# Making a dictionary from the anime list
def make_dict_from_list(anime):
    anime_dict = {'Title': anime[0],
                  'Num_episodes': anime[1],
                  'Aired_through': anime[2],
                  'Num_members': anime[3],
                  'page_url': anime[4],
                  'image_url': anime[5],
                  'Score': anime[6]}
    return anime_dict

In [135]:
anime_dict = list(map(make_dict_from_list, anime_list))

In [136]:
anime_dict[:2]

[{'Title': 'Code Geass: Hangyaku no Lelouch',
  'Num_episodes': 'TV (25 eps)',
  'Aired_through': 'Oct 2006 - Jul 2007',
  'Num_members': '2,158,686 members',
  'page_url': 'https://myanimelist.net/anime/1575/Code_Geass__Hangyaku_no_Lelouch',
  'image_url': 'https://cdn.myanimelist.net/r/100x140/images/anime/1032/135088.jpg?s=8b895bc4ead0cfc55ccfc813ba627926',
  'Score': '8.70'},
 {'Title': 'Mushishi Zoku Shou',
  'Num_episodes': 'TV (10 eps)',
  'Aired_through': 'Apr 2014 - Jun 2014',
  'Num_members': '289,087 members',
  'page_url': 'https://myanimelist.net/anime/21939/Mushishi_Zoku_Shou',
  'image_url': 'https://cdn.myanimelist.net/r/100x140/images/anime/13/58533.jpg?s=8b4c39ac1694d8b1ad2a1dcc504691f6',
  'Score': '8.70'}]

## Creating a dataframe

In [137]:
anime_ranks_df = pd.DataFrame(anime_dict)

In [138]:
anime_ranks_df.head()

Unnamed: 0,Title,Num_episodes,Aired_through,Num_members,page_url,image_url,Score
0,Code Geass: Hangyaku no Lelouch,TV (25 eps),Oct 2006 - Jul 2007,"2,158,686 members",https://myanimelist.net/anime/1575/Code_Geass_...,https://cdn.myanimelist.net/r/100x140/images/a...,8.7
1,Mushishi Zoku Shou,TV (10 eps),Apr 2014 - Jun 2014,"289,087 members",https://myanimelist.net/anime/21939/Mushishi_Z...,https://cdn.myanimelist.net/r/100x140/images/a...,8.7
2,Mushoku Tensei: Isekai Ittara Honki Dasu Part 2,TV (12 eps),Oct 2021 - Dec 2021,"795,975 members",https://myanimelist.net/anime/45576/Mushoku_Te...,https://cdn.myanimelist.net/r/100x140/images/a...,8.7
3,Rurouni Kenshin: Meiji Kenkaku Romantan - Tsui...,OVA (4 eps),Feb 1999 - Sep 1999,"269,227 members",https://myanimelist.net/anime/44/Rurouni_Kensh...,https://cdn.myanimelist.net/r/100x140/images/a...,8.7
4,Odd Taxi,TV (13 eps),Apr 2021 - Jun 2021,"392,687 members",https://myanimelist.net/anime/46102/Odd_Taxi,https://cdn.myanimelist.net/r/100x140/images/a...,8.69


In [43]:
# anime_ranks_df.to_csv("MAL-anime-rankings.csv")

## Storing data in csv file

In [182]:
# Make the csv file to store data
def make_csv(anime_dict, filename='anime-rankings.csv'):
    with open(filename, 'a') as f:
        w = csv.DictWriter(f, anime_dict[0].keys())
    #     w.writeheader()
        w.writerows(anime_dict)

In [92]:
# Displaying df with ranks
# anime_ranks_df.columns = ['Title', 'Num_episodes', 'Aired_through', 'Num_members', 'page_url', 'image_url', 'Score']
# anime_ranks_df.index = ranks
# anime_ranks_df.index.name = 'Rank'

In [45]:
# # Convert links to clickable links
# anime_ranks_df.loc[:, ['Anime_url', 'Image_url']] = anime_ranks_df.loc[:, ['Anime_url', 'Image_url']].style.format(lambda x: f'<a href="{x}">{x}</a>')

## Combining it all together

In [42]:
# Helper functions
def extract_links(a_tags):
    links = []
    for tag in a_tags:
        tag = str(tag)
        pattern = r'href=(\S+)'
        link = re.findall(pattern, tag)[0][1:-1]
        links.append(link)
        
    return links

def extract_img_links(img_tags):
    links = []
    for tag in img_tags:
        tag = str(tag)
        pattern = r'https\S+'
        link = re.findall(pattern, tag)[2]
        links.append(link)
        
    return links

def clean_list(List, type='anime'):
    for l in List:
        for i, x in enumerate(l):
            l[i] = x.strip()
        if type == 'manga':
            del l[1]
            
def make_dict_from_list(l):
    
    return {'Title': l[0],
            'Num_episodes': l[1],
            'Aired_through': l[2],
            'Num_members': l[3],
            'page_url': l[4],
            'image_url': l[5],
            'Score': l[6]}

def display_info(title_list, img_list, score_list, top_n):
    for i, (title, img_url, score) in enumerate(zip(title_list, img_list, score_list)):
        print(f"Rank {i+1}")
        print(title)
        display(Image(url=img_url, width=100))
        print(f"Score {score}")
        print('\n'+'='*50)
        if i+1 == top_n:
            break

In [43]:
# Get top 50 anime
def get_anime_info(anime_url="https://myanimelist.net/topanime.php"):
    response = requests.get(anime_url)
    parser = BeautifulSoup(response.text, "html.parser")
    
    # Get rankings
    ranks_html = parser.select("td[class='rank ac']")
    ranks = [rank.text.strip() for rank in ranks_html]
    
    # Get individual anime links
    anime_info = parser.select("td[class='title al va-t word-break']")
    a_tags = [info.a for info in anime_info]
    anime_links = extract_links(a_tags)

    # Get image links for all anime
    img_tags = [info.img for info in anime_info]
    img_links = extract_img_links(img_tags)

    # Create list of all anime with their listed info
    anime_list = list(info.text.strip().split('\n') for info in anime_info)
    clean_list(anime_list)
    
    # Extract scores of individual anime
    score_tags = parser.select("td[class='score ac fs14']")
    anime_scores = [score.text.strip() for score in score_tags]
    
    # Add scores, links and image_links to anime info list
    for i, (link, img_link, score) in enumerate(zip(anime_links, img_links, anime_scores)):
        anime_list[i].append(link)
        anime_list[i].append(img_link)
        anime_list[i].append(score)
    
    # Get titles of all anime
    anime_titles = [l[0] for l in anime_list]
    
    display_info(anime_titles, img_links, anime_scores, top_n)
    return anime_list, ranks

**Doing the same procedure for mangas with some minor changes**

In [44]:
# Get top 50 manga
def get_manga_info(top_n=50):
    response = requests.get("https://myanimelist.net/topmanga.php")
    parser = BeautifulSoup(response.text, "html.parser")
    
    # Get rankings
    ranks_html = parser.select("td[class='rank ac']")
    ranks = [rank.text.strip() for rank in ranks_html]
    
    # Get individual manga links
    manga_info = parser.select("td[class='title al va-t clearfix word-break']")
    a_tags = [info.a for info in manga_info]
    manga_links = extract_links(a_tags)

    # Get image links for all manga
    img_tags = [info.img for info in manga_info]
    img_links = extract_img_links(img_tags)

    # Create list of all manga with their listed info
    manga_list = list(info.text.strip().split('\n') for info in manga_info)
    clean_list(manga_list, 'manga')
    
    # Extract scores of individual manga
    score_tags = parser.select("td[class='score ac fs14']")
    manga_scores = [score.text.strip() for score in score_tags]
    
    # Add scores, links and image_links to manga info list
    for i, (link, img_link, score) in enumerate(zip(manga_links, img_links, manga_scores)):
        manga_list[i].append(link)
        manga_list[i].append(img_link)
        manga_list[i].append(score)
    
    # Get titles of all manga
    manga_titles = [l[0] for l in manga_list]
    
    display_info(manga_titles, img_links, manga_scores, top_n)
    return manga_list, ranks

In [45]:
anime_list, ranks = get_anime_info(5)

Rank 1
Fullmetal Alchemist: Brotherhood


Score 9.10

Rank 2
Bleach: Sennen Kessen-hen


Score 9.07

Rank 3
Steins;Gate


Score 9.07

Rank 4
Gintama°


Score 9.06

Rank 5
Kaguya-sama wa Kokurasetai: Ultra Romantic


Score 9.05



In [46]:
manga_list, ranks = get_manga_info(5)

Rank 1
Berserk


Score 9.47

Rank 2
JoJo no Kimyou na Bouken Part 7: Steel Ball Run


Score 9.30

Rank 3
Vagabond


Score 9.23

Rank 4
One Piece


Score 9.21

Rank 5
Monster


Score 9.15



## Scraping multiple pages

In [184]:
num_pages = 2

'https://myanimelist.net/topanime.php'

In [None]:
for i in range(num_pages):
    new_url = URL + f"?limit={i*50}"

## Visualize and analyze data

In [47]:
# Making dataframes and csv files to keep the scraped data
def make_df(info_list, ranks, type='anime'):
    ranks_df = pd.DataFrame(info_list)
    ranks_df.index = ranks
    ranks_df.index.name = 'Rank'

    if type == 'anime':
        ranks_df.columns = ['Title', 'Num_episodes', 'Aired_through', 'Num_members', 'page_url', 'image_url', 'Score']
        ranks_df.to_csv("MAL-anime-rankings.csv")
    elif type == 'manga':
        ranks_df.columns = ['Title', 'Num_volumes', 'Published_through', 'Num_members', 'page_url', 'image_url', 'Score']
        ranks_df.to_csv("MAL-manga-rankings.csv")
    else:
        raise NameError("No such type of thing here") 
        
    display(ranks_df.sample(5))

In [48]:
make_df(anime_list, ranks)

Unnamed: 0_level_0,Title,Num_episodes,Aired_through,Num_members,page_url,image_url,Score
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
30,Kingdom 3rd Season,TV (26 eps),Apr 2020 - Oct 2021,"91,218 members",https://myanimelist.net/anime/40682/Kingdom_3r...,https://cdn.myanimelist.net/r/100x140/images/a...,8.81
6,Shingeki no Kyojin Season 3 Part 2,TV (10 eps),Apr 2019 - Jul 2019,"2,106,077 members",https://myanimelist.net/anime/38524/Shingeki_n...,https://cdn.myanimelist.net/r/100x140/images/a...,9.05
7,Shingeki no Kyojin: The Final Season - Kankets...,Special (2 eps),Mar 2023 - 2023,"437,824 members",https://myanimelist.net/anime/51535/Shingeki_n...,https://cdn.myanimelist.net/r/100x140/images/a...,9.05
5,Kaguya-sama wa Kokurasetai: Ultra Romantic,TV (13 eps),Apr 2022 - Jun 2022,"822,345 members",https://myanimelist.net/anime/43608/Kaguya-sam...,https://cdn.myanimelist.net/r/100x140/images/a...,9.05
29,Vinland Saga Season 2,TV (24 eps),Jan 2023 - Jun 2023,"434,841 members",https://myanimelist.net/anime/49387/Vinland_Sa...,https://cdn.myanimelist.net/r/100x140/images/a...,8.84


In [49]:
make_df(manga_list, ranks, type='manga')

Unnamed: 0_level_0,Title,Num_volumes,Published_through,Num_members,page_url,image_url,Score
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
14,Ashita no Joe,Manga (20 vols),Jan 1968 - May 1973,"47,970 members",https://myanimelist.net/manga/1303/Ashita_no_Joe,https://cdn.myanimelist.net/r/100x140/images/m...,8.94
26,Koe no Katachi,Manga (7 vols),Aug 2013 - Nov 2014,"259,140 members",https://myanimelist.net/manga/56805/Koe_no_Kat...,https://cdn.myanimelist.net/r/100x140/images/m...,8.86
23,Yotsuba to!,Manga (? vols),Mar 2003 -,"145,403 members",https://myanimelist.net/manga/104/Yotsuba_to,https://cdn.myanimelist.net/r/100x140/images/m...,8.88
36,Nana,Manga (21 vols),May 2000 - May 2009,"127,373 members",https://myanimelist.net/manga/28/Nana,https://cdn.myanimelist.net/r/100x140/images/m...,8.78
18,Kaguya-sama wa Kokurasetai: Tensai-tachi no Re...,Manga (28 vols),May 2015 - Nov 2022,"253,462 members",https://myanimelist.net/manga/90125/Kaguya-sam...,https://cdn.myanimelist.net/r/100x140/images/m...,8.92


## Conclusion and further experiments

- I have scraped the list of the top 50 anime and mangas but this could be extended much further say for top 1000 anime.
- Selenium or playwright packages could be used to scrape multiple pages at once 
- Visualization and exploratory data analysis could be done easily by extracting multiple pages which could be another project in itself.