## Web-Scraping - Game of Thrones - Main & Recurring Characters Of All Seasons

**Quick Note**

This Notebook gets all starring cast characters of the popular HBO series 'Game of Thrones' and organizes them into a '.csv'.

Pandas is truely wonderful.

**Sources**

- [Pandas Documentation](https://pandas.pydata.org/docs/)

### Importing the necessary tools

In [1]:
import pandas as pd

### Source

In [2]:
page_url = 'https://en.wikipedia.org/wiki/List_of_Game_of_Thrones_characters'

In [3]:
# Don't need to verify certificate for wikipedia...
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

### Fetch tables from url

In [4]:
table_MN = pd.read_html(page_url)
len(table_MN)

8

### Identify tables of interest

#### Main characters

In [5]:
main_cast = table_MN[3]
main_cast.head()

Unnamed: 0_level_0,Actor,Character,Appearances,Appearances,Appearances,Appearances,Appearances,Appearances,Appearances,Appearances
Unnamed: 0_level_1,Actor,Character,1,2,3,4,5,6,7,8
0,Sean Bean,"Eddard ""Ned"" Stark",Main,,,,,Recurring[a],Guest[b],
1,Mark Addy,Robert Baratheon,Main,,,,,,,
2,Nikolaj Coster-Waldau,Jaime Lannister,Main,Main,Main,Main,Main,Main,Main,Main
3,Michelle Fairley,Catelyn Stark,Main,Main,Main,,,,,
4,Lena Headey,Cersei Lannister,Main[c],Main[c],Main[c],Main[c],Main[c],Main[c],Main[c],Main[c]


In [6]:
main_cast.shape

(44, 10)

In [7]:
main_char = main_cast['Character']

main_char.head()

Unnamed: 0,Character
0,"Eddard ""Ned"" Stark"
1,Robert Baratheon
2,Jaime Lannister
3,Catelyn Stark
4,Cersei Lannister


#### Recurring characters

In [8]:
recurr_cast = table_MN[4]
recurr_cast.head()

Unnamed: 0_level_0,Actor,Character,Appearances,Appearances,Appearances,Appearances,Appearances,Appearances,Appearances,Appearances
Unnamed: 0_level_1,Actor,Character,1,2,3,4,5,6,7,8
0,Donald Sumpter,Maester Luwin,Recurring,Recurring,,,,,,
1,Jamie Sives,Jory Cassel,Recurring,,,,,,,
2,Ron Donachie,Rodrik Cassel,Recurring,Recurring,,,,Guest[a],,
3,Joseph Mawle,Benjen Stark,Recurring,,,,,Recurring[b],Guest,
4,Dar Salim,Qotho,Recurring,,,,,,,


In [9]:
recurr_cast.shape

(118, 10)

In [10]:
recurr_char = recurr_cast['Character']

recurr_char.head()

Unnamed: 0,Character
0,Maester Luwin
1,Jory Cassel
2,Rodrik Cassel
3,Benjen Stark
4,Qotho


#### Merge 2 character tables

In [11]:
characters = pd.concat([main_char, recurr_char], ignore_index=True)

characters.shape

characters

Unnamed: 0,Character
0,"Eddard ""Ned"" Stark"
1,Robert Baratheon
2,Jaime Lannister
3,Catelyn Stark
4,Cersei Lannister
...,...
157,Randyll Tarly
158,Dickon Tarly
159,Harrag
160,Night King


In [12]:
import re
# Helper to get nickname
def first_val(list):
    if len(list) > 0:
        return list[0]
    else:
        return 'NaN'
# Extract nick name
characters['Character_nickname'] = characters['Character'].apply(lambda x: re.findall('\".*?\"',x))
characters['Character_nickname'] = characters['Character_nickname'].apply(lambda x: first_val(x).replace('\"', ''))
# Extract first name
characters['Character_firstname'] = characters['Character'].apply(lambda x: x.split(' ', 1)[0])
# Filter from Character name
characters['Character'] = characters['Character'].apply(lambda x: re.sub('\".*?\"','',x))
# Fix double 'space'
characters['Character'] = characters['Character'].apply(lambda x: re.sub('  ',' ', x))

In [13]:
# Check char list(or open csv once created...)
pd.set_option('display.max_rows', None)
characters

Unnamed: 0,Character,Character_nickname,Character_firstname
0,Eddard Stark,Ned,Eddard
1,Robert Baratheon,,Robert
2,Jaime Lannister,,Jaime
3,Catelyn Stark,,Catelyn
4,Cersei Lannister,,Cersei
5,Daenerys Targaryen,,Daenerys
6,Jorah Mormont,,Jorah
7,Viserys Targaryen,,Viserys
8,Jon Snow,,Jon
9,Robb Stark,,Robb


In [14]:
# Quick fixes char (Names that can be missinterpreted if unchanged)
characters.Character_firstname[17] = 'Drogo' # Khal Drogo
characters.Character_firstname[42] = 'High Sparrow' # The High Sparrow
characters.Character[42] = 'High Sparrow'
characters.Character_firstname[44] = 'Maester Luwin' 
characters.Character_firstname[61] = 'Maester Pycelle'
characters.Character_firstname[63] = 'Maester Aemon'
characters.Character_firstname[156] = 'Maester Wolkan'
characters.Character_firstname[114] = 'Little Sam' # Little Sam
characters.Character_firstname[132] = 'The Waif' 
characters.Character_firstname[139] = 'Moro' # Khal Moro
characters.Character_firstname[146] = 'Lady Crane'
characters.Character_firstname[160] = 'Night King'


### Create 'characters.csv'

In [17]:
string = characters.to_csv(sep=',')

f = open('characters.csv',"w+")
f.write(string)

4398