## Our libraries that we are importing:

In [1]:
# 1. Importing all the libraries necessary for this assignment
import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup

For this task, we will be downloading data from a website, cleaning up the dataframe, and then storing it as an excel file.

To clean up the dataframe we will do the following by web scrap: 
*   Make the index the different characters instead of 0, 1, 2, 3. It should be Godzilla, MUTO, King Kong...
*   Change the column names to just be the names of the movies (e.g., it should just be Godzilla instead of Films, Godzilla, Monsters. Also, change the index name to also just be “character” instead of Character, Character, Monster
*   Remove the row that has all duplicates (i.e., “human”, “human”, “human”), since that doesn’t seem helpful to us in this dataframe
*   Save the file as a .csv with the delimiter being tabs





### Web scraping the data:

Let's go ahead and web scrap our data from the Wikipedia page 'MonsterVerse'. We're going to be using the table 'Cast and characters'. 

In [2]:
# 2. Load the data from the third table, 'Cast and characters', from the Wikipedia page
print('-------------------------------- 2 --------------------------------')
wiki_url = "https://en.wikipedia.org/wiki/MonsterVerse"
df = pd.read_html(wiki_url, header=0)[3]

df

-------------------------------- 2 --------------------------------


Unnamed: 0,Character,Films,Films.1,Films.2,Films.3
0,Character,Godzilla,Kong:Skull Island,Godzilla:King of the Monsters,Godzilla vs. Kong
1,Monsters,Monsters,Monsters,Monsters,Monsters
2,Godzilla,T.J. StormS,Pictured with archive audio,T.J. StormS,CGI
3,MUTO,Matt CrossSLee RossS,,CGI,Archive footage
4,King Kong,,Terry NotarySToby KebbellS,Archive footage,Eric PeteyS[57]
5,King Ghidorah,,Pictured,Jason LilesSAlan MaxsonSRichard DortonS,Archive footage
6,Rodan,,Pictured,Jason LilesS,Archive footage
7,Humans,Humans,Humans,Humans,Humans
8,Ishiro Serizawa,Ken Watanabe,,Ken Watanabe,
9,Vivienne Graham,Sally Hawkins,,Sally Hawkins,


Alright! Now that we have retrieved the data table, we're going to clean up the data so that the table is easier to understand.

### Cleaning up the data:

First, let's make the index be the column for the different chracters instead of 0, 1, 2, 3, and etc.

In [3]:
# 3. Replace the numeric index column with the characters column
print('-------------------------------- 3 --------------------------------')
data = df.set_index('Character')

data

-------------------------------- 3 --------------------------------


Unnamed: 0_level_0,Films,Films.1,Films.2,Films.3
Character,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Character,Godzilla,Kong:Skull Island,Godzilla:King of the Monsters,Godzilla vs. Kong
Monsters,Monsters,Monsters,Monsters,Monsters
Godzilla,T.J. StormS,Pictured with archive audio,T.J. StormS,CGI
MUTO,Matt CrossSLee RossS,,CGI,Archive footage
King Kong,,Terry NotarySToby KebbellS,Archive footage,Eric PeteyS[57]
King Ghidorah,,Pictured,Jason LilesSAlan MaxsonSRichard DortonS,Archive footage
Rodan,,Pictured,Jason LilesS,Archive footage
Humans,Humans,Humans,Humans,Humans
Ishiro Serizawa,Ken Watanabe,,Ken Watanabe,
Vivienne Graham,Sally Hawkins,,Sally Hawkins,


Next, we're going to change the column names to be just the names of the movies.

In [4]:
# 4a. Change the 'Films' name columns into the actual movie names
print('-------------------------------- 4a -------------------------------')
data.rename(columns={'Films':'Godzilla'}, inplace=True)
data.rename(columns={'Films.1':'Kong:Skull Island'}, inplace=True)
data.rename(columns={'Films.2':'Godzilla:King of the Monsters'}, inplace=True)
data.rename(columns={'Films.3':'Godzilla vs. Kong'}, inplace=True)

data

-------------------------------- 4a -------------------------------


Unnamed: 0_level_0,Godzilla,Kong:Skull Island,Godzilla:King of the Monsters,Godzilla vs. Kong
Character,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Character,Godzilla,Kong:Skull Island,Godzilla:King of the Monsters,Godzilla vs. Kong
Monsters,Monsters,Monsters,Monsters,Monsters
Godzilla,T.J. StormS,Pictured with archive audio,T.J. StormS,CGI
MUTO,Matt CrossSLee RossS,,CGI,Archive footage
King Kong,,Terry NotarySToby KebbellS,Archive footage,Eric PeteyS[57]
King Ghidorah,,Pictured,Jason LilesSAlan MaxsonSRichard DortonS,Archive footage
Rodan,,Pictured,Jason LilesS,Archive footage
Humans,Humans,Humans,Humans,Humans
Ishiro Serizawa,Ken Watanabe,,Ken Watanabe,
Vivienne Graham,Sally Hawkins,,Sally Hawkins,


Let's change the index name to be lowercase.

In [5]:
# 4b. Change the index name to start with lowercase instead of uppercase
print('-------------------------------- 4b -------------------------------')
data.index.name = "character"

data

-------------------------------- 4b -------------------------------


Unnamed: 0_level_0,Godzilla,Kong:Skull Island,Godzilla:King of the Monsters,Godzilla vs. Kong
character,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Character,Godzilla,Kong:Skull Island,Godzilla:King of the Monsters,Godzilla vs. Kong
Monsters,Monsters,Monsters,Monsters,Monsters
Godzilla,T.J. StormS,Pictured with archive audio,T.J. StormS,CGI
MUTO,Matt CrossSLee RossS,,CGI,Archive footage
King Kong,,Terry NotarySToby KebbellS,Archive footage,Eric PeteyS[57]
King Ghidorah,,Pictured,Jason LilesSAlan MaxsonSRichard DortonS,Archive footage
Rodan,,Pictured,Jason LilesS,Archive footage
Humans,Humans,Humans,Humans,Humans
Ishiro Serizawa,Ken Watanabe,,Ken Watanabe,
Vivienne Graham,Sally Hawkins,,Sally Hawkins,


Since there are redundant rows for the data being shown, we'll remove those rows so there's no more duplicates. We'll also remove the first row since it is an exact duplicate of the column and index titles.

In [6]:
# 5. Remove the rows that have  duplicates
print('-------------------------------- 5 --------------------------------')
data.drop(['Character', 'Monsters', 'Humans'], axis=0, inplace=True)

data

-------------------------------- 5 --------------------------------


Unnamed: 0_level_0,Godzilla,Kong:Skull Island,Godzilla:King of the Monsters,Godzilla vs. Kong
character,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Godzilla,T.J. StormS,Pictured with archive audio,T.J. StormS,CGI
MUTO,Matt CrossSLee RossS,,CGI,Archive footage
King Kong,,Terry NotarySToby KebbellS,Archive footage,Eric PeteyS[57]
King Ghidorah,,Pictured,Jason LilesSAlan MaxsonSRichard DortonS,Archive footage
Rodan,,Pictured,Jason LilesS,Archive footage
Ishiro Serizawa,Ken Watanabe,,Ken Watanabe,
Vivienne Graham,Sally Hawkins,,Sally Hawkins,
William Stentz,David Strathairn,,David Strathairn,
Ford Brody,Aaron Taylor-JohnsonCJ AdamsY,,,
Elle Brody,Elizabeth Olsen,,,


We've removed the data that will be unhelpful or unnecessary. Now the table is clearer to read and understand.

### Saving the new data as a file:

We are going to save our new data as a .csv file with the delimiter being tabs.

In [7]:
# 6. Saving as a .csv file
print('-------------------------------- 6 -------------------------------')
data.to_csv('MonsterVerse.csv', sep ='\t')

new_data = pd.read_csv('MonsterVerse.csv')
new_data

-------------------------------- 6 -------------------------------


Unnamed: 0,character\tGodzilla\tKong:Skull Island\tGodzilla:King of the Monsters\tGodzilla vs. Kong
0,Godzilla\tT.J. StormS\tPictured with archive a...
1,MUTO\tMatt CrossSLee RossS\t\tCGI\tArchive foo...
2,King Kong\t\tTerry NotarySToby KebbellS\tArchi...
3,King Ghidorah\t\tPictured\tJason LilesSAlan Ma...
4,Rodan\t\tPictured\tJason LilesS\tArchive footage
5,Ishiro Serizawa\tKen Watanabe\t\tKen Watanabe\t
6,Vivienne Graham\tSally Hawkins\t\tSally Hawkins\t
7,William Stentz\tDavid Strathairn\t\tDavid Stra...
8,Ford Brody\tAaron Taylor-JohnsonCJ AdamsY\t\t\t
9,Elle Brody\tElizabeth Olsen\t\t\t
