# Wrangling Former Colonies
## By: Scott Kustes

### Objective:
Wrangle a list of former colonies, their colonizers, year of colonization, and year of independence.

#### Dataset:
The dataset was gathered by scraping information from Wikipedia pages using BeautifulSoup.  The following URLs were scraped:
- British Colonies: https://en.wikipedia.org/wiki/List_of_countries_that_have_gained_independence_from_the_United_Kingdom

#### Contents
<ul>
    <li><a href='#gather'>Data Gathering</a>
        <ul>
            <li><a href='#british-empire'>British Empire</a></li>
        </ul>
    </li>
    <li><a href='#assess1'>Assess, Part 1</a></li>
    <li><a href='#clean1'>Clean, Part 1</a></li>
    <li><a href='#final'>Finished Dataframes</a></li>
</ul>

In [87]:
# Import necessary packages
from bs4 import BeautifulSoup
import requests
import pandas as pd
import os.path as os_path

# Last Tested On
> September 12, 2019

<a id='gather'></a>
## Gather

<a id='british-empire'></a>
### British Empire

In [88]:
# Import the page
url = 'https://en.wikipedia.org/wiki/List_of_countries_that_have_gained_independence_from_the_United_Kingdom'
page = requests.get( url )
soup = BeautifulSoup( page.text, 'html.parser' )

# Get all of the tables on this page
tables = soup.find_all( class_='wikitable' )

In [90]:
# Get the first table on the page
# Additional tables in indices 1-8
colonies = tables[0].find( 'tbody' ).find_all( 'tr' )
colonies_header = []
colonies_data = []

# Loop through the colonies, gathering the data held in each table cell ('td') into an array
# Save the values in the first row to colonies_header
# Save remaining values to colonies_data
for index, colony in enumerate( colonies ):
    # Get the values in the columns
    columns = colony.find_all( 'td' ) if index > 0 else colony.find_all( 'th' )

    # If it isn't the first row, it's data
    if index > 0:
        # Append to colonies_data
        colonies_data.append( [element.text.strip() for element in columns ] )
    # If it is the first row, it's headers
    else:
        # Append to colonies_header
        colonies_header.append( [element.text.strip() for element in columns ] )
        colonies_header = colonies_header[0]

colonies_df = pd.DataFrame( data=colonies_data, columns=colonies_header )

In [91]:
colonies_df.sample(5)

Unnamed: 0,Country,Date,Year of Independence,Notes
14,Fiji,10 October,1970,
19,India,15 August,1947,Independence Day (India)
55,Uganda,9 October,1962,
35,Myanmar,4 January,1948,Gained independence as Burma. Renamed Myanmar ...
7,Botswana,30 September,1966,
