## PUBPOL542 Deliverable Draft

### Group Member: Matthew Pon
### Organization: HedgehogsAnonymous
* Github Repo: https://github.com/ponmp/PUBPOLDeliverableDraft 
* Organization Repo: https://github.com/HedgehogsAnonymous/PUBPOLDeliverable 

<a id='home'></a>
_____


## Table of contents
[1: Data](#data)

[2: Cleaning](#cleaning)

[3: Shaping](#shaping)

[4: Exporting](#exporting)
_____

<a id='data'></a>
## Data
Data table was found on Wikipedia's list of countries by suicide rates
* Source: https://en.wikipedia.org/wiki/List_of_countries_by_suicide_rate

In [None]:
from IPython.display import IFrame  # call IPython.display function's IFrame 
IFrame("https://en.wikipedia.org/wiki/List_of_countries_by_suicide_rate", width=700, height=300) # IFrame wikipedia page with a resolution of 700px by 300px.

In [None]:
# Make sure PUBPOL environment is active
# Run Jupyter Notebook Kernel

# Make sure PANDAS is installed
!pip show PANDAS
# if not installed uncomment next line and run
#!pip install pandas

# Call PANDAS as PD
import pandas as pd

# Make sure HTML5lib is installed
!pip show HTML5lib
# if not installed uncomment next line and run
#!pip install HTML5lib


Once opened you can view the data

In [None]:
MHwiki=pd.read_html("https://en.wikipedia.org/wiki/List_of_countries_by_suicide_rate", header=0, flavor='bs4', attrs={'class': 'wikitable'})# find name of web element via inspect for "wikitable"

In [None]:
# DFwiki returns all wikitables on the page but we are only concerned with historic suicide rates among both males and females
MHwiki

In [None]:
from IPython.display import IFrame  # call IPython.display function's IFrame 
IFrame("https://en.wikipedia.org/wiki/List_of_countries_by_alcohol_consumption_per_capita", width=700, height=300) # IFrame wikipedia page with a resolution of 700px by 300px.

Adding more data for analysis

In [None]:
ARwiki=pd.read_html("https://en.wikipedia.org/wiki/List_of_countries_by_alcohol_consumption_per_capita", header=0, flavor='bs4', attrs={'class': 'wikitable'})# find name of web element via inspect for "wikitable"

Again adding Life Satisfaction index data for analysis

In [None]:
SIwiki=pd.read_html("https://en.wikipedia.org/wiki/Satisfaction_with_Life_Index", header=0, flavor='bs4', attrs={'class': 'wikitable'})# find name of web element via inspect for "wikitable"

Adding Wealth Inequality data.

In [None]:
WIwiki=pd.read_html("https://en.wikipedia.org/wiki/List_of_sovereign_states_by_wealth_inequality", header=0, flavor='bs4', attrs={'class': 'wikitable'})# find name of web element via inspect for "wikitable"

Adding Social Progress

In [None]:
SPwiki=pd.read_html("https://en.wikipedia.org/wiki/Social_Progress_Index", header=0, flavor='bs4', attrs={'class': 'wikitable'})# find name of web element via inspect for "wikitable"

Importing .csv from https://www.cia.gov/the-world-factbook/field/tobacco-use/country-comparison

In [None]:
TUtable=pd.read_csv("./CIATobaccoUse.csv")
# Check if imported correctly
# TUtable

Adding data for Cocaine Usages

In [None]:
CUwiki=pd.read_html("https://en.wikipedia.org/wiki/List_of_countries_by_prevalence_of_cocaine_use", header=0, flavor='bs4', attrs={'class': 'wikitable'})# find name of web element via inspect for "wikitable"

Adding Urbanization Data

In [None]:
UDwiki=pd.read_html("https://en.wikipedia.org/wiki/Urbanization_by_sovereign_state", header=0, flavor='bs4', attrs={'class': 'wikitable'})# find name of web element via inspect for "wikitable"

Adding Population Density Data

In [None]:
PDwiki=pd.read_html("https://en.wikipedia.org/wiki/List_of_countries_by_real_population_density_based_on_food_growing_capacity", header=0, flavor='bs4', attrs={'class': 'wikitable'})# find name of web element via inspect for "wikitable"

Adding Life Expectancy Data

In [None]:
LEwiki=pd.read_html("https://en.wikipedia.org/wiki/List_of_countries_by_past_life_expectancy", header=0, flavor='bs4', attrs={'class': 'wikitable'})# find name of web element via inspect for "wikitable"

_____


<a id='cleaning'></a>
## Cleaning

In [None]:
# Creat a copy and show only the 4th table on the page, all suicide rates
SRwiki=MHwiki[3].copy()
SRwiki

Columns look fine. Table looks ok except for second row.

In [None]:
# examine columns for errors
SRwiki.columns.to_list()

In [None]:
# Dropping first row of NaN and saving
SRwiki.drop(0, inplace=True)

In [None]:
# Check if first row was dropped
SRwiki.reset_index()
SRwiki

In [None]:
# replacing all asterisks and checking
SRwiki.replace("[*]", "", regex=True)
SRwiki.replace("\u202f", "", regex=True)
SRwiki

In [None]:
# replacing all asterisks and saving
SRwiki.replace("[*]", "", regex=True, inplace=True)
SRwiki.replace("\u202f", "", regex=True, inplace=True)

In [None]:
# Ensuring no leading or trailing spaces
SRwiki.Country.str.strip()

In [None]:
# Checking Country Names
SRwiki.Country.to_list()

In [None]:
# replace special characters with standard
SRwiki.replace("São Tomé and Príncipe", "Sao Tome and Principe", inplace=True)

In [None]:
# Checking if correct data types
SRwiki.info()

Column names, rows, and table data types are correctly showing.
* Countries have been cleaned of special characters and spaces.
* Columns correctly show country and years.
* Data types show country names as objects and Suicide rates as decimals.

Repeat cleaning and checking for Male Suicide Rate table.

In [None]:
MSRwiki=MHwiki[1].copy() #saving Male Suicide rate table as MSRwiki
MSRwiki

In [None]:
MSRwiki.columns.to_list() # check columns

In [None]:
# Dropping first row of NaN and saving
MSRwiki.drop(0, inplace=True)

In [None]:
MSRwiki.reset_index() #reset index and check
MSRwiki

In [None]:
# replacing all asterisks and saving
MSRwiki.replace("[*]", "", regex=True, inplace=True)
MSRwiki.replace("\u202f", "", regex=True, inplace=True)

In [None]:
MSRwiki.Country.str.strip() #remove leading and trailing spaces

In [None]:
# Checking Country Names
MSRwiki.Country.to_list()

In [None]:
# replace special characters with standard
MSRwiki.replace("São Tomé and Príncipe", "Sao Tome and Principe", inplace=True)

In [None]:
# Checking if correct data types
MSRwiki.info()

Repeat for Female Suicide Rates

In [None]:
# Create a copy of Female Suicide Rates as FSRwiki
FSRwiki=MHwiki[2].copy()
FSRwiki

In [None]:
FSRwiki.columns.to_list() # check columns

In [None]:
# Dropping first row of NaN and checking
FSRwiki.drop(0)

In [None]:
# Saving Changes
FSRwiki.drop(0, inplace=True)

In [None]:
FSRwiki.reset_index() #reset index and check
FSRwiki

In [None]:
# replacing all asterisks and unicode spaces, with check
FSRwiki.replace("[*]", "", regex=True)
FSRwiki.replace("\u202f", "", regex=True)
FSRwiki

In [None]:
#Saving changes
FSRwiki.replace("[*]", "", regex=True, inplace=True)
FSRwiki.replace("\u202f", "", regex=True, inplace=True)

In [None]:
FSRwiki.Country.str.strip() #remove leading and trailing spaces

In [None]:
# Checking Country Names
FSRwiki.Country.to_list()

In [None]:
# replace special characters with standard
FSRwiki.replace("São Tomé and Príncipe", "Sao Tome and Principe", inplace=True)

In [None]:
# Checking if correct data types
FSRwiki.info()

In [None]:
# Add Female suffix to Country Name
#FSRwiki=FSRwiki.Country.add_suffix('_Female') # depricated method make a column instead

#Make new column at position 2 to indicate Male, Female, or Both
FSRwiki.insert(1,'Sex','Female')
MSRwiki.insert(1,'Sex','Male')
SRwiki.insert(1,'Sex','All')

In [None]:
#checking new column
FSRwiki

In [None]:
# Checking new column
MSRwiki

In [None]:
# Checking new column
SRwiki

Adding more data from second source for analysis

In [None]:
# Creat a copy and show only the 4th table on the page, all suicide rates
Awiki=ARwiki[1].copy()
Awiki

In [None]:
# examine columns for errors
Awiki.columns.to_list()

In [None]:
# Take only relevant data of total alcohol consumend by country
Awiki.drop(Awiki.columns[[2,3,4,5,6,7,8,9]], axis=1)

In [None]:
# Take only relevant data of total alcohol consumend by country and save
Awiki.drop(Awiki.columns[[2,3,4,5,6,7,8,9]], axis=1, inplace=True)
Awiki

In [None]:
#sort by Country name
Awiki.sort_values("Country")

In [None]:
#sort by Country name and save
Awiki.sort_values("Country", inplace=True)
Awiki

In [None]:
#remove index
Awiki.set_index(['Country', 'Total'])

In [None]:
#saving changes
Awiki.set_index(['Country', 'Total'], inplace=True)
Awiki

In [None]:
# Reset index
Awiki.reset_index(inplace=True)
Awiki

_____


<a id='shaping'></a>

## Shaping

Now we put all the data into the same table.

In [None]:
# All data has been cleaned, formated, and checked. We merge tables
# Check all column names are the same among tables
set(SRwiki.columns)&set(MSRwiki)&set(FSRwiki)


In [None]:
ASRwiki=pd.concat([SRwiki,MSRwiki,FSRwiki]) #concatenate all tables
ASRwiki #show concatenated table

In [None]:
# Reset index for concatenated table
ASRwiki.reset_index() #reset index and check

In [None]:
# Save Changes
ASRwiki=ASRwiki.reset_index() #reset index and check
ASRwiki

Merging with data on alcohol

In [None]:
SRwiki.Country.to_list()

In [None]:
#Checking Country lists
Awiki.Country.to_list()

_____


<a id='exporting'></a>

## Exporting
Now that the data has been imported and cleaned, we export the data as .pkl so we can use it in Rstudio.


In [None]:
# Make sure rpy2 is installed
!pip show rpy2
# !pip install rpy2

In [None]:
# export SRwiki as SRwiki.pkl
ASRwiki.to_pickle("ASRwiki.pkl")
print("Exported to pickle.")

In [None]:
# export to ASRwiki as ASRwiki.csv
ASRwiki.to_csv("ASRwiki.csv")
print("Exported to .csv")

In [None]:
from rpy2.robjects import pandas2ri
pandas2ri.activate()

from rpy2.robjects.packages import importr

base = importr('base')
base.saveRDS(ASRwiki,file="ASRwiki.RDS")
print("Exported to .rds")