# COGS 108 - Final Project 

# Overview

*Fill in your overview here*

# Names

- Seth D'Agostino
- Mikaila Keyes
- Jeffrey Chiu
- J. Cole

# Group Members IDs

- A13651408
- A14066254
- A12950096
- A########

# Research Question

We aim to explore how the socio-economic factors of a country (GDP, GDP per capita, HDI, etc.) may contribute to the rates of suicide mortality, engaging first with the assumption that economically underdeveloped nations will have higher rates.

## Background and Prior Work

According to the WHO, there are around 800,000 suicides across the world annually, which, if averaged, translates to one lost soul every 40 seconds (1). Astonishingly,  globally among people from 15 to 29 years old, suicide is the second leading cause of death (1). These simple, yet alarming statistics not only reveals a grim reality of the world we live in but also highlights the importance of this project. 

Death by suicide affects people across all spectrums of society. As a characteristic of modern society, no individual of a community is immune from suicide regardless of his or her culture, religion, economy, age, sex, or race. The pervasiveness of suicide could complicate the nature of suicide, but could be untangled by examining suicide against specific variables to identify correlations. Our datasets contain data for various countries across 1985-2016 about suicide rates amongst, age, and generation along with the GDP, GDP per capita, and HDI of the corresponding years. It might be helping in understanding any correlations between the economic development and/or growth of a nation and its national suicide rate. 

Group 021 from winter 2018 looked specifically at the suicide rates among veterans. The result of their study concluded that median threshold income, and not unemployment,  has a strong correlation with the veteran suicide rate. Instead of looking at suicide at a community level in America, we aim to examine suicide at a global scale.  

References (include links):
- 1) Suicide data. (2018, November 05). Retrieved from https://www.who.int/mental_health/prevention/suicide/suicideprevent/en/
- 2)

# Hypothesis


*Fill in your hypotheses here*

# Dataset(s)

*Fill in your dataset information here*

(Copy this information for each dataset)
- Dataset Name:
- Link to the dataset:
- Number of observations:

1-2 sentences describing each dataset. 

If you plan to use multiple datasets, add 1-2 sentences about how you plan to combine these datasets.

# Setup

In [38]:
# Display plots directly in the notebook instead of in a new window
%matplotlib inline

# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Data Wrangling

In [39]:
rates_85_16 = pd.read_csv("data/suicide-rates-overview-1985-to-2016.csv")
who_rate_by_country = pd.read_csv("data/WHO_suicide_mortality_by_country.csv")
who_country_metadata = pd.read_csv("data/WHO_country_metadata.csv")
print(rates_85_16)
print(who_rate_by_country)
print(who_country_metadata)

          country  year     sex          age  suicides_no  population  \
0         Albania  1987    male  15-24 years           21      312900   
1         Albania  1987    male  35-54 years           16      308000   
2         Albania  1987  female  15-24 years           14      289700   
3         Albania  1987    male    75+ years            1       21800   
4         Albania  1987    male  25-34 years            9      274300   
5         Albania  1987  female    75+ years            1       35600   
6         Albania  1987  female  35-54 years            6      278800   
7         Albania  1987  female  25-34 years            4      257200   
8         Albania  1987    male  55-74 years            1      137500   
9         Albania  1987  female   5-14 years            0      311000   
10        Albania  1987  female  55-74 years            0      144600   
11        Albania  1987    male   5-14 years            0      338200   
12        Albania  1988  female    75+ years       

# Data Cleaning

__rates_85_16:__  
- Got rid of country-year column of rates_85_16 because there is already a country and a year column

In [40]:
rates_85_16 = rates_85_16.drop(columns=['country-year'])
rates_85_16

Unnamed: 0,country,year,sex,age,suicides_no,population,suicides/100k pop,HDI for year,gdp_for_year ($),gdp_per_capita ($),generation
0,Albania,1987,male,15-24 years,21,312900,6.71,,2156624900,796,Generation X
1,Albania,1987,male,35-54 years,16,308000,5.19,,2156624900,796,Silent
2,Albania,1987,female,15-24 years,14,289700,4.83,,2156624900,796,Generation X
3,Albania,1987,male,75+ years,1,21800,4.59,,2156624900,796,G.I. Generation
4,Albania,1987,male,25-34 years,9,274300,3.28,,2156624900,796,Boomers
5,Albania,1987,female,75+ years,1,35600,2.81,,2156624900,796,G.I. Generation
6,Albania,1987,female,35-54 years,6,278800,2.15,,2156624900,796,Silent
7,Albania,1987,female,25-34 years,4,257200,1.56,,2156624900,796,Boomers
8,Albania,1987,male,55-74 years,1,137500,0.73,,2156624900,796,G.I. Generation
9,Albania,1987,female,5-14 years,0,311000,0.00,,2156624900,796,Generation X


__who_rate_by_country:__ 
- Got rid of the indicator name and code columns that contained the same information for the whole column.
- Extracted only the 2000, 2005, 2010, 2015, and 2016 columns since all the others are blank

In [41]:
who_rate_by_country = who_rate_by_country[['Country Name','Country Code', '2000', '2005', '2010', '2015', '2016']]
who_rate_by_country

Unnamed: 0,Country Name,Country Code,2000,2005,2010,2015,2016
0,Aruba,ABW,,,,,
1,Afghanistan,AFG,5.700000,6.300000,5.100000,4.800000,4.700000
2,Angola,AGO,7.900000,7.200000,5.700000,5.000000,4.700000
3,Albania,ALB,5.500000,6.700000,7.800000,6.000000,6.300000
4,Andorra,AND,,,,,
5,Arab World,ARB,4.328195,4.312462,4.206429,4.202179,4.266082
6,United Arab Emirates,ARE,3.200000,3.100000,3.000000,2.800000,2.800000
7,Argentina,ARG,9.500000,9.000000,8.700000,8.800000,9.200000
8,Armenia,ARM,3.300000,4.400000,6.000000,7.000000,6.600000
9,American Samoa,ASM,,,,,


__who_country_metadata:__
- Drop special notes columns
- Drop an empty Unnamed:5 column

In [42]:
who_country_metadata = who_country_metadata.drop(columns=['SpecialNotes'])
who_country_metadata = who_country_metadata.iloc[:,0:4]
who_country_metadata

Unnamed: 0,Country Code,Region,IncomeGroup,TableName
0,ABW,Latin America & Caribbean,High income,Aruba
1,AFG,South Asia,Low income,Afghanistan
2,AGO,Sub-Saharan Africa,Lower middle income,Angola
3,ALB,Europe & Central Asia,Upper middle income,Albania
4,AND,Europe & Central Asia,High income,Andorra
5,ARB,,,Arab World
6,ARE,Middle East & North Africa,High income,United Arab Emirates
7,ARG,Latin America & Caribbean,High income,Argentina
8,ARM,Europe & Central Asia,Upper middle income,Armenia
9,ASM,East Asia & Pacific,Upper middle income,American Samoa


- Merge who_rate_by_country and who_country_metadata by country Code
- Get rid of TableName column cuz it's just the same as Country Name
- Drop rows with Nan

In [46]:
who_rates_by_country = pd.merge(who_rate_by_country, who_country_metadata, on="Country Code")
who_rates_by_country = who_rates_by_country.drop(columns=['TableName'])
who_rates_by_country = who_rates_by_country.dropna()
who_rates_by_country

Unnamed: 0,Country Name,Country Code,2000,2005,2010,2015,2016,Region,IncomeGroup
1,Afghanistan,AFG,5.7,6.3,5.1,4.8,4.7,South Asia,Low income
2,Angola,AGO,7.9,7.2,5.7,5.0,4.7,Sub-Saharan Africa,Lower middle income
3,Albania,ALB,5.5,6.7,7.8,6.0,6.3,Europe & Central Asia,Upper middle income
6,United Arab Emirates,ARE,3.2,3.1,3.0,2.8,2.8,Middle East & North Africa,High income
7,Argentina,ARG,9.5,9.0,8.7,8.8,9.2,Latin America & Caribbean,High income
8,Armenia,ARM,3.3,4.4,6.0,7.0,6.6,Europe & Central Asia,Upper middle income
10,Antigua and Barbuda,ATG,2.0,1.2,0.3,0.8,0.5,Latin America & Caribbean,High income
11,Australia,AUS,13.2,12.3,12.5,13.8,13.2,East Asia & Pacific,High income
12,Austria,AUT,20.0,17.3,16.0,16.0,15.6,Europe & Central Asia,High income
13,Azerbaijan,AZE,2.2,3.2,3.1,2.7,2.6,Europe & Central Asia,Upper middle income


# Data Analysis & Results

Include cells that describe the steps in your data analysis.

In [5]:
## YOUR CODE HERE
## FEEL FREE TO ADD MULTIPLE CELLS PER SECTION

# Ethics & Privacy

*Fill in your ethics & privacy discussion here*

# Conclusion & Discussion

*Fill in your discussion information here*