# Exploratory Analysis of The Gender Gap Index
Jaclyn Wilson and Maygan Miguez

[Link to our GitHub webpage](https://datasciencegendergapindex.github.io)

## Project Goals
The goal of this project is to investigate and determine how the gender gap in terms of health, education, economics, and politics by country has changed over the past eight years. 


## Project Dataset

The dataset we are considering working with is [The Global Gender Gap Index](https://data.world/hdx/29f2f52f-a9c2-4ff9-a99e-42b894dc18e9). This data is collected from [The Humanitarian Data Exchange](https://data.humdata.org/dataset/global-gender-gap-index-world-economic-forum). The Global Gender Gap Index measures the relative gaps between women and men over a large set of countries across four key areas: health, education, economics and politics. In this dataset, the gender gap index score is on a 0-1 scale with 0 being complete inequality and 1 being complete equality. We are interested in this dataset because it describes an important aspect of gender equality, especially between different countries. We would like to examine gender inequality in terms of different countries and how inequality has changed over time.


By examining each country we can also see how the four key areas - health, education, economics, and politics, influence the gender gap index. The ranks and scores of each of these areas are organized in the table-3b-detailed-rankings-2013-csv-2.csv by country. Because this data provides a score for each country individually every year, we can use this data to compare the trends from different countries and dive deeper into how these gender gap indexes are derived. We can also answer questions such as, what issues are most prevalent in the world? Furthermore, we can examine the data alongside other country-wide factor datasets such as datasets on each country’s predominant religion, government type, legislation, and social norms. Through this examination, we can answer questions such as what characteristics might contribute to each country’s gender gap score and how it has changed over time?


While this dataset only supplies data from 2006-2013, we plan to scrape data for the years 2014-2021 from the annual Gender Gap Index Reports found at the [World Economic Forum’s website](https://www.weforum.org/reports/ab6795a1-960c-42b2-b3d5-587eccda6023). Doing so will enable us to make more accurate judgements about current gender equality measures and predict the direction it is going in the future. 


##Collaboration Plan

Our collaboration plan is to meet on zoom once a week to work on the final project. We have set up a google colab to work on our code together and make any updates. From now to November 16, we plan on scraping three Gender Gap Index Reports per week and creating necessary graphs for Milestone 2. From November 16 to December 9, we plan on developing our insights, visualizations, and presentation for the final project and presentation.


## ETL (Extraction, Transform, and Load)

We loaded two of four dataset .xlsx files included in [this dataset folder](https://data.world/hdx/29f2f52f-a9c2-4ff9-a99e-42b894dc18e9) from data.world. This dataset shows the year, country, each country’s gender equality ranking, and each country’s gender equality score out of 1. The original data columns listed the rankings and scores for each year as separate columns. We tidied the data by consolidating the ranking and scores as two columns. We also created a year column and set it as the index.


## Rank & Score by Year Data 
Using the data displayed below we will be able to group by year, country or both in order to visualize how the gender gap index has changed over time. This could include regression techniques to compare the trends.

In [None]:
import pandas as pd
gender_gap_df = pd.read_excel('https://query.data.world/s/ac54rh56yuptbuju6gxqpbviyq22jz')

In [None]:
# Combine yearly ranks into one column & combine yearly scores into one column
df1 = pd.melt(gender_gap_df, id_vars=['Country', 'ISO3'], value_vars=['2013 Rank', '2012 Rank', '2011 Rank', '2010 Rank', '2009 Rank', '2008 Rank', '2007 Rank', '2006 Rank'], var_name='Year', value_name='Rank')
df2 = pd.melt(gender_gap_df, value_vars=['2013 Score', '2012 Score', '2011 Score', '2010 Score', '2009 Score', '2008 Score', '2007 Score', '2006 Score'], var_name='Year 2', value_name='Score')
df3 = pd.concat([df1, df2],axis=1).sort_index(level=1)

# Remove word 'rank' to get year for Year column
df3['Year']= df3['Year'].str.replace(r'\D+','', regex=True)

# Drop unnecessary column
df3 = df3.drop(['Year 2'], axis=1)

# Set year as index and sort
df3 = df3.set_index('Year').sort_index()
display(df3.head())
display(df3.dtypes)


Unnamed: 0_level_0,Country,ISO3,Rank,Score
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2006,Yemen,YEM,115.0,0.4595
2006,Trinidad and Tobago,TTO,45.0,0.6797
2006,Panama,PAN,31.0,0.6935
2006,Slovenia,SVN,51.0,0.6745
2006,Malawi,MWI,81.0,0.6437


Country     object
ISO3        object
Rank       float64
Score      float64
dtype: object

## 2013 Category Ranks & Scores Data
The data displayed below shows the overall rank and score of each country for 2013. The ranks and scores are broken down into the different categories. We have uploaded this data so that we can compare health, education, economics, and politics ranks and scores of countries alongside other datasets about country characteristics.

In [None]:
gender_gap_bygroups_df = pd.read_csv('https://query.data.world/s/pwtht6bojpu7qxkii2kvykxuejcfdg', encoding = "ISO-8859-1")
display(gender_gap_bygroups_df.head())
display(gender_gap_bygroups_df.dtypes)

Unnamed: 0,Country,ISO3,Overall Rank,Overall Score,Economic Participation and Opportunity Rank,Economic Participation and Opportunity Score,Educational Attainment Rank,Educational Attainment Score,Health and Survival Rank,Health and Survival Score,Political Empowerment Rank,Political Empowerment Score
0,Iceland,ISL,1,0.8731,22,0.7684,1,1.0,97,0.9696,1,0.7544
1,Finland,FIN,2,0.8421,19,0.7727,1,1.0,1,0.9796,2,0.6162
2,Norway,NOR,3,0.8417,1,0.8357,1,1.0,93,0.9697,3,0.5616
3,Sweden,SWE,4,0.8129,14,0.7829,38,0.9977,69,0.9735,4,0.4976
4,Philippines,PHL,5,0.7832,16,0.7773,1,1.0,1,0.9796,10,0.376


Country                                          object
ISO3                                             object
Overall Rank                                      int64
Overall Score                                   float64
Economic Participation and Opportunity Rank       int64
Economic Participation and Opportunity Score    float64
Educational Attainment Rank                       int64
Educational Attainment Score                    float64
Health and Survival Rank                          int64
Health and Survival Score                       float64
Political Empowerment Rank                        int64
Political Empowerment Score                     float64
dtype: object