# Exploratory Data Analysis (EDA)


## 1. Report overview

### 1.1. Project Title: 	**World Women Empowerment: Demystifying Preconceptions About Women's Roles in Society**
### 1.2. Subtitle: A Global Analysis of Female Representation in Top Positions and Its Socio-Economic Impact (2004-2020)

![Womem Empowerment](https://png.pngtree.com/background/20220729/original/pngtree-women-empowerment-female-power-community-picture-image_1870427.jpg)

### 1.3. Context:

Despite progress towards gender equality, women often face stereotypes that directly or indirectly limit their roles to homestaying related jobs, all over the world, even in developed countries. 
This can contribute to discourage women from entering the workforce, especially in high-tech fields, and pursuing STEM education, contributing for a bigger gender gap.
As a woman in this society and as a woman in tech, I witness firsthand how male dominance and society steryotypes can negatively influence us.

Addressing these biases is crucial for empowering women all over the world.
    
- As a woman in tech, I want to understand my chances of succeeding in a top role, recognize my value, and understand how impactful we can be in leadership positions.

- As a business leader, I want to understand how gender diversity contributes to and impacts a country performance. When understanding this impact, I aim to promote and create more opportunities for women to join important decision-making positions.

- As a member of the general public, I want to understand the socio-economic implications of women's representation in leadership and the impacts per country on women with STEM education. I aim to combat stereotypes about the role of women in society and advocate for a more inclusive future.

This project uses data from The World Bank Group's Gender Statistics database (2004-2023) to explore these issues. It aims to empower women by highlighting their potential in leadership roles and by persuing STEM education, demonstrating the impact of gender diversity on organizational success, and advocating for societal change towards greater gender equality.


### 1.4. Goal:

In this EDA we are going to focus on the following questions:

- 1 - Which countries have the highest and lowest representation of women in top positions?
- 2 - How does this compare with the overall percentage of women in the workforce and in STEM?
- 3 - Is there a correlation between GDP and the percentage of women in top positions?
- 4 - Are countries with higher education levels for women (especially in STEM) showing better gender diversity in leadership?


### 1.5. Dataset used:

https://databank.worldbank.org/reports.aspx?ReportId=153622&Type=Table     World Bank Group's Gender Statistics database (2004-2023)

gender_statistics.csv - 	105 women data related indicators 

## 2. Data Overview

### 2.1. Import libraries


In [2]:
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
import folium
import numpy as np

### 2.2 Load data

In [3]:
dataset = pd.read_csv('./DataSet/gender_statistics.csv')
dataset

Unnamed: 0,Time,Country Name,Ratio of female to male labor force participation rate (%) (modeled ILO estimate),"Cost of business start-up procedures, male (% of GNI per capita)","Cost of business start-up procedures, female (% of GNI per capita)",Cost of business start-up procedures (% of GNI per capita),"Educational attainment, at least Bachelor's or equivalent, population 25+, female (%) (cumulative)","Educational attainment, at least Bachelor's or equivalent, population 25+, total (%) (cumulative)","Educational attainment, at least Master's or equivalent, population 25+, female (%) (cumulative)","Educational attainment, at least Master's or equivalent, population 25+, total (%) (cumulative)",...,"Start-up procedures to register a business, male (number)","Start-up procedures to register a business, female (number)","Tertiary education, academic staff (% female)",Time required to start a business (days),"Time required to start a business, male (days)","Time required to start a business, female (days)","Vulnerable employment, total (% of total employment) (modeled ILO estimate)","Vulnerable employment, female (% of female employment) (modeled ILO estimate)","Vulnerable employment, male (% of male employment) (modeled ILO estimate)",Women Business and the Law Index Score (scale 1-100)
0,2004,Afghanistan,20.176476,72.0,72.0,72.0,,,,,...,4.0,5.0,11.959570,9.5,9.0,10.0,91.435841,98.277575,90.073931,26.250
1,2004,Albania,70.592238,32.3,32.3,32.3,,,,,...,12.0,12.0,,40.0,40.0,40.0,57.190908,59.219696,55.720096,80.000
2,2004,Algeria,18.121593,14.6,14.6,14.6,,,,,...,13.0,13.0,31.616659,24.0,24.0,24.0,28.579172,30.601438,28.228000,40.625
3,2004,American Samoa,,,,,,,,,...,,,,,,,,,,
4,2004,Andorra,,,,,15.42804,16.31159,,,...,,,47.619049,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4340,,,,,,,,,,,...,,,,,,,,,,
4341,,,,,,,,,,,...,,,,,,,,,,
4342,,,,,,,,,,,...,,,,,,,,,,
4343,Data from database: Gender Statistics,,,,,,,,,,...,,,,,,,,,,


### 2.3. Get to know the data

In [5]:
#size
dataset.shape

(4345, 105)

In [33]:
#Check columns and columns' types
dataset.dtypes

Time                                                                                                                         object
Country Name                                                                                                                 object
Ratio of female to male labor force participation rate (%) (modeled ILO estimate)                                           float64
Cost of business start-up procedures, male (% of GNI per capita)                                                            float64
Cost of business start-up procedures, female (% of GNI per capita)                                                          float64
Cost of business start-up procedures (% of GNI per capita)                                                                  float64
Educational attainment, at least Bachelor's or equivalent, population 25+, female (%) (cumulative)                          float64
Educational attainment, at least Bachelor's or equivalent, population 25+, t

In [24]:
# general info:
pd.set_option('display.max_columns', None)
dataset.describe()

Unnamed: 0,Ratio of female to male labor force participation rate (%) (modeled ILO estimate),"Cost of business start-up procedures, male (% of GNI per capita)","Cost of business start-up procedures, female (% of GNI per capita)",Cost of business start-up procedures (% of GNI per capita),"Educational attainment, at least Bachelor's or equivalent, population 25+, female (%) (cumulative)","Educational attainment, at least Bachelor's or equivalent, population 25+, total (%) (cumulative)","Educational attainment, at least Master's or equivalent, population 25+, female (%) (cumulative)","Educational attainment, at least Master's or equivalent, population 25+, total (%) (cumulative)","Educational attainment, Doctoral or equivalent, population 25+, male (%) (cumulative)","Employers, female (% of female employment) (modeled ILO estimate)","Educational attainment, at least Master's or equivalent, population 25+, male (%) (cumulative)","Educational attainment, Doctoral or equivalent, population 25+, female (%) (cumulative)","Educational attainment, Doctoral or equivalent, population 25+, total (%) (cumulative)","Employers, male (% of male employment) (modeled ILO estimate)","Employers, total (% of total employment) (modeled ILO estimate)","Employment to population ratio, 15+, female (%) (modeled ILO estimate)","Employment to population ratio, 15+, male (%) (modeled ILO estimate)","Employment to population ratio, 15+, total (%) (modeled ILO estimate)",Expected Years of School,"Expected Years of School, Male","Expected years of schooling, female",Female share of employment in senior and middle management (%),Female professional and technical workers (% of total),"Female share of graduates in Arts and Humanities programmes, tertiary (%)","Female share of graduates from Science, Technology, Engineering and Mathematics (STEM) programmes, tertiary (%)","Female share of graduates in Education programmes, tertiary (%)","Female share of graduates in Health and Welfare programmes, tertiary (%)","Female share of graduates in Natural Sciences, Mathematics and Statistics programmes, tertiary (%)","Female share of graduates in Services programmes, tertiary (%)","Female share of graduates in unknown or unspecified fields, tertiary (%)","Female share of graduates in Agriculture, Forestry, Fisheries and Veterinary programmes, tertiary (%)","Female share of graduates in Business, Administration and Law programmes, tertiary (%)","Female share of graduates in Engineering, Manufacturing and Construction programmes, tertiary (%)","Female share of graduates in Information and Communication Technologies programmes, tertiary (%)","Female share of graduates in other fields than Science, Technology, Engineering and Mathematics programmes, tertiary (%)","Female share of graduates in Social Sciences, Journalism and Information programmes, tertiary (%)",Firms with female participation in ownership (% of firms),Firms with female top manager (% of firms),GDP growth (annual %),GDP per capita (Current US$),GDP (current US$),GDP per capita (constant 2010 US$),Gini index,"GNI per capita, PPP (current international $)","GNI per capita, Atlas method (current US$)","GNI, Atlas method (current US$)","Government expenditure on education, total (% of GDP)","Gross graduation ratio, tertiary, male (%)","Gross graduation ratio, tertiary, total (%)","Gross graduation ratio, tertiary, female (%)","Human Capital Index (HCI), Female (scale 0-1)",Human Capital Index (HCI) (scale 0-1),"Human Capital Index (HCI), Male (scale 0-1)","Labor force participation rate, female (% of female population ages 15+) (modeled ILO estimate)","Labor force participation rate, male (% of male population ages 15+) (modeled ILO estimate)","Labor force participation rate, total (% of total population ages 15+) (modeled ILO estimate)","Labor force, male","Labor force, female","Labor force, female (% of total labor force)","Labor force, total","Labor force with advanced education, female (% of female working-age population with advanced education)",Labor force with advanced education (% of total working-age population with advanced education),"Labor force with advanced education, male (% of male working-age population with advanced education)",Number of male directors,Number of male business owners,Number of male sole proprietors,Number of female sole proprietors,Number of female business owners,Number of female directors,"Population, female (% of total)","Population, total","Population, female","Population, male",Proportion of women in ministerial level positions (%),Ratio of female to male labor force participation rate (%) (national estimate),"Retirement age with full benefits, male","Retirement age with partial benefits, male","Retirement age with full benefits, female","Retirement age with partial benefits, female","School enrollment, tertiary, male (% gross)","School enrollment, tertiary (gross), gender parity index (GPI)","School enrollment, tertiary, female (% gross)","School enrollment, tertiary (% gross)","Self-employed, female (% of female employment) (modeled ILO estimate)","Self-employed, male (% of male employment) (modeled ILO estimate)","Self-employed, total (% of total employment) (modeled ILO estimate)",Share of female business owners (% of total business owners),Share of female sole proprietors (% of sole proprietors),Share of female directors (% of total directors),Share of male business owners (% of total business owners),Share of male sole proprietors (% of sole proprietors),Share of male directors (% of total directors),Start-up procedures to register a business (number),"Start-up procedures to register a business, male (number)","Start-up procedures to register a business, female (number)","Tertiary education, academic staff (% female)",Time required to start a business (days),"Time required to start a business, male (days)","Time required to start a business, female (days)","Vulnerable employment, total (% of total employment) (modeled ILO estimate)","Vulnerable employment, female (% of female employment) (modeled ILO estimate)","Vulnerable employment, male (% of male employment) (modeled ILO estimate)",Women Business and the Law Index Score (scale 1-100)
count,3737.0,2825.0,2825.0,2825.0,579.0,583.0,410.0,413.0,329.0,3552.0,410.0,329.0,330.0,3552.0,3552.0,3737.0,3737.0,3737.0,624.0,587.0,1892.0,1267.0,99.0,1103.0,952.0,1116.0,1117.0,896.0,966.0,508.0,1010.0,959.0,1063.0,891.0,943.0,903.0,364.0,266.0,3916.0,3960.0,3956.0,3897.0,1340.0,3629.0,3695.0,3698.0,2733.0,1314.0,1350.0,1316.0,526.0,601.0,525.0,3737.0,3737.0,3737.0,3737.0,3737.0,3737.0,3737.0,1730.0,1735.0,1733.0,463.0,490.0,495.0,495.0,490.0,463.0,4123.0,4123.0,4123.0,4123.0,2059.0,2226.0,179.0,86.0,179.0,86.0,2333.0,2333.0,2335.0,2448.0,3552.0,3552.0,3552.0,490.0,495.0,463.0,490.0,495.0,463.0,2825.0,2825.0,2825.0,1657.0,2825.0,2825.0,2825.0,3552.0,3552.0,3552.0,3780.0
mean,70.814785,45.046265,45.048637,45.047717,17.024065,16.427465,6.138292,6.285894,0.739483,1.910313,6.375441,0.454802,0.601403,4.293341,3.351709,46.123862,65.974464,56.177703,11.263106,11.246926,13.917819,30.84118,47.717172,64.000392,33.28853,72.965277,71.824222,55.934978,47.095502,51.2399,44.118465,56.825115,27.291106,29.771382,63.358228,64.099002,33.033516,18.005639,3.229911,16991.394905,348564100000.0,15762.319436,36.708881,18875.522182,14067.447903,369944600000.0,4.438442,24.091468,29.650315,36.247299,0.592871,0.569469,0.560557,50.241197,70.966804,60.716516,10517070.0,6908281.0,41.088764,17425340.0,73.752157,78.063102,82.03988,31154.12311,28924.979592,65662.94,43732.35,11520.646939,14289.269978,50.001263,33195750.0,16496390.0,16699360.0,19.163314,71.23488,61.827095,58.784302,60.646927,57.793023,36.359029,1.090841,45.90703,40.333432,42.932208,40.910617,41.744225,23.914349,34.056683,23.116707,76.085651,65.94262,76.883293,8.201416,8.151193,8.280396,40.192498,32.230265,32.171604,32.300807,38.392512,41.021927,36.61728,71.355489
std,19.330627,95.978841,95.979222,95.979195,11.097828,9.969252,6.426935,6.18205,0.790935,1.817962,6.05386,0.503124,0.635696,2.82662,2.361435,15.219065,10.367168,11.418442,2.369802,2.284394,3.458119,10.312578,11.917266,14.114406,10.957656,16.41533,13.346609,14.19038,18.124404,21.070342,15.018743,11.27458,10.267713,14.546454,10.44312,11.907855,15.3447,9.856629,5.940514,25731.56334,1531911000000.0,22998.193081,7.939846,20646.639836,19682.027127,1591092000000.0,1.875639,14.552465,17.859997,21.935133,0.151399,0.145568,0.14053,14.940829,8.620659,10.216971,42056420.0,27953840.0,9.17728,68899080.0,11.342376,7.829576,7.210407,60131.398535,52532.487266,310093.5,232040.0,26552.146348,36445.197867,2.928859,131643000.0,64183450.0,67473060.0,13.233832,18.286051,3.764904,5.42593,4.25444,5.545126,24.98896,0.317842,32.968325,28.265939,31.658739,23.937113,27.017495,9.192244,11.873485,9.831016,9.192244,11.873413,9.831016,3.408475,3.373903,3.433592,13.800551,47.889702,47.882774,47.89355,27.582484,31.906246,24.683111,18.490789
min,6.985358,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023486,0.0,0.0,0.0,0.049035,0.038077,3.418,34.359,22.634,4.156989,4.14812,2.67424,1.194,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,14.63415,0.0,0.0,15.84646,0.0,2.2,0.1,-54.335876,128.538423,22798270.0,262.184805,23.2,460.0,0.0,11967.81,0.127174,0.0,0.0,0.0,0.27825,0.286075,0.293965,4.828,44.661,31.402,21492.0,14311.0,6.537578,36260.0,18.681,29.631,28.705,2.0,3.0,2.0,6.0,1.0,1.0,23.394217,9791.0,4878.0,4914.0,0.0,9.229758,50.0,45.0,50.0,45.0,0.11494,0.06434,0.1199,0.11737,0.176692,0.367699,0.392793,0.947568,0.656969,0.456621,33.333333,25.0,51.539895,1.0,1.0,1.0,1.22511,0.5,0.5,0.5,0.138086,0.035968,0.152085,23.75
25%,61.173241,4.7,4.7,4.7,8.09284,9.007685,1.046438,1.43531,0.23578,0.818423,1.571588,0.10802,0.19366,1.95537,1.580454,38.64,59.725,49.763,9.70936,9.74159,12.08014,24.5235,41.5,58.35041,28.047045,69.822005,67.09343,48.64982,34.907792,39.034617,35.599555,51.04254,21.125585,19.19742,60.77643,59.8603,22.675,11.525,1.139701,1856.920716,5125616000.0,2087.693417,30.675,4180.0,1695.0,6077316000.0,3.14897,11.842478,13.68414,16.935902,0.46545,0.44349,0.437187,42.927,65.715,54.99,604924.0,393592.0,38.328808,1035268.0,70.12625,74.5805,77.509,2633.0,2459.0,2326.0,936.0,745.75,620.5,49.626645,743620.0,351211.5,372196.5,9.4,63.642806,60.0,55.0,58.0,55.0,14.186393,0.92395,15.80405,14.877598,11.836135,19.866997,16.143666,16.547669,27.920564,15.184591,68.692628,58.786436,69.800976,6.0,6.0,6.0,34.0616,11.0,11.0,11.0,12.844412,9.862947,15.484846,59.375
50%,76.279891,14.6,14.6,14.6,15.75087,15.66409,2.70213,2.8576,0.5414,1.596863,3.053795,0.3073,0.456115,4.131915,3.124397,47.658,65.428,56.585,12.066183,11.9,14.418295,31.601,50.0,66.00596,33.68854,76.388515,74.72527,56.676765,48.711805,53.31026,45.451295,57.57036,27.33678,26.84211,65.14363,67.1349,32.75,17.2,3.452995,6044.474254,21205520000.0,5878.75394,35.2,11440.0,5190.0,23868120000.0,4.25476,24.7336,29.930941,35.68689,0.603049,0.574536,0.559381,52.259,70.771,61.002,1988407.0,1373024.0,44.448261,3471429.0,76.126,79.286,82.61,7337.0,8958.5,8988.0,4769.0,2343.0,2077.0,50.329647,5872624.0,2902421.0,2936844.0,16.7,75.840393,61.5,60.0,60.0,60.0,34.118271,1.18439,43.013344,37.794979,34.838017,35.570931,35.681703,23.239243,35.469108,22.9271,76.760757,64.530892,77.0729,8.0,8.0,8.0,42.201832,19.0,19.0,19.0,31.427071,32.171849,30.062551,75.0
75%,84.519907,44.9,44.9,44.9,24.904175,23.52812,10.707455,11.03294,1.01818,2.49327,11.483185,0.6397,0.81565,5.834815,4.440046,55.644,73.029,62.87,13.123641,13.076027,16.302765,37.454,55.0,72.28018,40.373883,82.576253,79.57887,65.34657,59.208937,63.517798,54.324447,64.60072,33.804375,39.40747,69.124785,71.042645,42.55,23.575,5.900001,22022.5337,153810800000.0,19573.857995,41.525,26390.0,17445.0,171874700000.0,5.407137,34.062907,43.619558,53.029733,0.730662,0.690006,0.678,59.519,76.918,66.77,6304636.0,4164589.0,47.151275,10401330.0,80.87225,83.1715,86.908,27768.5,23946.0,30277.0,15849.5,7334.5,7080.0,51.031809,21484940.0,10834430.0,10655320.0,26.3,83.923812,65.0,62.925,65.0,61.75,54.115719,1.29848,71.481991,62.046467,73.337719,59.889304,65.149694,31.307372,41.213564,30.199024,83.452331,72.079436,84.815409,10.0,10.0,10.0,49.282299,38.0,38.0,38.0,62.563439,71.979356,56.592673,85.0
max,106.522469,1491.6,1491.6,1491.6,60.980579,59.26088,28.71104,28.15081,7.78851,17.575264,27.54751,3.88895,5.47437,20.064412,17.294603,83.24,96.498,88.753,13.96,13.970139,23.5853,63.984,71.0,100.0,75.0,100.0,100.0,100.0,100.0,100.0,100.0,94.23077,100.0,100.0,91.22807,100.0,86.8,64.8,86.826748,240862.182448,25439700000000.0,228667.935283,64.8,152630.0,125210.0,25586010000000.0,15.585125,98.154572,132.30069,169.243622,0.900133,0.887084,0.875118,87.123,96.567,88.87,432739500.0,354034700.0,53.574437,781808300.0,100.0,100.0,100.0,375857.0,354579.0,4791536.0,3623478.0,183226.0,286897.0,55.040714,1417173000.0,691528500.0,731180500.0,66.666667,124.007209,70.0,67.0,70.0,67.0,144.787888,1.90133,156.148895,150.201767,99.312356,92.455785,94.661079,66.666667,75.0,48.460105,99.052432,99.343031,99.543379,21.0,21.0,22.0,84.533607,697.0,697.0,697.0,93.991176,98.400922,91.399964,100.0


In [35]:
#Check NA per column, i.e., nr of na rowns per column
dataset.isna().sum()

Time                                                                                                                           3
Country Name                                                                                                                   5
Ratio of female to male labor force participation rate (%) (modeled ILO estimate)                                            608
Cost of business start-up procedures, male (% of GNI per capita)                                                            1520
Cost of business start-up procedures, female (% of GNI per capita)                                                          1520
Cost of business start-up procedures (% of GNI per capita)                                                                  1520
Educational attainment, at least Bachelor's or equivalent, population 25+, female (%) (cumulative)                          3766
Educational attainment, at least Bachelor's or equivalent, population 25+, total (%) (cumulative)

In [56]:
#Count duplicates
def null_rows(df, exclude=None):
    if exclude is None:
        exclude = [] 
    else:
        exclude
    return df[df[df.columns.difference(exclude)].isna().all(1)]


print(null_rows(dataset, ['Country Name']))

     Time Country Name  \
4340  NaN          NaN   
4341  NaN          NaN   
4342  NaN          NaN   

      Ratio of female to male labor force participation rate (%) (modeled ILO estimate)  \
4340                                                NaN                                   
4341                                                NaN                                   
4342                                                NaN                                   

      Cost of business start-up procedures, male (% of GNI per capita)  \
4340                                                NaN                  
4341                                                NaN                  
4342                                                NaN                  

      Cost of business start-up procedures, female (% of GNI per capita)  \
4340                                                NaN                    
4341                                                NaN                    
4342 