# Women, Business & Law
Measuring gender equality globally

WBL 2020 is the sixth in the series of biannual reports measuring gender differences in the law. In 2019 a study was released which piloted a new index that aggregates 35 data points across 8 scored indicators. The WBL index scores are based on the average of each economy’s scores for the 8 topics included in this year’s aggregate score. A higher score indicates more gender equal laws. The dataset was expanded in 2020 to include historical information dating back to 1970. This file contains Women, Business and the Law (WBL) data for 190 economies for 1970 to 2019 (reporting years 1971 to 2020). 

For more information about the methodology for data collection, scoring and analysis, visit http://wbl.worldbank.org.

This data source is so rich that enalbes multiple approaches of analysis, so I decided to reuse it for this week's assignment. 

## Setup

In [93]:
# install needed packages
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import os
# import geopandas as gpd
import plotly as ply
import plotly.express as px
from textwrap import fill
# from helpers import slug
%matplotlib inline
# %load_ext signature


In [44]:
# load data using pandas
wbl_data = pd.read_excel("wbl_data_50yearpanel_web_27feb2020.xlsx")

In [5]:
#show head and tail to make sure data is formatted correctly (no add'l headers/total lines)
wbl_data.head()

Unnamed: 0,ID,Economy,Code,Region,Income group,WBL Report Year,WBL INDEX,MOBILITY,Fam_CM_Passport,Fam_CM_TravelAbroad,...,Fam_AM_RightsImmovables,Fam_AM_InheritanceChildren,Fam_AM_InheritanceSpouses,Fam_AM_PropertyAdministration_formula,Fam_AM_NonmonetaryContributions,PENSION,Ages full benefits scored,Ages partial benefits scored,Ages mandatory retirement scored,Pension care credit
0,AFG1971,Afghanistan,AFG,South Asia,Low income,1971,26.3,25,No,Yes,...,Yes,No,No,Yes,No,25,No,No,Yes,No
1,AFG1972,Afghanistan,AFG,South Asia,Low income,1972,26.3,25,No,Yes,...,Yes,No,No,Yes,No,25,No,No,Yes,No
2,AFG1973,Afghanistan,AFG,South Asia,Low income,1973,26.3,25,No,Yes,...,Yes,No,No,Yes,No,25,No,No,Yes,No
3,AFG1974,Afghanistan,AFG,South Asia,Low income,1974,26.3,25,No,Yes,...,Yes,No,No,Yes,No,25,No,No,Yes,No
4,AFG1975,Afghanistan,AFG,South Asia,Low income,1975,26.3,25,No,Yes,...,Yes,No,No,Yes,No,25,No,No,Yes,No


In [7]:
#describe quantitative variables to understand distribution of values
wbl_data.describe()

Unnamed: 0,WBL Report Year,WBL INDEX,MOBILITY,WORKPLACE,PAY,MARRIAGE,PARENTHOOD,ENTREPRENEURSHIP,ASSETS,PENSION
count,9500.0,9500.0,9500.0,9500.0,9500.0,9500.0,9500.0,9500.0,9500.0,9500.0
mean,1995.5,59.126326,82.044737,40.944737,46.186842,61.810526,33.932632,72.010526,75.067368,60.889474
std,14.431629,18.058837,25.891629,32.477038,30.896172,29.475642,30.051149,20.924141,27.76675,29.110644
min,1971.0,17.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,1983.0,46.9,75.0,25.0,25.0,40.0,0.0,75.0,60.0,25.0
50%,1995.5,58.8,100.0,25.0,50.0,80.0,20.0,75.0,80.0,75.0
75%,2008.0,71.3,100.0,50.0,75.0,80.0,60.0,75.0,100.0,75.0
max,2020.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0


## Import dataset that maps to ISO code

Need ISO code to plot data into map

In [74]:
#make sure country names align in both datasets
wbl_data['Country'] = wbl_data['Economy']
# wbl_data.replace({'Country': {
#                              'Bahamas, The' : 'Bahamas'    
#                                }}, inplace=True)

wbl_data.replace({'Country': {
                             'United States' : 'United States of America', 
                             'Syrian Arab Republic' : 'Syrian Arab Republic (Syria)', 
                             'Gambia, The' : 'Gambia',
                             'Egypt, Arab Rep.' : 'Egypt',
                             'Hong Kong SAR, China' : 'Hong Kong, SAR China',
                             'Taiwan, China' : 'Taiwan, Republic of China',
                             'Bahamas, The' : 'Bahamas',
                             'Venezuela, RB' : 'Venezuela (Bolivarian Republic)',
                             'St. Lucia' : 'Saint Lucia',
                             'Korea, Rep.' : 'Korea (South)',
                             'North Macedonia' : 'Macedonia, Republic of',
                             'St. Kitts and Nevis' : 'Saint Kitts and Nevis',
                             'Kyrgyz Republic' : 'Kyrgyzstan',
                             'Iran, Islamic Rep.' : 'Iran, Islamic Republic of', 
                             'Puerto Rico (U.S.)' : 'Puerto Rico',
                             'Congo, Dem. Rep.' : 'Congo, (Kinshasa)',
                             'Congo, Rep.' : 'Congo (Brazzaville)',
                             'Tanzania' : 'Tanzania, United Republic of',
                             'Vietnam' : 'Viet Nam',
                             'Slovak Republic' : 'Slovakia',
                             'São Tomé and Príncipe' : 'Sao Tome and Principe',
                             'Yemen, Rep.' : 'Yemen',
                             'St. Vincent and the Grenadines' : 'Saint Vincent and Grenadines',
                             'Micronesia, Fed. Sts.' : 'Micronesia, Federated States of',
                             'Cabo Verde' : 'Cape Verde',
                             'Eswatini' : 'Swaziland'     
                               }}, inplace=True)

In [53]:
dfcountry = pd.read_excel('countryMap.xlsx')

In [41]:
dfcountry.head()

Unnamed: 0,Country,Alpha 2,Alpha 3,UN Code
0,Afghanistan,AF,AFG,4
1,Aland Islands,AX,ALA,248
2,Albania,AL,ALB,8
3,Algeria,DZ,DZA,12
4,American Samoa,AS,ASM,16


In [75]:
data = pd.merge(wbl_data, dfcountry, on='Country',how='left')
data.head()

Unnamed: 0,ID,Economy,Code,Region,Income group,WBL Report Year,WBL INDEX,MOBILITY,Fam_CM_Passport,Fam_CM_TravelAbroad,...,Fam_AM_NonmonetaryContributions,PENSION,Ages full benefits scored,Ages partial benefits scored,Ages mandatory retirement scored,Pension care credit,Country,Alpha 2,Alpha 3,UN Code
0,AFG1971,Afghanistan,AFG,South Asia,Low income,1971,26.3,25,No,Yes,...,No,25,No,No,Yes,No,Afghanistan,AF,AFG,4.0
1,AFG1972,Afghanistan,AFG,South Asia,Low income,1972,26.3,25,No,Yes,...,No,25,No,No,Yes,No,Afghanistan,AF,AFG,4.0
2,AFG1973,Afghanistan,AFG,South Asia,Low income,1973,26.3,25,No,Yes,...,No,25,No,No,Yes,No,Afghanistan,AF,AFG,4.0
3,AFG1974,Afghanistan,AFG,South Asia,Low income,1974,26.3,25,No,Yes,...,No,25,No,No,Yes,No,Afghanistan,AF,AFG,4.0
4,AFG1975,Afghanistan,AFG,South Asia,Low income,1975,26.3,25,No,Yes,...,No,25,No,No,Yes,No,Afghanistan,AF,AFG,4.0


In [76]:
#check that all new countries are properly categorized into Regions
blanks = data[['Economy','Country','Alpha 2','Alpha 3']]
blanks = blanks[blanks['Alpha 3'].isnull()]
blanks = blanks.drop_duplicates(subset=['Country'])
blanks

Unnamed: 0,Economy,Country,Alpha 2,Alpha 3
4600,Kosovo,Kosovo,,
9150,West Bank and Gaza,West Bank and Gaza,,


In [81]:
#fill null Codes

newcountries = {'Kosovo' : 'RKS', 
                'West Bank and Gaza' : 'PSE'  
               }

#use Palestine code as placeholder
#Currently Palestine claims the West Bank and the Gaza Strip, but in fact controls only about 40% of the West Bank.

data['Alpha 3'] = data['Alpha 3'].fillna(data.Country.map(newcountries))

blanks = data[['Economy','Country','Alpha 2','Alpha 3']]
blanks = blanks[blanks['Alpha 3'].isnull()]
blanks = blanks.drop_duplicates(subset=['Country'])
blanks


Unnamed: 0,Economy,Country,Alpha 2,Alpha 3


## Plot data on map

In [155]:
if not os.path.exists("images"):
    os.mkdir("images")

In [161]:
#plot 1971 WBL on map

yrmin = data['WBL Report Year'].min()

dfmin = data[data['WBL Report Year'] == yrmin]
himin = str(dfmin['WBL INDEX'].max())
mdmin = str(dfmin['WBL INDEX'].median())
lomin = str(dfmin['WBL INDEX'].min())

figmin = px.choropleth(dfmin, locations="Alpha 3",
                    color="WBL INDEX", 
                    hover_name="Country", # column to add to hover information
                    color_continuous_scale=px.colors.sequential.Plasma_r,
                    range_color=(0,100),
                    title=str(yrmin) + ' WBL Scores by Country'
                   )

figmin.update_layout(
    title_text=str(yrmin) + ' WBL Scores by Country',
    geo=dict(
        showframe=False,
        showcoastlines=False,
        projection_type='equirectangular'
    ),
    annotations = [dict(
        x=0.06,
        y=0.18,
        xref='paper',
        yref='paper',
        text='WBL Max: \t' + himin,
        showarrow = False
        ),
        dict(
        x=0.06,
        y=0.14,
        xref='paper',
        yref='paper',
        text='WBL Mid: \t' + mdmin,
        showarrow = False
        ),
        dict(
        x=0.06,
        y=0.10,
        xref='paper',
        yref='paper',
        text='WBL Min: \t' + lomin,
        showarrow = False
        )]
)


figmin.show()
figmin.write_image("images/figmin.png")

In [158]:
#plot 1995 WBL on map to highlight improvement rate differences between first half and second half of time range 

yrmid = int(data['WBL Report Year'].median())

dfmid = data[data['WBL Report Year'] == yrmid]
himid = str(dfmid['WBL INDEX'].max())
mdmid = str(dfmid['WBL INDEX'].median())
lomid = str(dfmid['WBL INDEX'].min())


figmid = px.choropleth(dfmid, locations="Alpha 3",
                    color="WBL INDEX", 
                    hover_name="Country", # column to add to hover information
                    color_continuous_scale=px.colors.sequential.Plasma_r,
                    range_color=(0,100),
                   )


figmid.update_layout(
    title_text=str(yrmid) + ' WBL Scores by Country',
    geo=dict(
        showframe=False,
        showcoastlines=False,
        projection_type='equirectangular'
    ),
    annotations = [dict(
        x=0.06,
        y=0.18,
        xref='paper',
        yref='paper',
        text='WBL Max: \t' + himid,
        showarrow = False
        ),
        dict(
        x=0.06,
        y=0.14,
        xref='paper',
        yref='paper',
        text='WBL Mid: \t' + mdmid,
        showarrow = False
        ),
        dict(
        x=0.06,
        y=0.10,
        xref='paper',
        yref='paper',
        text='WBL Min: \t' + lomid,
        showarrow = False
        )]
)


figmid.show()
figmid.write_image("images/figmid.png")

In [162]:
#plot 2020 WBL on map

yrmax = data['WBL Report Year'].max()

dfmax = data[data['WBL Report Year'] == yrmax]
himax = str(dfmax['WBL INDEX'].max())
mdmax = str(dfmax['WBL INDEX'].median())
lomax = str(dfmax['WBL INDEX'].min())


figmax = px.choropleth(dfmax, locations="Alpha 3",
                    color="WBL INDEX", 
                    hover_name="Country", # column to add to hover information
                    color_continuous_scale=px.colors.sequential.Plasma_r,
                    range_color=(0,100),
                   )

figmax.update_layout(
    title_text=str(yrmax) + ' WBL Scores by Country',
    geo=dict(
        showframe=False,
        showcoastlines=False,
        projection_type='equirectangular'
    ),
    annotations = [dict(
        x=0.06,
        y=0.18,
        xref='paper',
        yref='paper',
        text='WBL Max: \t' + himax,
        showarrow = False
        ),
        dict(
        x=0.06,
        y=0.14,
        xref='paper',
        yref='paper',
        text='WBL Mid: \t' + mdmax,
        showarrow = False
        ),
        dict(
        x=0.06,
        y=0.10,
        xref='paper',
        yref='paper',
        text='WBL Min: \t' + lomax,
        showarrow = False
        )]
)



figmax.show()
figmax.write_image("images/figmax.png")

## Stats for Presentation

In [164]:
mideast = data[data['Region'] == 'Middle East & North Africa']
mideast.head()

Unnamed: 0,ID,Economy,Code,Region,Income group,WBL Report Year,WBL INDEX,MOBILITY,Fam_CM_Passport,Fam_CM_TravelAbroad,...,Fam_AM_NonmonetaryContributions,PENSION,Ages full benefits scored,Ages partial benefits scored,Ages mandatory retirement scored,Pension care credit,Country,Alpha 2,Alpha 3,UN Code
150,ARE1971,United Arab Emirates,ARE,Middle East & North Africa,High income,1971,17.5,0,No,No,...,No,25,No,No,Yes,No,United Arab Emirates,AE,ARE,784.0
151,ARE1972,United Arab Emirates,ARE,Middle East & North Africa,High income,1972,17.5,0,No,No,...,No,25,No,No,Yes,No,United Arab Emirates,AE,ARE,784.0
152,ARE1973,United Arab Emirates,ARE,Middle East & North Africa,High income,1973,17.5,0,No,No,...,No,25,No,No,Yes,No,United Arab Emirates,AE,ARE,784.0
153,ARE1974,United Arab Emirates,ARE,Middle East & North Africa,High income,1974,17.5,0,No,No,...,No,25,No,No,Yes,No,United Arab Emirates,AE,ARE,784.0
154,ARE1975,United Arab Emirates,ARE,Middle East & North Africa,High income,1975,17.5,0,No,No,...,No,25,No,No,Yes,No,United Arab Emirates,AE,ARE,784.0


In [172]:
pivot = pd.pivot_table(mideast,index=['Economy'],columns=['WBL Report Year'],values=['WBL INDEX'],aggfunc=sum)
pivot['1971'] = pivot[('WBL INDEX',1971)]
pivot['1995'] = pivot[('WBL INDEX',1995)]
pivot['2020'] = pivot[('WBL INDEX',2020)]
pivot = pivot[['1971','1995','2020']]
pivot['50yr'] = pivot['2020'] - pivot['1971']
pivot = pivot.sort_values('50yr', ascending=True)
pivot

Unnamed: 0_level_0,1971,1995,2020,50yr
WBL Report Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Economy,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
"Iran, Islamic Rep.",29.4,26.3,31.3,1.9
"Yemen, Rep.",23.8,23.8,26.9,3.1
Qatar,29.4,29.4,32.5,3.1
West Bank and Gaza,23.1,23.1,26.3,3.2
Syrian Arab Republic,29.4,29.4,36.9,7.5
Lebanon,44.4,44.4,52.5,8.1
Kuwait,23.1,29.4,32.5,9.4
Oman,26.3,29.4,38.8,12.5
Libya,36.3,38.8,50.0,13.7
"Egypt, Arab Rep.",29.4,32.5,45.0,15.6


In [186]:
med = float(mdmax) - float(mdmin)

print('Number of countries in region: ', len(pivot.index))
print('Number of countries less than 50 WBL score: ', len(pivot.index[pivot['2020'] <= 50]))
print('Number of countries that improved by <15 points: ', len(pivot.index[pivot['50yr'] <= 15]))
print('Number of <50 countries that improved by <15 points: ', len(pivot.index[(pivot['2020'] <= 50) & (pivot['50yr'] <= 15)]))

print('Number of that improved by <', med,' points: ', len(pivot.index[pivot['50yr'] <= med]))

Number of countries in region:  20
Number of countries less than 50 WBL score:  12
Number of countries that improved by <15 points:  9
Number of <50 countries that improved by <15 points:  8
Number of that improved by < 30.699999999999996  points:  16
