# <font color=#D35400>WOMEN <font color=#F1C40F> ENTREPRENEURSHIP<font color=#1ABC9C > AND<font color=#8E44AD> LABOUR FORCE</font>

![](https://www.incimages.com/uploaded_files/image/1920x1080/shutterstock_794725570_353727.jpg)

## <font color=#0E6251 >Women Entrepreneurship Index </font>
- The Women Entrepreneurship Index (WEI) seeks to identify which factors enable the flourishing of high potential female entrepreneurs— women who own and operate businesses that are innovative, market expanding, and exportoriented. Through their entrepreneurial activities, high-potential female entrepreneurs improve their own economic welfare, and contribute to the economic and social fabric of society. 
- The WEI’s systematic approach enables crosscountry comparison and benchmarking of the gender differentiated conditions that often affect high potential female entrepreneurship development.
- As the world's first diagnostic tool for comprehensively identifying and analyzing the conditions that foster high potential female entrepreneurship development, the WEI does not simply measure the quantity of female entrepreneurs— rather WEI focuses on identifying a country’s strengths and weaknesses in terms of providing favorable conditions that could lead to high potential female entrepreneurship development.

In [None]:
!pip install seaborn --upgrade

import pandas as pd # dataframe manipulation
import numpy as np # linear algebra

# data visulization
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
%matplotlib inline
import seaborn as sns
print(sns.__version__)
sns.set_style('darkgrid')
import plotly.express as px
from statsmodels.graphics.gofplots import qqplot

import re # text data

import statsmodels.api as sm # regression


# custom pie chart labels
def make_autopct(values):
    def my_autopct(pct):
        total = sum(values)
        val = int(round(pct*total/100.0))
        return '{p:.2f}%\n({v:d})'.format(p=pct,v=val)
    return my_autopct

# function to compute percentile
def get_percentile(data, col, value):
    if value < np.percentile(data[col], 20):
        return 'below 20th percentile'
    elif value >= np.percentile(data[col], 20) and value < np.percentile(data[col], 40):
        return '20th-40th percentile'
    elif value >= np.percentile(data[col], 40) and value < np.percentile(data[col], 60):
        return '40th-60th percentile'
    elif value >= np.percentile(data[col], 60) and value < np.percentile(data[col], 80):
        return '60th-80th percentile'
    else:
        return '80th and above percentile'

# univariate distributions 
def univariate_dist(data, col, color=None, theme='ggplot', figsize=(10, 10), hist_bins='auto'):
    with plt.style.context(theme):
        fig = plt.figure(constrained_layout=True, figsize=figsize)
        spec = gridspec.GridSpec(3, 3, figure=fig)
        ax1 = fig.add_subplot(spec[0, :-1])
        ax1.set_title('Histogram', color='crimson')
        ax2 = fig.add_subplot(spec[1, :-1])
        ax2.set_title('QQ Plot', color='crimson')
        ax3 = fig.add_subplot(spec[:, 2])
        ax3.set_title('Boxplot', color='crimson')
        sns.histplot(data=data, x=col, ax=ax1, color=color, kde=True, bins=hist_bins)
        qqplot(data[col], fit=True, line='45', ax=ax2, color=color)
        sns.boxplot(y=data[col], ax=ax3, color=color)
        plt.suptitle(col.upper())
        return fig.show()

## <a id=toc>Table of Contents</a>

1. [Distributions of Categorical Data](#cat)
2. [Distributions of Numerical Data](#num)
3. [Women Entrepreneurship Index](#wei)
4. [Inflation Rate](#rate)
5. [Female Labor Force Participation Rate](#fpr)
6. [Regression Analysis](#reg)
7. [Insights](#insights)

# <font color='teal'>Lets look at the data</font>

In [None]:
df = pd.read_csv('/kaggle/input/women-entrepreneurship-and-labor-force/Dataset3.csv', sep=';')
df.drop(labels='No', axis=1, inplace=True)
df.head()

## Variable Types

In [None]:
print('Numeric Variables:')
for i, col in enumerate(df.select_dtypes('float64').columns):
    print(f'{i+1}. {col}')
print('\nCategrical Variables:')
for i, col in enumerate(df.select_dtypes('object').columns):
    print(f'{i+1}. {col}')

## Dataset Shape

In [None]:
df.shape

# <a id=cat><font color='maroon'>1. Distribution of Categorical Features</font></a>
[Back to Index](#toc)

- The sample data includes 27 Developed and 24 Developing countries, out of which 20 countries are European Union Members.

In [None]:
categories = ['Level of development', 'European Union Membership','Currency']
col = [['navy', 'crimson'], ['gold', 'mediumaquamarine'], ['hotpink', 'black']]
fig, ax = plt.subplots(1, 3, figsize=(12,6))
plt.subplots_adjust(hspace=0.5)
for i, (col, name) in enumerate(zip(col, categories)):
    count = df[name].value_counts()
    center_circle = plt.Circle((0,0), radius=0.8, color='white')
    ax[i].pie(count, labels=count.index, autopct=make_autopct(count), startangle=0,
       pctdistance=0.5, colors= col)
    ax[i].add_artist(center_circle)
    ax[i].set_title(name, fontweight='bold')
fig.show()

In [None]:
fig = px.choropleth(df, locations='Country', locationmode='country names', 
                    color='Level of development', projection='kavrayskiy7', 
                    title='Developed and Developing countries')
fig.show()

In [None]:
fig = px.choropleth(df, locations='Country', locationmode='country names', 
                    color='European Union Membership', projection='kavrayskiy7', 
                    title='European Union Members', scope='europe', 
                    hover_name='Currency', hover_data=['Level of development'],
                    color_discrete_map={'Member': 'gold', 'Non Member': 'mediumaquamarine'})
fig.show()

- All european union members are developed, as shown in the pivot table below.

In [None]:
pd.pivot_table(df, values='Country', index=['European Union Membership'],
               columns=['Level of development'],
               aggfunc='count', dropna=False, fill_value=0, margins=True)

- Not every European Union member has Euro as its national currency. Some exeptions include Sweden, Hungary, Poland, Crotia and Denmark, as shown below.

In [None]:
pd.pivot_table(df, values='Country', index=['European Union Membership'],
               columns=['Currency'],
               aggfunc='count', dropna=False, fill_value=0, margins=True)

# <a id=num><font color='dodgerblue'>2. Distribution of Numerical Features</font></a>
[Back to Index](#toc)

In [None]:
univariate_dist(df, 'Women Entrepreneurship Index', 'teal', hist_bins=10)

In [None]:
univariate_dist(df, 'Inflation rate', 'mediumvioletred')

In [None]:
univariate_dist(df, 'Entrepreneurship Index', 'c')

In [None]:
univariate_dist(df, 'Female Labor Force Participation Rate', 'darkorange')

# <a id=wei><font color=#E74C3C>3. Women Entrepreneurship Index</font></a>
[Back to Index](#toc)

## WEI Rankings
- The figure below shows the Women Entrepreneurship Index scores for all the countries included in the sample data (excluding white regions).

- All the countries having a WEI score below the 40th percentile are developing while all countries above the 60th percentile are developed. Countries with scores around the median includes developed as well as developing countries

In [None]:
# apply percentile function    
df['WEI percentile'] = df['Women Entrepreneurship Index'].apply(
    lambda x: get_percentile(df, 'Women Entrepreneurship Index', x)
)

colors = ['darkgreen', 'lightgreen', 'yellow', 'orange', 'red']
sorted_df = df.sort_values(by='Women Entrepreneurship Index', ascending=False)

# plot map
fig = px.choropleth(sorted_df, locations='Country', locationmode='country names', 
                    color='WEI percentile', projection='kavrayskiy7', 
                    title='Women Entrepreneurship Index Rankings', 
                    hover_data=['Level of development'], hover_name='Country',
                    color_discrete_map=dict(zip(sorted_df['WEI percentile'].unique(), colors)))
fig.update_geos(
    resolution=50,
    showland=True, landcolor="white",
    showocean=True, oceancolor="LightBlue"
)
fig.show()

In [None]:
with plt.style.context('ggplot'):
    plt.figure(figsize=(12, 8))
    sns.barplot(data=sorted_df, x='WEI percentile', 
                y='Women Entrepreneurship Index', 
                palette=dict(zip(sorted_df['WEI percentile'].unique(), colors)))
    plt.suptitle('Percentile average scores', fontsize=15)
    plt.show()

## Top 10 countries for Women Entrepreneurs

In [None]:
top_wei = pd.DataFrame(df[['Country', 'Women Entrepreneurship Index']].set_index('Country')['Women Entrepreneurship Index'].nlargest(10))
top_wei.style.background_gradient(cmap='PuBuGn')

## Top 10 countries for women entrepreneurs in developed and developing countries

In [None]:
developed = df[df['Level of development'] == 'Developed'][['Country', 'Women Entrepreneurship Index']]
developed = developed.nlargest(10, 'Women Entrepreneurship Index')
developed.set_index('Country', inplace=True)

developing = df[df['Level of development'] == 'Developing'][['Country', 'Women Entrepreneurship Index']]
developing = developing.nlargest(10, 'Women Entrepreneurship Index')
developing.set_index('Country', inplace=True)

with plt.style.context('fivethirtyeight'):
    fig, ax = plt.subplots(1, 2, figsize=(8,8))
    
    ax[0].barh(y=developed.index, width=developed['Women Entrepreneurship Index'],
              color='navy')
    for val, x in zip(developed['Women Entrepreneurship Index'], developed.index):
        ax[0].text(val/2, x, val, color='white', fontweight='bold')
    ax[0].invert_xaxis()
    ax[0].yaxis.tick_left()
    ax[0].set_xticks(np.arange(0, 101, 20))
    ax[0].set_yticklabels(developed.index, fontsize=12, fontweight='semibold')
    ax[0].set_xlabel('WEI score')
    ax[0].set_title('Developed Countries')
    
    ax[1].barh(y=developing.index, width=developing['Women Entrepreneurship Index'],
              color='crimson')
    for val, x in zip(developing['Women Entrepreneurship Index'], developing.index):
        ax[1].text(val/2, x, val, color='white', fontweight='bold')
    ax[1].yaxis.tick_right()
    ax[1].set_xticks(np.arange(0, 101, 20))
    ax[1].set_yticklabels(developing.index, fontsize=12, fontweight='semibold')
    ax[1].set_title('Developing Countries')
    ax[1].set_xlabel('WEI score')
    
    fig.legend(['Developed', 'Developing'], fontsize=12)
    fig.show()

## Worst 10 countries for women entrepreneurs

In [None]:
pd.DataFrame(sorted_df[['Country', 'Women Entrepreneurship Index']].set_index('Country')['Women Entrepreneurship Index'][-10:]).style.highlight_min(color='orange')

- The figure below shows the Women Entrepreneurship Index scores across developed and developing countries. The bubble size indicates the score. Larger the bubble, higher is the score. 
- It can be noticed that developing countries have smaller bubble sizes with respect to developed countries.

In [None]:
fig = px.scatter_geo(df, locations='Country', locationmode='country names', 
                    color='Level of development', projection='kavrayskiy7', 
                    title='Women Entrepreneurship Index across Level of Development', 
                   size='Women Entrepreneurship Index', )
fig.show()

- The boxplot and cumulative distribution function plot of WEI indicates a significant difference in the distribution across developed and developing countries.

In [None]:
with plt.style.context('bmh'):
    fig, ax = plt.subplots(1, 2, figsize=(12,4))
    sns.boxplot(data=df, x='Women Entrepreneurship Index', y='Level of development', 
                palette=dict(zip(df['Level of development'].unique(), ['crimson', 'navy'])),
                ax=ax[0])
    sns.ecdfplot(data=df, x='Women Entrepreneurship Index', hue='Level of development', 
                palette=dict(zip(df['Level of development'].unique(), ['crimson', 'navy'])),
                ax=ax[1])
    ax[0].set_xticks(np.arange(0, 101, 20))
    ax[1].set_xticks(np.arange(0, 101, 20))
    fig.show()

## Counteries focusing more on women entrepreneurs
All the countries on the right side of the plot indicates that these countries focus more on women entrepreneurs.

In [None]:
with plt.style.context('fivethirtyeight'):
    plt.figure(figsize=(10,10))
    plt.plot(np.arange(20, 81, 10), np.arange(20, 81, 10), '--r')
    plt.scatter(df['Women Entrepreneurship Index'], df['Entrepreneurship Index'])
    for x, y, name in zip(df['Women Entrepreneurship Index'], 
                          df['Entrepreneurship Index'], 
                          df['Country']):
        plt.text(x+1, y, str(name)[:3].upper(), fontsize=12)
    plt.text(60, 30, 'More focus on\nfemale entrepreneurs')
    plt.xlabel('Women Entrepreneurship score')
    plt.ylabel('Entrepreneurship score')
    plt.show()

# <a id=rate><font color=#7D3C98>4. Inflation Rate</font></a>
[Back to Index](#toc)

- Inflation is a general rise in the price level in an economy over a period of time, resulting in a sustained drop in the purchasing power of money. 
- When the general price level rises, each unit of currency buys fewer goods and services; consequently, inflation reflects a reduction in the purchasing power per unit of money – a loss of real value in the medium of exchange and unit of account within the economy.
- The common measure of inflation is the inflation rate, the annualized percentage change in a general price index, usually the consumer price index, over time.
- Negative Inflation Rate or deflation , opposite of inflation, is decrease in general price level of goods and services.

## Countries with a Negative Inflation Rate (or Deflation) and Positive Inflation Rate
- There are total 18 countries with a negative inflation rate and 30 countries with a positive inflation rate.

In [None]:
sns.set_style('darkgrid')
fig, ax = plt.subplots(1, 2, figsize=(10,10))
plt.subplots_adjust(wspace=0.5)
sns.barplot(data=df[df['Inflation rate'] < 0], x='Inflation rate', y='Country', 
                orient='h', ax=ax[0], hue='Level of development')
sns.barplot(data=df[df['Inflation rate'] > 0], x='Inflation rate', y='Country', 
                orient='h', ax=ax[1], hue='Level of development')
ax[1].yaxis.tick_right()
ax[0].set_ylabel(None)
ax[1].set_ylabel(None)
ax[0].set_title('Negative Inflation Rate')
ax[1].set_title('Positive Inflation Rate')
fig.show()

In [None]:
print('Number of countries with a negative inflation rate - ', 
      len(df[df['Inflation rate'] < 0]))
print('Number of countries with a positive inflation rate - ', 
      len(df[df['Inflation rate'] > 0]))

- The boxplot below indicates the distribution of inflation rate across the level of development of countries and european union membership.

In [None]:
px.box(df, x='Inflation rate', y='Level of development', 
       color='European Union Membership', hover_name='Country')

## Countries with a inflation rate of zero

In [None]:
df[df['Inflation rate'] == 0]

# <a id=fpr><font color=#52BE80>5. Female Labour participation rate</font></a>
[Back to Index](#toc)
- The figure below shows the Female Labor Force Participation Rate rankings for all the countries included in the sample data (excluding white regions).

In [None]:
# apply percentile function    
df['FLPR percentile'] = df['Female Labor Force Participation Rate'].apply(
    lambda x: get_percentile(df, 'Female Labor Force Participation Rate', x)
)

colors = ['darkgreen', 'lightgreen', 'yellow', 'orange', 'red']
sorted_df = df.sort_values(by='Female Labor Force Participation Rate', ascending=False)

# plot map
fig = px.choropleth(sorted_df, locations='Country', locationmode='country names', 
                    color='FLPR percentile', projection='kavrayskiy7', 
                    title='Female Labor Force Participation Rate Rankings', 
                    hover_data=['WEI percentile', 'Level of development'], hover_name='Country',
                    color_discrete_map=dict(zip(sorted_df['FLPR percentile'].unique(), colors)))
fig.update_geos(
    resolution=50,
    showland=True, landcolor="white",
    showocean=True, oceancolor="LightBlue"
)
fig.show()

In [None]:
with plt.style.context('ggplot'):
    plt.figure(figsize=(12, 8))
    sns.barplot(data=sorted_df, x='FLPR percentile', 
                y='Female Labor Force Participation Rate', 
                palette=dict(zip(sorted_df['WEI percentile'].unique(), colors)))
    plt.suptitle('Percentile average scores', fontsize=15)
    plt.show()

## Top 10 countries with the highest female labour participation
- These rankings of the top 10 countries are different from the rankings based on Women Entrepreneurs Index score

In [None]:
top_flpr = pd.DataFrame(df[['Country', 'Female Labor Force Participation Rate']].set_index('Country')['Female Labor Force Participation Rate'].nlargest(10))
top_flpr.style.background_gradient(cmap='PuBuGn')

# <a id=reg><font color=#D35400>6. Regression Analysis</font></a>
[Back to Index](#toc)

## Inflation Rate
- The plot below indicates that Inflation rate and WEI are inversely related. 
- However, the relationship is weak as the R squared value suggests that inflation rate (independent variable) only explains 20.75 percent of the variation in target variable (WEI). Point the cursor over the red trendline to see the R^2 value.

In [None]:
px.scatter(df, x='Inflation rate', y='Women Entrepreneurship Index', 
           trendline='ols', trendline_color_override='red', 
           hover_name='Country')

### Model Summary

In [None]:
X = df[['Inflation rate']].values
y = df['Women Entrepreneurship Index'].values
model = sm.OLS(y, sm.add_constant(X)).fit()
print(model.summary())

## Female Labor Force Participation Rate
- The scatter plot below indicates a positive correlation between female labour force participation rate and Women Entrepreneurship Index. 
- However, the relationship is weak as the R squared value suggests that Female Labor Force Participation Rate (independent variable) only explains 19.5 percent of the variation in target variable (WEI). Point the cursor over the red trendline to see the R^2 value.

In [None]:
px.scatter(df, x='Female Labor Force Participation Rate', 
           y='Women Entrepreneurship Index', 
           trendline='ols', trendline_color_override='red', 
           hover_name='Country')

### Model Summary
- Overall, the model is statistically significant as the p value is less than 0.05 

In [None]:
X = df[['Female Labor Force Participation Rate']].values
y = df['Women Entrepreneurship Index'].values
model = sm.OLS(y, sm.add_constant(X)).fit()
print(model.summary())

## Entrepreneurship Index
- The scatter plot below indicates a positive correlation between Entrepreneurship Index and Women Entrepreneurship Index. 
- The relationship is strong as the R sqaured value suggests that the independent variable (Entrepreneurship Index) explains 83.6 percent of the variation in target variable (WEI). Point the cursor over the red trendline to see the R^2 value.

In [None]:
px.scatter(df, x='Entrepreneurship Index', 
           y='Women Entrepreneurship Index', 
           trendline='ols', trendline_color_override='red', 
           hover_name='Country')

### Model Summary

In [None]:
X = df[['Entrepreneurship Index']].values
y = df['Women Entrepreneurship Index'].values
model = sm.OLS(y, sm.add_constant(X)).fit()
print(model.summary())

## Multiple Linear Regression

### Model summary with independent variables Inflation rate and Female Labor Force Participation Rate
- Inflation rate and Female Labor Force Participation Rate, together, explain 35.3 percent variation in the target variable. 
- Overall, the model is statiscally significant, including both the independent variables.
- The value of AIC and BIC have reduced as compared to the linear models, indicating a model closer to the truth.
- However, the remaining variation needs to studied by adding additional significant indepedent variables.

In [None]:
X = df[['Inflation rate', 'Female Labor Force Participation Rate']].values
y = df['Women Entrepreneurship Index'].values
model = sm.OLS(y, sm.add_constant(X)).fit()
print(model.summary())

### Model summary with independent variables Inflation rate, Female Labor Force Participation Rate and Entrepreneurship Index
- Inflation rate is statistically insignificant as p value is more than 0.05.

In [None]:
X = df[['Inflation rate', 'Female Labor Force Participation Rate', 'Entrepreneurship Index']].values
y = df['Women Entrepreneurship Index'].values
model = sm.OLS(y, sm.add_constant(X)).fit()
print(model.summary())

### Model summary with independent variables Female Labor Force Participation Rate and Entrepreneurship Index
- both the independent variables are statistcally significant with p value less than 0.05 and explain 85.7 percent of variation in the target  variable.

In [None]:
X = df[['Female Labor Force Participation Rate', 'Entrepreneurship Index']].values
y = df['Women Entrepreneurship Index'].values
model = sm.OLS(y, sm.add_constant(X)).fit()
print(model.summary())

# <a id=insights><font color='mediumvioletred'>7. Insights</font></a>
[Back to Index](#toc)

- The sample data includes 27 Developed and 24 Developing countries, out of which 20 countries are European Union Members.
- All european union members are developed nations.
- Not every European Union member has Euro as its national currency. Some exeptions include Sweden, Hungary, Poland, Crotia and Denmark
- All the countries having a Women Eentrepreneurship Index score below the 40th percentile are developing while all countries above the 60th percentile are developed. Countries with scores around the median includes developed as well as developing countries.
- There are total 18 countries with a negative inflation rate, 30 countries with a positive inflation rate and 3 countries with a inflation rate of zero.
- Female Labor Force Participation Rate and Entrepreneurship Index can be used to model WEI scores as they together explain 85.7 percent of variation in the target variable (WEI).