<a href="https://colab.research.google.com/github/federicapennino/Data_analysis/blob/LABS/Exploring_Female_Labor_Force_Participation_and_Leadership.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Exploring Female Labor Force Participation and Leadership**

This analysis investigates the relationship between female labor force participation and factors such as women in national parliaments and female leadership in private firms. Using regression models, it explores how these factors influence female labor force participation and identifies extreme combinations of countries with contrasting metrics.

The study aims to:

* Understand the relationship between female labor force participation (dependent variable) and women in national parliaments (independent variable).

* Examine how the presence of female top managers in firms mediates or strengthens this relationship.

* Identify countries with extreme rankings to analyze societal inequalities.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf
import scipy.stats as stats
url = 'https://www.qogdata.pol.gu.se/data/qog_bas_cs_jan24.xlsx'
df = pd.read_excel(url)

In [None]:
df.head()

Unnamed: 0,ccode,cname,ccode_qog,cname_qog,ccodealp,ccodecow,version,ajr_settmort,atop_ally,atop_number,...,wvs_imprel,wvs_pmi12,wvs_psarmy,wvs_psdem,wvs_psexp,wvs_pssl,wvs_relacc,wvs_satfin,wvs_subh,wvs_trust
0,4,Afghanistan,4,Afghanistan,AFG,700.0,QoGBasCSjan24,4.540098,1.0,1.0,...,,,,,,,,,,
1,8,Albania,8,Albania,ALB,339.0,QoGBasCSjan24,,1.0,8.0,...,2.869328,,1.596485,3.849031,3.475513,1.744196,,,3.488758,0.027857
2,12,Algeria,12,Algeria,DZA,615.0,QoGBasCSjan24,4.35927,1.0,9.0,...,,,,,,,,,,
3,20,Andorra,20,Andorra,AND,232.0,QoGBasCSjan24,,1.0,2.0,...,2.03493,2.710393,1.336049,3.681363,2.635721,1.830491,1.751004,6.561316,4.089642,0.255744
4,24,Angola,24,Angola,AGO,540.0,QoGBasCSjan24,5.634789,1.0,8.0,...,,,,,,,,,,


**Bivariate Regression Analysis**

A simple Ordinary Least Squares (OLS) regression is performed to evaluate the relationship between female labor force participation (wdi_lfpf) and the proportion of women in national parliaments (wdi_wip).

**Dependent variable**: Labor force, female (% of total labor force) QoG Code: wdi_lfpf
Female labor force as a percentage of the total show the extent to which women are active in the labor force. Labor force comprises people ages 15 and older who meet the International Labour Organization’s definition of the economically active population.

**Independent variable**:Proportion of seats held by women in national parliaments (%) QoG Code: wdi_wip
Women in parliaments are the percentage of parliamentary seats in a single or lower chamber held by women.

In [None]:
# Distribution of dependent variable (labour force, female)
df[['wdi_lfpf']].describe()

Unnamed: 0,wdi_lfpf
count,178.0
mean,41.278666
std,8.984418
min,7.817223
25%,38.75056
50%,44.583534
75%,47.129507
max,52.735699


In [None]:
# Distribution of independent variable (Proportion of seats held by women in national parliaments)
df[['wdi_wip']].describe()

Unnamed: 0,wdi_wip
count,193.0
mean,23.762523
std,12.305465
min,0.0
25%,14.893617
50%,22.535212
75%,31.125828
max,61.25


I expect that as the proportion of seats held by women in national parliaments increases, the female labor force participation will also increase. This could be due to policies that benefit working mothers or reduce workplace discrimination, such as addressing the gender pay gap or promotion gap. I assume that women in politics are more resposive to women's needs.  

Ho: there is no relationship between the two variables

H1: there is a positive relationship between the two variables




In [None]:
FLF_Fparliament = smf.ols(formula = 'wdi_lfpf~wdi_wip', data = df, subset=df['vdem_corr'].notna()).fit()
print (FLF_Fparliament.summary())

                            OLS Regression Results                            
Dep. Variable:               wdi_lfpf   R-squared:                       0.063
Model:                            OLS   Adj. R-squared:                  0.057
Method:                 Least Squares   F-statistic:                     11.35
Date:                Thu, 24 Oct 2024   Prob (F-statistic):           0.000936
Time:                        13:59:04   Log-Likelihood:                -614.83
No. Observations:                 171   AIC:                             1234.
Df Residuals:                     169   BIC:                             1240.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     36.5523      1.551     23.574      0.0

The model is statistically significant at the 1% level, as the p-value is < 0.01. Hence, we can conclude that there is a relationship between the proportion of seats held by women in parliaments and the female labor force participation rate, allowing us to reject the null hypothesis (H0). Moreover, the F-statistics is a large value (above 2.5) that together with the small p-value, supports the rejection of the null hypothesis.

Additionally, the 95% confidence interval for the coefficient is (0.079, 0.303). This range does not include zero, which supports the conclusion drawn from the p-value, confirming that the relationship is statistically significant.

The R-squared value is approximately 6.3%, indicating that only this percentage of change in the female labor force participation rate is explained by the proportion of seats held by women in national parliaments. This value is relatively low, suggesting that the independent variable does not explain much of the variation of the dependent variable.

The intercept is 36.6 meaning that when everything else is 0 (in this case the independent variable) the predicted female labor force participation rate would be 36.55%.

The coefficient of the independent variable is  0.1913, meaning that for each unit of increase in the proportion of women in parliaments, the female labor force participation rate is expected to increase by 0.19%. The positive sign suggests a positive relationship between the two variables.

Furthermore, the Durbin-Watson test is a statistical test that detects autocorrelation in the residuals of a regression analysis. In this case the value is 1.85, which is within the acceptable range, hence there should be very low/ insignificant autocorrelation, which is a good sign for the regression.

Overall, it is possible to reject the null hypothesis (H0) and accept the alternative hypothesiis (H1). The regression results suggests that there is a positive and statistically significant relationship between the two variables. However in the model the independent variable (the proportion of seats held by women in national parliaments) only explains a small portion of the change in the dependent variable (female labour force partecipation rate).


**Including a Mediating Variable**

I will add the variable: Firms with female top manager (% of firms) QoG Code: wdi_firftopm
Firms with female top manager refers to the percentage of firms in the private sector who have females as top managers. Top manager refers to the highest ranking manager or CEO of the establishment. This person may be the owner if he/she works as the manager of the firm. The results are based on surveys of more than 100,000 private firms.

This variable could capture how women in leadership positions influence broader female labor force participation. The presence of female top managers might create more inclusive/ faily - friendly work environments, potentially making it easier for other women to enter and remain in the labor force. Indeed, in past literature, has been shown that women often prefer to work for a company with a female CEO, especially in certain circumstance for example after having a children.






In [None]:
df[['wdi_firftopm']].describe()

Unnamed: 0,wdi_firftopm
count,78.0
mean,17.478205
std,9.158938
min,1.6
25%,11.45
50%,17.200001
75%,21.674999
max,43.099998


Testing how the two independent variables corrleate:

In [None]:
df_filtered = df[['wdi_wip', 'wdi_firftopm']].dropna()
stats.pearsonr(df_filtered['wdi_wip'], df_filtered['wdi_firftopm'])

PearsonRResult(statistic=0.18015355897069918, pvalue=0.11449013700918519)

The Pearson correlation coefficient is 0.1801 and can assume values from 0 to 1, hence the result showed in this case is quite weak, although it indicates a positive correlation. Additionally, the p-value is too high to say that the  observed correlation is statistically significant.

Given that the aim is to use these two variables in a regression, is a positive sign that the correlation is low, having independent variables highly correlated ina regression lead to multicollinearity, making it hard to isolate the effect of each independent variable on the dependent variable.

Following, a new regression with two independent variables, wdi_wip (proportion of seats held by women in national parliaments) and wdi_firftopm (firms with female top managers).

In [None]:
FLF_Fparliament_CEO = smf.ols(formula = 'wdi_lfpf ~ wdi_wip + wdi_firftopm', data = df).fit()
print (FLF_Fparliament_CEO.summary())

                            OLS Regression Results                            
Dep. Variable:               wdi_lfpf   R-squared:                       0.333
Model:                            OLS   Adj. R-squared:                  0.315
Method:                 Least Squares   F-statistic:                     18.69
Date:                Thu, 24 Oct 2024   Prob (F-statistic):           2.60e-07
Time:                        13:59:15   Log-Likelihood:                -261.84
No. Observations:                  78   AIC:                             529.7
Df Residuals:                      75   BIC:                             536.7
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                   coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------
Intercept       29.7069      2.446     12.143   

The model is statistically significant at 5% level as the p-value for both the independent variables is below 0.05. Moreover, the F-statistics is a large value (above 2.5) that together with the small p-values, supports the significance of the relationship.

Additionally, the 95% confidence intervals does not include zero for either of the two variables, further supporting the conclusion given by the p-values, that the relationship is statistically significant.

R-squared is approximately 33.3% and indicates what amount of the variance in the female labor force participation rate is explained by the proportion of seats held by women in national parliaments and and firms with female top manager. This value is much higher than it was in the first regression carried out. Now the independent variables explain over than 1/3 of the changes in female labour force participation rate. This can lead to the conclusion that the new variable (firms with female top manager) significantly improves the explanatory power of the model.

Additionally, the coefficient of the new variable shows a stronger and highly significant relationship (p-value =0) with female labor force participation. For each one-unit increase of firms with female top managers, female labor force participation increases by 0.47 %.

Furthermore, the Durbin-Watson is a statistical test that detects autocorrelation in the residuals of a regression analysis. In this case the value 2.1 is within the acceptable range, hence there should be very low to no autocorrelation, which is a good sign for the regression.

Overall, the additional independent variable (firms with female top managers) improves consistently the ability of the model to explain changes in female labour force partecipation. The R-squared has increased and the p-value is highly significative. This suggests that the new added variable is a stronger predictor for female labour force partecipation than the proportion of women in parliaments.


 **Extreme Combinations: Identifying Outliers**

Identify countries with high female top manager representation (wdi_firftopm) but low female labor force participation (wdi_lfpf).


In [None]:
df['female_CEO_rank'] = df['wdi_firftopm'].rank(ascending=False)
df['female_labourforce_rank'] = df['wdi_lfpf'].rank(ascending=True)

extreme_comb = df[['cname', 'female_CEO_rank', 'female_labourforce_rank']].copy()
extreme_comb['rank_difference'] = extreme_comb['female_labourforce_rank'] - extreme_comb['female_CEO_rank']
extreme_sorted = extreme_comb.sort_values(by='rank_difference', ascending=False).head(5)

extreme_sorted

Unnamed: 0,cname,female_CEO_rank,female_labourforce_rank,rank_difference
96,Latvia,7.0,168.0,161.0
100,Lithuania,9.0,164.0,155.0
102,Madagascar,4.0,159.0,155.0
13,Armenia,27.0,178.0,151.0
27,Belarus,22.0,172.0,150.0


The differences exposed by the results in the above table, can suggest that in these 5 countries, for very educated women is quite easy to reach high level position, however is not for the general society. It would be interesting to see the average level of female education of these countries. It could in general point out that there are strong inequalities in each of the 5 countries, maybe due to high barrier to entry in the workforce.

**Conclusions**

This analysis highlights:

Positive Relationships: Women in national parliaments and female top managers positively influence female labor force participation.

Explanatory Power: Including wdi_firftopm significantly improves the model's ability to explain changes in labor force participation.

Structural Inequalities: Extreme combinations reveal inequalities where elite positions are accessible to women, but systemic barriers persist for broader participation.