
Test for an education/gender interaction in wages
==================================================

Wages depend mostly on education. Here we investigate how this dependence
is related to gender: not only does gender create an offset in wages, it
also seems that wages increase more with education for males than
females.

Does our data support this last hypothesis? We will test this using
statsmodels' formulas
(http://statsmodels.sourceforge.net/stable/example_formulas.html).




In [None]:
import pandas as pd
import numpy as np
import seaborn
import statsmodels.formula.api as sm

### Load and massage the data



In [None]:
# EDUCATION: Number of years of education
# SEX: 1=Female, 0=Male
# WAGE: Wage (dollars per hour)
data = pd.read_csv('wages85.csv')

# Log-transform the wages, because they typically are increased with
# multiplicative factors
data['Wage'] = np.log10(data['Wage'])
data.head()

### Simple plotting



In [None]:
# Plot 2 linear fits for male and female.
seaborn.lmplot(y='Wage', x='Education', hue='Sex', data=data)

### Statistical analysis



In [None]:
# Note that this model is not the plot displayed above: it is one
# joined model for male and female, not separate models for male and
# female. The reason is that a single model enables statistical testing
result = sm.ols(formula='Wage ~ Education + Sex', data=data).fit()
print(result.summary())

In [None]:
result.params

The plots above highlight that there is not only a different offset in
wage but also a different slope

We need to model this using an interaction



In [None]:
result = sm.ols(formula='Wage ~ Education + Sex + Education * Sex',
                data=data).fit()
print(result.summary())

Looking at the p-value of the interaction of gender and education, the
data does not support the hypothesis that education benefits males
more than female (p-value > 0.05).



In [None]:
result.params