Gender Biases in Student Evaluations of Teachers - A Randomized, Online Experiment
====================================================


In [1]:
# boilerplate
%matplotlib inline
import math
import numpy as np
import pandas as pd
from numpy.random import random
import scipy as sp
from scipy import special
import matplotlib.pyplot as plt
from __future__ import division

# initialize PRNG
rs = np.random.RandomState(seed=1)

Permutation test code
============
You must install the _permute_ package to use this code. Install instructions can be found at https://github.com/statlab/permute.

In [2]:
from permute.core import corr, two_sample, permute_within_groups

Read data
=================

Some notes on the variables:
* **group** identifies the section the student was placed in.
* **gender** refers to the student's gender: 1 = male, 2 = female.
* **tagender** is the instructor's true gender: 1 = male, 0 = female.
* **taidgender** is the instructor's reported gender: 1 = male, 0 = female.
* **grade** is on a scale from 0-100

Furthermore, the IRB did not allow grades to be linked to ratings. 4 students did not submit evaluations, but we do not know which ones. There are 43 ratings and 47 grades.

In [3]:
ratings = pd.read_csv("Macnell-RatingsData.csv")
categories = ratings.columns.values.tolist()[1:15]
ratings.head()

Unnamed: 0,group,professional,respect,caring,enthusiastic,communicate,helpful,feedback,prompt,consistent,fair,responsive,praised,knowledgeable,clear,overall,gender,age,tagender,taidgender
0,3,5,5,4,4,4,3,4,4,4,4,4,4,3,5,4,2,1990,0,1
1,3,4,4,4,4,5,5,5,5,3,4,5,5,5,5,4,1,1992,0,1
2,3,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,2,1991,0,1
3,3,5,5,5,5,5,3,5,5,5,5,3,5,5,5,5,2,1991,0,1
4,3,5,5,5,5,5,5,5,3,4,5,5,5,5,5,5,2,1992,0,1


In [4]:
grades = pd.read_csv("Macnell-GradeData.csv")
grades.head()

Unnamed: 0,group,grade,tagender,taidgender
0,3,77.4,0,1
1,3,89.02,0,1
2,3,53.5,0,1
3,3,88.32,0,1
4,3,90.02,0,1


# Analysis

## Evidence of gender bias

### Ratings vs reported instructor gender

In [5]:
(p, t) = two_sample(ratings['overall'][ratings.taidgender==1], ratings['overall'][ratings.taidgender==0], \
                              stat = 't', alternative = "two-sided", keep_dist = False)
print 'Overall rating:'
print 't statistic:', np.round(t, 5)
print 'P-value (two-sided):', np.round(p, 5)
print 'Number of evaluations for male-identified instructors:', np.sum(ratings.taidgender==1)
print 'Number of evaluations for female-identified instructors:', np.sum(ratings.taidgender==0)

print ('\n\n{0:24} {1:8} {2:8}'.format('Category', 't', 'p-value'))
for col in categories:
    (p, t) = two_sample(ratings[col][ratings.taidgender==1], ratings[col][ratings.taidgender==0], \
                              stat = 't', alternative = "two-sided", keep_dist = False)
    print ('{0:20} {1:8.2f} {2:8.2f}'.format(col, t, p))

Overall rating:
t statistic: 1.82159
P-value (two-sided): 0.0951
Number of evaluations for male-identified instructors: 23
Number of evaluations for female-identified instructors: 20


Category                 t        p-value 
professional             1.93     0.07
respect                  1.93     0.07
caring                   2.24     0.04
enthusiastic             2.14     0.05
communicate              2.06     0.06
helpful                  1.79     0.09
feedback                 1.63     0.14
prompt                   2.21     0.04
consistent               1.65     0.09
fair                     2.96     0.00
responsive               1.12     0.29
praised                  2.73     0.01
knowledgeable            1.64     0.13
clear                    1.37     0.19


### Ratings vs concordance of student and REPORTED instructor genders

In [6]:
ratings['gender_concordance'] = ( (ratings['gender']% 2)==ratings['taidgender'] )
stu_male = ratings[ratings['gender']==1]
stu_female = ratings[ratings['gender']==2]

(t, plow, pupper, pboth, sims) = corr(x = stu_male['overall'], \
                                      y = stu_male['gender_concordance'], seed = rs)
print 'Male students\n'
print 'Number of male students:', stu_male.shape[0], '\n'
print 'Correlation for overall rating:', t
print 'Upper p-value:', pupper
print 'Two-sided p-value:', pboth
print ('\n{0:15} {1:8} {2:8} {3:8}'.format('Category', 'Correlation',\
                                           'Upper p-value', 'Two-sided p-value'))
(t, plow, pupper, pboth, sims) = corr(x = stu_male['overall'], \
                                      y = stu_male['gender_concordance'], seed = rs)
print ('{0:15} {1:8.3f} {2:10.2f} {3:10.2f}'.format('Overall', t, pupper, pboth))

for col in categories:
    (t, plow, pupper, pboth, sims) = corr(x = stu_male[col], \
                                          y = stu_male['gender_concordance'], seed = rs)
    print ('{0:15} {1:8.3f} {2:10.2f} {3:10.2f}'.format(col, t, pupper, pboth))


(t, plow, pupper, pboth, sims) = corr(x = stu_female['overall'], \
                                      y = stu_female['gender_concordance'], seed = rs)
print '\nFemale students\n'
print 'Number of female students:', stu_female.shape[0], '\n'
print 'Correlation for overall rating:', t
print 'Upper p-value:', pupper
print 'Two-sided p-value:', pboth
print ('\n{0:15} {1:8} {2:8} {3:8}'.format('Category', 'Correlation', \
                                           'Upper p-value', 'Two-sided p-value'))
(t, plow, pupper, pboth, sims) = corr(x = stu_female['overall'], \
                                      y = stu_female['gender_concordance'], seed = rs)
print ('{0:15} {1:8.3f} {2:10.2f} {3:10.2f}'.format('Overall', t, pupper, pboth))

for col in categories:
    (t, plow, pupper, pboth, sims) = corr(x = stu_female[col], \
                                          y = stu_female['gender_concordance'], seed = rs)
    print ('{0:15} {1:8.3f} {2:10.2f} {3:10.2f}'.format(col, t, pupper, pboth))

Male students

Number of male students: 20 

Correlation for overall rating: 0.089757421773
Upper p-value: 0.4227
Two-sided p-value: 0.8122

Category        Correlation Upper p-value Two-sided p-value
Overall            0.090       0.42       0.81
professional       0.217       0.27       0.42
respect            0.217       0.23       0.35
caring             0.020       0.53       0.99
enthusiastic       0.090       0.42       0.81
communicate        0.123       0.35       0.65
helpful            0.211       0.19       0.36
feedback           0.040       0.44       0.93
prompt             0.377       0.08       0.14
consistent         0.073       0.43       0.83
fair               0.408       0.06       0.08
responsive         0.181       0.28       0.53
praised            0.287       0.13       0.24
knowledgeable      0.078       0.41       0.80
clear              0.056       0.41       0.78

Female students

Number of female students: 23 

Correlation for overall rating: -0.363538598

### As a sanity check -- Ratings vs concordance of student and ACTUAL instructor genders

Since the students didn't know the instructors' actual gender, we hope that there is no correlation between gender concordance and ratings.

In [7]:
ratings['gender_concordance_actual'] = ( (ratings['gender']% 2)==ratings['tagender'] )
stu_male = ratings[ratings['gender']==1]
stu_female = ratings[ratings['gender']==2]

(t, plow, pupper, pboth, sims) = corr(x = stu_male['overall'], \
                                      y = stu_male['gender_concordance_actual'], seed = rs)
print 'Male students\n'
print 'Number of male students:', stu_male.shape[0]
print 'Correlation:', t
print 'Upper p-value:', pupper
print 'Two-sided p-value:', pboth
print ('\n{0:15} {1:8} {2:8} {3:8}'.format('Category', 'Correlation',\
                                           'Upper p-value', 'Two-sided p-value'))
(t, plow, pupper, pboth, sims) = corr(x = stu_male['overall'], \
                                      y = stu_male['gender_concordance_actual'], seed = rs)
print ('{0:15} {1:8.3f} {2:10.2f} {3:10.2f}'.format('Overall', t, pupper, pboth))

for col in categories:
    (t, plow, pupper, pboth, sims) = corr(x = stu_male[col], \
                                          y = stu_male['gender_concordance_actual'], seed = rs)
    print ('{0:15} {1:8.3f} {2:10.2f} {3:10.2f}'.format(col, t, pupper, pboth))


(t, plow, pupper, pboth, sims) = corr(x = stu_female['overall'], \
                                      y = stu_female['gender_concordance_actual'], seed = rs)
print '\nFemale students\n'
print 'Number of female students:', stu_female.shape[0]
print 'Correlation:', t
print 'Upper p-value:', pupper
print 'Two-sided p-value:', pboth
print ('\n{0:15} {1:8} {2:8} {3:8}'.format('Category', 'Correlation',\
                                           'Upper p-value', 'Two-sided p-value'))
(t, plow, pupper, pboth, sims) = corr(x = stu_female['overall'], \
                                      y = stu_female['gender_concordance_actual'], seed = rs)
print ('{0:15} {1:8.3f} {2:10.2f} {3:10.2f}'.format('Overall', t, pupper, pboth))

for col in categories:
    (t, plow, pupper, pboth, sims) = corr(x = stu_female[col], \
                                          y = stu_female['gender_concordance_actual'], seed = rs)
    print ('{0:15} {1:8.3f} {2:10.2f} {3:10.2f}'.format(col, t, pupper, pboth))

Male students

Number of male students: 20
Correlation: -0.0718144366713
Upper p-value: 0.6896
Two-sided p-value: 0.7214

Category        Correlation Upper p-value Two-sided p-value
Overall           -0.072       0.69       0.72
professional       0.080       0.37       0.74
respect            0.080       0.45       0.84
caring            -0.106       0.73       0.59
enthusiastic      -0.072       0.57       0.82
communicate       -0.010       0.61       0.84
helpful            0.014       0.52       0.96
feedback          -0.117       0.70       0.69
prompt            -0.049       0.64       0.89
consistent         0.054       0.49       0.85
fair              -0.034       0.63       0.88
responsive        -0.064       0.56       0.84
praised            0.010       0.56       1.00
knowledgeable      0.106       0.42       0.70
clear             -0.119       0.72       0.65

Female students

Number of female students: 23
Correlation: 0.132379460479
Upper p-value: 0.33
Two-sided p-value

## Grades and instructor gender

### Course grade and reported instructor gender
Do students of male- and female-identified instructors perform equally, as measured by course grade? We do a two-sample permutation t-test.

In [11]:
(p, t) = two_sample(grades['grade'][grades.taidgender==1], grades['grade'][grades.taidgender==0], \
                              stat = 't', alternative = "two-sided")
print 'Course grade:'
print 't statistic:', np.round(t, 5)
print 'P-value (two-sided):', np.round(p, 5)
print 'Number of students of male-identified instructors:', np.sum(grades.taidgender==1)
print 'Number of students of female-identified instructors:', np.sum(grades.taidgender==0)

Course grade:
t statistic: 0.21442
P-value (two-sided): 0.8322
Number of students of male-identified instructors: 23
Number of students of female-identified instructors: 24


### Course grade and actual instructor gender
Do students of male and female instructors perform equally, as measured by course grade?  We do a two-sample permutation t-test.

In [12]:
(p, t) = two_sample(grades['grade'][grades.tagender==1], grades['grade'][grades.tagender==0], \
                              stat = 't', alternative = "two-sided")
print 'Course grade:'
print 't statistic:', np.round(t, 5)
print 'P-value (two-sided):', np.round(p, 5)
print 'Number of students of male instructors:', np.sum(grades.taidgender==1)
print 'Number of students of female instructors:', np.sum(grades.taidgender==0)

Course grade:
t statistic: 2.65325
P-value (two-sided): 0.00991
Number of students of male instructors: 23
Number of students of female instructors: 24


## References

MacNell, L., Driscoll, A., and Hunt, A.N. (2014), "What’s in a Name: Exposing Gender Bias in Student Ratings of Teaching," _Innovative Higher Education_, 1-13.