In [2]:
import pandas as pd
import numpy as np
import researchpy as rp
import math
import seaborn as sns
import matplotlib.pyplot as plt

# Table of Contents
- [Overview](#overview)
- [Names](#names)
- [Research Question](#research_question)
- [Background](#background)
- [Hypothesis](#hypothesis)
- [Datasets](#datasets)
- [Data Cleaning](#data_cleaning)
- [Data Analysis](#data_analysis)
- [Conclusion](#conclusion)

<a id='overview'></a>
# Overview 
In the book Invisible Women, the author Caroline Criado Perez discusses that “decades of research [...] show that teaching evaluation forms are worse than useless at actually evaluating teaching and are in face ‘biased against female instructors by an amount that is large and statistically significant”’ Based on this claim we plan to research student evaluations on CAPEs at UC San Diego and see if we are able to identify any biases in student expectations of women vs men professors. We are focusing the research on the variables: student average grade expected, student average grade received,  and whether the student would recommend the professor. We conducted our analysis using [insert stats tests here] in order to determine if there is a correlation between student average grade expected and received and whether the student would recommend the professor for men and women. Upon analysis,...

<a id='names'></a>
# Names
* Niharika Bhaskar
* Nicole Martindale

<a id='research_question'></a>
# Research Question
### <span style="color:DarkCyan">To what extent does a professor being a women and a profesor being a man change how students reccomend them and how students percieve the grade they expect to recieve?</span>

<a id='background'></a>
# Background

#### Why is it of interest to us

As women computer scientists who want to pursue higher education (Master's degree, PhD, etc), this is of interest to us because there are very few women who are computer science professors and learning about the explicit and implicit bias they face is important as we continue in academia. We have personally seen the benefits of women role models especially in roles like professors and teachers and we hope to bring light to these inequities. 

Imposter Syndrome is something many people face, especially women in STEM. Negative evaluations can often contribute to beliefs that can cause harm and perpetuate self-doubt. We want to bring awareness of these biases so that the ratings are not seen as the most important way of determining the efficacy of a class and especially, the professor. 


#### What background information led you to your hypothesis.

In the book Invisible Women, the author Caroline Criado Perez discusses how student expectations of women professors can create hurdles that may not be experienced if that individual was a man. Perez explains that “students are also more likely to request extensions, grade boosts, and rule-bending of female academics” (Perez 97). This expectation that women professors will be more lenient may translate into what grade students expect in the class. If this expectation is not met students may be more upset. Furthermore, in the book Perez cites that “an analysis of 14 million reviews on the website RateMyProfessors.com found that female professors are more likely to be ‘mean, ‘harsh, ‘unfair’, ‘strict’, and ‘annoying’”(Perez 97). There seems to be evidence that students do not evaluate men professors and women professors at the same standard. Consequently, there is statistically significant evidence that teaching evaluations are biased against women. In addition, there is evidence that there are significantly fewer women computer science professors and STEM departments as a whole.

#### Why is this important?
While the project may not be able to fix any explicit biases that may be present in students, identifying and being aware of any implicit biases for individuals may be beneficial. This research project may be able to suggest that there may be biases against women professors across different departments, including various STEM departments and ultimately the Computer Science department. This will hopefully motivate students to be more aware of their own actions when it comes to evaluating professors in the future. 

Ultimately, I think the project will also suggest that students and anyone referring to CAPES and other evaluation tools should not jump to conclusions about the results of those evaluations, as there may be hidden biases against women baked into the results of those evaluations as a result of explicit and implicit biases. Ultimately, it is essential to evaluate possible inequities in evaluations to guarantee an equitable work environment for women in academia.

<a id='hypothesis'></a>
# Hypothesis
Based on the various studies cited in the book Invisible Women and extrapolating the results of those studies on the categories presented on CAPES we hypothesize that women professors on average will have higher grades expected from students, but will be recommended at lower rates compared to men professors.

<a id='datasets'></a>
# Dataset(s)
1. **Jacobs School of Engineering CAPE Evaluations**
    - **Name**:  jacobs_eng_capes.csv
    - [Link to Data](https://docs.google.com/spreadsheets/d/1MjLio0p3HbPYfjGPjIj8YpI0lgx_qnKyWSPae-fgIBU/edit?usp=sharing)
    - **Description**: This dataset contains data on the ratings of women and men professors from the various departments within the Jacobs School of Engineering from the years 2007 to 2021
    - **Source**: Course and Professor Evaluation (CAPE) 
    - **Observations**:

| Variable           	                        | Description 	|
|:--------------------	                        |:-------------	|
| Rcmnd Instr           	                        | % of students who reccomend instructor       	|
| Avg Grade Expected              	                        | average grade student expected       	|
| Avg Grade Received               	                        | average grade student recieved            	|

2. **Biological Sciences CAPE Evaluations**
    - **Name**: bio_sci_capes.csv
    - [Link to Data](https://docs.google.com/spreadsheets/d/1ejbpa4pENkSDyB56eFo5WGleFBTcSHw_OTwTKmw4H20/edit?usp=sharing)
    - **Description**: This dataset contains data on the ratings of women and men professors from the biological sciences department from the years 2007 to 2021
    - **Source**: Course and Professor Evaluation (CAPE) 
    - **Observations**:

| Variable           	                        | Description 	|
|:--------------------	                        |:-------------	|
| Rcmnd Instr           	                        | % of students who reccomend instructor       	|
| Avg Grade Expected              	                        | average grade student expected       	|
| Avg Grade Received               	                        | average grade student recieved            	|

2. **Humanities CAPE Evaluations**
    - **Name**: humanities_capes.csv
    - [Link to Data](https://docs.google.com/spreadsheets/d/1RXHr4ROl9AboFbWTegWW0iDBkq5jpqzijdehjE3RSOA/edit?usp=sharing)
    - **Description**: This dataset contains data on the ratings of women and men professors from the humanities department from the years 2007 to 2021
    - **Source**: Course and Professor Evaluation (CAPE) 
    - **Observations**:

| Variable           	                        | Description 	|
|:--------------------	                        |:-------------	|
| Rcmnd Instr           	                        | % of students who reccomend instructor       	|
| Avg Grade Expected              	                        | average grade student expected       	|
| Avg Grade Received               	                        | average grade student recieved            	|


<a id='data_cleaning'></a>
# Data Cleaning

In [14]:
# Read in data for cse department
df_cse_rec = pd.read_csv("jacobs_capes/cse_capes.csv", usecols= ['Instructor','Rcmnd Instr'])
df_cse_grade = pd.read_csv("jacobs_capes/cse_capes.csv", usecols= ['Instructor','Avg Grade Expected', 'Avg Grade Received'])

# Change percent reccomend to decimal
df_cse_rec['Rcmnd Instr'] = df_cse_rec['Rcmnd Instr'].str.rstrip('%').astype('float') / 100.0

# Array with women professors
women_profs = ['Heninger, Nadia A', 'Rosing, Tajana Simunic', 'Chaudhuri, Kamalika', 'Alvarado, Christine J.', 'Minnes Kemp, Mor Mia', 'Polikarpova, Nadezhda', 'Esmaeilzadeh, Hadi', 'Riek, Laurel D', 'Gymrek, Melissa Ann', 'Chaudhuri, Kamalika', 'Nakashole, Ndapandula', 'Zaitsev, Anna L', 'Altintas De Callaf, Ilkay', 'Zhang, Yiying', 'Zhao, Jishen', 'Zhou, Yuan-Yuan']

# Drop null values
df_cse_rec = df_cse_rec.dropna()
df_cse_grade = df_cse_grade.dropna()

# Rename columns for convenience
df_cse_rec = df_cse_rec.rename(columns={"Instructor":"prof", "Rcmnd Instr":"rec_percent", "Gender": "gender"})
df_cse_grade = df_cse_grade.rename(columns={"Instructor":"prof","Avg Grade Expected": "expected", "Avg Grade Received": "received"})

In [15]:
df_cse_rec.head()

Unnamed: 0,prof,rec_percent
0,"Moshiri, Alexander Niema",0.983
1,"Kane, Daniel Mertz",0.946
2,"Jones, Miles E",0.961
3,"Meyer, Kyle Phillip",0.53
4,"Meyer, Kyle Phillip",0.63


In [16]:
df_cse_grade.head()

Unnamed: 0,prof,expected,received
0,"Moshiri, Alexander Niema",A- (3.70),B+ (3.61)
1,"Kane, Daniel Mertz",B+ (3.69),B+ (3.35)
2,"Jones, Miles E",B+ (3.57),B+ (3.45)
3,"Meyer, Kyle Phillip",B+ (3.53),B+ (3.46)
4,"Meyer, Kyle Phillip",B+ (3.61),B+ (3.54)


In [17]:
# Split function to get grade numerical value
def split_func(x) : 
    return(float(x.split('(')[1][:-1]))

In [18]:
# Split expected and received in order to extract grade value as decimal
df_cse_grade['value_e'] = df_cse_grade['expected'].apply(split_func)
df_cse_grade['value_r'] = df_cse_grade['received'].apply(split_func)
df_cse_grade = df_cse_grade.drop(columns=['expected', 'received'])
df_cse_grade = df_cse_grade.rename(columns={'value_e':'expected', 'value_r':'received'})

df_cse_grade

Unnamed: 0,prof,expected,received
0,"Moshiri, Alexander Niema",3.70,3.61
1,"Kane, Daniel Mertz",3.69,3.35
2,"Jones, Miles E",3.57,3.45
3,"Meyer, Kyle Phillip",3.53,3.46
4,"Meyer, Kyle Phillip",3.61,3.54
...,...,...,...
2379,"Kube, Paul Richard",3.41,3.01
2380,"Kube, Paul Richard",3.50,2.98
2381,"Dasgupta, Sanjoy",3.50,3.13
2382,"Jhala, Ranjit",3.13,2.64


In [19]:
# Average out recommend percent for all classes professor has taught
df_cse_rec_avg = df_cse_rec.groupby('prof', as_index=False).mean()

#Average out expected & recieved grade for all classes professor has taught
df_cse_grade = df_cse_grade.groupby('prof', as_index=False).mean()

# Add column for gender for professor in both dataframes
df_cse_rec_avg['gender'] = np.where(df_cse_rec_avg['prof'].isin(women_profs), "W", "M")
df_cse_grade['gender'] = np.where(df_cse_grade['prof'].isin(women_profs), "W", "M")

In [20]:
df_cse_rec_avg.head()

Unnamed: 0,prof,rec_percent,gender
0,"Aksanli, Baris",1.0,M
1,"Allos, Haytham Issa",0.815333,M
2,"Altintas De Callaf, Ilkay",0.794484,W
3,"Alvarado, Christine J.",0.975818,W
4,"Anderson, James W.",0.6,M


In [21]:
df_cse_grade.head()

Unnamed: 0,prof,expected,received,gender
0,"Aksanli, Baris",3.18,2.84,M
1,"Allos, Haytham Issa",3.543333,3.268333,M
2,"Altintas De Callaf, Ilkay",3.511667,3.248333,W
3,"Alvarado, Christine J.",3.531429,3.124524,W
4,"Arsanjani, Ali Paul",3.255,3.055,M


<a id='data_analysis'></a>
# Data Analysis
* stats test of some sort and you know results from that tables and graphs and all that good stuff

<a id='conclusion'></a>
# Conclusion