Case Study in Python
Use the Jupyter notebook to analyze admission_data.csv to find the following values and for the quizzes below. Indexing, query, and groupby may come in handy!

- Proportion and admission rate for each gender
- Proportion and admission rate for physics majors of each gender
- Proportion and admission rate for chemistry majors of each gender
- Admission rate for each major

# Simpson's Paradox
Use `admission_data.csv` for this exercise.

In [1]:
# Load and view first few lines of dataset
import pandas as pd
import numpy as np
data=pd.read_csv('admission_data.csv')
data.head()

Unnamed: 0,student_id,gender,major,admitted
0,35377,female,Chemistry,False
1,56105,male,Physics,True
2,31441,female,Chemistry,False
3,51765,male,Physics,True
4,53714,female,Physics,True


### Proportion and admission rate for each gender

In [2]:
# Proportion of students that are female
val=len(data['gender'])

In [10]:
data['gender'].value_counts()/val*100

female    51.4
male      48.6
Name: gender, dtype: float64

### Proportion and admission rate for each gender

In [16]:
# Proportion of students that are female:
len(data[data['gender']=='female'])/len(data['gender'])

0.514

In [17]:
# Proportion of students that are male:
1-len(data[data['gender']=='female'])/len(data['gender'])

0.486

In [18]:
# Admission rates:
data['admitted'].value_counts()*100/len(data)

False    61.6
True     38.4
Name: admitted, dtype: float64

In [31]:
# Admission rate for female students:
len(data[(data['gender']=='female') & (data['admitted'])])/len(data[data['gender']=='female'])
# or:
df[df['gender'] == 'female']['admitted'].mean() 

0.28793774319066145

In [37]:
# Admission rate for male students:
len(data[(data['gender']=='male') & (data['admitted'])])/len(data[data['gender']=='male'])
# or
data[data['gender']=='male']['admitted'].mean()

0.48559670781893005

### Proportion and admission rate for physics majors of each gender

In [57]:
# What proportion of female students are majoring in physics?
len(data[(data['gender']=='female') & (data['major']=='Physics')])/len(data[data['gender']=='female'])
# or:
data.query('gender=="female" and major=="Physics"').count()[0]/len(data[data['gender'] == 'female'])



0.12062256809338522

In [None]:
len(admits[(admits['gender']=='male') & (admits['admitted'])])/(len(admits[admits['gender']=='male']))

In [62]:
# What proportion of male students are majoring in physics?

len(data[(data['gender']=='male') & (data['major']=='Physics')])/len(data[data['gender']=='male'])
# or:
data.query('gender=="male" and major=="Physics"').count()[0]/len(data[data['gender'] == 'male'])


0.92592592592592593

In [66]:
# Admission rate for female physics majors
# These are the female students who apply in physics field gets admitted into the college:
fem_admission_phy=data.query('gender=="female" and admitted and major=="Physics"').count()[0]
fem_phy=data.query('gender=="female" and major=="Physics"').count()[0]
fem_admission_phy/fem_phy



0.74193548387096775

In [70]:
# Admission rate for male physics majors
mal_admission_phy=data.query('gender=="male" and admitted and major=="Physics"').count()[0]
mal_phy=data.query('gender=="male" and major=="Physics"').count()[0]
mal_admission_phy/fem_phy

3.7419354838709675

### Proportion and admission rate for chemistry majors of each gender

In [68]:
# What proportion of female students are majoring in chemistry?
len(data[(data['gender']=='female') & (data['major']=='Chemistry')])/len(data[data['gender']=='female'])


0.8793774319066148

In [69]:
# What proportion of male students are majoring in chemistry?
len(data[(data['gender']=='male') & (data['major']=='Chemistry')])/len(data[data['gender']=='female'])


0.07003891050583658

In [74]:
# Admission rate for female chemistry majors
fem_admission_phy=data.query('gender=="female" and admitted and major=="Chemistry"').count()[0]
fem_phy=data.query('gender=="female" and major=="Chemistry"').count()[0]
fem_admission_phy/fem_phy

0.22566371681415928

In [73]:
# Admission rate for male chemistry majors
mal_admission_phy=data.query('gender=="male" and admitted and major=="Chemistry"').count()[0]
mal_phy=data.query('gender=="male" and major=="Chemistry"').count()[0]
mal_admission_phy/mal_phy

0.1111111111111111

### Admission rate for each major

In [81]:
# Admission rate for physics majors
len(data[(data['major']=='Physics') & (data['admitted']==True)])/len(data[data['major']=='Physics'])
# or:
data[data['major'] == "Physics"]['admitted'].mean()

0.54296875

In [82]:
# Admission rate for Chemistry majors
len(data[(data['major']=='Chemistry') & (data['admitted']==True)])/len(data[data['major']=='Chemistry'])
# or:
data[data['major'] == "Chemistry"]['admitted'].mean()


0.21721311475409835

Many more females applied to chemistry, which had a lower admissions rate. Therefore, they had an overall lower admission rate. Though, females had higher admission rates conditionally in both physics and chemistry. This is known as Simpson's Paradox.