# Simpson's Paradox
Use `admission_data.csv` for this exercise.

In [1]:
# Load and view first few lines of dataset
import pandas as pd

df = pd.read_csv("admission_data.csv")

print(df.head())

   student_id  gender      major  admitted
0       35377  female  Chemistry     False
1       56105    male    Physics      True
2       31441  female  Chemistry     False
3       51765    male    Physics      True
4       53714  female    Physics      True


### Proportion and admission rate for each gender

In [2]:
# Count of all admissions (all rows)
admission_count = df.shape[0]
print(admission_count)

500


In [3]:
# number of all female students
female_count = df.loc[df.gender == 'female'].shape[0]

# Proportion of students that are female
female_proportion = female_count / admission_count
print(female_proportion)

0.514


In [4]:
# number of all male students
male_count = df.loc[df.gender == 'male'].shape[0]

# Proportion of students that are male
male_proportion = male_count / admission_count
print(male_proportion)

0.486


In [5]:
# total number of admitted females
female_admitted = df.loc[(df.gender == 'female') & (df.admitted == True)].shape[0]

# Admission rate for females (rate of females that have been admitted)
female_admission_rate = female_admitted / female_count
female_admission_rate = round(female_admission_rate, 6)
print(female_admission_rate)

0.287938


In [6]:
# total number of admitted males who are majored in Physics
male_admitted = df.loc[(df.gender == 'male') & (df.admitted == True)].shape[0]

# Admission rate for males (rate of males that have been admitted)
male_admission_rate = male_admitted / male_count
male_admission_rate = round(male_admission_rate, 6)
print(male_admission_rate)

0.485597


### By only looking at gender and admission rates, who appears to be favored in the admissions process?
Males, as males were admitted at a rate of 48.6%, while females were admitted at a rate of 28.8%%

### Proportion and admission rate for physics majors of each gender

In [7]:
# total number of females who are majored in Physics
female_physics_count = df.loc[(df.major == 'Physics') & (df.gender == 'female')].shape[0]

# What proportion of female students are majoring in physics?
female_physics_proportion = female_physics_count / female_count
female_physics_proportion = round(female_physics_proportion, 3)
print(female_physics_proportion)

0.121


In [8]:
# total number of males who majored in Physics
male_physics_count = df.loc[(df.major == 'Physics') & (df.gender == 'male')].shape[0]

# What proportion of male students are majoring in physics?
male_physics_proportion = male_physics_count / male_count
male_physics_proportion = round(male_physics_proportion, 3)
print(male_physics_proportion)

0.926


### Who tends to have more physics majors than chemistry majors?

Correct! 92.6% of males have physics majors!

In [9]:
# total number of females who are majored in Physics and have been admitted
female_physics_admitted = df.loc[(df.major == 'Physics') & 
                                 (df.gender == 'female') & 
                                 (df.admitted == True)].shape[0]

# Admission rate for female physics majors
female_physics_admitted_rate = female_physics_admitted / female_physics_count
female_physics_admitted_rate = round(female_physics_admitted_rate, 3)
print(female_physics_admitted_rate)

0.742


In [10]:
# total number of males who are majored in Physics and have been admitted
male_physics_admitted = df.loc[(df.major == 'Physics') & 
                                 (df.gender == 'male') & 
                                 (df.admitted == True)].shape[0]

# Admission rate for male physics majors
male_physics_admitted_rate = male_physics_admitted / male_physics_count
male_physics_admitted_rate = round(male_physics_admitted_rate, 3)
print(male_physics_admitted_rate)

0.516


### Of the students applying as physics majors, who appears to be favored in the admissions process?

Female physics majors were admitted at a rate of 74.2%, while male physics majors were admitted at a rate of 51.6%.

### Proportion and admission rate for chemistry majors of each gender

In [11]:
# total number of females who are majored in chemistry
female_chemistry_count = df.loc[(df.major == 'Chemistry') & (df.gender == 'female')].shape[0]

# What proportion of female students are majoring in chemistry?
female_chemistry_proportion = female_chemistry_count / female_count
female_chemistry_proportion = round(female_chemistry_proportion, 3)
print(female_chemistry_proportion)

0.879


In [12]:
# total number of males who are majored in chemistry
male_chemistry_count = df.loc[(df.major == 'Chemistry') & (df.gender == 'male')].shape[0]

# What proportion of male students are majoring in chemistry?
male_chemistry_proportion = male_chemistry_count / male_count
male_chemistry_proportion = round(male_chemistry_proportion, 3)
print(male_chemistry_proportion)

0.074


### Who tends to have more chemistry majors than physics majors?
87.9% of females have chemistry majors!

In [13]:
# total number of females who are majored in chemistry and have been admitted
female_chemistry_admitted = df.loc[(df.major == 'Chemistry') & 
                                 (df.gender == 'female') & 
                                 (df.admitted == True)].shape[0]

# Admission rate for female chemistry majors
female_chemistry_admitted_rate = female_chemistry_admitted / female_chemistry_count
female_chemistry_admitted_rate = round(female_chemistry_admitted_rate, 3)
print(female_chemistry_admitted_rate)

0.226


In [14]:
# total number of males who are majored in chemistry and have been admitted
male_chemistry_admitted = df.loc[(df.major == 'Chemistry') & 
                                 (df.gender == 'male') & 
                                 (df.admitted == True)].shape[0]

# Admission rate for male chemistry majors
male_chemistry_admitted_rate = male_chemistry_admitted / male_chemistry_count
male_chemistry_admitted_rate = round(male_chemistry_admitted_rate, 3)
print(male_chemistry_admitted_rate)

0.111


### Of the students applying as chemistry majors, who appears to be favored in the admissions process?
Women were admitted as chemistry majors at a rate of 22.6%, while men were admitted at a rate of 11.1%.

### Admission rate for each major

In [15]:
# Total male and female students who are majored in Physics
physics_count = df.loc[df.major == 'Physics'].shape[0]

# Total male and female students who are majored in Physics AND have been admitted
physics_admitted = df.loc[(df.major == 'Physics') & (df.admitted == True)].shape[0]

# Admission rate for physics majors
physics_admission_rate = physics_admitted / physics_count
physics_admission_rate = round(physics_admission_rate, 3)
print(physics_admission_rate)

0.543


In [16]:
# Total male and female students who are majored in chemistry
chemistry_count = df.loc[df.major == 'Chemistry'].shape[0]

# Total male and female students who are majored in Chemistry AND have been admitted
chemistry_admitted = df.loc[(df.major == 'Chemistry') & (df.admitted == True)].shape[0]

# Admission rate for chemistry majors
chemistry_admission_rate = chemistry_admitted / chemistry_count
chemistry_admission_rate = round(chemistry_admission_rate, 3)
print(chemistry_admission_rate)

0.217


### Which major has a lower admission rate?
Chemistry has an admission rate of 21.7%, while physics has a rate of 54.3%!

## Reflect

Many more females applied to chemistry, which had a lower admissions rate.  Therefore, they had an overall lower admission rate.  Though, females had higher admission rates conditionally in both physics and chemistry.  This is known as **Simpson's Paradox**.

- In general, males were admitted at a rate of 48.6%, while females were admitted at a rate of 28.8%
- However, when looking at the Physics major, it appears that female physics majors were admitted at a rate of 74.2%, while male physics majors were admitted at a rate of 51.6%.
- Also, when looking at the Chemistry major, Women were admitted as chemistry majors at a rate of 22.6%, while men were admitted at a rate of 11.1%.
- TThis is often called the simpson's paradox, where the same data give two opposite conclusions.