# Simpson's Paradox

Simpson's paradox, which goes by several names, is a phenomenon in probability and statistics, in which a trend appears in several different groups of data but disappears or reverses when these groups are combined. This result is often encountered in social-science and medical-science statistics and is particularly problematic when frequency data is unduly given causal interpretations. The paradox can be resolved when causal relations are appropriately addressed in the statistical modeling. It is also referred to as Simpson's reversal, Yule–Simpson effect, amalgamation paradox, or reversal paradox

The way you choose to look at the data can lead to totally different results. Often you can influence what people will believe by how you choose to communicate your findings. With a better understanding of Simpson's paradox, you can guess how people intentionally or unintentionally come to a false conclusion with these choices.

It's prudent to be skeptical of your results and the results of others. Moving forward, even when you feel confident about the statistics you use for your analysis, take a moment to reconsider looking at the data and whether you chose wisely.

In [1]:
# Load and view first few lines of dataset
import pandas as pd
admits = pd.read_csv('admission_data.csv')
admits.head()

Unnamed: 0,student_id,gender,major,admitted
0,35377,female,Chemistry,False
1,56105,male,Physics,True
2,31441,female,Chemistry,False
3,51765,male,Physics,True
4,53714,female,Physics,True


### Proportion and admission rate for each gender

In [12]:
# Proportion of students that are female
print(len(admits[admits['gender']=='female']))
print (admits.shape[0])

prop_female = admits.gender.value_counts()[0]/ admits.shape[0]
prop_female

257
500


0.514

In [13]:
# Proportion of students that are male
prop_male = admits.gender.value_counts()[1]/admits.shape[0]
prop_male

0.486

In [15]:
# Admission rate for females
admits[admits['gender']=='female'].admitted.value_counts()[1]/ admits.gender.value_counts()[0]

0.28793774319066145

In [17]:
# Admission rate for males
admits[admits['gender']=='male'].admitted.value_counts()[1]/ admits.gender.value_counts()[1]

0.48559670781893005

### Proportion and admission rate for physics majors of each gender

In [23]:
# What proportion of female students are majoring in physics?
admits[admits['major']=='Physics'].gender.value_counts()[1]/admits.query('major == "Physics"').shape[0]

0.12109375

In [31]:
# What proportion of male students are majoring in physics?
admits.query('gender=="male" and major=="Physics"').count()[0]/len(admits[admits['gender']=='male'])

0.9259259259259259

In [36]:
# Admission rate for female physics majors
fem_adm_phy = admits.query('major=="Physics" and gender=="female" and admitted==True').count()[0]
fem_phy = admits.query('major=="Physics" and gender=="female"').count()[0]

fem_adm_phy/fem_phy


0.7419354838709677

In [38]:
# Admission rate for male physics majors
male_adm_phy = admits.query('major=="Physics" and gender=="male" and admitted==True').count()[0]
male_phy = admits.query('major=="Physics" and gender=="male"').count()[0]

male_adm_phy/male_phy

0.5155555555555555

### Proportion and admission rate for chemistry majors of each gender

In [43]:
# What proportion of female students are majoring in chemistry?
admits.query('gender=="female" and major=="Chemistry"').count()[0]/admits.query('gender=="female"').count()[0]

0.8793774319066148

In [44]:
# What proportion of male students are majoring in chemistry?
admits.query('gender=="male" and major=="Chemistry"').count()[0]/admits.query('gender=="male"').count()[0]

0.07407407407407407

In [47]:
# Admission rate for female chemistry majors
#first get count of all the female in chemistry who got admitted
fem_che_adm = admits.query('gender=="female" and major=="Chemistry" and admitted==True').count()[0]
# Total female who applied
fem_che = admits.query('gender=="female" and major=="Chemistry"').count()[0]

fem_che_adm/fem_che

0.22566371681415928

In [48]:
# Admission rate for male chemistry majors
#first get count of all the male in chemistry who got admitted
male_che_adm = admits.query('gender=="male" and major=="Chemistry" and admitted==True').count()[0]
# Total male who applied
male_che = admits.query('gender=="male" and major=="Chemistry"').count()[0]

male_che_adm/male_che

0.1111111111111111

### Admission rate for each major

In [55]:
# Admission rate for physics majors
admits[admits['major']=='Physics']['admitted'].mean()

0.54296875

In [56]:
# Admission rate for chemistry majors
admits[admits['major']=='Chemistry']['admitted'].mean()

0.21721311475409835

Many more females applied to chemistry, which had a lower admissions rate.  Therefore, they had an overall lower admission rate.  Though, females had higher admission rates conditionally in both physics and chemistry.  This is known as **Simpson's Paradox**.