# Simpson's Paradox
Use the Jupyter notebook to analyze `admission_data.csv` to find the following values and for the quizzes below. Indexing, query, and groupby may come in handy!

- Proportion and admission rate for each gender
- Proportion and admission rate for physics majors of each gender
- Proportion and admission rate for chemistry majors of each gender
- Admission rate for each major

In [1]:
import pandas as pd

In [5]:
# Load and view first few lines of dataset
data=pd.read_csv('admission_data.csv')
data.head()

Unnamed: 0,student_id,gender,major,admitted
0,35377,female,Chemistry,False
1,56105,male,Physics,True
2,31441,female,Chemistry,False
3,51765,male,Physics,True
4,53714,female,Physics,True


### Proportion and admission rate for each gender

In [32]:
#Total number of students
len(data)

500

In [26]:
# Number of students that are male and female
data["gender"].value_counts()

female    257
male      243
Name: gender, dtype: int64

In [27]:
# Proportion of students that are male and female
data["gender"].value_counts() / len(data)

female    0.514
male      0.486
Name: gender, dtype: float64

In [24]:
# Admissions for males and females
data.groupby(["gender","admitted"]).size()

gender  admitted
female  False       183
        True         74
male    False       125
        True        118
dtype: int64

In [30]:
# Admission rate for males
len(data[(data['gender']=='male') & (data['admitted'])])/(
    len(data[data['gender']=='male']))

0.48559670781893005

In [29]:
# Admission rate for females
len(data[(data['gender']=='female') & (data['admitted'])])/(
    len(data[data['gender']=='female']))

0.28793774319066145

### Proportion and admission rate for physics majors of each gender

In [38]:
# What proportion of female students are majoring in physics?
fem_physics = (data.query("gender == 'female' & major == 'Physics'").count())/(data.query("gender == 'female'").count())
fem_physics['student_id']

0.12062256809338522

In [40]:
# What proportion of male students are majoring in physics?
male_physics = (data.query("gender == 'male' & major == 'Physics'").count())/(data.query("gender == 'male'").count())
male_physics['student_id']

0.92592592592592593

In [43]:
# Admission rate for female physics majors
len(data[(data["gender"]=='female') & (data["major"] == 'Physics') & data["admitted"]]) / len(data[(data["gender"]=='female') & (data["major"] == 'Physics')])

0.7419354838709677

In [44]:
# Admission rate for male physics majors
len(data[(data["gender"]=='male') & (data["major"] == 'Physics') & data["admitted"]]) / len(data[(data["gender"]=='male') & (data["major"] == 'Physics')])

0.5155555555555555

### Proportion and admission rate for chemistry majors of each gender

In [39]:
# What proportion of female students are majoring in chemistry?
fem_chemistry = (data.query("gender == 'female' & major == 'Chemistry'").count())/(data.query("gender == 'female'").count())
fem_chemistry['student_id']

0.87937743190661477

In [42]:
# What proportion of male students are majoring in chemistry?
male_chemistry = (data.query("gender == 'male' & major == 'Chemistry'").count())/(data.query("gender == 'male'").count())
male_chemistry['student_id']

0.07407407407407407

In [45]:
# Admission rate for female chemistry majors
len(data[(data["gender"]=='female') & (data["major"] == 'Chemistry') & data["admitted"]]) / len(data[(data["gender"]=='female') & (data["major"] == 'Chemistry')])

0.22566371681415928

In [46]:
# Admission rate for male chemistry majors
len(data[(data["gender"]=='male') & (data["major"] == 'Chemistry') & data["admitted"]]) / len(data[(data["gender"]=='male') & (data["major"] == 'Chemistry')])

0.1111111111111111

### Admission rate for each major

In [47]:
# Admission rate for physics majors
len(data[(data['major'] == 'Physics') & data['admitted']]) / len(data[(data['major'] == 'Physics')])

0.54296875

In [48]:
# Admission rate for chemistry majors
len(data[(data['major'] == 'Chemistry') & data['admitted']]) / len(data[(data['major'] == 'Chemistry')])

0.21721311475409835