# List Comprehension Lab

### Introduction

In this lesson, we'll work with [this dataset](https://www.kaggle.com/datasets/rkiattisak/student-performance-in-mathematics/) on student performance in math, reading, and writing at a US high school. 

Let's get started.

### Loading our Data

In [7]:
import pandas as pd

students_df = pd.read_csv('./exams.csv')
students = students_df.to_dict('records')

students[:1]

[{'gender': 'female',
  'race/ethnicity': 'group D',
  'parental level of education': 'some college',
  'lunch': 'standard',
  'test preparation course': 'completed',
  'math score': 59,
  'reading score': 70,
  'writing score': 78}]

Begin by selecting the first student, and displaying the keys.

In [8]:
students[0].keys()

dict_keys(['gender', 'race/ethnicity', 'parental level of education', 'lunch', 'test preparation course', 'math score', 'reading score', 'writing score'])

Ok, so we identify the grain of the data, by seeing that this indicates information *per student*.

### Selecting attributes

Ok, so now let's use list comprehension to explore certain attributes of our data.  For example, let's create a list of math scores.

In [9]:
math_scores = [student['math score'] for student in students]

math_scores[:3]

[59, 96, 57]

And then find the maximum and minimum math score of our students.

In [10]:
min_score = min(math_scores)
min_score

# 15

15

> Well, maybe he had a bad day.

In [12]:
max_score = max(math_scores)
max_score

# 100

100

> That's the spirit.

Ok, let's explore some additional data.  Use list comprehension to select each parent's education.

In [13]:
parent_educations = [student['parental level of education'] for student in students]
parent_educations[:3]

['some college', "associate's degree", 'some college']

And then let's see all of the different kinds of education listed.

In [16]:
print(set(parent_educations))

# {'high school', 'some high school', "master's degree", "bachelor's degree",
# "associate's degree", 'some college'}

{'high school', 'some high school', "master's degree", "bachelor's degree", "associate's degree", 'some college'}


### Filtering with list comprehension

Ok, now let's get a sense of how strongly parental education is associated with student test scores.  

Use list comprehension to select the students whose parents had `high school` or `some high school`.

In [17]:
parental_hs_ed_students = [student for student in students if student['parental level of education'] == 'high school' or student['parental level of education'] == 'some high school']

In [23]:
print(parental_hs_ed_students[:1])

# [{'gender': 'male', 'race/ethnicity': 'group C',
# 'parental level of education': 'some high school',
# 'lunch': 'standard', 'test preparation course': 'none',
# 'math score': 68, 'reading score': 57, 'writing score': 54}]

[{'gender': 'male', 'race/ethnicity': 'group C', 'parental level of education': 'some high school', 'lunch': 'standard', 'test preparation course': 'none', 'math score': 68, 'reading score': 57, 'writing score': 54}]


In [29]:
len(parental_hs_ed_students)

392

And then select those student whose parents either have a bachelors, associates, or master's degree.

In [24]:
parental_col_grad_students = [student for student in students if student['parental level of education'] == 'associates' \
                              or student['parental level of education'] == 'bachelors' or student['parental level of education'] == "master's degree"]

In [27]:
parental_col_grad_students[:1]

# [{'gender': 'male', 'race/ethnicity': 'group B',
#   'parental level of education': "master's degree",
#   'lunch': 'standard', 'test preparation course': 'none', 'math score': 53, 'reading score': 50, 'writing score': 49}]


[{'gender': 'male', 'race/ethnicity': 'group B', 'parental level of education': "master's degree", 'lunch': 'standard', 'test preparation course': 'none', 'math score': 53, 'reading score': 50, 'writing score': 49}]


In [28]:
len(parental_col_grad_students)

# 75

75

Ok, so right off of the bat we can see that only 16 percent of students' parents had a college education.

In [32]:
round(75/(392 + 75), 2)

0.16

Ok, so now we've selected two groups of students.

In [33]:
# parental_col_grad_students

# parental_hs_ed_students

Now select the math scores of the our college grad students.

In [37]:
par_col_grad_math_scores = [student['math score'] for student in parental_col_grad_students]
par_col_grad_math_scores[:3]

[53, 55, 56]

And from there we can find the average score by adding up all of the scores and dividing by their length.

In [38]:
sum(par_col_grad_math_scores)/len(par_col_grad_math_scores)

71.02666666666667

Ok, so an average score of 71.

Now, it's your turn.  Select the math scores from the `parental_hs_ed_students`, and then find the average score.

In [39]:
parental_hs_ed_math_scores = [student['math score'] for student in parental_hs_ed_students]
parental_hs_ed_math_scores[:3]

[68, 46, 80]

In [40]:
sum(parental_hs_ed_math_scores)/len(parental_hs_ed_math_scores)

# 64.84693877551021

64.84693877551021

Ok, so we see a decrease of around 7 points on the math scores. 

### Summary

In this lesson, we practiced working with list comprehensions both with selecting data, and filtering our data with if statements. 