# DTSC 580: Data Manipulation

## Assignment: High School Students Merging Practice

### Name: 

## Overview

In this assignment, you will be working with multiple CSV files with the goal to merge the information into a single DataFrame. The data is made up and contains information about four imaginary High Schools.  The files that you will be working with are:

- <u>central.csv</u>: list of students that attend Central High School along with their class scores
- <u>columbia.csv</u>: list of students that attend Columbia High School along with their class scores
- <u>eastside.csv</u>: list of students that attend Eastside High School along with their class scores
- <u>greenwich.csv</u>: list of students that attend Greenwich High School along with their class scores
- <u>school_info.csv</u>: information about the four local schools
- <u>activities.csv</u>: list of students that participate in after school activities
- <u>principal.csv</u>: information about the principals for all the schools in the district, not just the 4 high schools that we're analyzing

## Assignment

Your job is to load and merge the data so that you end up with a final DataFrame that you must call `students_final`. The `students_final` DataFrame:
- must be sorted by `Student_ID`
- The index must be in order `0` through `n - 1`, where `n` is the number of total students in the file. 
- You will create column 6 called `Grade_Average` that is the average of the Math, Science, English, and History scores for each student.
- You will create column 7 called `Letter_Grade` that creates a categorical column for the letter grade earned based on the `Grade_Average` column.  Scores between 0-59.99 earn an `F`, 60-69.99 earn a `D`, 70-79.99 earn a `C`, 80-89.99 earn a `B`, and 90 and above earns an `A`. The categories should be ordered with the unknown category called `None` listed in the beginning of the order as follows:
   - `Index(['None', 'F', 'D', 'C', 'B', 'A'], dtype='object')`
- Any missing values for the entire data set must be filled with the string `None`.
- As an extra check, make sure that no student IDs are duplicated in your final DataFrame as one way to see if you merged the DataFrames correctly.
- Ensure that column names and data types match the below list and are in this exact order.
```
#   Column              Dtype      
 0   Student_ID         int64   
 1   Math               int64   
 2   Science            int64   
 3   English            int64   
 4   History            int64   
 5   Grade_Average      float64 
 6   Letter_Grade       category
 7   Activity           object  
 8   School_Name        object  
 9   Address            object  
 10  Principal_Name     object  
 11  Mascot             object  
 12  Student_Population int64   
```
- Once complete, save your notebook as `students.ipynb` and submit to CodeGrade for grading.


In [81]:
# standard imports
import pandas as pd
import numpy as np

# Do not change this option; This allows the CodeGrade auto grading to function correctly
pd.set_option('display.max_columns', 20)

In [82]:
activities = pd.read_csv('activities.csv')
central = pd.read_csv('central.csv')
columbia = pd.read_csv('columbia.csv')
eastside = pd.read_csv('eastside.csv')
greenwich = pd.read_csv('greenwich.csv')
principal = pd.read_csv('principal.csv')
school_info = pd.read_csv('school_info.csv')

In [83]:
activities.head()

Unnamed: 0,ID,Activity
0,222949,Basketball
1,340051,Drama
2,100570,Basketball
3,118245,Soccer
4,128108,Volleyball


In [84]:
central.head()

Unnamed: 0,Student_ID,School_Name,Math,Science,English,History
0,145581,Central,70,74,87,63
1,321209,Central,70,62,70,84
2,221982,Central,62,61,79,63
3,204249,Central,89,65,73,67
4,319950,Central,61,99,86,86


In [85]:
columbia.head()

Unnamed: 0,Student_ID,School_Name,Math,Science,English,History
0,116664,Columbia,83,70,92,69
1,149124,Columbia,99,99,72,66
2,385707,Columbia,97,71,90,91
3,215575,Columbia,80,95,76,96
4,103408,Columbia,81,77,96,71


In [86]:
eastside.head()

Unnamed: 0,Student_ID,School_Name,Math,Science,English,History
0,158370,Eastside,98,81,66,60
1,228394,Eastside,67,73,92,89
2,173159,Eastside,98,78,65,63
3,153737,Eastside,78,78,77,73
4,214630,Eastside,83,77,96,72


In [87]:
greenwich.head()

Unnamed: 0,Student_ID,School_Name,Math,Science,English,History
0,113113,Greenwich,88,75,68,73
1,244195,Greenwich,74,66,84,68
2,217027,Greenwich,80,88,60,83
3,385930,Greenwich,82,61,69,84
4,115623,Greenwich,95,89,60,94


In [88]:
principal.head()

Unnamed: 0,School,School_Address,Principal_Name
0,Westside,23 Westside Road,Brian Clancy
1,Central,100 Central High Lane,Ray Smith
2,Clinton,5678 Clinton Hwy,Sally Smith
3,Bright Hill,957 Central Blvd,Maggie Hughe
4,Rogers,1 High School Lane,Sam Brown


In [89]:
school_info.head()

Unnamed: 0,School,Address,Mascot,Student_Population
0,Central,100 Central High Lane,Eagle,300
1,Eastside,9755 Hwy 60,Raptors,1000
2,Columbia,19 East Avenue,Tigers,700
3,Greenwich,1 Greenwich Blvd,Bears,1200


In [90]:
school = school_info.merge(principal).drop(columns='School_Address')
school

Unnamed: 0,School,Address,Mascot,Student_Population,Principal_Name
0,Central,100 Central High Lane,Eagle,300,Ray Smith
1,Eastside,9755 Hwy 60,Raptors,1000,Dwayne Anderson
2,Columbia,19 East Avenue,Tigers,700,Patricia Rogers
3,Greenwich,1 Greenwich Blvd,Bears,1200,Shannon Baker


In [91]:
students = pd.concat([central,columbia,eastside,greenwich])
students

Unnamed: 0,Student_ID,School_Name,Math,Science,English,History
0,145581,Central,70,74,87,63
1,321209,Central,70,62,70,84
2,221982,Central,62,61,79,63
3,204249,Central,89,65,73,67
4,319950,Central,61,99,86,86
...,...,...,...,...,...,...
1022,213951,Greenwich,98,65,60,89
1023,205324,Greenwich,60,85,76,67
1024,209950,Greenwich,76,66,68,99
1025,364186,Greenwich,91,98,68,84


In [92]:
students = students.merge(school, left_on='School_Name', right_on='School').drop(columns='School')
students

Unnamed: 0,Student_ID,School_Name,Math,Science,English,History,Address,Mascot,Student_Population,Principal_Name
0,145581,Central,70,74,87,63,100 Central High Lane,Eagle,300,Ray Smith
1,321209,Central,70,62,70,84,100 Central High Lane,Eagle,300,Ray Smith
2,221982,Central,62,61,79,63,100 Central High Lane,Eagle,300,Ray Smith
3,204249,Central,89,65,73,67,100 Central High Lane,Eagle,300,Ray Smith
4,319950,Central,61,99,86,86,100 Central High Lane,Eagle,300,Ray Smith
...,...,...,...,...,...,...,...,...,...,...
4072,213951,Greenwich,98,65,60,89,1 Greenwich Blvd,Bears,1200,Shannon Baker
4073,205324,Greenwich,60,85,76,67,1 Greenwich Blvd,Bears,1200,Shannon Baker
4074,209950,Greenwich,76,66,68,99,1 Greenwich Blvd,Bears,1200,Shannon Baker
4075,364186,Greenwich,91,98,68,84,1 Greenwich Blvd,Bears,1200,Shannon Baker


In [93]:
students = students.merge(activities, how='left', left_on='Student_ID', right_on='ID').drop('ID',axis=1)
students

Unnamed: 0,Student_ID,School_Name,Math,Science,English,History,Address,Mascot,Student_Population,Principal_Name,Activity
0,145581,Central,70,74,87,63,100 Central High Lane,Eagle,300,Ray Smith,
1,321209,Central,70,62,70,84,100 Central High Lane,Eagle,300,Ray Smith,
2,221982,Central,62,61,79,63,100 Central High Lane,Eagle,300,Ray Smith,
3,204249,Central,89,65,73,67,100 Central High Lane,Eagle,300,Ray Smith,
4,319950,Central,61,99,86,86,100 Central High Lane,Eagle,300,Ray Smith,Cheer
...,...,...,...,...,...,...,...,...,...,...,...
4072,213951,Greenwich,98,65,60,89,1 Greenwich Blvd,Bears,1200,Shannon Baker,
4073,205324,Greenwich,60,85,76,67,1 Greenwich Blvd,Bears,1200,Shannon Baker,
4074,209950,Greenwich,76,66,68,99,1 Greenwich Blvd,Bears,1200,Shannon Baker,Volleyball
4075,364186,Greenwich,91,98,68,84,1 Greenwich Blvd,Bears,1200,Shannon Baker,


In [94]:
students = students.sort_values(by='Student_ID').reset_index(drop=True)
students

Unnamed: 0,Student_ID,School_Name,Math,Science,English,History,Address,Mascot,Student_Population,Principal_Name,Activity
0,100089,Central,91,96,88,62,100 Central High Lane,Eagle,300,Ray Smith,Other_Club
1,100213,Eastside,85,72,70,76,9755 Hwy 60,Raptors,1000,Dwayne Anderson,
2,100300,Eastside,65,99,77,76,9755 Hwy 60,Raptors,1000,Dwayne Anderson,Football
3,100355,Greenwich,83,75,64,99,1 Greenwich Blvd,Bears,1200,Shannon Baker,
4,100359,Greenwich,61,77,73,83,1 Greenwich Blvd,Bears,1200,Shannon Baker,
...,...,...,...,...,...,...,...,...,...,...,...
4072,399853,Columbia,93,69,97,63,19 East Avenue,Tigers,700,Patricia Rogers,
4073,399872,Columbia,83,95,76,81,19 East Avenue,Tigers,700,Patricia Rogers,Baseball
4074,399905,Columbia,88,96,66,98,19 East Avenue,Tigers,700,Patricia Rogers,
4075,399915,Central,76,97,96,61,100 Central High Lane,Eagle,300,Ray Smith,


In [95]:
students['Grade_Average'] = (students['Math'] + students['Science'] + students['English'] + students['History'])/4
students

Unnamed: 0,Student_ID,School_Name,Math,Science,English,History,Address,Mascot,Student_Population,Principal_Name,Activity,Grade_Average
0,100089,Central,91,96,88,62,100 Central High Lane,Eagle,300,Ray Smith,Other_Club,84.25
1,100213,Eastside,85,72,70,76,9755 Hwy 60,Raptors,1000,Dwayne Anderson,,75.75
2,100300,Eastside,65,99,77,76,9755 Hwy 60,Raptors,1000,Dwayne Anderson,Football,79.25
3,100355,Greenwich,83,75,64,99,1 Greenwich Blvd,Bears,1200,Shannon Baker,,80.25
4,100359,Greenwich,61,77,73,83,1 Greenwich Blvd,Bears,1200,Shannon Baker,,73.50
...,...,...,...,...,...,...,...,...,...,...,...,...
4072,399853,Columbia,93,69,97,63,19 East Avenue,Tigers,700,Patricia Rogers,,80.50
4073,399872,Columbia,83,95,76,81,19 East Avenue,Tigers,700,Patricia Rogers,Baseball,83.75
4074,399905,Columbia,88,96,66,98,19 East Avenue,Tigers,700,Patricia Rogers,,87.00
4075,399915,Central,76,97,96,61,100 Central High Lane,Eagle,300,Ray Smith,,82.50


In [96]:
students['Letter_Grade'] = pd.cut(students['Grade_Average'], bins=[0,59.99,69.99,79.99,89.99,students['Grade_Average'].max()], labels=['F','D','C','B','A'])
students['Letter_Grade'] = students['Letter_Grade'].cat.add_categories('None').cat.reorder_categories(['None','F','D','C','B','A'])
students = students.fillna('None')

In [97]:
students['Student_ID'].duplicated().sum()

0

In [100]:
students_final = students[['Student_ID','Math','Science','English','History','Grade_Average','Letter_Grade','Activity','School_Name','Address','Principal_Name','Mascot','Student_Population']]