# MSDS 631 - Lecture 7 (March 6, 2019)

## Pandas Aggregations and Analytical Methods and Combining Data

### Aggregations

A great deal of analyzing raw data is trying to summarize it for further analysis. So far, we've been writing for-loops and storing data into dictionaries to then run other analyses (think percentage of students on probation). To do this, you defined the attribute you wanted to "group by" (majors, in this case). Pandas allows you to do this automatically and perform certain functions on all of the data associated with each particular value.

If we wanted to use base Python to find the average GPA amongst students in each major, we would do the following:

In [2]:
#Open data
import json
with open('students.json', 'r') as f:
    students_list_of_dicts = json.load(f)

#Create an empty list for each major so we can add the students' GPAs
major_gpas = {}
possible_majors = set([i['major'] for i in students_list_of_dicts])
for major in possible_majors:
    major_gpas[major] = []

#Get all of the students GPAs for their major
for student in students_list_of_dicts:
    student_major = student['major']
    major_gpas[student_major].append(student['gpa'])

#Compute the average
average_gpas = {}
for major in major_gpas:
    avg_gpa = sum(major_gpas[major]) / len(major_gpas[major])
    rounded_gpa = round(avg_gpa, 3)
    average_gpas[major] = rounded_gpa
average_gpas

{'Chemistry': 3.359,
 'Economics': 3.488,
 'Engineering': 3.106,
 'Finance': 3.615,
 'Math': 3.3,
 'Physics': 3.296}

That's **three** separate for-loops with two separate dictionaries that we had to use in order to move data into their appropriate locations so that we could make computations. That's a lot! Imagine what we'd have to do if we wanted to add gender, or worse yet, gender AND class.

With Pandas aggregations we can tell the DataFrame what we want to do with a LOT less code.

Let's start by loading the data into a DataFrame.

In [22]:
import pandas as pd
#Loading from a csv
#Could have also used original json data to create our DataFrame but without any specific column order
students_df = pd.read_csv('students.csv')
students_df.head()

Unnamed: 0,student_id,first,last,gender,class,major,gpa
0,5a397209-3782-4764-a285-10fae807ee71,Janis,Brown,Female,Junior,Economics,3.12
1,e26c3d69-3c74-49b6-81d7-47232787fad9,Timothy,Bishop,Male,Sophomore,Economics,3.48
2,975c1581-5ba2-430c-a3d1-01ce03bd83f9,Elizabeth,Owens,Female,Freshman,Finance,3.4
3,6081f91d-365c-46ce-ad1b-38af120781d9,Edward,Pearson,Male,Freshman,Math,3.84
4,84cec8f4-0b64-44ce-a628-c0eb73f6ca6f,Lisa,Gonzalez,Female,Junior,Finance,4.0


In a DataFrame we access rows via either the `.loc` or `.iloc` methods.
- `.loc` uses the "name" of the index
- `.iloc` uses the positional index of the DataFrame or Series

The default index for DataFrames and Series in Pandas are integer indices so either `.loc` or `.iloc` would work.

In [23]:
#Access the first student's student ID
student_1 = students_df.loc[0]
student_1['student_id']

'5a397209-3782-4764-a285-10fae807ee71'

#### Grouping Data

One of the great things about DataFrames is the ability to "group" data by as many columns' values as you wish. If you think about the way we've been doing things so far, consider each group to be a key in a series of nested dictionaries. In the example below, we would have created a dictionary with the keys being each of the possible majors from our list of students. If we group by a second attribute then we'd have nested dictionaries with the outer keys being the majors and the inner keys being another attribute (e.g. gender).

The `.groupby` method creates a new object from which we can perform several operations. If possible, you can perform the operation on the entire DataFrame, but the data types must align with what you are requesting. Below we are seeking the arithmetic mean, so none of the text columns would work. In this case, we are specifically asking for the mean of the GPA (Note that we are putting `'gpa'` in double brackets so that we can get the data returned as a DataFrame. If we had put single brackets, we would only be able to perform the operation on a single column and we would yield a Series instead of a DataFrame.

In [24]:
#Now let's compute the mean GPA by major
gpa_by_major = students_df.groupby('major')[['gpa']].mean()
gpa_by_major

Unnamed: 0_level_0,gpa
major,Unnamed: 1_level_1
Chemistry,3.359151
Economics,3.487805
Engineering,3.106283
Finance,3.614967
Math,3.300452
Physics,3.295663


In [25]:
#Groupby results come with the values of the attribute as the new index (in no particular order).
#You can access individual data points using the .loc method
gpa_by_major.loc['Economics', 'gpa']

3.4878053725291438

In [26]:
#You can also access the data via the iloc method.
gpa_by_major.iloc[1]

gpa    3.487805
Name: Economics, dtype: float64

In [27]:
#Now let's compute the mean GPA by major AND gender
gpa_by_major_gender = students_df.groupby(['major','gender'])[['gpa']].mean()
gpa_by_major_gender

Unnamed: 0_level_0,Unnamed: 1_level_0,gpa
major,gender,Unnamed: 2_level_1
Chemistry,Female,3.395772
Chemistry,Male,3.292617
Economics,Female,3.495707
Economics,Male,3.462748
Engineering,Female,2.995098
Engineering,Male,3.143491
Finance,Female,3.654529
Finance,Male,3.568535
Math,Female,3.268426
Math,Male,3.315657


There are several ways of accessing data that have multi-level indices. You can chain together multiple .loc methods as we're doing below.

In [28]:
#"Chaining" together multiple .loc methods
gpa_by_major_gender.loc['Finance'].loc['Female', 'gpa']

3.6545289256198346

We can also access multi-level indices with a single .loc command. First, in order to do this, we need to understand how the data is being stored. Let's look at the actual values of the index.

In [29]:
gpa_by_major_gender.index.tolist()

[('Chemistry', 'Female'),
 ('Chemistry', 'Male'),
 ('Economics', 'Female'),
 ('Economics', 'Male'),
 ('Engineering', 'Female'),
 ('Engineering', 'Male'),
 ('Finance', 'Female'),
 ('Finance', 'Male'),
 ('Math', 'Female'),
 ('Math', 'Male'),
 ('Physics', 'Female'),
 ('Physics', 'Male')]

From the above code, the indices are actually tuples of the two values. If we now pass the .loc method a single tuple with the values we want, then we don't have to chain together our data accessing methods.

In [30]:
#Pass the .loc method a tuple of two values
gpa_by_major_gender.loc[('Chemistry', 'Female'), 'gpa']

3.3957716049382709

In [31]:
#Now let's compute the mean GPA by major, class, and gender
gpa_by_major_class_gender = students_df.groupby(['major', 'class', 'gender'])[['gpa']].mean()
gpa_by_major_class_gender.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,gpa
major,class,gender,Unnamed: 3_level_1
Chemistry,Freshman,Female,3.38749
Chemistry,Freshman,Male,3.2868
Chemistry,Junior,Female,3.407347
Chemistry,Junior,Male,3.327231
Chemistry,Senior,Female,3.401786
Chemistry,Senior,Male,3.29728
Chemistry,Sophomore,Female,3.387423
Chemistry,Sophomore,Male,3.264516
Economics,Freshman,Female,3.489215
Economics,Freshman,Male,3.503711


In [32]:
#What is the first index?
gpa_by_major_class_gender.index.tolist()[0]

('Chemistry', 'Freshman', 'Female')

In [33]:
#Access the first index via a single tuple
gpa_by_major_class_gender.loc[('Physics', 'Senior', 'Female'), 'gpa']

3.2343055555555558

In [34]:
#Access the second index via iloc; Note iloc returns a Series that you access with brackets afterwards
gpa_by_major_class_gender.iloc[1]['gpa']

3.2868000000000013

In [35]:
gpa_by_major_class_gender.loc['Physics'].loc['Senior'].loc['Female', 'gpa']

3.2343055555555558

I've shown a lot about accessing data, but ultimately, the point of doing analytics is to perform analyses on large sets of data with multiple attributes and multiple values, all at once. This allows you to perform better comparisons rather than obtain individual specific answers (as you may be accustomed to). When we get into merges and plotting in the next class you will see how to effectively use all of these capabilities. In the meantime, it's enough to simply understand how aggregations work.

#### Groupby using variables

While it may not seem useful yet, instead of passing the literal column name into a groupby, we can use a variable name instead. This allows us to run loops or to create functiosn that allows u to choose which variable to group by. Below is an example of looking at multiple analyses at once.

In [44]:
#Getting minimum GPAs by several columns and assigning the results into a dictionary.
mins = {}
stuff = ['major', 'gender', 'class']
for i in stuff:
    mins[i] = students_df.groupby(i)[['gpa']].min()
for i in stuff:
    print(mins[i])

              gpa
major            
Chemistry    2.41
Economics    2.39
Engineering  1.72
Finance      2.31
Math         2.23
Physics      2.17
         gpa
gender      
Female  1.72
Male    1.97
            gpa
class          
Freshman   2.19
Junior     2.10
Senior     1.72
Sophomore  2.13


#### More groupby methods

There are many types of computations you can do with aggregations (too many to list here). The most common methods you will call include:
- .mean()
- .max()
- .min()
- .median()
- .size()
 - Counts how many times you see the value of the attribute(s) you are grouping by
- .count()
 - Counts how many non-null values you have in a column
- .rank()
 - Ranks a particular value within a group
 
Let's use the methods above to understand what it's doing

In [36]:
#Max GPA by major
max_gpa_by_major = students_df.groupby('major')[['gpa']].max()
max_gpa_by_major

Unnamed: 0_level_0,gpa
major,Unnamed: 1_level_1
Chemistry,4.0
Economics,4.0
Engineering,4.0
Finance,4.0
Math,4.0
Physics,4.0


In [40]:
#Median GPA by major
students_df.groupby('major')[['gpa']].median()

Unnamed: 0_level_0,gpa
major,Unnamed: 1_level_1
Chemistry,3.36
Economics,3.51
Engineering,3.11
Finance,3.65
Math,3.31
Physics,3.3


In [183]:
#How many students are in each major
num_students_by_major = students_df.groupby('major').size()
num_students_by_major

major
Chemistry      1507
Economics      1973
Engineering    2034
Finance        2241
Math            730
Physics        1515
dtype: int64

In [190]:
#How many non-null values are there for each column grouped by major
students_df.groupby('major').count()

Unnamed: 0_level_0,student_id,first,last,gender,class,gpa
major,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Chemistry,1507,1507,1507,1507,1507,1507
Economics,1973,1973,1973,1972,1973,1973
Engineering,2034,2034,2034,2034,2034,2034
Finance,2241,2240,2241,2241,2241,2241
Math,730,730,730,730,730,730
Physics,1515,1515,1515,1515,1515,1515


In [210]:
#Copute the rank of the students' GPAs, by major
#Ties are assigned the "best" rank
gpa_ranks = students_df.groupby('major')['gpa'].rank(method='min', ascending=False)

In [211]:
gpa_ranks.head()

student_id
5a397209-3782-4764-a285-10fae807ee71    1675.0
e26c3d69-3c74-49b6-81d7-47232787fad9    1026.0
975c1581-5ba2-430c-a3d1-01ce03bd83f9    1657.0
6081f91d-365c-46ce-ad1b-38af120781d9      32.0
84cec8f4-0b64-44ce-a628-c0eb73f6ca6f       1.0
Name: gpa, dtype: float64

In [212]:
#We can assign a new column the same way we add data to a dictionary. 
#In this case we are adding a new column (would be key in a dictionary) and assigning it a value of the gpa_ranks Series
students_df['gpa_rank'] = gpa_ranks.astype(int) #Ranks returned as floats, so we converted them to integers using .astype()
students_df.head()

Unnamed: 0_level_0,first,last,gender,class,major,gpa,gpa_rank
student_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
5a397209-3782-4764-a285-10fae807ee71,Janis,Brown,Female,Junior,Economics,3.12,1675
e26c3d69-3c74-49b6-81d7-47232787fad9,Timothy,Bishop,Male,Sophomore,Economics,3.48,1026
975c1581-5ba2-430c-a3d1-01ce03bd83f9,Elizabeth,Owens,Female,Freshman,Finance,3.4,1657
6081f91d-365c-46ce-ad1b-38af120781d9,Edward,Pearson,Male,Freshman,Math,3.84,32
84cec8f4-0b64-44ce-a628-c0eb73f6ca6f,Lisa,Gonzalez,Female,Junior,Finance,4.0,1


In [221]:
students_df = students_df.sort_values(['major', 'gpa_rank'])

In [222]:
students_df.head(20)

Unnamed: 0_level_0,first,last,gender,class,major,gpa,gpa_rank
student_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
c52f24b1-34fc-48ca-a4fb-627a4cda0365,Cinthia,Gonzalez,Female,Freshman,Chemistry,4.0,1
baa2901d-a51c-4a3f-b7a5-8d21b8ef1962,Donna,Thompson,Female,Freshman,Chemistry,4.0,1
13a5fabf-787b-4f40-8609-08366ebae350,Roger,Daise,Male,Freshman,Chemistry,4.0,1
3063e328-9e0d-4ac7-b576-af47b6c453fa,Margie,Levin,Female,Freshman,Chemistry,4.0,1
693cd228-ac6b-4d38-8a0c-eea00454ad5f,Marion,Martinez,Female,Freshman,Chemistry,4.0,1
e9934d1a-d1aa-42ba-8633-8bb46c7dd8c1,Emily,Capozzoli,Female,Freshman,Chemistry,4.0,1
6a54ada8-b1fc-481d-aad4-db3aaf5f0d8a,Jana,Zumwalt,Female,Freshman,Chemistry,4.0,1
c66bca9c-8f6e-4a20-8463-01f0caba2a35,Maxine,Baker,Female,Freshman,Chemistry,4.0,1
c5b5f9de-f746-4399-90ba-5cb91519e344,Emilee,Alexander,Female,Freshman,Chemistry,4.0,1
b9aa2367-d178-418c-8c34-72e532d8cb88,Raul,Getz,Male,Freshman,Chemistry,3.99,47


### Modifying DataFrames

Sometimes you need to modify a DataFrame to help you with your analyses. This might entail reordering columns, renaming columns, or setting the index (there are MANY more things to modify, but these are a good start).

Keep in mind that these modifications can't just be "done" to a DataFrame. The changes must be saved into a variable - though it can be saved into the exact same variable that we started with.

Let's start by setting the index.

#### Setting the Index

Indices are important because they make it easy for us to access data we want. In our *`student_directory`* dictionary from previous lectures, we were able to access the information of a specific student by passing the dictionary the student ID (rather than having to search in a list until we found a particular student). Using the .loc method allows us to do that. We can set the index using the code in the cell below.

In [46]:
students_df = students_df.set_index('student_id')
students_df.head()

Unnamed: 0_level_0,first,last,gender,class,major,gpa
student_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
5a397209-3782-4764-a285-10fae807ee71,Janis,Brown,Female,Junior,Economics,3.12
e26c3d69-3c74-49b6-81d7-47232787fad9,Timothy,Bishop,Male,Sophomore,Economics,3.48
975c1581-5ba2-430c-a3d1-01ce03bd83f9,Elizabeth,Owens,Female,Freshman,Finance,3.4
6081f91d-365c-46ce-ad1b-38af120781d9,Edward,Pearson,Male,Freshman,Math,3.84
84cec8f4-0b64-44ce-a628-c0eb73f6ca6f,Lisa,Gonzalez,Female,Junior,Finance,4.0


You should be able to see that the previous index of integers is now gone. We can now access a single student's information like this:

In [47]:
students_df.loc['84cec8f4-0b64-44ce-a628-c0eb73f6ca6f']

first         Lisa
last      Gonzalez
gender      Female
class       Junior
major      Finance
gpa              4
Name: 84cec8f4-0b64-44ce-a628-c0eb73f6ca6f, dtype: object

It is critical to understand that once you set the index like this, the column `student_id` no longer exists. If I try to get the student IDs like I used to, I will get an error. I can still access the IDs, but now I have to access it via the index as follows:

In [51]:
students_df.index

Index(['5a397209-3782-4764-a285-10fae807ee71',
       'e26c3d69-3c74-49b6-81d7-47232787fad9',
       '975c1581-5ba2-430c-a3d1-01ce03bd83f9',
       '6081f91d-365c-46ce-ad1b-38af120781d9',
       '84cec8f4-0b64-44ce-a628-c0eb73f6ca6f',
       '6c849c3e-e640-4bba-a86a-4323fd513b90',
       'a5c87c39-447c-4c29-92af-fa702a8d5595',
       'f6b177e8-e00a-480e-b62e-906c2ad80f85',
       '8387594f-c9b2-4daa-ae93-c3e40f58cb26',
       '156aefe7-73b4-4777-929c-7aa9c0cd35c5',
       ...
       '3927810e-476d-45db-b88c-660e00385ae5',
       '6b8c4f7a-546d-41d5-978f-d461c287fba1',
       'c32e6f19-817d-4e60-bf20-ed7ca573a7f7',
       '32606dc2-862b-45cc-b0ac-f2b24253abdf',
       '8dc612f4-8150-4045-9e2d-cf160fb71da4',
       '3f1f6525-3ec0-4184-b435-c829419bf582',
       'bc551659-ba48-447e-aa6a-0c2f49aaa9c1',
       '4884e643-4a94-4362-a422-604763401487',
       '034754f5-50dd-42e5-a916-cc6c9d9d0131',
       '75c02f31-566f-439e-875e-5af9fe412977'],
      dtype='object', name='student_id', length=

If we want to re-establish the integers as the index and move the student_id data back to a column, we can simply reset the index as we do below:

In [52]:
students_df = students_df.reset_index()
students_df.head()

Unnamed: 0,student_id,first,last,gender,class,major,gpa
0,5a397209-3782-4764-a285-10fae807ee71,Janis,Brown,Female,Junior,Economics,3.12
1,e26c3d69-3c74-49b6-81d7-47232787fad9,Timothy,Bishop,Male,Sophomore,Economics,3.48
2,975c1581-5ba2-430c-a3d1-01ce03bd83f9,Elizabeth,Owens,Female,Freshman,Finance,3.4
3,6081f91d-365c-46ce-ad1b-38af120781d9,Edward,Pearson,Male,Freshman,Math,3.84
4,84cec8f4-0b64-44ce-a628-c0eb73f6ca6f,Lisa,Gonzalez,Female,Junior,Finance,4.0


#### Renaming Columns

You will often run into a situation where you need to rename columns. Perhaps you're doing an aggregation and you want a better description for the resulting computation. Other times you're going to do a merge of two DataFrames (covered next week) and have repeated names in the two tables. You're going to need/want to change names. There are two ways to change columns in a DataFrame. The number of columns you need to change will determine which way makes the most sense.

Imagine we want to format our column names of *`students_df` so they look prettier for presenting. We'd want to capitalize the names and replace underscores with spaces. Here we could just totally replace the attribute of the DataFrame object directly.

In [54]:
students_df.columns = ['Student ID', 'First', 'Last', 'Gender', 'Class', 'Major', 'GPA']
students_df.head()

Unnamed: 0,Student ID,First,Last,Gender,Class,Major,GPA
0,5a397209-3782-4764-a285-10fae807ee71,Janis,Brown,Female,Junior,Economics,3.12
1,e26c3d69-3c74-49b6-81d7-47232787fad9,Timothy,Bishop,Male,Sophomore,Economics,3.48
2,975c1581-5ba2-430c-a3d1-01ce03bd83f9,Elizabeth,Owens,Female,Freshman,Finance,3.4
3,6081f91d-365c-46ce-ad1b-38af120781d9,Edward,Pearson,Male,Freshman,Math,3.84
4,84cec8f4-0b64-44ce-a628-c0eb73f6ca6f,Lisa,Gonzalez,Female,Junior,Finance,4.0


We can also rename specific columns.

Let's say we think that "First" and "Last" aren't descriptive enough and we want to add "Name" to the end of those columns. We could replicate the code from above and rename all of the columns, but that's somewhat repetitive, and when your DataFrames get bigger, this will be infeasible.

The code for renaming specific columns is quite simple. We are essentially creating a mapping of "previous name" to "desired name."

In [55]:
students_df = students_df.rename(columns={'First': 'First Name', 'Last': 'Last Name'})
students_df.head()

Unnamed: 0,Student ID,First Name,Last Name,Gender,Class,Major,GPA
0,5a397209-3782-4764-a285-10fae807ee71,Janis,Brown,Female,Junior,Economics,3.12
1,e26c3d69-3c74-49b6-81d7-47232787fad9,Timothy,Bishop,Male,Sophomore,Economics,3.48
2,975c1581-5ba2-430c-a3d1-01ce03bd83f9,Elizabeth,Owens,Female,Freshman,Finance,3.4
3,6081f91d-365c-46ce-ad1b-38af120781d9,Edward,Pearson,Male,Freshman,Math,3.84
4,84cec8f4-0b64-44ce-a628-c0eb73f6ca6f,Lisa,Gonzalez,Female,Junior,Finance,4.0


Lastly, if we want to re-order the columns, we just need to access the data in way we want to save it.

If you recall, we can get a subset of a DataFrame by passing it a list of columns. For example, if we just wanted the names for each ID we could do this:

In [56]:
names = students_df[['Student ID', 'First Name', 'Last Name']]
names.head()

Unnamed: 0,Student ID,First Name,Last Name
0,5a397209-3782-4764-a285-10fae807ee71,Janis,Brown
1,e26c3d69-3c74-49b6-81d7-47232787fad9,Timothy,Bishop
2,975c1581-5ba2-430c-a3d1-01ce03bd83f9,Elizabeth,Owens
3,6081f91d-365c-46ce-ad1b-38af120781d9,Edward,Pearson
4,84cec8f4-0b64-44ce-a628-c0eb73f6ca6f,Lisa,Gonzalez


In [57]:
#We could also have done this:
name_cols = ['Student ID', 'First Name', 'Last Name']
names = students_df[name_cols]
names.head()

Unnamed: 0,Student ID,First Name,Last Name
0,5a397209-3782-4764-a285-10fae807ee71,Janis,Brown
1,e26c3d69-3c74-49b6-81d7-47232787fad9,Timothy,Bishop
2,975c1581-5ba2-430c-a3d1-01ce03bd83f9,Elizabeth,Owens
3,6081f91d-365c-46ce-ad1b-38af120781d9,Edward,Pearson
4,84cec8f4-0b64-44ce-a628-c0eb73f6ca6f,Lisa,Gonzalez


Using the same principle, we can simply create the order we want and pass it to our DataFrame.

In [58]:
new_order = ['Student ID', 'Major', 'Class', 'GPA', 'Gender', 'First Name', 'Last Name']
reordered_df = students_df[new_order]
reordered_df.head()

Unnamed: 0,Student ID,Major,Class,GPA,Gender,First Name,Last Name
0,5a397209-3782-4764-a285-10fae807ee71,Economics,Junior,3.12,Female,Janis,Brown
1,e26c3d69-3c74-49b6-81d7-47232787fad9,Economics,Sophomore,3.48,Male,Timothy,Bishop
2,975c1581-5ba2-430c-a3d1-01ce03bd83f9,Finance,Freshman,3.4,Female,Elizabeth,Owens
3,6081f91d-365c-46ce-ad1b-38af120781d9,Math,Freshman,3.84,Male,Edward,Pearson
4,84cec8f4-0b64-44ce-a628-c0eb73f6ca6f,Finance,Junior,4.0,Female,Lisa,Gonzalez


#### Inserting Columns

Lastly, just to remind you, addding new columns is done exactly like in dictionaries. We simply need to add a new column header (in the same way we add a new key) and provide the data (though remember that the data inserted must be the same length as the DataFrame).

In [59]:
students_df['On Probation'] = students_df['GPA'] < 3.0
students_df.head()

Unnamed: 0,Student ID,First Name,Last Name,Gender,Class,Major,GPA,On Probation
0,5a397209-3782-4764-a285-10fae807ee71,Janis,Brown,Female,Junior,Economics,3.12,False
1,e26c3d69-3c74-49b6-81d7-47232787fad9,Timothy,Bishop,Male,Sophomore,Economics,3.48,False
2,975c1581-5ba2-430c-a3d1-01ce03bd83f9,Elizabeth,Owens,Female,Freshman,Finance,3.4,False
3,6081f91d-365c-46ce-ad1b-38af120781d9,Edward,Pearson,Male,Freshman,Math,3.84,False
4,84cec8f4-0b64-44ce-a628-c0eb73f6ca6f,Lisa,Gonzalez,Female,Junior,Finance,4.0,False
