# Spring 2021 CS 3654 Final Project
**Group 15**
- Dominic Berry (dberry101010)
- Mohamed Naji (mohamedn)
- Zaid Al Nouman (zaida)
- Daniel Schoenbach (danielschoenbach)

## Introduction
Virginia Tech is nearly 150 years old.
Certainly, its academics have changed over time, evolving with economic demands
and equipping students with the necessary skill sets to impact their communities.

Where are Virginia Tech academics headed now?
Are we becoming more STEM focused?
Or are the programs becoming more balanced?
What role do job market demands play in these shifts in student enrollment and department staffing?
And finally, what other interesting trends can we find in VT academics?

To answer these questions, we will explore:
- How has enrollment in particular classes changed over time?
- Do professors assign higher (or lower) grades than they used to?
- Is there a correlation between VT academics and the job market?
- Is Virginia Tech becoming more diverse?

We will use multiple datasets from the [Virginia Tech University DataCommons](https://udc.vt.edu/),
which provides information on courses, undergraduate enrollment, and time to graduation from 2001 to 2020.

Additionally, we'll use portions of the [First Destination Report](https://career.vt.edu/about/postgrad-survey/report.html),
a survey of recent Virginia Tech graduates.

In [None]:
# Initialize the notebook

# Import libraries for future use
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Set default font size for graphs
plt.rcParams.update({'font.size': 14})

# Efficiently removes commas from a Series of formatted numbers
# in preparation for conversion to a numerical data type
# Ex: Series(["1,500"]) -> Series(["1500"])
def strip_commas(series: pd.Series) -> pd.Series:
    return series.str.translate(str.maketrans({',': None}))

## Data

### Virginia Tech University DataCommons
Our primary data source is the [University DataCommons](https://udc.vt.edu/about) web application.
This web application contains years of information about students, classes, and faculty at VT.

#### Grade Distributions
This dataset contains over 150,000 records of class sessions at Virginia Tech.
Fortunately, it was easily exportable from the UDC and requires minimal cleaning.
We filter out class numbers 5000 and over because we are focused on undergrads.
From this data, we hope to explore trends in subject enrollment and grades.
Which subject areas are rising in demand? Which ones are falling?
Have grades become inflated? Do instructors' average grades change over time?


In [None]:
grade_data = pd.read_csv('data/vt_udc_grades_fall2001_spring2021.zip')
grade_data = grade_data[grade_data["Course No."] < 5000]
grade_data.reset_index(drop=True, inplace=True)
grade_data

#### Undergrad Enrollment

This dataset from the UDC contains student enrollment numbers for each department and year.
Although the data requires little cleaning, we still need to convert formatted numbers (e.g. "1,500") to integers.
After that, we filter out departments with fewer than 100 enrollments to remove defunct, administrative,
and graduate-level organizations.
Doing so leaves us with numbers for 65 different departments to study trends in enrollment.

In [None]:
undergrad_enrollment = pd.read_csv('data/vt_udc_fall_undergrad_enrollment_2011_2020.csv')
undergrad_enrollment.index = undergrad_enrollment["Departments"]
undergrad_enrollment.drop("Departments", axis=1, inplace=True)
undergrad_enrollment = undergrad_enrollment.apply(lambda x: strip_commas(x).astype(int))
undergrad_enrollment = undergrad_enrollment[undergrad_enrollment.sum(axis=1) >= 100]
undergrad_enrollment

#### Freshmen Time to Degree

This dataset from the UDC contains the number of academic years for entering freshmen to graduate.
Like the other UDC datasets, it needs little cleaning.

In [None]:
freshmen_time_to_degree = pd.read_csv('data/vt_udc_freshman_timetodegree_fall2010_spring2021.csv')
freshmen_time_to_degree.index = freshmen_time_to_degree["Departments"]
freshmen_time_to_degree.drop("Departments", axis=1, inplace=True)
freshmen_time_to_degree

### First Destination After Undergraduate Degree

Virginia Tech Career and Professional Development conducts a survey
of new graduates, known as the [First Destination Report](https://career.vt.edu/about/postgrad-survey/report.html).

#### Salary

This dataset, compiled from several years of reports, contains new graduate employment rates and salaries.
The data was scraped using the accompanying `Salary_Scraping.ipynb` notebook.
We combine the yearly reports into a single dataset, converting percentages to decimals as we do.

In [None]:
# Loads all the salary data into a single dataframe
salary_data = pd.DataFrame(columns=['College/Major'])
for i in range(2014, 2020):
    current = pd.read_csv(f'data/s_{i}-{i+1}.csv')
    current['College/Major'] = current['College/Major'].str.replace('&', 'and')
    current.index = current['College/Major']
    current = current.loc[:, ['Median']]
    # Fill na with column means
    current['Median'].fillna(np.mean(current['Median']), inplace = True)
    salary_data = pd.merge(salary_data, current, on='College/Major', how='outer')

salary_data.columns = ['College/Major', '1', '2','3','4','5','6']

salary_data

## Analysis

### How have GPAs changed over time?
#### University GPAs by semester

To answer this, we collect each term's classes and make a weighted average based on enrollment.

In [None]:
avg_gpa = grade_data[grade_data['Term'].isin(['Spring', 'Fall'])] \
    .groupby(['Academic Year', 'Term'], as_index=False) \
    .apply(lambda x: (x['GPA'] * x['Enrollment']).sum() / x['Enrollment'].sum())
avg_gpa.rename(columns={None:'GPA'}, inplace=True)
spring_gpa = avg_gpa[avg_gpa['Term'] == 'Spring']
fall_gpa = avg_gpa[avg_gpa['Term'] == 'Fall']
plt.figure(figsize=(15,10))
plt.xlabel("Academic Year")
plt.ylabel("GPA")
plt.title("University GPA from 2001-2021")
plt.xticks(rotation = 90)
plt.plot(spring_gpa['Academic Year'],spring_gpa['GPA'],marker='o',label='Spring')
plt.plot(fall_gpa['Academic Year'],fall_gpa['GPA'],marker='o',label='Fall')
plt.legend()

In [None]:
print("GPA during 2001-02: ",round(avg_gpa['GPA'].iloc[0:2].mean(),3))
print("GPA during 2020-21: ",round(avg_gpa['GPA'].iloc[-1],3))
print("Increase: ", round(avg_gpa['GPA'].iloc[-1:].mean() - avg_gpa['GPA'].iloc[0:2].mean(), 3))
print("Max: ", round(avg_gpa['GPA'].max(), 3), "in", *avg_gpa.loc[avg_gpa['GPA'].idxmax(), ['Term', 'Academic Year']])
print("Min: ", round(avg_gpa['GPA'].min(), 3), "in", *avg_gpa.loc[avg_gpa['GPA'].idxmin(), ['Term', 'Academic Year']])

As shown in the plot, the average GPA has increased over the years by .38 points,
beginning at 2.98 in 2001-02 and ending at 3.36 in 2020-21.
This is a very clear trend that the average assigned grade has risen over time.

Further, three semesters stand out as having elevated GPAs: Spring 2007, Spring 2020, and Fall 2020.
We suspect these GPA bumps are a result of instructors granting extra leniency
during the notable events corresponding with these terms:
- Spring 2007 was the Virginia Tech shooting
- Spring 2020 was the start of the COVID-19 pandemic and was the transition from in person to online classes
- Fall 2020 was held largely online during the COVID-19 pandemic

#### Subject GPAs

To keep the number of subjects manageable, we will only consider subjects
with at least 10,000 enrollments across our dataset.
Each subject is assigned a color based off its initial position
to illustrate changes over time.

In [None]:
subject_enrollment = grade_data.groupby('Subject')['Enrollment'].sum()
subject_gpas = grade_data[grade_data['Subject'].isin(subject_enrollment.index[subject_enrollment >= 10000])] \
    .groupby(['Subject','Academic Year'], as_index=False) \
    .apply(lambda x: (x['GPA'] * x['Enrollment']).sum() / x['Enrollment'].sum())
subject_gpas.rename(columns={None:'GPA'}, inplace=True)

subjects = subject_gpas[subject_gpas['Academic Year']=="2001-02"].sort_values('GPA')['Subject']
color_map = plt.get_cmap('viridis')
subject_color_map = {}

def plot_subject_gpas(year):
    plt.figure(figsize=(20,10))
    plt.xlabel("Subject", size=14)
    plt.ylabel("Average GPA", size=14)
    plt.title(f"Subject GPA ({year}-{year+1})")
    plt.xticks(rotation=90)
    plt.ylim(2.5,4)
    df = subject_gpas[subject_gpas['Academic Year']==f"{year}-{str(year+1)[-2:]}"]
    df = df.sort_values("GPA")
    subject_color_map.update({subj:color_map(idx/(len(df))) for idx, subj in enumerate(df['Subject']) if subj not in subject_color_map})
    plt.bar(df['Subject'],df['GPA'],color=[subject_color_map[x] for x in df['Subject']])

plot_subject_gpas(2001)
plot_subject_gpas(2005)
plot_subject_gpas(2010)
plot_subject_gpas(2015)
plot_subject_gpas(2020)

From these graphs we can see a few interesting trends:
1. The overall increase in GPAs we saw earlier can also be seen here.
1. Most subject GPAs shift around, drifting higher and lower year to year.
1. However, a few core subjects stay fixed. For example, MATH consistently has some of the lowest GPAs.

### Which classes are hardest?

One straightforward measure is how many students, on average, fail a particular class.
We'll focus on classes that had at least 100 students enrolled over the past 20 years,
filtering out experimental, special study, and other unusual classes.

In [None]:
#Creating a dataframe of subjects with over 100 people enrolled in the past 20 years
enrol_over_100 = grade_data.groupby('Course Title', as_index=False).agg({"Enrollment": "sum"})
enrol_over_100 = enrol_over_100[enrol_over_100['Enrollment']>100]

#Dropping Subjects that did not have more than 100 people enrolled 
fail = grade_data.groupby(['Course Title'])['F (%)'].mean().reset_index()
fail = fail.merge(enrol_over_100, on=['Course Title'], how='inner')

#Finding the 20 highest fail percentages 
fail = fail.sort_values(['F (%)'])
fail = fail.tail(15)

#plotting the subjects with the highest fail percentages
plt.figure(figsize=(20,10))
plt.title("Classes with Highest Fail Percentage")
plt.xlabel("Class")
plt.ylabel("Fail Percentage")
plt.xticks(rotation = 90)
plt.bar(fail['Course Title'], fail['F (%)'])

In [None]:
max_fail = max(fail['F (%)'])
max_class= fail[fail['F (%)']==max_fail]['Course Title'].values[0]

print("The hardest class at Virginia Tech is: ", max_class)
print("The fail percentage is: ", max_fail)

As shown by the bar graph above, most of the harder classes offered at Virginia Tech are STEM related.
Out of subjects that have had more than 100 people enrolled in the past 20 years,
the hardest classes at Virginia Tech based on Fail Percentage is Rock Mech & Grnd Cntl
which has a 14.21% failure rate.

### Do students choose majors based on money?

Using the salary information from the First Destination Report,
we can check for a correlation between program enrollment and salary after graduation.

In [None]:
average_enrollment = pd.DataFrame({
    'College/Major': undergrad_enrollment.index,
    'Average Enrollment': undergrad_enrollment.loc[:, "2014-15":"2019-20"].mean(axis=1)
})

#find mean salary from 2014-2020
col = salary_data.loc[: , "1":"6"]
salary_data['Average Salary'] = col.mean(axis=1)
average_salary = salary_data.filter(['College/Major', "Average Salary"])
#merge dataframes based on instersecting majors
salary_vs_enrollment = average_enrollment.merge(average_salary, how = 'inner', on = ['College/Major'])

#create scatter plot
plt.figure(figsize=(15,10))
plt.scatter(salary_vs_enrollment['Average Salary'],salary_vs_enrollment['Average Enrollment'],s=(salary_vs_enrollment["Average Salary"]/100)+5, alpha=.4)
plt.xlabel("Average Salary")
plt.ylabel("Average Enrollment")
plt.title("Average Income vs Enrollment in Major (2014-20)")

In [None]:
#find correlation coefficient
xbar = np.mean(salary_vs_enrollment['Average Salary'])
ybar = np.mean(salary_vs_enrollment['Average Enrollment'])
xi = salary_vs_enrollment['Average Salary']
yi = salary_vs_enrollment['Average Enrollment']
r = np.sum((xi - xbar)*(yi-ybar))/(np.sqrt(np.sum(np.square(xi-xbar)))*np.sqrt(np.sum(np.square(yi-ybar))))
print("The correlation coefficient is: ", r)

Based on the scatter plot and very small correlation coefficient, no conclusion can be drawn on the correlation between average salary and enrollment within a major. However, there is some uncertainty within these results due to a lack of data. Only  majors that overlapped in both the salary data and enrollment data were used, of which there were only thirty. The average of salary and enrollments were used for those not reported.

### Does time spent teaching at Tech affect assigned grades?

First, we'll try associating courses with how many years the instructor has been teaching at Tech.

In [None]:
# Group by year, make a set of the Teachers' Names, then do a dictionary of teachers by year
teachersByYear = {}
for i in range(len(list(grade_data.groupby("Academic Year")["Instructor"]))):
    setOfTeachers = set()
    for teacher in list(grade_data.groupby("Academic Year")["Instructor"])[i][1]:
        setOfTeachers.add(teacher)
    teachersByYear[list(grade_data.groupby("Academic Year")["Instructor"])[i][0]] = setOfTeachers

In [None]:
# Now we take that and add a column for the year they started teaching
modified_grade_data = grade_data.copy().iloc[:, 5:14]

# Calculate how many years each teacher has taught
teachers = [x for x in teachersByYear.values()]
teacherYears = {}
for i in range(len(teachers)):
    for teacher in teachers[i]:
        if teacher in teacherYears:
            teacherYears[teacher] += 1
        else:
            teacherYears[teacher] = 1

# Ok, add column...
modified_grade_data["Total Years Taught"] = None
for i in range(len(modified_grade_data["Instructor"])):
    modified_grade_data["Total Years Taught"].iat[i] = teacherYears[modified_grade_data["Instructor"][i]]

In [None]:
plt.figure(figsize=(15,7))
plt.scatter(modified_grade_data["Total Years Taught"], modified_grade_data["GPA"], alpha=0.005)
plt.title('Course Grades by Instructors\' Years Teaching')
plt.xlabel('Years Teaching at Tech')
plt.ylabel('Course GPA')

There does not seem to be much correlation.
This suggests, at least, that teachers are not being selected based on a history
of consistently assigning higher or lower grades.
If that were the case, we would see an upper or lower bias on the right side of the graph.

Our first analysis considered each teacher's early classes equivalent to their later ones.
Of course, at the time of each class, the teacher had not been at Tech for their present duration.

We now associate course grades with the number of years
the instructor had been teaching at Tech, *at the time of the course*.

In [None]:
# What about getting the years taught until this point?
# Current Year - Start Year?
# Does not account for the teachers who leave for a little bit and come back
# but we will consider that to be negligable

def firstYear(t):
    for year in range(len(teachersByYear)):
        if t in list(teachersByYear.values())[year]:
            return year + 2001
firstYear("Duncan")

modified_grade_data["Started Teaching"] = None
for i in range(len(modified_grade_data["Instructor"])):
    modified_grade_data["Started Teaching"].iat[i] = firstYear(modified_grade_data["Instructor"][i])

modified_grade_data["Academic Year"] = grade_data["Academic Year"].apply(lambda x: x[:-3])
modified_grade_data["Academic Year"] = modified_grade_data["Academic Year"].apply(lambda x: int(x))
modified_grade_data["Started Teaching"] = modified_grade_data["Started Teaching"].apply(lambda x: int(x))

modified_grade_data["Taught So Far"] = modified_grade_data["Academic Year"] - modified_grade_data["Started Teaching"]

plt.figure(figsize=(15,7))
plt.scatter(modified_grade_data["Taught So Far"], modified_grade_data["GPA"], alpha=0.005)
plt.title('Course Grades by Instructors\' Years Teaching (at time of course)')
plt.xlabel('Years at Tech at Time of Course')
plt.ylabel('Course GPA')

In [None]:
print("Correlation:", modified_grade_data["Taught So Far"].corr(modified_grade_data["GPA"]))

On the whole, the graph still appears to be uncorrelated; but, we can clearly see a trend in the
minimum GPAs based on how long a teacher has been at the school.
Over time, instructors seem to assign fewer lower grades than they did at the start of their career.

### Enrollments and Recessions

#### How do recessions affect enrollment?
Recessions lead to decreased job growth, and typically job loss, as the labour market's demand for certain professions decrease during economic decline. Let's take a look at how enrollment is impacted by periods of recession, specifically the beginning of the COVID-19 pandemic, to see how a recession might affect Virginia Tech academics. 

In [None]:
# enrollments and recessions
recent = undergrad_enrollment.iloc[:, :-1]

year = [int(y[:-3]) for y in recent.columns]
fit = np.polyfit(year, recent.sum().values, 2)
equation = np.poly1d(fit)
prediction = fit[0] * 2020 ** 2  + fit[1] * 2020 + fit[2]

year.append(2020)

xmesh = np.linspace(min(year), max(year), 100)

#plt.plot(year, recent.sum().values, 'bo', label = "data", c="red")
plt.plot(xmesh, equation(xmesh), '-b', label='Without 2020')
plt.plot(2020, prediction, 'o', label = "Predicted", c="blue")

# We can predict enrollment for 2020 based on previous years
prediction = fit[0] * 2020 ** 2  + fit[1] * 2020 + fit[2]

year2 = [int(y[:-3]) for y in undergrad_enrollment.columns]
fit = np.polyfit(year2, undergrad_enrollment.sum().values, 2)
equation = np.poly1d(fit)

xmesh = np.linspace(min(year2), max(year2), 100)

plt.plot(xmesh, equation(xmesh), '-', label='With 2020', c="green")
plt.plot(year2, undergrad_enrollment.sum().values, 'o', label = "Actual", c="red")

plt.legend(fontsize=10)
plt.xlabel('Year', fontsize=10)
plt.ylabel('Undergrad Enrollment', fontsize=10)
plt.show()

As we can see, enrollment is certainly on the rise, but enrollment in 2020
was significantly less than expected, though still greater than the previous year. 

#### Is GPA affected by recessions?
Next, let's take a look at how recessions might affect student grades, if at all. The hypothesis is that during a recession, student's may not perform as well in their classes due to the increased stress a recession may cause, as well as possibly working to help support themselves. 

In [None]:
# GPA and recessions
y = grade_data.groupby("Academic Year").mean()["GPA"]
x = grade_data.groupby("Academic Year").mean().index
x = [int(year[:-3]) for year in x]
fit = np.polyfit(x, y, 2)
equation = np.poly1d(fit)
xmesh = np.linspace(min(x), max(x), 100)

plt.plot(x, y, 'o', label = "data", c="red")
plt.plot(xmesh, equation(xmesh), '-b', label='fit')

plt.plot([2008, 2020], [y["2008-09"], y["2020-21"]], "o", c="black", label="Years of Recession")

plt.legend(fontsize=10)
plt.xlabel('Year', fontsize=10)
plt.ylabel('Average GPA', fontsize=10)
plt.show()

#2006 and 2019 though...?;''

Here it seems that GPAs remained the same through 2008 and 2020. This disproves the hypothesis that students' performance may be impacted by a recession. However, there are multiple factors to consider. For example, the recession in 2020 caused by the pandemic caused courses to switch to an online format. This might have resulted in easier tasks and assessments, as well as increased the number of cheating incidents. It's possible this would offset any decrease in student grade data due to a recession, though more data would be needed to come to any conclusion.

This plot also reinforces the previous plot displaying how students' grades on average are rising. 

### How has course enrollment changed?
Let's take a look at which courses have become more popular to gain insight on whether or not Virginia Tech is becoming more or less STEM focused.

In [None]:
# How has course specific enrollment changed?

group = ["Academic Year", "Course Title"]
grade_data.columns
course_enrollment = grade_data.drop(labels=["Term", "Subject", "Course No.", "Instructor", "GPA", "A (%)",
                        "B (%)", "C (%)", "D (%)", "F (%)", "CRN", "Credits"], axis=1)

# This data is interesting because it shows which courses gained the most enrollments from 2001 to 2019. 
course_enrollment_over_time = course_enrollment.groupby(group).sum()
(course_enrollment_over_time.loc["2019-20"] - course_enrollment_over_time.loc["2001-02"]).sort_values(by="Enrollment", ascending=False).head(20)

We can see from this table that the majority of these courses are STEM focused, with many being hard sciences such as physics and chemistry. From this, we can come to the conclusion that the rise in popularity of these STEM courses is due to the increased enrollment of STEM majors, supporting the argument that Virginia Tech is becoming more STEM focused. The only notable exception is Design Appreciation gaining popularity due to it's asynchronous online availability to a large number of students even before the pandemic. 

### Is Virginia Tech becoming more diverse in terms of gender?
As we've seen, Virginia Tech enrollment is rising every year, and we've seen above that courses in STEM have had their enrollments grow the most since 2001. As more STEM majors enroll at Virginia Tech, diversity becomes an increasingly larger issue, as the field has historically been dominated by white males. Let's take a look into Virginia Tech's efforts to increase diversity to combat this issue.

In [None]:
# How has enrollment by gender changed? Data is from "https://udc.vt.edu/irdata/data/students/enrollment/index",
# with filters for academic year, undergraduates only, for the start of the fall term, male vs female
enrollment_by_gender = pd.DataFrame({
    "Male": [13788, 13957, 14091, 14069, 14577, 14740, 15428, 15779, 16683, 17037],
    "Female": [9903, 9894, 9933, 10159, 10793, 11024, 11710, 11966, 12534, 12902]
}, index=["2011-12", "2012-13", "2013-14", "2014-15", "2015-16", "2016-17", "2017-18", "2018-19", "2019-20", "2020-21"])

# Could someone plot these as two lines on the same line plot to show their growth with respect to each other?
male_enrollment_growth = enrollment_by_gender["Male"].iloc[9] - enrollment_by_gender["Male"].iloc[0]
male_growth_percentage = ((enrollment_by_gender["Male"].iloc[9] / enrollment_by_gender["Male"].iloc[0]) - 1) * 100
female_enrollment_growth = enrollment_by_gender["Female"].iloc[9] - enrollment_by_gender["Female"].iloc[0]
female_growth_percentage = ((enrollment_by_gender["Female"].iloc[9] / enrollment_by_gender["Female"].iloc[0]) - 1) * 100
print(f"Percent growth in enrollments of males between 2011 to 2020: {round(male_growth_percentage, 2)}")
print(f"Growth in enrollments of males between 2011 to 2020: {male_enrollment_growth}")
print(f"Percent growth in enrollments of females between 2011 to 2020: {round(female_growth_percentage, 2)}")
print(f"Growth in enrollments of females between 2011 to 2020: {female_enrollment_growth}")
enrollment_by_gender

We can see that, though the percentage growth of female enrollments is greater than the percentage growth of male enrollments, enrollments of men greatly outnumbers enrollments of women. 

### Is Virginia Tech becoming more diverse in terms of race / ethnicity?

Now let's take a look at Virginia Tech's enrollments based race / ethnicity.

In [None]:
# How has enrollment by gender changed? Data is from "https://udc.vt.edu/irdata/data/students/enrollment/index",
# with filters for academic year, undergraduates only, for the start of the fall term, by Race / Ethnicity

enrollment_by_ethnicity = pd.DataFrame({
    "American Indian or Alaska Native": [57, 48, 38, 37, 37, 43, 39, 34, 36, 36],
    "Asian": [1921, 2000, 2030, 2225, 2367, 2573, 2670, 2740, 2997, 3325],
    "Black or African American": [879, 824, 834, 873, 952, 1022, 1087, 1173, 1244, 1458],
    "Hispanics of any race": [1035, 1146, 1226, 1282, 1389, 1533, 1635, 1792, 1985, 2318],
    "Native Hawaiian or Other Pacific Islander": [15, 23, 31, 32, 26, 32, 32, 31, 30, 35],
    "White": [17741, 17445, 17214, 16872, 17167, 17137, 17925, 18119, 18856, 18834],
    "Two or more races": [612, 844, 966, 1041, 1131, 1156, 1211, 1337, 1418, 1516],
    "Not Reported": [824, 723, 706, 764, 814, 755, 797, 705, 726, 759],
    "Nonresident Alien": [616, 806, 989, 1121, 1501, 1540, 1797, 1880, 2008, 1739]
}, index=["2011-12", "2012-13", "2013-14", "2014-15", "2015-16", "2016-17", "2017-18", "2018-19", "2019-20", "2020-21"])

# Could someone plot these as 9 lines on the same line plot? Possibly with overall university enrollment
# as another line to compare the rate of increase compared to the university's?
enrollment_by_ethnicity

Based off the following trends, we can see that though enrollment of white students hasn't changed much within the last decade, the enrollment of Hispanics of any race has more than doubled, and the enrollment of Black or African American and Asian students has also increased dramatically. 

## Conclusion

Overall, we can see that overall undergraduate GPAs have increased at a consistent rate over time, and that as professors continue teaching, they assign fewer low grades. We also saw some spikes in undergraduate GPAs that can be attributed to the 2007 Virginia Tech shooting or the COVID-19 pandemic. Though overall GPAs increased, some GPAs by subject remained consistent relative to each other, such as MATH consistently being a subject with the lowest grades given for the last 20 years. We couldn't draw a conclusion to determine if students chose their major based on average salary after graduation, though we were able to observe an increase of enrollments within STEM related intro courses such as Chemistry, Physics, and Biology. Finally, we saw that Virginia Tech has made an effort to increase diversity, and has doubled enrollment of Hispanics of any race, and increased enrollment of Asians and Black or African American students by a large amount.

There's definitely been a slight shift towards STEM in the world and at Virginia Tech, and as its student body continues to grow, so will the number of skilled STEM professionals entering the world.

## Credits

#### Part 1 Contributions
- Dominic Berry assisted with guidance and data cleaning.
- Mohamed Naji scraped and compiled the salary data.
- Zaid Al Nouman helped guide the research questions and did data cleaning.
- Daniel Schoenbach drafted additional document pieces and did data cleaning.

#### Part 2 Contributions
- Dominic Berry worked on teacher and recession analysis.
- Mohamed Naji worked on GPA and salary analysis.
- Zaid Al Nouman scraped graduate plans and worked on enrollment questions.
- Daniel Schoenbach edited and compiled the pieces.

#### Part 3 Contributions
- Dominic Berry and Mohamed Naji fleshed out the analysis.
- Zaid Al Nouman and Daniel Schoenbach edited and polished the final report. 