# Data Analysis (part 5)

## Which are the "best" comm classes?
* One way to analyze this is looking at semester-by-semester data on the best courses, and then coming up with a list of overall winners across semesters

In [1]:
%matplotlib inline

import json
import pandas as pd
import matplotlib.pyplot as plt
from pandas.io.json import json_normalize
import numpy as np
import matplotlib.pyplot as plt
import requests
import warnings
warnings.filterwarnings('ignore')
import seaborn as sns
import plotly_express as px
import math
import plotly

In [2]:
# loading the cleaned data back in 
comm_revs_df2 = pd.read_csv('../data/clean_data/comm_revs_df2')

In [3]:
#making sure it's treating the level column as a string and not a number to add up
comm_revs_df2['level'] = comm_revs_df2['level'].astype(int).astype(str)

In [4]:
# in each semester what was the "best" comm class along multiple dimensions

#creating a df to store the best courses 
best = pd.DataFrame() 

#including a column for the semester values
best['term'] = comm_revs_df2['Semester'].unique() 

#establishing which variables to include
variables = ['CourseQuality', 'InstructorQuality', 'AmountLearned', 'Difficulty', 'StimulateInterest', 'WorkRequired']

#for loop: 
for var in variables:
    best[var] = (comm_revs_df2.loc[comm_revs_df2.groupby('Semester')[var].idxmax(), :].reset_index()['level']).values

#reporting results
best


#note: 'best' is highest along the variables, including most difficult and highest workload


Unnamed: 0,term,CourseQuality,InstructorQuality,AmountLearned,Difficulty,StimulateInterest,WorkRequired
0,2002A,395,398,398.0,322,322,496
1,2003A,226,398,226.0,395,262,299
2,2003B,395,323,395.0,275,395,275
3,2002C,225,225,225.0,125,225,225
4,2003C,395,395,395.0,322,395,395
5,2004C,395,397,395.0,322,262,413
6,2004A,226,226,226.0,398,226,454
7,2006A,360,326,360.0,316,326,413
8,2006C,340,130,125.0,430,130,430
9,2006B,376,376,376.0,376,376,454


#### This dataframe is useful for a variety of reasons
* We will aggregate to get all-time winners
* Students like to take "famous" classes or ones that are particularly memorable and highly-regarded; this analysis makes that data readily available (e.g. in spring 2019 the highest amount learned was in COMM-322, and best course quality was COMM-395

In [5]:
#what about in just the past few years:
best.loc[38:]

Unnamed: 0,term,CourseQuality,InstructorQuality,AmountLearned,Difficulty,StimulateInterest,WorkRequired
38,2016C,428,428,395.0,226,428,398
39,2017C,395,395,217.0,290,217,290
40,2017A,494,310,322.0,322,310,494
41,2018A,395,395,395.0,130,395,313
42,2018B,262,218,,491,218,491
43,2018C,384,310,226.0,226,561,494
44,2019A,395,353,322.0,322,353,322
45,2019B,290,290,,339,130,339


In [6]:
# which courses regularly are at the top of the lists

#making a dataframe for the all-stars
all_stars = pd.DataFrame()

#for loop:
for var in variables:
    all_stars[var] = [best[var].value_counts().idxmax()]

#flipping for user-friendly viewing
all_stars = all_stars.transpose()

#naming the winners column
all_stars.columns = ['all-star']

all_stars

Unnamed: 0,all-star
CourseQuality,395
InstructorQuality,395
AmountLearned,395
Difficulty,226
StimulateInterest,262
WorkRequired,495


#### COMM-395 (Communication and the Presidency with Prof. David Eisenhower) is the one most frequently getting top marks for course quality, instructor quality, and amount learned.
* In general, students looking to take 'legendary' COMM classes could use the all_stars chart