# Community Data Science - Mobility Project
University of Illinois Urbana-Champaign

**Objective**
* Work with data from local companies to help improve transportation services provided to the local community 

**Scope**
* Bus Delays-MTD Ridership/Schedule data
* Optimal Allocation-Buses/Bikes
* Capacity Planning and Safety - F&S

**Community Impact**
* Safer, Efficient, coordinated

**Student Impact**
* Data Curation, Analysis, Story Telling

**Data**
* MTD - Ridership, Schedule,boarding-lighting
* F&S - Building types, Bike capacity/Census, Pavement condition index,  
* Veoride- Ride start/stop/duration/gps
* Tech Services- Wireless activity data in buildings
* Open- Building-Class-schedule-grades

### Collaborators
* Varshini Ramanathan
* Jose Luis Rodriguez
* Vishal Sachdev
* Jinran Shi
* Jasneet Thukral
* Yanbing Yi

### Packages 

In [2]:
import pandas as pd
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
import re
import time

**Here we create a dataset based on University of Illinois' GPA Dataset**

https://github.com/wadefagen/datasets/tree/master/gpa

In [2]:
gpa.head()

Unnamed: 0,Year,Term,YearTerm,Subject,Number,Course Title,A+,A,A-,B+,...,B-,C+,C,C-,D+,D,D-,F,W,Primary Instructor
0,2018,Fall,2018-fa,AAS,100,Intro Asian American Studies,2,8,8,7,...,2,1,0,0,0,0,0,0,0,"Thomas, Merin A"
1,2018,Fall,2018-fa,AAS,100,Intro Asian American Studies,0,11,7,4,...,2,0,0,0,0,0,0,2,0,"Lee, Sang S"
2,2018,Fall,2018-fa,AAS,100,Intro Asian American Studies,1,6,7,4,...,2,0,0,0,0,0,0,2,0,"Kang, Yoonjung"
3,2018,Fall,2018-fa,AAS,100,Intro Asian American Studies,3,5,8,8,...,1,0,1,1,1,0,0,0,0,"Thomas, Merin A"
4,2018,Fall,2018-fa,AAS,100,Intro Asian American Studies,0,6,12,2,...,3,0,0,0,0,0,0,1,0,"Lee, Sang S"


## Data Preparation

* Frist lets create another dataset based on gpa data to identify courses codes and sections
* We can group by year,term and subject to identify unique courses by number

In [None]:
# Function to create a url based on the gpa dataset
def create_url(website, year, term, course, section):
    url = website + "/" + year + "/" + term + "/" + course + "/" + section
    return {"url": url, "year": year, "term": term, "course": course, "section": section}

# Read gpa dataset
gpa = pd.read_csv("data/uiuc-gpa.csv")

# Group by year,term and subject by unique number
gpa_group = gpa.groupby(["Year" , "Term" , "Subject"]).Number.unique()

grp_courses = []
for k in gpa_group.keys():
    year = k[0]
    term = k[1]
    course = k[2]
    for i in gpa_group[k]:
        grp_courses.append(create_url(website, str(year), term, course, str(i)))

df = pd.DataFrame.from_dict(grp_courses)
#df.to_csv("data/courses-details.csv", index=None)

In [7]:
courses = pd.read_csv("data/courses-details.csv")
courses.head()

Unnamed: 0,course,section,term,url,year
0,AAS,100,Fall,https://courses.illinois.edu/schedule/2010/Fal...,2010
1,AAS,120,Fall,https://courses.illinois.edu/schedule/2010/Fal...,2010
2,AAS,215,Fall,https://courses.illinois.edu/schedule/2010/Fal...,2010
3,ABE,100,Fall,https://courses.illinois.edu/schedule/2010/Fal...,2010
4,ABE,221,Fall,https://courses.illinois.edu/schedule/2010/Fal...,2010


### Data Preparation - Web Scraping 
Here we use the courses-details dataset to scrape courses time and other information from the University of Illinois Course explorer website - https://courses.illinois.edu/schedule
* This take some time so the code is not included on this notebook

In [3]:
courses = pd.read_csv("data/uiuc-courses.csv")
courses.head()

Unnamed: 0,crn,course_subject,course_number,course_section,course_type,course_day,time_start,time_end,location,course_term,course_year,instructor,course_tile,course_info,course_schedule
0,41758,AAS,100,AD1,Discussion/ Recitation,F,10:00:00,10:50:00,429 Armory,Fall,2010,"Rana, JWinkelmann, M",intro_asian_american_studies,This course satisfies the General Education Cr...,https://courses.illinois.edu/schedule/2010/Fal...
1,47100,AAS,100,AD2,Discussion/ Recitation,F,11:00:00,11:50:00,431 Armory,Fall,2010,"Rana, JWinkelmann, M",intro_asian_american_studies,This course satisfies the General Education Cr...,https://courses.illinois.edu/schedule/2010/Fal...
2,47102,AAS,100,AD3,Discussion/ Recitation,F,12:00:00,12:50:00,431 Armory,Fall,2010,"Kwon, YRana, J",intro_asian_american_studies,This course satisfies the General Education Cr...,https://courses.illinois.edu/schedule/2010/Fal...
3,51248,AAS,100,AD4,Discussion/ Recitation,F,13:00:00,13:50:00,431 Armory,Fall,2010,"Kwon, YRana, J",intro_asian_american_studies,This course satisfies the General Education Cr...,https://courses.illinois.edu/schedule/2010/Fal...
4,51249,AAS,100,AD5,Discussion/ Recitation,M,13:00:00,13:50:00,431 Armory,Fall,2010,"Arnaldo, CRana, J",intro_asian_american_studies,This course satisfies the General Education Cr...,https://courses.illinois.edu/schedule/2010/Fal...


### Data Preparation - Curation and Merge
**uiuc-gpa + uiuc-courses**

#TODO Calculate the number of students graded per course from gpa dataset
#TODO Link the two datasets using course instructor, title, others
#TODO Create a main dataset with *course_attendance* for later analysis

In [27]:
courses.columns

Index(['crn', 'course_subject', 'course_number', 'course_section',
       'course_type', 'course_day', 'time_start', 'time_end', 'location',
       'course_term', 'course_year', 'instructor', 'course_tile',
       'course_info', 'course_schedule'],
      dtype='object')