# File Processing for Schools
----

#### Data cleaning and formatting is done to create Pandas dataframes to be used for mapping and visualization of the data.  The primary data set comes from an api generated by CBS Sports News and hosted by Amazon Web Services.  The data set is a listing of College/University Sports Events that are scheduled to be streamed by video or audio.  The ask by CBS Sports News is to take the API that is generated weekly and create a visualization of the scheduled events to be broadcast to help anticipate staffing needs on a daily basis.  For CBS Sports News, a heat map/and or graphic visualization of the games to be broadcast by specific pub points will be used to deliver this information.   Further analysis of the events data, will be done using information gathered from a listing of Universities and Colleges to get location data to create maps and visualizations of the events held at specific locations.




In [49]:
# Dependencies and Setup
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import requests
import time
import json as js
from scipy.stats import linregress

# Import API key - usng CBS keys - not used yet
#from api_keys import sport_key
from config import gkey
from config import scorecard_key

# Incorporated citipy to determine city based on latitude and longitude
from citipy import citipy

# Input test file (JSON).
input_data_file01="NewPubpoints_in_events.csv"
input_data_file02="Resources/MERGED2018_19_PP.csv"
output_data_file = "eventsMaster.csv"


# Steps for analyzing / cleaning data

Step2 - get the location data for schools that are known and merge back to processing dataframe saved to NewPubpoints_in_events.CSV
----

## Step2 - get location data for schools 

In [50]:
#Open file to get working_events copy to use 
working_set_df = pd.read_csv('NewPubPoints_in_events.csv') 
print(working_set_df.columns)
working_set_df.head()
#now has the index column


Index(['ID', 'Scheduled', 'PassThru', 'Start Time', 'End Time', 'Event',
       'School Name', 'School Code', 'Sport', 'PubPoint'],
      dtype='object')


Unnamed: 0,ID,Scheduled,PassThru,Start Time,End Time,Event,School Name,School Code,Sport,PubPoint
0,80454be6-1828-499d-b398-6c3b38f30a28,scheduled,False,1618008600,1618020000,SB: Middle Tennessee vs Marshall,Conference USA,c-usa,Softball,mtsu_softball
1,f9547332-f5d5-49e0-bc8f-63ec97466837,scheduled,False,1618073400,1618084800,vs. Ashland,Davenport University,dave,Baseball,davenport_1
2,cc6a38ff-0b50-4d79-b47f-320b77188954,scheduled,False,1618076700,1618110000,UWG Baseball vs. Union,University of West Georgia,wega,Baseball,westgeorgia_audio2
3,bc057b3e-4358-4089-aea2-05ac0004396c,scheduled,False,1617822000,1617840000,Bethany at W&J,Presidents Athletic Conference,pac,Softball,washjeff_1
4,1281a5f2-414e-4349-ab70-63900958ec47,scheduled,False,1618008600,1618020000,BSB AUDIO: Charlotte at FIU,Conference USA,c-usa,Baseball,charlotte_audio2


In [51]:
#Step 2
#getting filters for data that has to be extracted separate from the main dataframe

school_filters=working_set_df.groupby('School Name')
conference_usa=school_filters.get_group('Conference USA')
print(conference_usa.count())
patriot_league=school_filters.get_group('Patriot League')
print(patriot_league.count())
president=school_filters.get_group('Presidents Athletic Conference')
print(president.count())

ID             31
Scheduled      31
PassThru       31
Start Time     31
End Time       31
Event          31
School Name    31
School Code    31
Sport          31
PubPoint       28
dtype: int64
ID             37
Scheduled      37
PassThru       37
Start Time     37
End Time       37
Event          37
School Name    37
School Code    37
Sport          37
PubPoint       37
dtype: int64
ID             12
Scheduled      12
PassThru       12
Start Time     12
End Time       12
Event          12
School Name    12
School Code    12
Sport          12
PubPoint       12
dtype: int64


In [57]:
conferences=['Patriot League','Conference USA','Presidents Athletic Conference','Atlantic 10 Conference','America East Conference']
school_locations=[]
school_locations=pd.DataFrame()
blank_counter=0
for index, item in working_set_df.iterrows():
    if item['School Name']in conferences:
        blank_counter=blank_counter+1
    elif item['School Name']not in conferences:
        school_locations.loc[index,'School Name']=item['School Name']
print(f'no school set {blank_counter}')       
school_locations.count()
print(school_locations)

no school set 103
                                School Name
1                      Davenport University
2                University of West Georgia
5         Rochester Institute of Technology
6    Fairleigh Dickinson-College at Florham
7                           Troy University
..                                      ...
552                      Providence College
553            Mississippi State University
554                East Carolina University
555           University of the Cumberlands
556                    VILLANOVA UNIVERSITY

[454 rows x 1 columns]


In [58]:
#process for location lat/lng by School Name
#using school_locations
#school_df = pd.read_csv(input_data_file02) 
#school_filter_df=school_df[['INSTNM','CITY','ZIP','LATITUDE','LONGITUDE']]
#school_filter_df.columns


In [59]:
#school_filter_df.head()
#school_filter_df.count()

In [60]:
#output dataframe to CSV file - passthroughs accunted for - audio shown in pubPoint
#school_filter_df.to_csv('Schools_Info.csv')

### Perform API Calls
* Perform a weather check on each city using a series of successive API calls.
* Include a print log of each city as it'sbeing processed (with the city number and city name).


### Convert Raw Data to DataFrame
* Export the city data into a .csv.
* Display the DataFrame

In [65]:
Geo_df=pd.DataFrame()
Geo_df["School Name"]=""
Geo_df["Lat"]=""
Geo_df["Long"]=""
radii=50000
target_city=""
target_type="university"
keys=""

params={"address":target_city,
        "radius":radii,
        "types":target_type,
        "keyword":keys,
        "key":gkey}
base_url="https://maps.googleapis.com/maps/api/geocode/json"
basecount=school_locations.count()
for index,row in school_locations.iterrows():
    target_city=row['School Name']
    #print(target_city)
    params["address"]=target_city
    params["keys"]=target_city
    response=requests.get(base_url, params=params)
    #print(response)
    new_geo=response.json()
    Geo_df.loc[index,"School Name"]=row['School Name']
    Geo_df.loc[index,"Lat"]=new_geo["results"][0]["geometry"]["location"]["lat"]
    Geo_df.loc[index,"Long"]=new_geo["results"][0]["geometry"]["location"]["lng"]
    
Geo_df


Unnamed: 0,School Name,Lat,Long
1,Davenport University,42.8495,-85.5307
2,University of West Georgia,33.5718,-85.1032
5,Rochester Institute of Technology,43.0845,-77.6749
6,Fairleigh Dickinson-College at Florham,40.7762,-74.4321
7,Troy University,31.8011,-85.9573
...,...,...,...
552,Providence College,41.8439,-71.4349
553,Mississippi State University,33.4552,-88.7944
554,East Carolina University,35.6069,-77.3665
555,University of the Cumberlands,36.7371,-84.1638


In [66]:
#output dataframe to CSV file 
Geo_df.to_csv('Geo_df.csv')