# Functions for Processing 
----

#### Data cleaning and formatting is done to create Pandas dataframes to be used for mapping and visualization of the data.  The primary data set comes from an api generated by CBS Sports News and hosted by Amazon Web Services.  The data set is a listing of College/University Sports Events that are scheduled to be streamed by video or audio.  The ask by CBS Sports News is to take the API that is generated weekly and create a visualization of the scheduled events to be broadcast to help anticipate staffing needs on a daily basis.  For CBS Sports News, a heat map/and or graphic visualization of the games to be broadcast by specific pub points will be used to deliver this information.   Further analysis of the events data, will be done using information gathered from a listing of Universities and Colleges to get location data to create maps and visualizations of the events held at specific locations.




In [1]:
# Dependencies and Setup
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import requests
import time
import json as js
from scipy.stats import linregress




# Import API key - usng CBS keys - not used yet
#from api_keys import sport_key
from config import gkey
from config import scorecard_key

# Incorporated citipy to determine city based on latitude and longitude
from citipy import citipy

# Input test file (JSON)
input_data_file='events.json'
# Output File (CSV)
output_data_file = "eventsMaster.csv"


# Steps for analyzing / cleaning data

Step1 - Capture data from the CBS Sports News API, filter, clean and assemble a dataframe that has school location data and pub points data 
----

Need: 
'count'  - gives the number of events in the list
nested in 'count'
'events' - nested
 by 'contenttype'
     'eventstate' 'scheduled'
     'eventstatus' 'live'
     'eventtype' 'game'
     'is_passthrough' 'False'
     'prismid': '27a62c10-4a15-42ae-a81b-9b31c346ffb9',  [unique id]
     'schedule': {'endtimestamp': 1618181100,
                          'starttimebuffer': 0,
                          'starttimestamp': 1618163100},
     'school': 'nwst',
     'school_name': 'Northwestern State University',
     'sport': 'm-basebl',
     'sport_name': 'Baseball',



In [2]:
#Step 1
with open(input_data_file) as f:
  data = js.load(f)
#gives a dictionary
#data.values()


In [3]:
import pprint
#pp = pprint.PrettyPrinter(depth=4)
#pp.pprint(data)


In [4]:
#another way to print
#pp.pprint(f'Dictionary comprehension: {data}')

In [5]:
data.keys()

dict_keys(['events', 'count'])

In [6]:
#Step 1
#checking to remove passthroughs
newDict={}
print(data['count'])
newCount=0
#filter out the passthrough records
Counts=data['count']
#first look for passthrough = True
for index in range(1,Counts):
    if data['events'][index]['is_passthrough']==False:
        newDict=data
        newCount= newCount+1
    
#print(newDict)    
print(f'filtered data counts {newCount}')

706
filtered data counts 558


In [7]:
#Step 1
#create the master filtered data frame
ID=[]
Type=[]
Scheduled=[]
Status=[]
CType=[]
Pass=[]
Start=[]
End=[]
Event_Title=[]
School_Name=[]
School_Code=[]
Game=[]
PubPoint=[]


for index in range(1,newCount):
    try:
        ID.append(newDict['events'][index]['prismid'])
        Scheduled.append(newDict['events'][index]['eventstate'])
        Pass.append(newDict['events'][index]['is_passthrough'])
        Start.append(newDict['events'][index]['starttime'])
        End.append(newDict['events'][index]['endtime'])
        Event_Title.append(newDict['events'][index]['eventtitle'])
        School_Name.append(newDict['events'][index]['school_name'])
        School_Code.append(newDict['events'][index]['school'])
        Game.append(newDict['events'][index]['sport_name'])
        
    except ValueError:
        continue
    except KeyError:
        print(index)
        continue
             
        
  
    
event_df=pd.DataFrame(ID)
event_df['Scheduled']=Scheduled
event_df['PassThru']=Pass
event_df['Start Time']=Start
event_df['End Time']=End
event_df['Event']=Event_Title
event_df['School Name']=School_Name
event_df['School Code']=School_Code
event_df['Sport']=Game
event_df.rename(columns={0:'ID'},inplace=True)
event_df.set_index('ID',inplace=True)



In [8]:
#event_df.head()

In [9]:
event_df.columns

Index(['Scheduled', 'PassThru', 'Start Time', 'End Time', 'Event',
       'School Name', 'School Code', 'Sport'],
      dtype='object')

In [10]:
#Step 1
#get the pass through record values for each prismid
with open(input_data_file) as f:
  filter_set = js.load(f)
#gives a dictionary
filter_set.values()

prismid=[]
Pass=[]
contenttype=[]
pubPoint=[]
#filter out the passthrough records
Counters=filter_set['count']

pr_count=0
content_count=0
pass_count=0
pub_count=0
#first look for passthrough = True
for index in range(1,Counts):
   try:     
    
    if filter_set['events'][index]['is_passthrough']==True:
        Pass.append(filter_set['events'][index]['is_passthrough'])
        prismid.append(filter_set['events'][index]['prismid'])
        contenttype.append(filter_set['events'][index]['contenttype'])
        pubPoint.append(filter_set['events'][index]['ingest']['primary']['pub_point'])
   except KeyError:
        continue

count_pid=len(prismid)
print(count_pid)        
#count_contenttype=len(contenttype)  
#count_pass=len(Pass)
#print(count_pass)
#print(f'pass_throughs are :{Pass}')  
count_pubs=len(pubPoint)
print(count_pubs)
#print(f'pubs are: {pubPoint}')
  
#print(f'prismids to exclude are :{prismid}')

147
0


In [11]:
#Step 1
#create the pub poing list to merge back to the master filtered data frame
counter=0
xcount=0
prID=[]
PP=[]
# the pub points
for index in range(1,newCount):
    if newDict['events'][index]['prismid'] in prismid:
        xcount=+1
    elif newDict['events'][index]['prismid']not in prismid:
        prID.append(newDict['events'][index]['prismid'])
        PP.append(newDict['events'][index]['ingest']['primary']['pub_point'])
        
print(xcount)
print(len(prID))  
print(len(PP))

AddPP_df=pd.DataFrame(prID)
AddPP_df['PubPoint']=PP
AddPP_df.rename(columns={0:'ID'},inplace=True)
AddPP_df.set_index('ID',inplace=True)

1
442
442


In [12]:
print(AddPP_df)

                                                  PubPoint
ID                                                        
80454be6-1828-499d-b398-6c3b38f30a28         mtsu_softball
f9547332-f5d5-49e0-bc8f-63ec97466837           davenport_1
cc6a38ff-0b50-4d79-b47f-320b77188954    westgeorgia_audio2
bc057b3e-4358-4089-aea2-05ac0004396c            washjeff_1
1281a5f2-414e-4349-ab70-63900958ec47      charlotte_audio2
...                                                    ...
584cb036-66a8-4a6f-b725-067ffe6c09e0  southalabama_3_audio
a65b3462-a775-445e-b7d2-c330c4476fbe          providence_1
d986f7b6-fea3-4a03-ac1e-8dbfaef364af                 ecu_2
c5127d37-1fa6-418a-aefd-1d1c6a61adcd         cumberlands_2
823a3c4b-c307-4cc0-ad0c-985461829089           villanova_1

[442 rows x 1 columns]


In [13]:
#Step 1
#this join gets the 558 with 442 pub points
working_events=event_df.merge(AddPP_df,how='left', left_on='ID',right_on='ID')
#record checks

working_events.count()

Scheduled      557
PassThru       557
Start Time     557
End Time       557
Event          557
School Name    557
School Code    557
Sport          557
PubPoint       442
dtype: int64

In [14]:
#working_events.tail(25)

In [15]:
#Step 1 - a copy of the dataframe is save to a file 
#output dataframe to CSV file - passthroughs accunted for - audio shown in pubPoint
working_events.to_csv('NewPubPoints_in_events.csv')

## Step2 - get location data for schools 

In [16]:
#Step 2
#getting filters for data that has to be extracted separate from the main dataframe

school_filters=working_events.groupby('School Name')
conference_usa=school_filters.get_group('Conference USA')
print(conference_usa.count())
patriot_league=school_filters.get_group('Patriot League')
print(patriot_league.count())

Scheduled      31
PassThru       31
Start Time     31
End Time       31
Event          31
School Name    31
School Code    31
Sport          31
PubPoint       28
dtype: int64
Scheduled      37
PassThru       37
Start Time     37
End Time       37
Event          37
School Name    37
School Code    37
Sport          37
PubPoint       37
dtype: int64


In [17]:
conferences=['Patriot League','Conference USA']
school_locations=[]
school_locations=pd.DataFrame()
blank_counter=0
for index, item in working_events.iterrows():
    if item['School Name']in conferences:
        blank_counter=blank_counter+1
    elif item['School Name']not in conferences:
        school_locations.loc[index,'School Name']=item['School Name']
print(f'no school set {blank_counter}')       
school_locations.count()
#school_locations

no school set 68


School Name    489
dtype: int64

In [20]:
#process for location lat/lng by School Name
#using school_locations
newlist=working_events
newlist['lat']=''
newlist['lng']=''
newlist['city location']=''
no_info=0
url="https://api.data.gov/ed/collegescorecard/v1/schools?school.name="
for index, item in newlist.iterrows():
        search_school=item['School Name']    
        query_url=f'{url}+{search_school}+&api_key={scorecard_key}'
        search_requested=requests.get(query_url)
        #print(search_requested.json())
        try:
                newlist.loc[index,'lat']=search_request[][0]['lat']
                newlist.loc[index,'lng']=search_request[][0]['lon']
                newlist.loc[index,'city location']=search_request[][0]['city']
            
        except(KeyError,IndexError):
            print('Skipping')
        
    
print(no_info)

{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact

{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact

{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact

{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact

{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact

{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact

{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact/ for assistance'}}
{'error': {'code': 'OVER_RATE_LIMIT', 'message': 'You have exceeded your rate limit. Try again later or contact us at https://api.data.gov:443/contact

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

### Perform API Calls
* Perform a weather check on each city using a series of successive API calls.
* Include a print log of each city as it'sbeing processed (with the city number and city name).


### Convert Raw Data to DataFrame
* Export the city data into a .csv.
* Display the DataFrame