## Introduction

For this project I will be analysing punting and field goal data for the Canadian Football League with the aim of determining some insights about the kicking game. One particular interest is the effect of kicking indoors vs outdoors. In the CFL there is one indoor stadium located in Vancouver, British Columbia (BC). Having no adverse elements such as wind or rain, or in the latter parts of the year, snow, indoor stadiums could potentially have a positive effect on the performance of kickers and punters in the stadium. Using data gathered from the CFL website containing stats of four seasons of kicking data, I will investigate this to see if there is a significant difference in the productivity of the kicking game indoors vs outdoors. There are three types of kicking in the CFL for which stats are available: kick offs, punts, and field goals. The two I am most interested in are field goals and punting as they would be most affected by adverse conditions. Field goals are where the kicker attempts to kick the ball from the ground through the uprights to score points and punting is when the ball is kicked out of the hand of the punter to give the ball back to the opposing team. I will first just look at general performances in each stadium to see if there are any differences but seeing that different players have different abilities, I believe it would make sense to analyse the data based on individual kicker’s performance both indoors and outdoors. It may also be that from year to year a kicker’s ability may change so in order to analyse the data each data point will be a kicker’s performance for each season played. Possible ways to extend this analysis is to look at which stadiums have the best and worst performance results. When it comes to punting, sometimes very windy stadiums can actually work in the favour of the punter as this can make it difficult for the returner to successfully catch the ball in the air. As a current punter and kicker in the league, this data is particularily of interest to me. I have played in all of the stadiums over the years and have formed opinions as to which stadiums are easier or more difficult to perform in. I would like to know if there is any data to back up what I believe or if these are merely percieved advantages and disadvantges. There does not appear to be any literature or in depth statistical analysis of the punting and kickng performaces and extremely unlikely they has been any analysis done by a player that knows the limitations of the data available and some of the nuances of the kicking game itself in the CFL. 

## Data Details

The data obtained for this analysis is obtained directly from the cfl.ca website where statistics are for the public to view. The data was obtained using web scraping the data directly from the site in the form of a json document. The website gives the user the ability to set parameters for the statistics that they want including the what years, where the games were played, who the opponent was, and filtering the data by game, season, or player. Knowing how the url is constructed based on these parameters and filters, I built a function to download the datasets I was looking for and would save those datasets immediately to limit the need to re-download the datasets any more than necessary. While the data is public, ethically it is important to not download the data any more times than needed to reduce costs for the hosts of the data as these stats are not designed for high data usage downloading. These stats are for general browsing to fans of the league. The data obtained was for the four seasons from 2016 to 2019.

One challenge in the data acquiring process is that the downloaded data did not actually contain the stadium location data in the results. However the website allowed you to query the stats to limit the results based on the location of the games. My approach was to create a function to download a dataset for each stadium location and put this into a list. Once in the list I then could append the location to the dataset downloaded and then combine all the datasets for each location, with the data of the location included, into a final complete dataset.
The dataset had a lot of other categories not needed so they were removed as well as both the punt average and field goal percentage needed to be converted to a float from an object . 

The punting dataset contains the year, punter’s name, the location, the number of punts, total punt yards, and punt average. The field goal dataset contains the year, kicker’s name, field goals attempted, field goals made, accuracy, a breakdown of distances of the field goals made distances, and converts made and attempted. Converts are a special field goal that is attempted after a touchdown that is a specific distance. These are only worth one point so are kept as a separate statistic. 

While the names of the players are included in the dataset, this does not pose any ethical dillemas as player data is readily available for the public to view. The nature of the profession is to perform in public and that performance statstics are recorded to be analysed by commentators and fans alike. 

## Data Retrieval

In [4]:
import requests
import json
import pandas as pd 
import numpy as np
from bs4 import BeautifulSoup

In [5]:
# list of all the team location acronyms for data retrieval
team_codes = ['bc', 'edm', 'cgy', 'ssk', 'wpg', 'ham', 'tor', 'ott', 'mtl']


For each team code create the url to retrieve the appropriate data set.  
This is equal to the `base url + filter location = team code`  
If the `include_home` parameter is set to `True`, results will include the home kicker for each location as well.  
The `kick_category` parameter is to set whether to rertieve punting or field goal data. 

In [6]:
def get_site_json(url):
    data_page = requests.get(url)
    soup = BeautifulSoup(data_page.content, 'html.parser')
    site_json = json.loads(soup.text)
    return site_json

def get_game_data(team_codes, kick_category, include_home):
    # if category is punting - add a filter so that a minumum of 1 punts is returned
    # otherwise CFL returns a dataset including every player on the roster for the game instead of just the players who had a punt
    # same issue doesn't exist for field goals so this filter not needed
    if kick_category == 'punting':
        base_url = 'https://www.cfl.ca/wp-content/themes/cfl.ca/inc/admin-ajax.php?action=get_league_stats&stat_category=punting&filter[season][ge]=2016&filter[punts][ge]=1&filter[location][eq]='
    else:
        base_url = 'https://www.cfl.ca/wp-content/themes/cfl.ca/inc/admin-ajax.php?action=get_league_stats&stat_category=field_goals&filter[season][ge]=2016&filter[location][eq]='
    json_data = []
    for team in team_codes:
        if include_home: 
            url = base_url + team + '&group_by=player'
        else:
            # if only want to include players away games (no home advantage)
            # set filter so that opposing team = team location so home player data not included
            url = base_url + team + '&filter[opponent_team_abbreviation][eq]=' + team + '&group_by=player'
        site_json = get_site_json(url)
        json_data.append(site_json)
    return json_data

## Download all the datasets

In [8]:
home_game_data_punt = get_game_data(team_codes, 'punting', True)

In [9]:
home_game_data_fg = get_game_data(team_codes, 'field_goals', True)

In [10]:
away_game_data_punt = get_game_data(team_codes, 'punting', False)

In [7]:
away_game_data_fg = get_game_data(team_codes, 'field_goals', False)

## Save Retrieved Data to file

In [11]:
def save_json_to_file(data, name):
    with open(name, 'w', encoding='utf-8') as f:
        json.dump(data, f, ensure_ascii=False, indent=3)

save_json_to_file(home_game_data_punt, 'home_game_data_punt.json')
save_json_to_file(away_game_data_punt, 'away_game_data_punt.json')
save_json_to_file(home_game_data_fg, 'home_game_data_fg.json')
save_json_to_file(away_game_data_fg, 'away_game_data_fg.json')