Try downloading some web pages using a Python program and extracting information. Look at the page in your web browser and use the inspector to locate areas of interest.  

Refer to a library like beautifulsoup4 or pyquery documentation to find out how to search the HTML for more specific things

For example, tags with particular class attributes.

Look at CFL punting and kick off stats for 2019

[Punt and Kick Stats](https://www.cfl.ca/stats/?stat_category=punting&season=2019)


In [4]:
from bs4 import BeautifulSoup
import requests

season = '2019'
url = 'https://www.cfl.ca/stats/?stat_category=punting&season=' + season

page = requests.get(url)

soup = BeautifulSoup(page.content, 'html.parser')

table_head = soup.find_all('th', attrs={'class': 'cell-th'})

col_headers = []

for th in table_head:
    col_headers.append(th.text)

#insert extra column heading that appears in data
col_headers.insert(2, 'URL')

print(col_headers)

['Date', 'NAME', 'URL', 'Team', 'GP', 'PUNTS', 'YDS', 'AVG', 'LG', 'S', 'KICKOFFS', 'YDS', 'AVG', 'LG', 'S']


#### After trying to import all the data at once using the request I realized that the data is actually loaded seperately. The page is not static. I was able to locate the data in the network tab of the dev tools and import that as json. I needed to import the headers seperately as those could be loaded straight from the url for the webpage. However thet data for the column headers was one less than the data in the main table as the json data had an extra category of the url to link the player to their player page so I added that into the `col_headers` but then dropped it after creating the dataframe as I thought this the eiasier course of action instead of filtering the data before dataframe creation. End result is I have successfully webscraped the table from the website into a dataframe.

In [11]:
import json
import pandas as pd

data_page = requests.get('https://www.cfl.ca/wp-content/themes/cfl.ca/inc/admin-ajax.php?action=get_league_stats&stat_category=punting&season=2019')

soup = BeautifulSoup(page.content, 'html.parser')

site_json=json.loads(soup.text)

site_json['data'][2]

player_data = []

for row in site_json['data']:
    player_data.append(row)

df = pd.DataFrame(player_data, columns=col_headers)
df = df.drop(['URL'], axis=1)
df

Unnamed: 0,Date,NAME,Team,GP,PUNTS,YDS,AVG,LG,S,KICKOFFS,YDS.1,AVG.1,LG.1,S.1
0,2019,"LEONE, Richie",OTT,18,132,6383,48.4,77,6,50,3091,61.8,72,1
1,2019,"RYAN, Jonathan",SSK,18,107,5222,48.8,77,12,0,0,0.0,0,0
2,2019,"BEDE, Boris",MTL,18,109,4862,44.6,61,2,83,5772,69.5,95,5
3,2019,"MEDLOCK, Justin",WPG,18,106,4716,44.5,71,1,79,5310,67.2,85,2
4,2019,"HAJRULLAHU, Lirim",HAM,18,106,4566,43.1,62,1,91,5701,62.6,80,1
5,2019,"MAVER, Rob",CGY,18,105,4549,43.3,64,3,0,0,0.0,0,0
6,2019,"O'NEILL, Hugh",EDM,12,80,3609,45.1,70,1,46,2881,62.6,73,0
7,2019,"PFEFFER, Ronald",TOR,12,76,3412,44.9,61,1,0,0,0.0,0,0
8,2019,"BARTEL, Josh",BC,14,73,3108,42.6,53,1,0,0,0.0,0,0
9,2019,"MEDEIROS, Zackary",TOR,8,49,2029,41.4,54,1,17,1082,63.6,73,0
