Welcome to my first post! 
I'll primarily use this site to host and share little projects. It's mainly to help myself keep these things organized and accessible, but if anyone happens upon these and finds them helpful, that'd be awesome!

This notebook explores the steps required to download volleyball stats from a URL and convert it into a python dictionary.

NOTE: Originally I printed out the HTML code at a few steps of the process. However, a quirky bug occurred when uploading the Jupyter Notebook with Pelican: the HTML was rendered. I didn't want a table of stats to be shown, I just wanted to show how the raw HTML code was structured. For now, I've commented out those print statements.


In [2]:
from urllib.request import urlopen

url = "http://www.illinoistechathletics.com/sports/mvball/2016-17/bios/drews_michael_ank9?view=gamelog"

# urlopen returns an HTTPResponse object and calling read() on it returns the body
webpage = urlopen(url).read()

# print(webpage)


Success! .....kind of. The HTTP request worked properly, but that returns an object that contains all of the HTML source code from the website. I only want the stats. 

Looking into parsing through HTML, I came across the BeautifulSoup module.

In [3]:
from bs4 import BeautifulSoup as bs

soup = bs(webpage, "html.parser")

# print(soup.prettify())

While still a bit overwhelming, the stats are within a < table > tag and the BeautifulSoup object allows us to easily find all of the tables in the source code. Let's see how many tables there are:

In [4]:
print(len(soup.find_all("table")))

5


Since there are only 5 tables, the quickest way to determine which one has the Game Log stats is to inspect each one. It turns out the 4th table is the one we want.

In [5]:
stat_table = soup.find_all("table")[3]
# print(stat_table)

Now we just need to iterate through all of the tags in that table to retrieve the particular values.

The results will be stored in a python dictionary. Since I know I'll eventually want a dictionary for each of the players on the team, my plan is to actually create a dictionary of dictionaries: each player name will be a key that can be used to retrieve the stats dictionary. The keys in the stats dictionary will be the name of each particular stat, so ultimately I'll have a data structure that allows me to specify both a player name and a particular stat. The values of the stats dictionary will be a list where each element is the stat from a particular game.

Example: {'Michael Drews': {'k': [1, 2, 3], 'b': [0, 0, 2]}}

The simplified dictionary shows that I played in three games. To get the number of kills I had in the first one, I'd use something like stats_dict['Michael Drews']['k'][0]

In [6]:
from collections import defaultdict

stats_dict = defaultdict(lambda: defaultdict(lambda: []))
stat_names = []

# The stat names are within <th> tags, so first create a list of all of the
# stat names. This will be needed to save the stats to the dictioanry. 
for th in stat_table.find_all("th"):
   stat_names.append(th.string)

# Each row in the GameLog is encapsulated within <tr> tags.
# Use a for loop to access each of the rows
for tr in stat_table.find_all("tr"):
    # Each column in a row is represented using a <td> tag
    # Iterate across all of the <td> tags
    for i, elt in enumerate(tr.find_all("td")):
        stat = 0
        # It gets kind of tricky here because the Score stat's <td> tag
        # contains a child <a> tag, so this try block checks to see if
        # the current tag has a child. If it does, it retrieves the score
        # from the child. If it doesn't, an exception is thrown and code
        # execution continues from the except block.
        try:
            for c in elt.children:
                if c.string != '\n':
                    stat = c.string.strip('\n')
        except:
            stat = elt.string.strip('\n')
        # Similarly, I want the stats saved as a float, but not all of the 
        # stats are numbers. The try block converts all of the numbers to
        # float and if there's an exception from the conversion, the string
        # is saved to the dictionary instead.
        try:
            stats_dict['Michael Drews'][stat_names[i]].append(float(stat))
        except:
            stats_dict['Michael Drews'][stat_names[i]].append(stat)

In [7]:
print(stat_names)

['Date', 'Opponent', 'Score', 'ms', 's', 'k', 'e', 'ta', 'pct', 'a', 'sa', 'se', 're', 'digs', 'bs', 'ba', 'be', 'tot', 'bhe', 'pts']


In [8]:
stats_dict['Michael Drews']['k']

[7.0,
 6.0,
 4.0,
 2.0,
 5.0,
 6.0,
 5.0,
 13.0,
 3.0,
 6.0,
 5.0,
 3.0,
 8.0,
 3.0,
 1.0,
 8.0,
 9.0,
 4.0,
 9.0,
 9.0,
 6.0,
 8.0,
 4.0,
 5.0,
 6.0,
 7.0,
 10.0,
 '-',
 5.0,
 10.0]

In [9]:
stats_dict['Michael Drews']['Score']

['W, 3-0',
 'L, 3-0',
 'L, 3-0',
 'L, 3-0',
 'L, 3-0',
 'L, 3-0',
 'W, 3-0',
 'L, 3-2',
 'L, 3-0',
 'L, 3-1',
 'L, 3-1',
 'L, 3-1',
 'L, 3-2',
 'L, 3-0',
 'W, 3-0',
 'W, 3-2',
 'L, 3-0',
 'L, 3-0',
 'L, 3-1',
 'L, 3-0',
 'W, 3-0',
 'W, 3-0',
 'L, 3-0',
 'L, 3-0',
 'L, 3-0',
 'L, 3-0',
 'L, 3-1',
 '     \xa0     ',
 'L, 3-0',
 'L, 3-0']

Woohoo! Now with the stats in a dictionary, additional stats can easily be computed.

In [10]:
total_kills = 0
total_games_played = 0
for num_kills in stats_dict['Michael Drews']['k']:
    if num_kills != '-':
        total_kills += num_kills
        total_games_played += 1
print("Total kills:", total_kills)
print("Total games played:", total_games_played)
print("Average kills/games:", '%.3f'%(total_kills/total_games_played))


Total kills: 177.0
Total games played: 29
Average kills/games: 6.103


Next time, I'll work on creating the dictionary that stores the stats of every player on the team.