See part one [here](./individual-volleyball-stats-part-one.html).

Previously, the stats from a single player were downloaded and populated in a dictionary. The next step is retrieving the stats for each player on the team. I worked on this a couple months ago about halfway through our season and ended up going to each individual player's webpage, copying and pasting their stats into a file on my computer and manually adding commas. I did this for each player and then the python script that created the dictionary iterated through all 11 files to get the stats. 
# YUCK!!

What a terrible approach. Granted, I was just trying to quickly put a plot together for my coaches, but I should've put a little bit more thought into the script. Once we played more games, if I wanted an updated plot, I would have to repeat the entire process of copying and pasting the stats. By retrieving the data directly through the site's HTML, no additional work is required if more games are added.

Anyways, onto the python. I'll repeat the vital code from the previous post. In the future, I'll likely create modules with functions in them to keep the notebooks clean.

In [26]:
from urllib.request import urlopen
from bs4 import BeautifulSoup as bs
from collections import defaultdict

url = "http://www.illinoistechathletics.com/sports/mvball/2016-17/bios/drews_michael_ank9?view=gamelog"

webpage = urlopen(url).read()
soup = bs(webpage, "html.parser")

stats_dict = defaultdict(lambda: defaultdict(lambda: []))
stat_names = []

stat_table = soup.find_all("table")[3]

# The stat names are within <th> tags, so first create a list of all of the
# stat names. This will be needed to save the stats to the dictioanry. 
for th in stat_table.find_all("th"):
   stat_names.append(th.string)

# Each row in the GameLog is encapsulated within <tr> tags.
# Use a for loop to access each of the rows
for tr in stat_table.find_all("tr"):
    # Each column in a row is represented using a <td> tag
    # Iterate across all of the <td> tags
    for i, elt in enumerate(tr.find_all("td")):
        stat = 0
        # It gets kind of tricky here because the Score stat's <td> tag
        # contains a child <a> tag, so this try block checks to see if
        # the current tag has a child. If it does, it retrieves the score
        # from the child. If it doesn't, an exception is thrown and code
        # execution continues from the except block.
        try:
            for c in elt.children:
                if c.string != '\n':
                    stat = c.string.strip('\n')
        except:
            stat = elt.string.strip('\n')
        # Similarly, I want the stats saved as a float, but not all of the 
        # stats are numbers. The try block converts all of the numbers to
        # float and if there's an exception from the conversion, the string
        # is saved to the dictionary instead.
        try:
            stats_dict['Michael Drews'][stat_names[i]].append(float(stat))
        except:
            if 'vs' in stat:
                stat = stat.split('vs.')[1].strip().rstrip('\n')
                stats_dict['Michael Drews'][stat_names[i]].append(stat)
            elif ' at ' in stat:
                stat = stat.split('at')[1].strip().rstrip('\n')
                stats_dict['Michael Drews'][stat_names[i]].append(stat)
            else:
                stat = stat.strip().rstrip('\n')
                stats_dict['Michael Drews'][stat_names[i]].append(stat)

Unfortunately, a unique id is appended to the URL for each player, so it won't be as easy as replacing the name in the URL. First, I have to find out the unique player ID (pid) for each player. This likely requires downloading the Roster page, as that page contains the links to each player's stat page. 

In [38]:
roster_url = "http://www.illinoistechathletics.com/sports/mvball/2016-17/roster"

webpage = urlopen(roster_url).read()
soup = bs(webpage, "html.parser")

roster_table = soup.find_all("table")[0]
player_urls = []

for entry in roster_table.tbody.find_all("tr"):
    player_url = entry.td.a['href']
    print(player_url)
    player_urls.append(player_url)
    

/sports/mvball/2016-17/bios/bostick_derek_mb6q
/sports/mvball/2016-17/bios/bumpass_kyle_12ra
/sports/mvball/2016-17/bios/allen_david_vrb7
/sports/mvball/2016-17/bios/drews_michael_ank9
/sports/mvball/2016-17/bios/kupiec_lukasz_6h2s
/sports/mvball/2016-17/bios/hussain_irshad_bzfc
/sports/mvball/2016-17/bios/huang_allan_7r9o
/sports/mvball/2016-17/bios/bahrami_arvin_ba5l
/sports/mvball/2016-17/bios/robeck_evan_ld87
/sports/mvball/2016-17/bios/letkiewicz_filip_kmmv
/sports/mvball/2016-17/bios/sassmannschausen_paulo_a272
/sports/mvball/2016-17/bios/shepta_yuriy_sq0p
/sports/mvball/2016-17/bios/woltman_andrew_6mbr


Awesome!

Now I just need to take the base URL and append the player URL extensions to the base. player_urls is a list that contains all of the urls, so for each of the player_urls, I'll update them by appending the base URL.

In [39]:
base_url = "http://www.illinoistechathletics.com"

for index, p_url in enumerate(player_urls):
    player_urls[index] = base_url + p_url
    
print(player_urls[0])



http://www.illinoistechathletics.com/sports/mvball/2016-17/bios/bostick_derek_mb6q


Additionally, I want each player's first name to use as the key for the stats dictionary, so I'll iterate through the player_urls list once more to extract each player's first name. This works fine for Illinois Tech's roster because no two players share a first name; if any two did, then these would not suffice as dictionary keys. The first and last name would be an improved key, but I'll stick to using just the first name for now.

In [40]:
player_names = []

for p_url in player_urls:
    player_names.append(p_url.split("_")[1])
    
print(player_names)

['derek', 'kyle', 'david', 'michael', 'lukasz', 'irshad', 'allan', 'arvin', 'evan', 'filip', 'paulo', 'yuriy', 'andrew']


Now we just need to repeat the process for a single player and change the URL and dictionary key each iteration. To clean this up a little bit, I'll create a new function that takes a URL, key, and dictionary as parameters and updates the dictionary with the downloaded stats.

In [60]:
def singlePlayerStatDownload(name, url, dictionary):

    webpage = urlopen(url).read()
    soup = bs(webpage, "html.parser")

    stat_names = []

    stat_table = soup.find_all("table")[3]

    # The stat names are within <th> tags, so first create a list of all of the
    # stat names. This will be needed to save the stats to the dictioanry. 
    for th in stat_table.find_all("th"):
       stat_names.append(th.string)

    # Each row in the GameLog is encapsulated within <tr> tags.
    # Use a for loop to access each of the rows
    for tr in stat_table.find_all("tr"):
        # Each column in a row is represented using a <td> tag
        # Iterate across all of the <td> tags
        for i, elt in enumerate(tr.find_all("td")):
            stat = 0
            # It gets kind of tricky here because the Score stat's <td> tag
            # contains a child <a> tag, so this try block checks to see if
            # the current tag has a child. If it does, it retrieves the score
            # from the child. If it doesn't, an exception is thrown and code
            # execution continues from the except block.
            try:
                for c in elt.children:
                    if c.string != '\n':
                        stat = c.string.strip('\n')
            except:
                stat = elt.string.strip('\n')
            # Similarly, I want the stats saved as a float, but not all of the 
            # stats are numbers. The try block converts all of the numbers to
            # float and if there's an exception from the conversion, the string
            # is saved to the dictionary instead.
            try:
                dictionary[name][stat_names[i]].append(float(stat))
            except:
                if 'vs' in stat:
                    stat = stat.split('vs.')[1].strip().rstrip('\n')
                    dictionary[name][stat_names[i]].append(stat)
                elif ' at ' in stat:
                    stat = stat.split('at')[1].strip().rstrip('\n')
                    dictionary[name][stat_names[i]].append(stat)
                elif 'W' in stat or 'L' in stat:
                    stat = stat.strip().rstrip('\n')
                    dictionary[name][stat_names[i]].append(stat)
                else:
                    dictionary[name][stat_names[i]].append(0.0)

In [61]:
stats_dict = defaultdict(lambda: defaultdict(lambda: []))
    
for p_name, p_url in zip(player_names, player_urls):
    singlePlayerStatDownload(p_name, p_url, stats_dict)

In [62]:
print(stats_dict.keys())

dict_keys(['michael', 'kyle', 'filip', 'irshad', 'david', 'arvin', 'derek', 'yuriy', 'andrew', 'evan', 'paulo', 'allan', 'lukasz'])


In [63]:
stats_dict['michael']['k']

[7.0,
 6.0,
 4.0,
 2.0,
 5.0,
 6.0,
 5.0,
 13.0,
 3.0,
 6.0,
 5.0,
 3.0,
 8.0,
 3.0,
 1.0,
 8.0,
 9.0,
 4.0,
 9.0,
 9.0,
 6.0,
 8.0,
 4.0,
 5.0,
 6.0,
 7.0,
 10.0,
 0.0,
 5.0,
 10.0]

In [64]:
sum(stats_dict['david']['k'])

175.0

In [65]:
sum(stats_dict['michael']['k'])

177.0

Looks like I just barely beat David for total kills in the season =D Comparing those two sums to the "Season Totals" stats on the website, they both match!

So now I have a python dictioanary that allows me to access any players stat for any game or compute entirely new stats. Sweet!!

While the raw stats are awesome, next time I'll explore downloading Play-by-Play logs of each game. This data allows us to identify patterns and strategy (for example, who gets set the most when the score is at least 20-20?). 

Stay tuned!