```
title: How to Win Fantasy Hockey
date: 2019-09-28
tags: python
slug: scraping_fantasy_hockey
```

#### How to Win Fantasy Hockey

*A recipe for using python to scrape player projection data and build a value over replacement ranking model*

Jupyter notebook for this blog post available [here](https://github.com/maxhumber/maxhumber.com/blob/master/blog/2019-09-28_scraping_fantasy_hockey.ipynb)
<br/>
<br/>

№1 Identify a data source ([CBS Sports](https://www.cbssports.com/fantasy/hockey/stats/F/2019/season/projections/) is a good option)

№2 `get` the html data with [gazpacho](https://maxhumber.github.io/gazpacho/)

In [1]:
from gazpacho import get

position = 'F'

base = f'https://www.cbssports.com/fantasy/hockey/stats'
url = f'{base}/{position}/2019/season/projections/'

html = get(url)

№3 Pass the captured html to a `Soup` parser

In [2]:
from gazpacho import Soup

soup = Soup(html)

№4 `find` the html tags that contain player projection data

In [3]:
# HTML: <tr class="TableBase-bodyTr ">
rows = soup.find('tr', {'class': 'TableBase-bodyTr '})

№5. Capture a single row (and inspect for good measure)

In [4]:
row = rows[0]

№6 Use `find` to grab those tags that map to player name, position, and projected Fantasy Points

In [5]:
# name
row.find('span', {'class': 'CellPlayerName--long'}).find('a').text

# position
(row.find('span', {'class': 'CellPlayerName--long'})
     .find('span', {'class': 'CellPlayerName-position'}).text
)

# points
float(row.find('td', {'class': 'TableBase-bodyTd'})[1].text)

366.5

№7 Wrap these `find` operations into a function

In [6]:
row = rows[0]

def parse_row(row):
    meta = row.find('span', {'class': 'CellPlayerName--long'})
    try:
        name = meta.find('a').text
    except AttributeError:
        name = meta.text
    position = meta.find('span', {'class': 'CellPlayerName-position'}).text
    points = float(row.find('td', {'class': 'TableBase-bodyTd'})[1].text)
    return name, position, points

parse_row(row)

('Nikita Kucherov', 'RW', 366.5)

№8 Make sure that the function works for all the captured rows

In [7]:
players = []
for row in rows:
    try: 
        players.append(parse_row(row))
    except AttributeError:
        pass
players[-2:]

[('David Perron', 'RW', 143.5), ('Dylan Strome', 'C', 143.0)]

№9 Bundle up the logic so that it can be applied to multiple pages

In [8]:
def scrape_position(position):
    base = f'https://www.cbssports.com/fantasy/hockey/stats'
    url = f'{base}/{position}/2019/season/projections/'
    html = get(url)
    soup = Soup(html)
    rows = soup.find('tr', {'class': 'TableBase-bodyTr '})
    data = []
    for row in rows:
        try: 
            data.append(parse_row(row))
        except AttributeError:
            pass
    return data

№10 Scrape each page that contains player projection data

In [9]:
# F for Forwards
# D for Defence
# G for goalies

import time

data = []
for position in ['F', 'D', 'G']:
    d = scrape_position(position)
    data.extend(d)
    time.sleep(1)

№11 Stuff the captured data into a pandas `DataFrame`

In [10]:
import pandas as pd

df = pd.DataFrame(data, columns=['name', 'position', 'points'])
df.sample(5)

Unnamed: 0,name,position,points
122,Jake Muzzin,D,163.8
239,Anders Nilsson,G,176.4
11,Johnny Gaudreau,LW,270.0
220,Henrik Lundqvist,G,246.2
82,Kyle Palmieri,RW,150.5


№12 Calculate the <a href="https://en.wikipedia.org/wiki/Value_over_replacement_player">value over player replacement</a> score for each player

In [11]:
pool_size = 8
starters = {'C': 1, 'LW': 1, 'RW': 1, 'D': 2, 'G': 1}

for position, slots in starters.items():
    replacement = (
        df[df['position'] == position]
        .sort_values('points', ascending=False)
        .head(slots * pool_size)
        ['points']
        .mean()
    )
    df.loc[df['position'] == position, 'vorp'] = df['points'] - replacement

№13 Re-rank and draft players according to their VORP rank

In [12]:
df['rank'] = df['vorp'].rank(method='average', ascending=False)
df.sort_values('rank').set_index('rank').head(20)

Unnamed: 0_level_0,name,position,points,vorp
rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1.0,Nikita Kucherov,RW,366.5,103.9375
2.0,Mark Giordano,D,322.3,93.15
3.0,Brent Burns,D,317.5,88.35
4.0,Morgan Rielly,D,289.5,60.35
5.0,Leon Draisaitl,LW,310.0,47.6125
6.0,John Carlson,D,271.5,42.35
7.0,Brad Marchand,LW,298.0,35.6125
8.0,Andrei Vasilevskiy,G,409.5,35.125
9.0,Patrick Kane,RW,289.5,26.9375
10.0,Connor McDavid,C,300.0,19.9


№14 Profit