```
title: How to Win Fantasy Hockey
date: 2019-09-28
tags: python
slug: scraping_fantasy_hockey
```

#### How to Win Fantasy Hockey

*A recipe for using python to scrape player projection data and build a value over replacement ranking model*

Jupyter notebook for this blog post available [here](https://github.com/maxhumber/maxhumber.com/blob/master/blog/2019-09-28_scraping_fantasy_hockey.ipynb)
<br/>
<br/>

№1 Identify a data source ([CBS Sports](https://www.cbssports.com/fantasy/hockey/stats/F/2019/season/projections/) is a good option)

№2 `get` the html data with [gazpacho](https://maxhumber.github.io/gazpacho/)

In [1]:
from gazpacho import get

position = 'F'

base = f'https://www.cbssports.com/fantasy/hockey/stats'
url = f'{base}/{position}/2019/season/projections/'

html = get(url)

№3 Pass the captured html to a `Soup` parser

In [2]:
from gazpacho import Soup

soup = Soup(html)

№4 `find` the html tags that contain player projection data

In [3]:
# HTML: <tr class="TableBase-bodyTr ">
rows = soup.find('tr', {'class': 'TableBase-bodyTr '})

№5. Capture a single row (and inspect for good measure)

In [4]:
row = rows[0]

№6 Use `find` to grab those tags that map to player name, position, and projected Fantasy Points

In [5]:
# name
row.find('span', {'class': 'CellPlayerName--long'}).find('a').text

# position
(row.find('span', {'class': 'CellPlayerName--long'})
     .find('span', {'class': 'CellPlayerName-position'}).text
)

# points
float(row.find('td', {'class': 'TableBase-bodyTd'})[1].text)

367.3

№7 Wrap these `find` operations into a function

In [6]:
row = rows[0]

def parse_row(row):
    meta = row.find('span', {'class': 'CellPlayerName--long'})
    name = meta.find('a').text
    position = meta.find('span', {'class': 'CellPlayerName-position'}).text
    points = float(row.find('td', {'class': 'TableBase-bodyTd'})[1].text)
    return name, position, points

parse_row(row)

('Nikita Kucherov', 'RW', 367.3)

№8 Make sure that the function works for all the captured rows

In [7]:
forwards = [parse_row(row) for row in rows]
print(forwards[-2:])

[('Nick Shore', 'C', 166.6), ('Mats Zuccarello', 'RW', 165.8)]


№9 Bundle up the logic so that it can be applied to multiple pages

In [8]:
def scrape_position(position):
    base = f'https://www.cbssports.com/fantasy/hockey/stats'
    url = f'{base}/{position}/2019/season/projections/'
    html = get(url)
    soup = Soup(html)
    rows = soup.find('tr', {'class': 'TableBase-bodyTr '})
    data = [parse_row(row) for row in rows]
    return data

№10 Scrape each page that contains player projection data

In [9]:
# F for Forwards
# D for Defence
# G for goalies

import time

data = []
for position in ['F', 'D', 'G']:
    d = scrape_position(position)
    data.extend(d)
    time.sleep(1)

№11 Stuff the captured data into a pandas `DataFrame`

In [10]:
import pandas as pd

df = pd.DataFrame(data, columns=['name', 'position', 'points'])
df.sample(5)

Unnamed: 0,name,position,points
119,Alex Pietrangelo,D,145.3
136,Justin Faulk,D,123.5
177,Danny DeKeyser,D,92.4
140,Drew Doughty,D,119.8
176,Esa Lindell,D,92.4


№12 Calculate the <a href="https://en.wikipedia.org/wiki/Value_over_replacement_player">value over player replacement</a> score for each player

In [11]:
pool_size = 8
starters = {'C': 1, 'LW': 1, 'RW': 1, 'D': 2, 'G': 1}

for position, slots in starters.items():
    replacement = (
        df[df['position'] == position]
        .sort_values('points', ascending=False)
        .head(slots * pool_size)
        ['points']
        .mean()
    )
    df.loc[df['position'] == position, 'vorp'] = df['points'] - replacement

№13 Re-rank and draft players according to their VORP rank

In [12]:
df['vorp_rank'] = df['vorp'].rank(method='average', ascending=False)
df.sort_values('vorp_rank').head(20)

Unnamed: 0,name,position,points,vorp,vorp_rank
0,Nikita Kucherov,RW,367.3,77.75,1.0
2,Leon Draisaitl,LW,329.7,51.875,2.0
1,Connor McDavid,C,346.2,49.6125,3.0
200,Andrei Vasilevskiy,G,437.1,38.2,4.0
100,Brent Burns,D,235.8,37.33125,5.0
101,Dustin Byfuglien,D,233.5,35.03125,6.0
102,Morgan Rielly,D,225.6,27.13125,7.0
6,Alex Ovechkin,LW,304.2,26.375,8.0
3,Patrick Kane,RW,313.2,23.65,9.0
103,John Carlson,D,221.1,22.63125,10.0


№14 Profit