Dump and Load
=============

OK, we're getting pretty good at pulling data and doing some processing on it (*NOTE: you probably want to take accuracy of the players into account as well, but worry about that later*), but how about using it?

To avoid having to pull and repull the data every time I want to do something, let's see if I can dump the data that I create from a single Tournament into a file, then load it back up. I'm guessing that JSON will be fine for this, since I'm already creating `dict`s anyway, so let's use that.

In [1]:
import json

from bs4 import BeautifulSoup
import requests

import wcf

In [2]:
tourney = wcf.Tournament(555)  # our old standby
tourney.load_all_games()

print(len(tourney.games))

pulling data from results.worldcurling.org...
71


OK, our games our loaded. This next bit is just copy-pasted from `parsing_trial_wcf.py`, since it's a good start and does everything that I need. We'll just run this, process everything, and we have our aggregate game data all set!

In [3]:
def determine_end_types(game):
    hammer = game.lsfe
    end_types = []
    for end in zip(*game.ends):
        types, hammer = determine_single_end(end, hammer)
        end_types.append(types)
    return [t for t in zip(*end_types)]


def determine_single_end(end, hammer):
    types = determine_types(end, hammer)
    new_hammer = update_hammer(end)
    new_hammer = new_hammer if new_hammer is not None else hammer
    return types, new_hammer


def determine_types(end, hammer):
    ''' Find curling outcome from a single end.'''
    if end[0] > 0 and hammer == 0:
        return 'score-with-hammer', 'blank'
    elif end[1] > 0 and hammer == 1:
        return 'blank', 'score-with-hammer'
    elif end[0] > 0 and hammer == 1:
        return 'steal', 'blank'
    elif end[1] > 0 and hammer == 0:
        return 'blank', 'steal'
    elif end[0] == end[1] == 0 and hammer == 0:
        return 'blank-with-hammer', 'blank'
    elif end[0] == end[1] == 0 and hammer == 1:
        return 'blank', 'blank-with-hammer'


def update_hammer(end):
    if end[0] > 0:
        return 1
    elif end[1] > 0:
        return 0


def get_aggregate(game, types):
    aggregate = []
    for team_ends, team_types, team in zip(game.ends, types, game.teams):
        data = get_team_aggregate(team_ends, team_types)
        data['team-name'] = team
        data['game-type'] = game.draw
        aggregate.append(data)
    aggregate[game.winner]['won'] = True
    return aggregate


def get_team_aggregate(ends, types):
    ''' Builds a dict with "meta-data" about the game for a single team.'''
    assert len(ends) == len(types)
    data = {'blank': 0, 'blank-with-hammer': 0, 'steal': 0,
            'score-with-hammer': 0, 'score-2+-with-hammer': 0,
            'team-name': '', 'total-ends': len(ends),
            'total-score': 0, 'stolen-points': 0, 'won': False}
    for points, type in zip(ends, types):
        data['total-score'] += points
        data[type] += 1
        if type == 'steal':
            data['stolen-points'] += points
        elif type == 'score-with-hammer' and points >= 2:
            data['score-2+-with-hammer'] += 1
    return data

In [4]:
t_data = []
for game in tourney:
    types = determine_end_types(game)
    meta = get_aggregate(game, types)
    t_data.append(meta)

In [5]:
t_data

[[{'blank': 4,
   'blank-with-hammer': 1,
   'game-type': 'Draw #1',
   'score-2+-with-hammer': 2,
   'score-with-hammer': 3,
   'steal': 2,
   'stolen-points': 3,
   'team-name': 'Sweden',
   'total-ends': 10,
   'total-score': 8,
   'won': True},
  {'blank': 6,
   'blank-with-hammer': 1,
   'game-type': 'Draw #1',
   'score-2+-with-hammer': 2,
   'score-with-hammer': 3,
   'steal': 0,
   'stolen-points': 0,
   'team-name': 'Japan',
   'total-ends': 10,
   'total-score': 5,
   'won': False}],
 [{'blank': 6,
   'blank-with-hammer': 0,
   'game-type': 'Draw #1',
   'score-2+-with-hammer': 1,
   'score-with-hammer': 2,
   'steal': 0,
   'stolen-points': 0,
   'team-name': 'Korea',
   'total-ends': 8,
   'total-score': 3,
   'won': False},
  {'blank': 2,
   'blank-with-hammer': 2,
   'game-type': 'Draw #1',
   'score-2+-with-hammer': 3,
   'score-with-hammer': 3,
   'steal': 1,
   'stolen-points': 3,
   'team-name': 'Scotland',
   'total-ends': 8,
   'total-score': 9,
   'won': True}],
 [

OK, let's see if this dump works...

In [7]:
with open('../data/555.json', 'w') as f:
    json.dump(t_data, f, indent=4)

In [8]:
with open('../data/555.json', 'r') as f:
    data = json.load(f)

print(len(data))
print(data[-1])

71
[{'team-name': 'Denmark', 'total-ends': 9, 'won': False, 'score-2+-with-hammer': 1, 'steal': 0, 'blank': 6, 'game-type': 'Final', 'blank-with-hammer': 1, 'total-score': 3, 'score-with-hammer': 2, 'stolen-points': 0}, {'team-name': 'Canada', 'total-ends': 9, 'won': True, 'score-2+-with-hammer': 2, 'steal': 0, 'blank': 3, 'game-type': 'Final', 'blank-with-hammer': 3, 'total-score': 5, 'score-with-hammer': 3, 'stolen-points': 0}]


It works!
---------

OK, I'll include dumping to a JSON file (and checking if one exists before pulling from the internet) into my API. That is good.

Accuracy
--------

Since that was pretty easy, let's work on pulling in the accuracy statistics from the players. One thing to note is that, in some games, the skip will not throw the hammer (or intentionally miss) to avoid giving the other team points or when it doesn't matter (last end), so assuming every player throws the same number of stones isn't 100% correct.

In addition, teams can also sub in a player, so I need to worry about that. Looking at the raw page, I know if the sub subbed in if the 'Alternate' section has a number for their accuracy. What I don't know is how many stones they threw, so it might just be easier to ignore the subs? Every one in `tournamentId=555` had either 100 or 75 for their accuracy, so they had to have thrown very few stones.

Let's just ignore them then. Easier for me!

In [9]:
raw_data = tourney._load_tourney_data()
soup = BeautifulSoup(raw_data.text, 'html.parser')

pulling data from results.worldcurling.org...


In [10]:
teams = soup.find_all('div', class_='col-md-6')
print(len(teams))

142


OK, good. We have 142 team data groups (71 games times 2 teams), so I can be pretty confident that we just pulled the team data and nothing else. To work through the next part, let's just focus on the first team (Sweden/Edin).

In [11]:
edin = teams[0]
print(edin)

<div class="col-md-6">
<div class="col-md-12">
<h5>Sweden</h5>
</div>
<div class="col-md-3">
            Skip
        </div>
<div class="col-md-8">
<a href="/Person/Details/3400">Niklas Edin</a>
</div>
<div class="col-md-1 col-md-pull-1 text-right">
83        </div>
<div class="col-md-3">
            Third
        </div>
<div class="col-md-8">
<a href="/Person/Details/5237">Oskar Eriksson</a>
</div>
<div class="col-md-1 col-md-pull-1 text-right">
85        </div>
<div class="col-md-3">
            Second
        </div>
<div class="col-md-8">
<a href="/Person/Details/4944">Kristian Lindström</a>
</div>
<div class="col-md-1 col-md-pull-1 text-right">
85        </div>
<div class="col-md-3">
            Lead
        </div>
<div class="col-md-8">
<a href="/Person/Details/5657">Christoffer Sundgren</a>
</div>
<div class="col-md-1 col-md-pull-1 text-right">
85        </div>
<div class="col-md-3">
            Alternate
        </div>
<div class="col-md-8">
<a href="/Person/Detail

In [12]:
positions = edin.find_all('div', class_='col-md-3')
names = edin.find_all('div', class_='col-md-8')
accuracies = edin.find_all('div', class_='col-md-pull-1')

for p, n, a in zip(positions, names, accuracies):
    print(p.text.strip(), n.text.strip(), a.text.strip())

Skip Niklas Edin 83
Third Oskar Eriksson 85
Second Kristian Lindström 85
Lead Christoffer Sundgren 85
Alternate Henrik Leek -


In [13]:
team = []
for p, n, a in zip(positions, names, accuracies):
    player = {'name': n.text.strip(), 'position': p.text.strip()}
    acc = a.text.strip().replace('-', '')
    acc = None if acc is '' else int(acc)
    player['accuracy'] = acc
    team.append(player)

print(team)

[{'position': 'Skip', 'name': 'Niklas Edin', 'accuracy': 83}, {'position': 'Third', 'name': 'Oskar Eriksson', 'accuracy': 85}, {'position': 'Second', 'name': 'Kristian Lindström', 'accuracy': 85}, {'position': 'Lead', 'name': 'Christoffer Sundgren', 'accuracy': 85}, {'position': 'Alternate', 'name': 'Henrik Leek', 'accuracy': None}]
