### Euro 2020 Prediction Game

For the euro 2020 prediction game my idea is to do all the score keeping in python. Read the excel data into a python class. The class would contain some nicely organised data structure to store the predictions. 


Idea:

- class Bracket:
    - load from excel; phase I and phase II are two seperate sheets
    - group_stage dict {mid:score}
    - knockout_phase:
        - phase I:
            - l16: list of tuples ('Team', 'rank')
            - sf: list
            - f: list
            - bonus: list
        - phase II:
            - qf: list
            - sf: list
            - f: list

Checklist to get done:
- make sure that score can be computed correctly from comparison of livescore bracket and player bracket
- finalise classes
- pretty print html format for scoring table
- pretty print html format for player scores
- figure out flask and how to host
- make use of metadata.yml for scoring system points
- parse astericks in knockout phase
- prepare phase II excel sheet
- fix round of 16 scoring in livescore scraper somehow need to know where each team finished in their group. Might be easier to just 
- rewrite of excel parser to parse directly from first sheet
- fix astericks scoring for knockout phase in excel and python
- set then unordered tuple then ordered for teams
- 


In [128]:
from collections import Counter
import pandas as pd
import os
import yaml

class Score():
    def __init__(self, mid, score, teams=None):
        self.mid = mid
        self.home = None
        self.away = None
        self.score = None
        self.teams = None
        
        if teams:
            self.teams = tuple(teams)
        if isinstance(score, str):
            if '?' in score:
                # handling livescore future game score
                print(f'No score yet for {self.teams}')
                return
            else:
                score = score.replace(' ','').split('-')
                
        if isinstance(score, (list, tuple)) and len(score)==2:
            self.score = tuple(score)
            self.home = int(score[0])
            self.away = int(score[1])
        else:
            raise TypeError('unknown score format')
            
        # 1 - home_win; 0 - draw; 2 - away_win
        self.outcome = (self.home != self.away) + (self.away>self.home)
        return 
    
    def __str__(self):
        if self.score:
            if self.teams:
                return f"{self.teams[0]} {'-'.join(self.score)} {self.teams[1]}"
            else:
                return f"{'-'.join(self.score)}"
        else:
            if self.teams:
                return f'{self.teams[0]} ? - ? {self.teams[1]}'
            else:
                return f'{self.mid}: ? - ?'
    
    @property
    def matchup(self):
        if self.teams:
            return f'{self.teams[0]} vs {self.teams[1]}'
    
    @property
    def winner(self):
        if self.outcome and self.teams:
            return self.teams[self.outcome-1]
        else:
            return None
    
    @property
    def goal_count(self):
        if self.teams and self.score:
            return {team:int(goals) for team, goals in zip(self.teams, self.score)}
        else:
            return None
        
    def compute(self, other, outcome=5, result=15):
         # TEMPORARY LINE DONT FORGET TO REMOVE
        other.score = tuple([s.strip() for s in other.score])
        if self.teams and other.teams and (self.teams != other.teams):
            return 0
        if self.score == other.score:
            return result
        elif self.outcome == other.outcome:
            return outcome
        else:
            return 0


def score_compare(a, b, outcome=5, result=15):
    '''
    compare scores a & b
    '''
    # TEMPORARY LINE DONT FORGET TO REMOVE
    b.score = tuple([s.strip() for s in b.score])
    if a.teams and b.teams and (a.teams != b.teams):
        return 0
    if a.score == b.score:
        return result
    elif a.outcome == b.outcome:
        return outcome
    else:
        return 0

def team_compare(a, b, qualified=10, ordering=0):
    '''
    compare teams in a to teams in b and score points accordingly
    
    a and b must be sets of teams
    '''
    pts = 0
    correct_qualified = len(a.intersection(b))
    
    if ordering and a and isinstance(a, tuple):
        correct_ordering = correct_qualified
        a_teams = set([t[0] for t in a])
        b_teams = set([t[0] for t in b])
        correct_qualified = len(a_teams.intersection(b_teams))
        pts += correct_ordering * ordering
        
    pts += correct_qualified * qualified
    
    return pts
        
        
class Stage():
    def __init__(self, name, matches=None, teams=None, outcome=None, result=None, qualified=None, ordering=None):
        '''
        matches - a dict of matches and the corresponding scores scores could be in string or Score format
        teams - a list of teams who qualified for this stage
        '''
        self.name = name
        self.matches = None
        self.teams = None
        if matches:
            test_match = list(matches.values())[0]
            if isinstance(test_match, str):
                self.matches = {k:Score(k, v) for k,v in matches.items()}
            elif isinstance(test_match, Score):
                self.matches = matches
            else:
                raise TypeError('Unrecognised matches format')
            self.outcome = outcome or 0
            self.result = result or 0
            if not teams:
                match_teams = []
                for match in self.matches.values():
                    if match.teams:
                        match_teams += list(match.teams)
                self.teams = set(match_teams) or None
        if teams:
            self.teams = set(teams)
            self.qualified = qualified or 0
            self.ordering = ordering or 0
        
    @property
    def winners(self):
        if self.matches:
            return set([match.winner for match in self.matches.values()])
        else:
            return None
    
    @property
    def highest_scoring_team(self):
        count = Counter()
        if self.matches:
            for match in self.matches.values():
                count.update(match.goal_count)
        if len(count):
            return count.most_common()[0][0]
        else:
            return None
            
            
    def compute(self, other):
        points = 0
        if self.matches:
            missing_matches = set(self.matches.keys()) - set(other.matches.keys())
            if missing_matches:
                print(f'Warning missing matches! {missing_matches}')
            for mid, match in self.matches.items():
                points += match.compute(other.matches[mid])
        
        if self.teams:
            points += team_compare(self.teams, other.teams)
            
        return points
    
    def get_scores(self, other):
        matches = {}
        if self.matches:
            missing_matches = set(self.matches.keys()) - set(other.matches.keys())
            if missing_matches:
                print(f'Warning missing matches! {missing_matches}')
            for mid, match in self.matches.items():
                matches[other.matches[mid].matchup] = match.__str__()
        
        return matches

            
class Bracket():
    
    def __init__(self, name, workdir, phase=1):
        '''
        load bracket from excel or pkl
        
        maybe specify name and dir or something along those lines
        '''
        self.name = name
        self.dat = {}
        if phase == 1:
            self.phase = 'phase I'
            pkl_file_1 = os.path.join(workdir,'phase_I', name + '.pkl')
            xlsx_file_1 = os.path.join(workdir,'phase_I','CxFPoolsEuro2020_PhaseI_'+ name + '.xlsx')
            if os.path.exists(pkl_file_1):
                self.dat = pkl_load(pkl_file_1)
            elif os.path.exists(xlsx_file_1):
                dat = pd.read_excel(xlsx_file_1, sheet_name='INTERNAL_USE_ONLY').iloc[:,0].values
                self.dat['Group Stage'] = Stage(name='Group Stage',matches={i+1:m for i,m in enumerate(dat[1:37])})
                self.dat['Round of 16'] = Stage(name='Round of 16', teams=list(zip(dat[38:54], [1,2]*4 + [3]*4)))
                self.dat['Semi-finals']= Stage(name='Semi-final', teams=list(dat[55:59]))
                self.dat['Final'] = Stage(name='Final', teams=list(dat[60:62]))
                self.dat['Winner'] = Stage(name='Winner', teams=list(dat[63:64]))
                self.dat['Bonus'] = Stage(name='Bonus', teams=list(dat[65:68]))
            else:
                print(f'No valid Phase 1 file found for {name}')
        
        if phase == 2:
            self.phase = 'phase II'
            pkl_file_2 = os.path.join(workdir,'phase_II', name + '.pkl')
            xlsx_file_2 = os.path.join(workdir,'phase_II', 'CxFPoolsEuro2020_PhaseII_'+ name + '.xlsx')
            if os.path.exists(pkl_file_2):
                self.dat['phase II'] = pkl_load(pkl_file_2)
            elif os.path.exists(xlsx_file_2):
                dat = pd.read_excel(xlsx_file_2, sheet_name='INTERNAL_USE_ONLY').iloc[:,0].values
                # TODO add phase 2 once sheet is complete
            else:
                #print(f'No valid Phase 2 file found for {name}')
                return None
    
    
    @classmethod
    def load_dual_phase(cls, participant, workdir):
        phase1 = cls(participant, workdir, phase=1)
        phase2 = cls(participant, workdir, phase=2)
        return phase1, phase2
        
    
    def compute(self, other):
        points = {}
        for key, stage in self.dat.items():
            points[(self.phase, key)] = stage.compute(other.dat[key])
            
        return points
    
    def get_scores(self, other):
        if 'Group Stage' in self.dat:
            return self.dat['Group Stage'].get_scores(other.dat['Group Stage'])
    
class ActualBracket(Bracket):
    def __init__(self, comp_url):
        self.name = 'actual'
        self.phase = 0
        self.dat = scrape_competition_from_livescore(comp_url)
        self.dat['Winner'] = Stage(name='Winner', teams = self.dat['Final'].winners)
        bonus_1 = self.dat['Group Stage'].highest_scoring_team
        bonus_2 = None
        bonus_3 = None
        #load bonus 2 and 3 from metadata.yml
        self.dat['Bonus'] = Stage(name='Bonus', teams=[bonus_1, bonus_2, bonus_3])
        
            
class Tournament():
    def __init__(self, workdir, comp_url):
        with open(os.path.join(workdir,'metadata.yml')) as f:
            config = yaml.safe_load(f)
        self.participants = [p.replace(' ','') for p in config['participants']]
        self.scoring = config['scoring']
        self.workdir = workdir
        self.brackets = {}
        for participant in self.participants:
            self.brackets[participant] = Bracket.load_dual_phase(participant, self.workdir)
        self.actual = ActualBracket(comp_url)
        self.comp_url = comp_url
    
    def reload(self):
        self.actual = ActualBracket(comp_url)
        
    @property
    def standings(self):
        points = {}
        for name, (phase1, phase2) in self.brackets.items():
            points[name] = {}
            points[name].update(phase1.compute(self.actual))
            points[name].update(phase2.compute(self.actual))
        pd.DataFrame.from_dict(points, orient='index')
        return points
    
    @property
    def predicted_scores(self):
        scores = {}
        for name, (phase1, phase2) in self.brackets.items():
            scores[name] = phase1.get_scores(self.actual)
        return scores

In [142]:
tournament = Tournament('/Users/lukeaarohi/pyfiles/EURO2020/', 'https://www.livescores.com/soccer/euro-2020/')

fetching markup from https://www.livescores.com/soccer/euro-2020/
fetching markup from https://www.livescores.com/soccer/euro-2020/group-a/results/all/
fetching markup from https://www.livescores.com/soccer/euro-2020/group-a/fixtures/all/
fetching markup from https://www.livescores.com/soccer/euro-2020/group-b/results/all/
fetching markup from https://www.livescores.com/soccer/euro-2020/group-b/fixtures/all/
fetching markup from https://www.livescores.com/soccer/euro-2020/group-c/results/all/
fetching markup from https://www.livescores.com/soccer/euro-2020/group-c/fixtures/all/
fetching markup from https://www.livescores.com/soccer/euro-2020/group-d/results/all/
fetching markup from https://www.livescores.com/soccer/euro-2020/group-d/fixtures/all/
fetching markup from https://www.livescores.com/soccer/euro-2020/group-e/results/all/
fetching markup from https://www.livescores.com/soccer/euro-2020/group-e/fixtures/all/
fetching markup from https://www.livescores.com/soccer/euro-2020/grou

In [144]:
res = tournament.predicted_scores
pd.DataFrame.from_dict(res).T.sort_index()

Unnamed: 0,Turkey vs Italy,Wales vs Switzerland,Denmark vs Finland,Belgium vs Russia,England vs Croatia,Austria vs North Macedonia,Netherlands vs Ukraine,Scotland vs Czech Republic,Poland vs Slovakia,Spain vs Sweden,...,North Macedonia vs Netherlands,Ukraine vs Austria,Russia vs Denmark,Finland vs Belgium,Croatia vs Scotland,Czech Republic vs England,Sweden vs Poland,Slovakia vs Spain,Portugal vs France,Germany vs Hungary
AdamAlGededi,1-2,1-1,1-0,1-0,2-0,1-0,1-0,1-1,2-0,2-0,...,1-3,2-1,2-1,0-5,2-0,0-2,1-1,0-3,0-2,2-0
AdamMicallef,1-2,0-0,3-0,3-1,1-0,4-2,1-1,1-1,1-0,1-0,...,1-5,0-0,0-2,0-5,2-0,0-2,0-0,0-2,0-0,3-1
BenCassarGalea,0-2,1-1,2-0,2-0,2-1,1-0,3-0,1-1,1-0,2-0,...,0-3,1-1,1-2,0-2,2-0,1-3,1-1,1-3,1-2,2-0
BenjiFlynn,1-1,1-3,2-0,3-1,2-2,2-1,1-1,2-1,2-0,2-1,...,0-3,2-1,1-1,0-4,2-1,0-3,1-2,0-3,1-2,3-0
BeppeDarmanin,1-1,1-2,2-0,3-1,1-1,2-0,2-2,1-2,2-1,3-0,...,0-3,2-1,2-2,0-4,2-1,1-3,1-2,0-3,1-2,3-1
DanielCutajar,0-1,2-0,1-1,3-1,1-1,1-1,3-0,2-2,3-1,1-1,...,1-4,1-1,2-2,0-3,1-1,1-3,0-0,0-3,1-2,4-0
DavidMifsud,1-2,0-1,2-0,3-1,2-1,1-0,2-0,1-0,2-0,3-1,...,0-2,2-0,1-2,0-3,2-0,0-2,1-1,0-2,2-2,3-0
EdFenechAdami,2-1,1-1,2-0,2-0,2-1,1-0,2-0,1-1,1-0,2-0,...,1-2,1-1,1-1,1-2,3-0,1-2,0-0,0-2,1-2,2-0
JakeCamilleri,0-1,1-1,2-0,4-1,2-1,1-0,3-1,1-2,0-0,2-0,...,0-2,0-3,1-3,0-2,1-0,1-3,1-1,0-4,1-3,3-0
JeremySpiteriBailey,0-1,0-2,1-1,1-0,2-1,0-1,2-0,2-2,1-0,2-1,...,0-2,1-0,1-1,1-4,0-0,1-1,0-2,1-3,2-3,1-0


In [108]:
df = pd.DataFrame.from_dict(res, orient='index')

In [126]:
tournament.brackets['LukeAarohi'][0].dat['Group Stage']

<__main__.Stage at 0x10bd334c0>

In [82]:
from bs4 import BeautifulSoup
import requests, os, re
from urllib import parse
import random

gen_score = lambda : f'{random.randint(0,3)} - {random.randint(0,3)}'

def from_livescore(x):
    x = x.replace('-',' ').title()
    if x.startswith('Group'):
        return 'Group Stage'
    return x

euro_url = 'https://www.livescores.com/soccer/euro-2020/'

ls_id_map = {80596: 1,
             80595: 2,
             80737: 3,
             80736: 4,
             80742: 5,
             81035: 6,
             81036: 7,
             80743: 8,
             80748: 9,
             80749: 10,
             80612: 11,
             80611: 12,
             80738: 13,
             80598: 14,
             80597: 15,
             81037: 16,
             80739: 17,
             81038: 18,
             80750: 19,
             80745: 20,
             80744: 21,
             80613: 22,
             80614: 23,
             80751: 24,
             80599: 25,
             80600: 26,
             81039: 27,
             81040: 28,
             80741: 29,
             80740: 30,
             80747: 31,
             80746: 32,
             80753: 33,
             80752: 34,
             80615: 35,
             80616: 36}

ls_order_map = {
             80042: (2, 2),
             80043: (1, 2),
             80046: (1, 3),
             80044: (1, 3),
             80045: (2, 2),
             80047: (1, 3),
             80048: (1, 2),
             80049: (1, 3)
             }


def fetch_beautiful_markup(url):
    print('fetching markup from ' + url)
    
    # try catching all possible http errors
    try :
        livescore_html = requests.get(url)
    except Exception as e :
        return print('An error occured as: ', e)

    parsed_markup = BeautifulSoup(livescore_html.text, 'html.parser')
    
    return parsed_markup

def extract_scores(parsed_markup, stage=None):
    # dictionary to contain score
    scores = {}

    # scrape needed data from the parsed markup
    for element in parsed_markup.find_all("div", "row-gray") :
        
        match_name_element = element.find(attrs={"class": "scorelink"})
        ls_id = int(element.get('data-eid'))
        mid = ls_id_map.get(ls_id, ls_id)
        order = ls_order_map.get(ls_id, None)

        if match_name_element is not None :
            # this means the match is about to be played
            match_stage, matchup = match_name_element.get('href').split('/')[3:5]
            match_stage = from_livescore(match_stage)
            if match_stage not in scores: scores[match_stage] = {}
            home_team = from_livescore(matchup.split('-vs-')[0].strip())
            away_team = from_livescore(matchup.split('-vs-')[1].strip())
            teams = (home_team, away_team)
            if order:
                teams = tuple(zip(teams, order))
            score = element.find("div", "sco").get_text().strip()
            score = gen_score()

            # add our data to our dictionary
            scores[match_stage][mid] = Score(mid, score, teams)
        elif stage:
            if stage not in scores: scores[stage] = {}
            # we need to use a different method to get our data
            home_team = '-'.join(element.find("div", "tright").get_text().strip().split(" "))
            away_team = '-'.join(element.find(attrs={"class": "ply name"}).get_text().strip().split(" "))

            score = element.find("div", "sco").get_text().strip()
            score = gen_score()

            teams = (home_team, away_team)
            if order:
                teams = tuple(zip(teams, order))

            # add our data to our dictionary
            scores[stage][mid] = Score(mid, score, teams)

    return scores

def extract_competition_stages(markup, comp):
    stages = {}
    selected_cat = markup.find('aside', 'left-bar').find('ul','buttons btn-light').find('a',{'class':'selected cat'})
    stage_refs = selected_cat.parent.find_all('a', attrs={'href':re.compile(comp+'.*/')})
    for g in stage_refs:
        g_url = g.get('href')
        g_name = g.get('title')
        stages[g_name] = g_url
    
    return stages
    

def scrape_scores_from_livescore(url, stage) :
    
    parsed_markup = fetch_beautiful_markup(url)
    scores = extract_scores(parsed_markup, stage)
    return scores


def scrape_competition_from_livescore(comp_url):
    res = {}
    comp = parse.urlparse(comp_url).path
    comp_markup = fetch_beautiful_markup(comp_url)
    comp_scores = extract_scores(comp_markup)
    comp_stages = extract_competition_stages(comp_markup, comp)
    
    for g_name, g_url in comp_stages.items():
        g_path = parse.urljoin(comp_url, g_url)
        for what in ['results/all/', 'fixtures/all/']:
            what_path = parse.urljoin(g_path, what)
            g_what = scrape_scores_from_livescore(what_path, g_name)
            for stage, stage_scores in g_what.items():
                if stage not in comp_scores:
                    comp_scores[stage] = stage_scores
                else:
                    for mid, score in stage_scores.items():
                        if mid not in comp_scores:
                            comp_scores[stage][mid] = score
    
    for stage, stage_scores in comp_scores.items():
        res[stage] = Stage(stage, stage_scores)
        
    return res

In [3]:
my_bracket = Bracket('LukeAarohi','./')

No valid Phase 2 file found for LukeAarohi


In [6]:
euro_url = 'https://www.livescores.com/soccer/euro-2020/'
scoreboard = scrape_competition_from_livescore(euro_url)

fetching markup from https://www.livescores.com/soccer/euro-2020/
fetching markup from https://www.livescores.com/soccer/euro-2020/group-a/results/all/
fetching markup from https://www.livescores.com/soccer/euro-2020/group-a/fixtures/all/
fetching markup from https://www.livescores.com/soccer/euro-2020/group-b/results/all/
fetching markup from https://www.livescores.com/soccer/euro-2020/group-b/fixtures/all/
fetching markup from https://www.livescores.com/soccer/euro-2020/group-c/results/all/
fetching markup from https://www.livescores.com/soccer/euro-2020/group-c/fixtures/all/
fetching markup from https://www.livescores.com/soccer/euro-2020/group-d/results/all/
fetching markup from https://www.livescores.com/soccer/euro-2020/group-d/fixtures/all/
fetching markup from https://www.livescores.com/soccer/euro-2020/group-e/results/all/
fetching markup from https://www.livescores.com/soccer/euro-2020/group-e/fixtures/all/
fetching markup from https://www.livescores.com/soccer/euro-2020/grou

In [529]:
scoreboard['Winner'] = Stage(name='Winner', teams = ['France'])

In [522]:
list(scoreboard['Final'].matches.values())[0].__str__()

'Winner-SF-1 1 - 0 Winner-SF-2'

In [500]:
for k, m1 in my_bracket.dat['phase I']['Group Stage'].matches.items():
    m2 = scoreboard['Group Stage'].matches[k]
    print(m1 , m2)

1: 1-0 Turkey 1-0 Italy
2: 0-1 Wales 0-1 Switzerland
3: 2-0 Denmark 2-2 Finland
4: 1-0 Belgium 1-2 Russia
5: 3-0 England 3-2 Croatia
6: 1-1 Austria 0-0 North Macedonia
7: 1-1 Netherlands 3-1 Ukraine
8: 1-2 Scotland 2-1 Czech Republic
9: 2-0 Poland 1-2 Slovakia
10: 3-0 Spain 0-2 Sweden
11: 0-1 Hungary 2-1 Portugal
12: 1-1 France 2-0 Germany
13: 0-2 Finland 3-3 Russia
14: 2-1 Turkey 1-3 Wales
15: 0-0 Italy 2-1 Switzerland
16: 2-0 Ukraine 0-2 North Macedonia
17: 1-2 Denmark 3-3 Belgium
18: 1-0 Netherlands 2-1 Austria
19: 1-0 Sweden 3-1 Slovakia
20: 1-0 Croatia 0-0 Czech Republic
21: 3-1 England 2-0 Scotland
22: 0-4 Hungary 2-0 France
23: 1-2 Portugal 3-0 Germany
24: 2-1 Spain 2-0 Poland
25: 1-1 Switzerland 2-1 Turkey
26: 1-0 Italy 0-3 Wales
27: 0-2 North Macedonia 1-2 Netherlands
28: 2-1 Ukraine 1-1 Austria
29: 1-2 Russia 2-3 Denmark
30: 0-3 Finland 3-2 Belgium
31: 1-1 Croatia 2-1 Scotland
32: 0-2 Czech Republic 1-0 England
33: 0-1 Sweden 2-2 Poland
34: 0-3 Slovakia 1-0 Spain
35: 0-2 Port

livescores.com structuring:


example url: https://www.livescores.com/soccer/euro-2020/group-a/results/all/
url structure:
    tournament base: https://www.livescores.com/soccer/euro-2020


In [7]:
scoreboard

{'Group Stage': <__main__.Stage at 0x10a44dd30>,
 'Round of 16': <__main__.Stage at 0x10aec1c40>,
 'Quarter-finals': <__main__.Stage at 0x10aec1f40>,
 'Semi-finals': <__main__.Stage at 0x10aec1d90>,
 'Final': <__main__.Stage at 0x10af6cd00>}

In [65]:
class Tester():
    def __init__(self, a):
        self.a = a
        
    @classmethod
    def two_test(cls, a, b):
        at = cls(a)
        bt = cls(b)
        return at, bt
    
    @classmethod
    def load(self,a):
        self.a = a
        return self
    

In [13]:
t1, t2 =Tester.two_test(1,2)

In [14]:
t0 = Tester(1)

In [68]:
a = Tester.load(5)

In [70]:
a.a

5