# Bayesian Bivariate Model

In [1]:
import sys

sys.path.append("../../")

import penaltyblog as pb



## Get data from football-data.co.uk

In [2]:
fb = pb.scrapers.FootballData("ENG Premier League", "2019-2020")
df = fb.get_fixtures()

df.head()

Unnamed: 0_level_0,date,datetime,season,competition,div,time,team_home,team_away,fthg,ftag,...,b365_cahh,b365_caha,pcahh,pcaha,max_cahh,max_caha,avg_cahh,avg_caha,goals_home,goals_away
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1565308800---liverpool---norwich,2019-08-09,2019-08-09 20:00:00,2019-2020,ENG Premier League,E0,20:00,Liverpool,Norwich,4,1,...,1.91,1.99,1.94,1.98,1.99,2.07,1.9,1.99,4,1
1565395200---bournemouth---sheffield_united,2019-08-10,2019-08-10 15:00:00,2019-2020,ENG Premier League,E0,15:00,Bournemouth,Sheffield United,1,1,...,1.95,1.95,1.98,1.95,2.0,1.96,1.96,1.92,1,1
1565395200---burnley---southampton,2019-08-10,2019-08-10 15:00:00,2019-2020,ENG Premier League,E0,15:00,Burnley,Southampton,3,0,...,1.87,2.03,1.89,2.03,1.9,2.07,1.86,2.02,3,0
1565395200---crystal_palace---everton,2019-08-10,2019-08-10 15:00:00,2019-2020,ENG Premier League,E0,15:00,Crystal Palace,Everton,0,0,...,1.82,2.08,1.97,1.96,2.03,2.08,1.96,1.93,0,0
1565395200---tottenham---aston_villa,2019-08-10,2019-08-10 17:30:00,2019-2020,ENG Premier League,E0,17:30,Tottenham,Aston Villa,3,1,...,2.1,1.7,2.18,1.77,2.21,1.87,2.08,1.8,3,1


## Train the Model

In [3]:
clf = pb.models.BayesianBivariateGoalModel(
    df["goals_home"], df["goals_away"], df["team_home"], df["team_away"]
)
clf.fit()

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [tau_att, atts_star, tau_def, def_star, tau_rho, rho, mu, eta]


Sampling 2 chains for 2_000 tune and 2_500 draw iterations (4_000 + 5_000 draws total) took 114 seconds.


## The model's parameters

In [5]:
clf

Module: Penaltyblog

Model: Bayesian Random Intercept

Number of parameters: 62
Team                 Attack               Defence              rho                 
--------------------------------------------------------------------------------
Arsenal              0.365                -0.093               -0.149              
Aston Villa          -0.174               0.539                -0.22               
Bournemouth          -0.835               0.172                0.05                
Brighton             -0.357               0.243                -0.261              
Burnley              0.009                0.274                -0.39               
Chelsea              0.368                -0.081               0.112               
Crystal Palace       -0.61                0.219                -0.417              
Everton              -0.469               -0.027               -0.007              
Leicester            0.738                -0.221               -0.225              

## Predict Match Outcomes

In [6]:
probs = clf.predict("Liverpool", "Wolves")
probs

Module: Penaltyblog

Class: FootballProbabilityGrid

Home Goal Expectation: 1.5174424821924677
Away Goal Expectation: 0.8928718395765348

Home Win: 0.5189164181292896
Draw: 0.25940159825895515
Away Win: 0.2216819835152744

### 1x2 Probabilities

In [7]:
probs.home_draw_away

[0.5189164181292896, 0.25940159825895515, 0.2216819835152744]

In [8]:
probs.home_win

0.5189164181292896

In [9]:
probs.draw

0.25940159825895515

In [10]:
probs.away_win

0.2216819835152744

### Probablity of Total Goals >1.5

In [11]:
probs.total_goals("over", 1.5)

0.6937978756309453

### Probability of Asian Handicap 1.5

In [12]:
probs.asian_handicap("home", 1.5)

0.2670081276116055

## Probability of both teams scoring

In [13]:
probs.both_teams_to_score

0.46103699853202423

## Train the model with more recent data weighted to be more important

In [14]:
weights = pb.models.dixon_coles_weights(df["date"], 0.001)

clf = pb.models.BayesianBivariateGoalModel(
    df["goals_home"], df["goals_away"], df["team_home"], df["team_away"], weights
)
clf.fit()

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [tau_att, atts_star, tau_def, def_star, tau_rho, rho, mu, eta]


Sampling 2 chains for 2_000 tune and 2_500 draw iterations (4_000 + 5_000 draws total) took 103 seconds.


In [15]:
clf

Module: Penaltyblog

Model: Bayesian Random Intercept

Number of parameters: 62
Team                 Attack               Defence              rho                 
--------------------------------------------------------------------------------
Arsenal              0.379                -0.115               -0.147              
Aston Villa          -0.198               0.509                -0.218              
Bournemouth          -0.81                0.148                0.061               
Brighton             -0.401               0.242                -0.251              
Burnley              -0.067               0.209                -0.345              
Chelsea              0.355                -0.063               0.111               
Crystal Palace       -0.623               0.234                -0.396              
Everton              -0.435               -0.022               -0.028              
Leicester            0.617                -0.22                -0.161              