# Online Sports Betting: A Rigged Game

Lisandro Kaunitz, Shenjun Zhong & Javier Kreiner, the authors of **Beating the bookies with their own numbers - and how the online sports betting market is rigged**, attempt a novel and brilliant approach to sports betting. Rather than compete with the bookmakers predictions, Kunitz et al. attempt to beat the bookmakers by using their predictions against them. In the paper, they demonstrate how to take advantage of mispriced odds using the implicit information in boomakers' aggreagete odds and conclude it is possible. Bookmakers countered the authors success by limiting the size and type of bets they were allowed to place, leading to the second conclusion of the paper: even if a bettor has a consistently profitable strategy, the bookies are under no obligation to continue taking his or her bets. Betting exchanges use discriminatory practices against successful gamblers and online sports betting remains a long-term losing proposition.

Github: https://github.com/Lisandro79/BeatTheBookie/tree/master/src

Paper: https://www.researchgate.net/publication/320296375_Beating_the_bookies_with_their_own_numbers_-_and_how_the_online_sports_betting_market_is_rigged

Blog: https://www.lisandrokaunitz.com/index.php/en/category/beatthebookies-en/

In [2]:
import numpy as np
import pandas as pd
import gzip
import shutil
import matplotlib.pyplot as plt
import seaborn as sns

plt.rcParams['figure.figsize'] = (10, 5)
plt.style.use('fivethirtyeight')

In [3]:
#Unzip data into Kaggle notebook's working directory
PATH = "../input/beat-the-bookie-worldwide-football-dataset/"

for fname in ['closing_odds', 'odds_series', 'odds_series_b',
             'odds_series_b_matches', 'odds_series_matches']:
    with gzip.open(PATH + f'{fname}.csv.gz', 'rb') as f_in:
        with open(f'./{fname}.csv', 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)

In [4]:
#Matches with odds and bookie data.
close = pd.read_csv('./closing_odds.csv', index_col=0)

#utf-8 encoder won't work here
series_b_match = pd.read_csv('./odds_series_b_matches.csv', index_col=0, encoding='latin-1')
series_match = pd.read_csv('./odds_series_matches.csv', index_col=0, encoding='latin-1')

#Big files!
series = pd.read_csv('./odds_series.csv', index_col=0,  nrows=1000)
series_b = pd.read_csv('./odds_series_b.csv', index_col=0, nrows=1000)

## Exploration

Kaunitz, et al. collected historical closing odds of 479,440 soccer matches between 2005 and 2015 from 32 online bookmakers. The hard part of scraping the data off bookies' websites and cleaning it has been done for me. Thank goodness. 

In [25]:
close.info()

In [64]:
close.head()

These are the closing odds of the soccer matches set right before game time. Let's convert odds of a home win, away win, or draw to probability for each of the matches. 

$P = 1 / (1+odds)$

For example 3:1 odds is equivalent to 1/(3+1) or .25 probability.

I find that the probability of all possible outcomes for an average match adds up to ~0.8. Thought it would be closer to 
1.

In [69]:
#prob. rounded to nearest 1/100th
bin_denom = 100
eps = 1

close['avg_prob_home_win'] = np.floor((1/(eps+close['avg_odds_home_win']))*bin_denom)/bin_denom
close['avg_prob_away_win'] = np.floor((1/(eps+close['avg_odds_away_win']))*bin_denom)/bin_denom
close['avg_prob_draw'] = np.floor((1/(eps+close['avg_odds_draw']))*bin_denom)/bin_denom

#the probs. of all possible outcomes for a match add up to ~.8.
close['total_prob'] = close[['avg_prob_home_win', 'avg_prob_away_win', 'avg_prob_draw']].sum(axis=1)
round(close['total_prob'].mean(), 2)

All outcomes have right-skewed odds. Home-win odds center around 2:1 with higher variance than away-win odds. Away-win odds are tightly grouped around 3:1. Draw-odds are also centered around 3:1 but with much higher variance.

In [27]:
for odds in ['avg_odds_home_win', 'avg_odds_away_win', 'avg_odds_draw']:
    plt.figure(figsize=(9,1))
    sns.histplot(close[odds].clip(0,10)) #there are some really high odds
    plt.ylabel('')
plt.show()

The mean accuracy in the prediction of the soccer match result is the proportion of games ending in home team victory, draw, or away team victory for that bin. With this data, we can figure out that the consensus probability is a very good estimate of the underlying probability of an outcome.

A strategy intended to beat the bookmakers at predicting the game outcome (value betting, the commenest form of algorithmic betting) requires a more accurate model than the ones bookmakers have developed.

In [94]:
home_wins = close[(close.home_score > close.away_score)==1].groupby('avg_prob_home_win').mean().iloc[:,-6:-4]
away_wins = close[(close.home_score < close.away_score)==1].groupby('avg_prob_away_win').mean().iloc[:,-6:-4]
draws = close[(close.home_score == close.away_score)==1].groupby('avg_prob_draw').mean().iloc[:,-6:-4]

In [103]:
sns.scplot(home_wins, bins=25)


In [21]:
for odds in ['avg_prob_home_win', 'avg_prob_away_win', 'avg_prob_draw']:
    plt.figure(figsize=(9,3))
    sns.histplot(close[odds].clip(0,10))
    plt.ylabel('')
plt.show()