<div style="width:90%; padding-left:5%">
<h1> Designing and testing sport betting strategies </h1>
<hr>

<div style="padding:5px; margin-top:10px; border:1px solid grey;">
<h5> IMPORTANT NOTE FOR THE GRADER: </h5>
<p>
The word count of this file goes a little above 2000 because of the HTML used. However, the text itself has an acceptable length.
</p>
</div>

<br>
<p>
Welcome! The purpose of this tutorial is to provide a skeleton which we can use to design and test sport betting strategies.
</p>
<p>
We will speak about sport betting strategies generally, but the code will be based on tennis. <br>
We will restrict ourselves to low-frequency betting (i.e. we do not bet *during* a game)
</p>
<p>
The tutorial will be divided in 3 parts:
<ul>
    <li> Downloading and formatting the data </li>
    <li> Designing a generic way to write and test our betters </li>
    <li> A few attempts of betters, discussion of the results.</li>
</ul>
</p>
<p>
These tasks will be described in detail through this tutorial.
</p>
<div style="padding:5px; margin-top:10px; border:1px dashed grey;">
<h5> Warning: </h5>
<p>
Designing a good or useful automatic better goes far beyond the scope of this tutorial. The odds are designed so that it is not possible to beat the bank with simple linear models. The main goal of this tutorial is to provide some modular code that could be used to play around or design betting strategies.
</p>
</div>

<p>
We will use pandas in this tutorial. <br>
<i> >&nbsp; pip install pandas </i>
</p>
<p>
We will analyse excel files. Pandas need another library to read them: <br>
 <i> >&nbsp; pip install xlrd </i>
</p>
<p>    
<strong>INFO </strong> If you do not have pandas or xlrd installed, 
you can uncomment the line(s) and run the cell below to install what you need:
</p>
<hr>
</div>


In [1]:
# imports

import os, urllib, random, zipfile

# UNCOMMENT BELOW to install pandas
#os.system("pip install pandas")

# UNCOMMENT BELOW to install xlrd ( necesarry to use pd.read_excel() )
#os.system("pip install xlrd")    

import pandas as pd

<div style="width:90%; padding-left:5%">
    <hr>
    <h3> 1) Downloading and formatting the data </h3>

    <p>
    The data used in this tutorial can be found at url http://www.tennis-data.co.uk/alldata.php .
    </p>
    <p>
    We will use the results of 5 entire years: 2013 to 2017.
    The individual files for one specific year can be found at url http://www.tennis-data.co.uk/201X/201X.zip
    </p>
    
    <p>
    We will use the code snippet below to download and extract, using the following standard libraries:
    <ul>
        <li> urllib to download the data </li>
        <li> zipfile to extract it </li>
     
    </p>
<hr>

</div>


In [2]:
# helper to download and unzip the data. (If the data already exists, replace it) 
def download_data():
    years = range(2013, 2018)
    urls = ["http://www.tennis-data.co.uk/" + str(year) + "/" + str(year) + ".zip" for year in years]
    local_zip_names = [str(year) + ".zip" for year in years]
    local_names = [ "./" + str(year) + ".xlsx" for year in years]

    # download and unzip the data 
    for url, zip_name, name in zip(urls, local_zip_names, local_names):
    
        # download
        urllib.request.urlretrieve(url, zip_name)
    
        # unzip
        zip_file = zipfile.ZipFile(zip_name, 'r')
        zip_file.extractall()
        zip_file.close()

print("downloading and extracting...")
download_data()
print("done")
    

downloading and extracting...
done


<div style="width:90%; padding-left:5%">
<hr>
<p>
Now, it is time to load the dataframes in memory with pandas. <br>
This can be easily done with the pd.read_excel(url, **options) function. In our case, we can leave all the optional parameters to their default value.
</p>
We also need to separate our data in two disjoint groups: the training set and the validation set. 
To make this tutorial realistic, we will suppose that we try to guess the results of the year 2017 given the 4 previous years.

<hr>
</div>



In [3]:
df_2013 = pd.read_excel("./2013.xlsx")
df_2014 = pd.read_excel("./2014.xlsx")
df_2015 = pd.read_excel("./2015.xlsx")
df_2016 = pd.read_excel("./2016.xlsx")
df_2017 = pd.read_excel("./2017.xlsx")

# we will alias our datasets 'training' and 'validation'

raw_training_df = pd.concat((df_2013, df_2014, df_2015, df_2016))
raw_validation_df = df_2017


<div style="width:90%; padding-left:5%">

<hr>
We can now print the data and the name of the columns, to take a look at what is provided:
<hr>

</div>

In [4]:
print(raw_training_df.head())
print( 3 * "\n==================================================================================================================" + "\n")
print(raw_validation_df.head())
print( 3 * "\n==================================================================================================================" + "\n")
print(raw_validation_df.columns)


   ATP  AvgL  AvgW  B365L  B365W  Best of    Comment    Court       Date  \
0    1  2.78  1.42   3.00   1.36        3  Completed  Outdoor 2012-12-31   
1    1  2.05  1.73   2.20   1.61        3  Completed  Outdoor 2012-12-31   
2    1  3.58  1.28   3.75   1.25        3  Completed  Outdoor 2012-12-31   
3    1  7.76  1.08   9.00   1.07        3  Completed  Outdoor 2012-12-31   
4    1  1.85  1.88   1.80   1.90        3  Completed  Outdoor 2013-01-01   

    EXL  ...                Tournament   W1   W2   W3  W4  W5    WPts  WRank  \
0  2.65  ...    Brisbane International  6.0  6.0  NaN NaN NaN  1215.0   28.0   
1  2.00  ...    Brisbane International  6.0  2.0  6.0 NaN NaN   927.0   41.0   
2  3.75  ...    Brisbane International  7.0  6.0  NaN NaN NaN  1830.0   19.0   
3  8.00  ...    Brisbane International  6.0  6.0  NaN NaN NaN  1070.0   36.0   
4  1.87  ...    Brisbane International  6.0  6.0  NaN NaN NaN   897.0   43.0   

         Winner  Wsets  
0      Mayer F.    2.0  
1   Nieminen

<div style="width:90%; padding-left:5%">
<hr>
<h2> General explanation </h2>
<p>
Before going any further, it is time to provide explanations about what we are going to do.
</p>

<h5> What is a tennis better ? </h5>


<p>
It is a machine which decides whether or not to bet, given the informations about the game.

Formally,in this tutorial, a better is any class implementing a method <strong><i>decide(...) <i></strong> which takes as input the informations of one future game:

<ul>
    <li> The players </li>
    <li> The odds </li>
    <li> The game metadata (tournament, date, ...) <i>&nbsp;(optional)</i> </li>
</ul>
And outputs two values: 
<ul>
    <li> The player on who to bet </li>
    <li> The amount of the bet (0 when we do not want to bet) </li>
</ul>
<div style="padding:5px; border:1px dashed grey;">
<i> Note about the bet amount: </i> <br>
 Betters typically have a fixed maximum bet amount. We will set this value to 1 in our simulations: the bet amount will always be between 0 and 1. 
</div>

<p>
This machine might be trained with some data. We will put this training phase in the constructor of our bot. The training data must be different from the testing data: it makes no sense to call decide(...) with some training data.
</p>


<h5>
Once implemented, how can we test our better?
</h5>

<p>
A very good question! This is so important to test our betters that the testing will be implemented before the betters. 
</p>
<p>
    We will make a test which is generic to any decide(...) function. <br>
    We need some testing data: as mentionned, we will use the 2017 records. <br>


<h5> OK, and how can we implement a good better? </h5>
<p>
 This is a question that many people have addressed!
 More explanations and details will be provided later.
<p>

<hr>
</div>


<div style="width:90%; padding-left:5%">
<hr>
    <h4> Formatting the data </h4>
    
    <p>
    Now that we have a brief idea of the project, it is time to reformat the data to make it easier to use.
    </p>
    <p>
    First, we will filter the useful rows: <br>
    In this dataset, a row is relevent if and only if its value 'Comment' equals 'Completed'.
    </p> 
    <p>
    Now we need to decide the informations that we are going to use.
    We will ignore some informations:
    </p>
        <ul>
            <li><i>The location and name of the tournament</i>: 
            <p> Tournaments can be identified with their ATP number, which is sufficient for our purpose.<p> </li>
            <li><i>The odds provided by some platforms</i>: 
            <p> There are several betting platforms (bet365, PlayStation, ...). For this project, we will use the odds from the <i><strong>'Betfair Exchange' (EX)</strong></i> platform. We will compare the results with the average odd (<strong>*Avg*</strong>) and the max odd (<strong>*Max*</strong>) from the most important platforms. </p>
            
<div style="padding:5px; margin-top:10px; border:1px dashed grey;">
    <h5> Why do we choose Exchange ? </h5>
    <p>
    We can observe that the odds from the 'Exchange' platform are notably lower than the other ones. The reason is that Exchange provides a betting API, available in many languages (including python). Since this platform is used mostly by robots, the bank had to adjust the odds to get a positive average income. 
    </p>
</div>
</li>
</ul>

<p>
    Now, we will add a few rows that will be useful :
</p>

<ul>
    <li> <i> The player names </i>
        <p> It might be useful to group both player names in a single column, in order to identify a game by its pair of players. Hence, we create a new columns "Names" containing values "name1/name2", where "name1" and "name2" are sorted.</p>
    </li>
    <li> <i> The result of the game </i>
        <p> We already have the result of the game, but we will add an indicator which equals '1' if "name1" (the first sorted name) wins. </p>
    </li>   
</ul>
<hr>
</div>

In [5]:
# now, we will format our data to make it easier to read and use.

# We will restrict to one betting platform for this tutorial (EXchange). 
# We choose EXchange since it provides a good betting API.
# We want to see if our 'better' could be used in the real world.


# helper: given a row from our dataframes, combine and sort the player names "player1/player2"
def combine_names(row):
    sorted_names = sorted([row["Winner"], row["Loser"]])
    return sorted_names[0] + "/" + sorted_names[1]

def format_df(raw_df):
    
    result = raw_df.copy()
    
    # first, we create a new column "player1/player2" with sorted names.  
    # We will use this value to select the games played by the same pair of players.
    result["Names"] = result.apply(combine_names, axis=1);
    
    # then, we create a colum with the value '1' if 'player1' (the first sorted name) has won.
    # we will use this value to compute winning rates between players.
    result["Name1Wins"] = result.apply(lambda row: 1 if row["Names"].split("/")[0] == row["Winner"] else 0, axis=1)
    
    
    # we filter the columns which have another status than completed
    # then, we drop the data that we will not use
    return result.loc[raw_df['Comment'] == "Completed"].drop(
            columns=["Location", "Tournament", "Series", "Best of", "Comment", "B365W", "B365L", "LBW", "LBL", "PSW", "PSL"])

    
(training_df, validation_df) = [format_df(df) for df in (raw_training_df, raw_validation_df)]

# We store the value of the odds that we use
odd_types = ("EX", "Max", "Avg")

        

<div style="width:90%; padding-left:5%">
<hr>
We can print our dataframe to see the modifications:
<hr>
</div>

In [6]:
print(training_df.head())
print( 3 * "\n==================================================================================================================" + "\n")
print(validation_df.head())

   ATP  AvgL  AvgW    Court       Date   EXL   EXW   L1   L2   L3    ...      \
0    1  2.78  1.42  Outdoor 2012-12-31  2.65  1.45  4.0  4.0  NaN    ...       
1    1  2.05  1.73  Outdoor 2012-12-31  2.00  1.75  3.0  6.0  1.0    ...       
2    1  3.58  1.28  Outdoor 2012-12-31  3.75  1.25  5.0  2.0  NaN    ...       
3    1  7.76  1.08  Outdoor 2012-12-31  8.00  1.06  4.0  4.0  NaN    ...       
4    1  1.85  1.88  Outdoor 2013-01-01  1.87  1.87  1.0  2.0  NaN    ...       

    W2   W3  W4  W5    WPts  WRank        Winner  Wsets  \
0  6.0  NaN NaN NaN  1215.0   28.0      Mayer F.    2.0   
1  2.0  6.0 NaN NaN   927.0   41.0   Nieminen J.    2.0   
2  6.0  NaN NaN NaN  1830.0   19.0  Nishikori K.    2.0   
3  6.0  NaN NaN NaN  1070.0   36.0  Baghdatis M.    2.0   
4  6.0  NaN NaN NaN   897.0   43.0    Istomin D.    2.0   

                       Names  Name1Wins  
0        Giraldo S./Mayer F.          0  
1   Benneteau J./Nieminen J.          0  
2  Matosevic M./Nishikori K.          

<div style="width:90%; padding-left:5%">
    <hr>
    <h3> 2) Designing a generic way to write and test our betters  </h3>
    <p>
    As explained above, a Better is a class implementing some decide(...) function. At this point, will define the first abstraction of a Better. 
    </p>
    <p>
    The constructor of a better is very simple: for this first abstraction, a better only needs one mandatory information: the odd type (in our case, one of 'EX', 'Avg', or 'Max')
    </p>
    <p>
    It is also the moment to implementing the test(...) method, which computes the results of our Better on some testing dataset. The implementation goes as follows:
    </p>
    
    <div style="border:1px solid grey; margin:10px;">
    <p>
    Given a decide(...) function, for each row of the testing dataframe:
    <ul>
        <li>compute (bet_side, bet_amount) = decide(row). </li>
        <li>see the result of the game:
            <ul> <li> if we were wrong, we simply lose (bet_amount). </li>
                <li> Otherwise, we win ((odd - 1) * bet_amount) </li>
            </ul>
        </li>
    </ul>
    </p>
    </div>
</p>

<div style="padding:5px; border:1px dashed grey; margin:10px;">
<p>
    <i> Note about the odds: </i> <br> An odd is a value > 1, describing how much the bank will pay you if you win. In one game, both players have their own odd, and the odd of one player does not suffice to infer the odd of the other one. A definition can be the following: <br>
    " If I bet 1 on a player with odd O, and if I win, the bank will give me back O (thus my benefit will be O - 1) "
</p>
</div>

<p>
Once the simulation is done for the whole testing data, we can return the total income, and other measurements which we are described in the code snippet.
</p>

<p>
We also provide an option (verbose: default True) to print the results of the simulation. Thus we only implement one main print for all kinds of better.
</p>

<hr>
</div>

In [7]:
# In the second part of this tutorial, we will define the abstract class of our bot

# We create the abstract class Better
class Better:

    def __init__(self, odd_type="EX"):
        """ Initialize the better with a given training dataframe 
        
        odds_type must be one of ("EX", "Max", "Avg"). It allows to select the odds to use for the computation.

        """               
        self.odd_type = odd_type

    def null_odd(self, row):
        """
        Given a row, 
        Return true if an odd that our better needs is not null or nan
        """
        b1, b2 = [pd.isnull(row[self.odd_type + char]) for char in ("L", "W")]
        return b1 or b2
    
    def decide(self, row):
        """ 
        Given a row (with all its data except the results of the game)
        and the odds to use for the computation (in our tutorial one of "EX", "Max", "Avg")
        
        Return a tuple (side, ponderation) with 
    
        side = 'W' or 'L' the side where to bet, and 
        0 >='ponderation' >= 1 the amount to bet (the certitude)
        
        In our dataset, the 'W' side is always the winner side.
        Of course, we will not use this information in any implementation of the decide() method.
            
        """
        return None
    
    def test(self, validation_df=validation_df, verbose=True):
        """ Test the better with the validation dataframe 'validation_df' 
    
        Returns a tuple with a few useful values:
        (
        n_bet_count     :    the number of non-zero bets during the test,
        total_row_count :    the number of rows analyzed by the test,
        play_percentage :    the percentage of rows with a bet > 0
        avg_bet_amount  :    the average amount of a bet (bets of 0 are ignored),
        
        total_income    :    the total income, in $, of the whole simulation (max unit bet value is 1$)
        avg_income      :    the average income of one bet
        )
        
        
        (In the computation, the max amount of a bet is $1. We may change this value later to get more realistic values)
        
        If verbose = true, also prints a few informations
        """
        max_bet_amount = 1
     
        rows = [row for _, row in validation_df.iterrows() if not self.null_odd(row)]
        n_bet_count = 0
        total_row_count = len(rows)
        total_bet_amount = 0
        total_income = 0
        
        better_type = str(type(self)).split('__main__.')[1].split("'")[0] 
        
        if(verbose):
            print("\n=====================================================================")
            print("Testing better of type " + better_type )
            print("Odd type : " + self.odd_type )
            print("Total number of entries: " + str(total_row_count) + "\n")
        
        for row in rows:
            side, ponderation = self.decide(row)
            
            bet_amount = max_bet_amount * ponderation
            
            if bet_amount > 0:
                n_bet_count += 1
                total_bet_amount += bet_amount
            
            unit_income = ( -1 + (0 if side == 'L' else row[self.odd_type + "W"]) )
            income = bet_amount * unit_income
            total_income += income
            
        avg_bet_amount = total_bet_amount / n_bet_count
        avg_income = total_income / n_bet_count
        play_percentage = n_bet_count / total_row_count * 100
        
        if(verbose):
            
            print("Number of games with non-zero bet: " + str(n_bet_count) 
                  + " (total: " + str(total_row_count) + ")")
            print("Percentage of games played: " + str(play_percentage))
            print("Average bet amount: " + str(avg_bet_amount))
            print("Total bet amount: " + str(total_bet_amount))
            print()
            print("Total income with this strategy: " + str(total_income))
            print("Average income: " + str(avg_income))
            print() 
            print("End test")
            print("=====================================================================\n")
            
            
        
        return n_bet_count, total_row_count, play_percentage, avg_bet_amount, total_income, avg_income

    

    
    

<div style="width:90%; padding-left:5%">
<hr>
    <h3> 3) Implementing the betters  </h3>
    <p>
        As a first attempt, we will implement a random better, betting 0 or B (the bet_amount provided in the constructor). <br>
        We will make it slightly configurable: we will provide an argument *play_percentage* (default: 100). <br>
        This value is the percentage of time that our better bets B (rather than 0). <br>
        When this better decides to bet B, it chooses the player on who to bet randomly (p = 0.5).
    </p>
<hr>

</div>

In [8]:
# Implementation of a random better. 
# For each entry, this bot this first randomly decide whether or not to bet. 
# If it bets, it will pick one of the two players randomly, and bet 1.
class SimpleRandomBetter(Better):
    """
    Construct a random better betting 0 or 1.
    play_percentage is the percentage of games (between 0 and 100) for which the bot should bet 1.
    """
    def __init__(self, play_percentage=100, odd_type="EX", bet_amount=1):
        super().__init__(odd_type)
        self.play_percentage = play_percentage
        self.bet_amount = bet_amount
        
    def decide(self, entry):
        
        # first, decide whether or not to play: 
        #   pick a random value in [0, 100[
        #   if it is higher than play_percentage do not play
        if (random.random() * 100 > self.play_percentage):
            return ("W", 0)
        
        # otherwise, bet 1 randomly for L or W
        return (random.choice(("W", "L")), self.bet_amount)
        
  

<div style="width:90%; padding-left:5%">
<hr>  <p>
        We can call test() on our better to simulate a random betting strategy:
    </p>
<hr>

</div>

In [9]:
# test our random better for each odd type
simple_random_betters = [SimpleRandomBetter(odd_type=o) for o in odd_types]
test_results = [better.test() for better in simple_random_betters]



Testing better of type SimpleRandomBetter
Odd type : EX
Total number of entries: 2519

Number of games with non-zero bet: 2519 (total: 2519)
Percentage of games played: 100.0
Average bet amount: 1.0
Total bet amount: 2519

Total income with this strategy: -269.03000000000003
Average income: -0.1068003175863438

End test


Testing better of type SimpleRandomBetter
Odd type : Max
Total number of entries: 2527

Number of games with non-zero bet: 2527 (total: 2527)
Percentage of games played: 100.0
Average bet amount: 1.0
Total bet amount: 2527

Total income with this strategy: -28.590000000000014
Average income: -0.011313810842896722

End test


Testing better of type SimpleRandomBetter
Odd type : Avg
Total number of entries: 2527

Number of games with non-zero bet: 2527 (total: 2527)
Percentage of games played: 100.0
Average bet amount: 1.0
Total bet amount: 2527

Total income with this strategy: -124.27999999999997
Average income: -0.049180846853977035

End test



<div style="width:90%; padding-left:5%">
<hr><p>
        We can run the simulation above a bunch of times to estimate the variance of this experiment.
    </p>
    <p>
        Surprisingly, the expected income using the 'Max' odd is positive. That means that we could earn money by betting randomly on the highest odds: however the problem is that the odds varies a lot and it is not easy to find the max odd in practice.
    </p>
    
<hr>

</div>

<div style="width:90%; padding-left:5%">
<hr><p>
        Now, it is time to write another important abstraction for our better:
    </p>
    <p>
    We can reduce the problem of betting to another one: estimating the probability that one player wins.
    </p>
    <p>
    If we can make a good estimator, then it is easy to decide whether or not to bet: <br>
    <div style="border:1px dotted grey; padding:5px; margin:10px;">
    Let us assume a game P1 vs P2 , with odds O1 and O2 respectively, and an estimator:
    <ul>
        <li>First, estimate the probability that player 1 wins. Denote it p </li>
        <li>Calculate the expected income of a bet on a player:
            <ul>
                <li> A bet of 1 on player 1 has expected income -1 x (1 - P) + (O1 - 1) x P </li>
                <li> A bet of 1 on player 2 has expected income -1 x P + (O2 - 1) x (1 - P) </li>
            </ul>
        </li>
        <li>  If one of this value is positive, bet on the associated player. </li>
    
    </ul>
    </div>
    <p style="border:1px dotted grey; padding:5px; margin:10px;">
    Note: the odds are designed so that the expected incomes cannot both be positive. Thus, we never bet on both sides.
    </p>
    </p>
 
    <p>
    Thus, we will create a new class RateBetter extends Better, in which will define a new abstract method rate(...), and implement the method decide(...). In this implementation, we decide that the value 'rate' will be between 0 and 1, and will indicate the estimated chances that the 'W' column wins.
    </p>
    
    <p>
    For sake of modularity, we will also define a new method bet_amount(expected_income) which takes an expected income (for a $1 bet) as parameter, and returns the amount (between 0 and 1) that we should bet (the bigger the expected income, the more we should bet). This method will allow us to weight our bets differently without modifying any existing code.
    </p>
<hr>

</div>

In [10]:
"""
    Another abstraction of a better
    Given a function rate(self, row) to compute the probability that 'W' wins, 
    we can implement the decide(self, row) function
"""
class RateBetter(Better):
    """
    Given a row, estimate the probability that the 'W' player wins.
    """
    def rate(self, row):
        pass
    
    """
    Given the expected income of a bet of 1$ , 
    compute the amount (between 0 and 1) that we should bet. 
    This method will be called by decide(...) to compute the amount of the bet.
    We will provide a simple default implementation
    """
    def amount(self, expected_income):
        return 1 if expected_income > 0 else 0
    
    def decide(self, row):
              
        # if the odds are not defined, we do not bet
        if self.null_odd(row):
            return ("W", 0)
        
        # compute the rate of the 'W' and the 'L' side (from the previous games played by the same players)
        p = self.rate(row)
        
        # if p is negative, we do not bet. 
        # In case of problem, rate(...) will return negative values to indicate that it has failed
        if (p < 0):
            return ("W", 0)
        
        (rateW, rateL) = (p, 1-p)
         
        # fetch the odds
        oddW, oddL = [row[self.odd_type + char] for char in ("W", "L")]
        
        # compute the expected win (with a 1$ bet) respectively by betting L or W (using the rates)
        (exp_win_W, exp_win_L) = [ rate * (odd-1) - 1 * (1-rate) for odd, rate in zip((oddW, oddL), (rateW, rateL)) ]
        
        return ("W", self.amount(exp_win_W)) if exp_win_W > 0 else\
               ("L", self.amount(exp_win_L)) if exp_win_L > 0 else\
               ("W", 0)
     

<div style="width:90%; padding-left:5%">
<hr><p>
        We are done! All the important structure has been implemented. We are ready to start designing some betters!
    </p>
    <p>
        As an example will keep things simple and implement one dummy rate estimator as follows:
    </p> 
    <div style="border:1px dotted grey; padding:5px; margin:10px;">
        Rate (p1,p2, n):
    <ul>
        <li> in the training dataset, look for all games with players (p1/p2) </li>
        <li> return -1 if less than n games are found during the previous step  </li>
        <li> compute the rate of player1:  r1 = (n games won by p1) / (n games in total) </li>
        <li> return r1. </li>
    </ul>
    <i> Note </i>: n is a constant which we will provide in the constructor.
    </div>
    
<hr>
</div>

In [11]:
# now, we can establish a first implementation. This better will only bet the amount '1' or '0'
class SimpleRateBetter(RateBetter):
    
    def __init__(self, odd_type="EX", training_df=training_df, n=5):
        
        super().__init__(odd_type)
        pair_df = training_df.groupby("Names")["Name1Wins"].agg(["sum", "count"])
               
        # We keep the result only if the pair of players have played at least 'n' games
        self.rate_df = pair_df.loc[pair_df["count"] >= n]

    
    def rate(self, row):
        
        # fetch the names
        names = combine_names(row)  
        
        # if we do not find the names in our records, we do not bet
        # if the odds are not specified, we do not bet
        if (names not in self.rate_df.index):
            return -1
        
        rate_row = self.rate_df.loc[names]
        leftRate = rate_row["sum"] / rate_row["count"]
        
        rateW = leftRate if names.split("/")[0] == row["Winner"] else 1-leftRate
        return rateW
                                  

 <div style="width:90%; padding-left:5%">
 <hr>
 Now, let's create a better and test it!
 <hr>
 </div>

In [12]:
# test our simple better
simple_rate_betters = [SimpleRateBetter(odd_type=odd_type) for odd_type in odd_types]
simple_betters_test_results = [better.test() for better in simple_rate_betters]


Testing better of type SimpleRateBetter
Odd type : EX
Total number of entries: 2519

Number of games with non-zero bet: 63 (total: 2519)
Percentage of games played: 2.500992457324335
Average bet amount: 1.0
Total bet amount: 63

Total income with this strategy: -14.820000000000002
Average income: -0.23523809523809527

End test


Testing better of type SimpleRateBetter
Odd type : Max
Total number of entries: 2527

Number of games with non-zero bet: 74 (total: 2527)
Percentage of games played: 2.928373565492679
Average bet amount: 1.0
Total bet amount: 74

Total income with this strategy: -3.5500000000000003
Average income: -0.047972972972972976

End test


Testing better of type SimpleRateBetter
Odd type : Avg
Total number of entries: 2527

Number of games with non-zero bet: 66 (total: 2527)
Percentage of games played: 2.6117926394934705
Average bet amount: 1.0
Total bet amount: 66

Total income with this strategy: -9.920000000000002
Average income: -0.15030303030303033

End test



 <div style="width:90%; padding-left:5%">
 <hr>
 We can see that we lose some money. Given the dummy rate(...) function used, this was expectable!
 However it is also interesting to compare the results with randomness: 
 <br>
 As a last example of test, we will create a RandomBetter with the same play_percentage as recorded by the test() above. We will take an average of 20 simulations to decrease the variance.
 <hr>
 </div>

In [13]:
# Now, we will se the results of a random better. 
# To make the comparison relevant, we will use the same play percentage and bet amount as before

# the percentage of play of the simpleRateBetters above
play_percentages = [p for _, _, p, _, _, _ in simple_betters_test_results]
simple_random_betters = [SimpleRandomBetter(play_percentage=p, odd_type=o)\
                         for p, o in zip(play_percentages, odd_types)]


# first, we create a helper function to simulate multiple times.
"""
Given (better, n):
Call better.test() n times and return its average.
"""
def average_test(better, n):
    test_results = [better.test(verbose=False) for i in range(n)]
    sums = [sum(values) for values in zip(*test_results)]
    return [val/n for val in sums]

# we can now test our better (we take an average on 20 simulations)
n = 20 
for b in simple_random_betters:    
    print(average_test(b, n))


[58.15, 2519.0, 2.3084557364033342, 1.0, -4.355999999999999, -0.07419401045653376]
[73.1, 2527.0, 2.8927582113177683, 1.0, 1.1615, 0.016859952092346746]
[67.9, 2527.0, 2.686980609418282, 1.0, -4.518500000000001, -0.06558197286469633]


 <div style="width:90%; padding-left:5%">
 <hr>
 
 <div style="padding:5px; border:1px dashed grey;">
 <i> Note </i>: as commented in the test() function, those tuples contains the average values of: <br>
 ( N non-zero bet  ;  N rows in the test  ;  % of non-zero bet  ;  total income ($)  ;  average income per bet ) 
 </div>
 
 We can observe that the random test gets better results than our simulation!
 
 <h5> What can we conculde ? </h5>
 <p>
     Calculating a rate by counting the games between two players only makes no sense. <br>
     If A and B have played 4 times with score 3-1, does A have about 75% of chances to win the next game? <br>
     Obviously, the answer is no. Other factors (such as the ranking) also matters.
     Our better just ran into a bad case of randomness.
 </p>
 
 <h5> And how could we create a Rate() function capable of beating the bank? </h5>
 
 <p>
     The real way do so is to use a Classifier, capable of estimating his chances of being wrong. <br>
     This code is a good skeleton to implement a Better using a Classifier. 
     However, this task goes way beyond the scope of this tutorial and remains wide open!
 </p>

 <hr>
 </div> 
