# Assignment 1.1

## Decide to gamble or not

Simple Poisson model:
- all previous matches of 2 teams
- disregarding whether home or away
- disregarding conceded goals of the teams

Write Functions:
* exact_score_odds(csv_file, team1,team2, gh, ga) -> returns the predicted exact score gh-ga in the match team1 against team2
* pred_1X2(csv_file, team1, team2) -> returns the tuple (odds that team1 wins, odds of draw, odds of team2 wins)
* exact_score_odds_alt(team1,team2, gh, ga) and pred_1X2_alt(team1, team2) <br /> implementing your own take on the problem of predicting the odds of an exact score of a game, and the odds of the home win, draw, and away win events.

In [86]:
import pandas as pd
import numpy as np
import readcsv

In [87]:
def goalStatsDf(csv):
    rcsv = readcsv.ReadCSV("I1.csv")
    entire_df = pd.read_csv(csv)
    all_teams = np.sort(entire_df.AwayTeam.unique())
    team_stats = []

    for team in all_teams:
        team_stats.append((team,) + rcsv.goals_statistics(team))
    goalstat_df = pd.DataFrame(team_stats, columns = ["team", "scoredGoals", "concededGoals", "numMatches", "scoredHome", "concededHome", "numMatchesHome", "scoredAway", "concededAway", "numMatchesAway"])
    return goalstat_df


In [88]:
def poissonProb(avgNumEvents, observedEvents):
    numerator = np.math.pow(avgNumEvents,observedEvents) * np.math.pow(np.math.e, (-avgNumEvents))
    denominator = np.math.factorial(observedEvents)
    probObserved = numerator / denominator
    return probObserved

# 1) Exact score odds

In [89]:
def exact_score_odds(csv_file, team1, team2, goalsHome, goalsAway):
    # exact_score_odds(csv_file, team1,team2, gh, ga) -> returns the predicted exact score goalsHome-goalsAway in the match team1 against team2
    summary_df = goalStatsDf(csv_file)
    avgT1 = np.around(summary_df.loc[summary_df["team"] == team1, "scoredGoals"].values[0] / summary_df.loc[summary_df["team"] == team1, "numMatches"].values[0], 3)
    avgT2 = np.around(summary_df.loc[summary_df["team"] == team2, "scoredGoals"].values[0] / summary_df.loc[summary_df["team"] == team2, "numMatches"].values[0], 3)
    odds = 1 / (poissonProb(avgT1, goalsHome) * poissonProb(avgT2, goalsAway))
    return odds

### Bookmaker odds
Spezia - Salernitana, exact score 2 - 1, odds 12.00
Lazio - Inter, exact score 1 - 2, odds 9.50
AC Milan - Verona, exact score 1 - 0, odds 12.50
AC Milan - Verona, exact score 1 - 1, odds 10.00

In [90]:
exact_score_odds("I1.csv", "Venezia", "Fiorentina", 1, 2)

12.65073978542058

# Odds calculated by method 1)
Spezia - Salernitana, exact score 2 - 1, odds 12.208
Lazio - Inter, exact score 1 - 2, odds 12.41
AC Milan - Verona, exact score 1 - 0, odds 13.26
AC Milan - Verona, exact score 1 - 1, odds 10.288

In other words according to the odds produced with [lambda = scored goals/ numMatches] these odds are slightly lower than the bookmakers odds.

# 2) Prediction of Win, Draw, Loss

In [100]:
def oddsMatrix(csv, homeTeam, awayTeam):
    # pred_1X2(csv_file, team1, team2) -> returns the tuple (odds that team1 wins, odds of draw, odds of team2 wins)
    # after analysing the sports df you can see that 7 goals was the max but it is so unlikely/close to 0% that it causes division errors
    oddsMatrix = np.zeros((6,6))
    for h in range(0,6):
        for a in range(0,6):
            oddsMatrix[h,a] = np.around(exact_score_odds(csv, homeTeam, awayTeam, h, a),3)
    return oddsMatrix

oddsMatrix("I1.csv", "Venezia", "Fiorentina")

array([[7.25000000e+00, 5.72200000e+00, 9.03300000e+00, 2.13870000e+01,
        6.75220000e+01, 2.66462000e+02],
       [1.01540000e+01, 8.01400000e+00, 1.26510000e+01, 2.99540000e+01,
        9.45680000e+01, 3.73196000e+02],
       [2.84430000e+01, 2.24490000e+01, 3.54360000e+01, 8.39060000e+01,
        2.64896000e+02, 1.04536800e+03],
       [1.19507000e+02, 9.43230000e+01, 1.48892000e+02, 3.52546000e+02,
        1.11300900e+03, 4.39230200e+03],
       [6.69508000e+02, 5.28420000e+02, 8.34128000e+02, 1.97504600e+03,
        6.23534600e+03, 2.46067320e+04],
       [4.68843100e+03, 3.70041900e+03, 5.84123000e+03, 1.38308530e+04,
        4.36648870e+04, 1.72316049e+05]])

In [92]:
def pred_1X2(csv,homeTeam, awayTeam):
    matrix = 100/oddsMatrix(csv,homeTeam, awayTeam)
    oddsDraw = np.around(np.diagonal(matrix).sum(),2)
    oddsLoss = np.around(np.triu(matrix, k=1).sum(),2)
    oddsWin = np.around(np.tril(matrix, k=-1).sum(),2)
    return ((100/oddsWin),(100/oddsDraw),(100/oddsLoss))

pred_1X2("I1.csv","Venezia", "Fiorentina")

(4.768717215069147, 3.402517863218782, 2.0230629172567265)

# Own solution

Using the average of goals scored per overall matches works but you can get better results if you take into account also the goals conceded and whether these where scored/conceded at home games versus away games. Essentially we are taking into account the attack and defense strength at home and away of both the teams.

In [93]:
def all_matchesStats(csv_file):
    # this function returns avg of goals scored/conceded overall at home and also the same averages for away
    summary_df = goalStatsDf(csv_file)
    totalscoredHomeAvg = summary_df["scoredHome"].sum() / summary_df["numMatchesHome"].sum()
    totalconcededHomeAvg = summary_df["concededHome"].sum() / summary_df["numMatchesHome"].sum()
    totalscoredAwayAvg = summary_df["scoredAway"].sum() / summary_df["numMatchesAway"].sum()
    totalconcededAwayAvg = summary_df["concededAway"].sum() / summary_df["numMatchesAway"].sum()
    # if correct totalscoredHomeAvg should be the same as totalconcededAway
    return (totalscoredHomeAvg, totalconcededHomeAvg, totalscoredAwayAvg, totalconcededAwayAvg)

In [94]:
all_matchesStats("I1.csv")

(1.6377777777777778,
 1.4377777777777778,
 1.4377777777777778,
 1.6377777777777778)

In [124]:
def exact_score_odds_alt(csv_file, homeTeam, awayTeam, homeGoals, awayGoals):
    summary_df = goalStatsDf(csv_file)
    totalAverages = all_matchesStats(csv_file)

    # calculating the Attack strength and defensive strengths in comparison to the overall averages
    avgHomeScore = np.around(summary_df.loc[summary_df["team"] == homeTeam, "scoredHome"].values[0] / summary_df.loc[summary_df["team"] == homeTeam, "numMatchesHome"].values[0], 3)
    homeAttackAvg = avgHomeScore / totalAverages[0]
    avgHomeConcede = np.around(summary_df.loc[summary_df["team"] == homeTeam, "concededHome"].values[0] / summary_df.loc[summary_df["team"] == homeTeam, "numMatchesHome"].values[0], 3)
    homeDefenseAvg = avgHomeConcede / totalAverages[1]
    avgAwayScore = np.around(summary_df.loc[summary_df["team"] == awayTeam, "scoredAway"].values[0] / summary_df.loc[summary_df["team"] == awayTeam, "numMatchesAway"].values[0], 3)
    awayAttackAvg = avgAwayScore / totalAverages[2]
    avgAwayConcede = np.around(summary_df.loc[summary_df["team"] == awayTeam, "concededAway"].values[0] / summary_df.loc[summary_df["team"] == awayTeam, "numMatchesAway"].values[0], 3)
    awayDefenseAvg = avgAwayConcede / totalAverages[3]
    lambdaHome = homeAttackAvg * awayDefenseAvg * totalAverages[0]
    lambdaAway = homeDefenseAvg * awayAttackAvg * totalAverages[1]
    #to verify what average for home goals vs away goals are
    #print(lambdaHome, lambdaAway)
    oddsOutcome = 1 / (np.around(poissonProb(lambdaHome, homeGoals),4) * np.around(poissonProb(lambdaAway, awayGoals),4))
    return np.around(oddsOutcome,3)

In [125]:
exact_score_odds_alt("I1.csv", "Lazio", "Inter", 0, 0)

20.472

# Odds calculated by alternative method
Spezia - Salernitana, exact score 2 - 1, odds 10.426
Lazio - Inter, exact score 1 - 2, odds 10.461
AC Milan - Verona, exact score 1 - 0, odds 8.241
AC Milan - Verona, exact score 1 - 1, odds 8.192

In this method it seems like the games of Spezia and Milan have slightly better odds than the bookmaker assumes which suggests you might win more money.

In [133]:
def oddsMatrixAlt(csv, homeTeam, awayTeam):
    # pred_1X2(csv_file, team1, team2) -> returns the tuple (odds that team1 wins, odds of draw, odds of team2 wins)
    # after analysing the sports df you can see that 7 goals was the max but it is so unlikely/close to 0% that it causes division errors
    oddsMatrix = np.zeros((6,6))
    for h in range(0,6):
        for a in range(0,6):
            oddsMatrix[h,a] = np.around(exact_score_odds_alt(csv, homeTeam, awayTeam, h, a),3)
    return oddsMatrix

oddsMatrixAlt("I1.csv", "Venezia", "Fiorentina")

array([[1.0028000e+01, 7.8970000e+00, 1.2442000e+01, 2.9404000e+01,
        9.2661000e+01, 3.6583100e+02],
       [9.6820000e+00, 7.6240000e+00, 1.2012000e+01, 2.8388000e+01,
        8.9461000e+01, 3.5319600e+02],
       [1.8697000e+01, 1.4724000e+01, 2.3198000e+01, 5.4824000e+01,
        1.7276600e+02, 6.8209100e+02],
       [5.4185000e+01, 4.2671000e+01, 6.7229000e+01, 1.5888000e+02,
        5.0068100e+02, 1.9767140e+03],
       [2.0941100e+02, 1.6491000e+02, 2.5982100e+02, 6.1402400e+02,
        1.9349850e+03, 7.6394190e+03],
       [1.0171390e+03, 8.0099300e+02, 1.2619890e+03, 2.9824040e+03,
        9.3984960e+03, 3.7105751e+04]])

In [109]:
def pred_1X2_alt(csv,homeTeam, awayTeam):
    matrix = 100/oddsMatrixAlt(csv,homeTeam, awayTeam)
    oddsDraw = 100/np.around(np.diagonal(matrix).sum(),2)
    oddsLoss = 100/np.around(np.triu(matrix, k=1).sum(),2)
    oddsWin = 100/np.around(np.tril(matrix, k=-1).sum(),2)
    return (oddsWin,oddsDraw,oddsLoss)

pred_1X2_alt("I1.csv","Venezia", "Fiorentina")

(3.3200531208499333, 3.5612535612535616, 2.4084778420038533)

In [130]:
entire_df = pd.read_csv("I1.csv")
all_teams = np.sort(entire_df.AwayTeam.unique())
print(all_teams)

['Atalanta' 'Benevento' 'Bologna' 'Cagliari' 'Crotone' 'Empoli'
 'Fiorentina' 'Genoa' 'Inter' 'Juventus' 'Lazio' 'Milan' 'Napoli' 'Parma'
 'Roma' 'Salernitana' 'Sampdoria' 'Sassuolo' 'Spezia' 'Torino' 'Udinese'
 'Venezia' 'Verona']


In [134]:
# Since it takes quite a while to compute: for the quantitative analysis I need to resort to sampling
print("Atalanta - Benevento ", pred_1X2_alt("I1.csv","Atalanta", "Benevento"))
print("Bologna - Napoli ", pred_1X2_alt("I1.csv","Bologna", "Napoli"))
print("Parma - Roma ", pred_1X2_alt("I1.csv","Parma", "Roma"))
print("Genoa - Juventus ", pred_1X2_alt("I1.csv","Genoa", "Juventus"))

Atalanta - Benevento  (1.4779781259237363, 6.605019815059445, 8.8261253309797)
Bologna - Napoli  (6.002400960384153, 5.361930294906166, 1.6289297931259163)
Parma - Roma  (6.253908692933083, 5.173305742369375, 1.6100466913540492)
Genoa - Juventus  (5.9523809523809526, 5.186721991701244, 1.6350555918901244)


In [132]:
print("The odds show odds of Home win, draw and Awayteam win")
print("Atalanta - Benevento ", pred_1X2("I1.csv", "Atalanta", "Benevento"))
print("Bologna - Napoli ", pred_1X2("I1.csv", "Bologna", "Napoli"))
print("Parma - Roma ", pred_1X2("I1.csv", "Parma", "Roma"))
print("Genoa - Juventus ", pred_1X2("I1.csv", "Genoa", "Juventus"))

Atalanta - Benevento  (1.6331863465621428, 5.162622612287041, 5.977286312014345)
Bologna - Napoli  (4.504504504504505, 5.107252298263535, 1.8231540565177757)
Parma - Roma  (4.894762604013706, 4.452359750667854, 1.7914725904693658)
Genoa - Juventus  (4.095004095004095, 4.557885141294439, 1.9252984212552946)
