<a href="https://colab.research.google.com/github/kyi95/Sports-Betting-Odds-Analysis/blob/main/NCAA_Data_Odds_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Objective


The objective of this notebook is to do EDA on our small dataset of NCAA men's basketball odds between 2 betting websites, BetUS and BetOnline.

In [1]:
#importing the libraries we need in order to analyze the data
import numpy as np
import pandas as pd
import plotly.express as px

# Pathlib to navigate file system
from pathlib import Path

In [2]:
#Allows us to use data saved in our google drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
#Let's Load In the Data (If working different device, load local one drive path here)
data_folder = Path('/content/drive/MyDrive/SideProjects/Data')

file = data_folder / 'NCAA_Odds_Data.csv'

In [4]:
odds_df = pd.read_csv(file)

In [5]:
# Here we have named the dataframe odds and are looking at the head of it. 
odds_df.head()

Unnamed: 0,TeamName,PointSpread,Vigor,MoneyLine,Total,OverUnderOdds,OverUnder,Website
0,North Carolina,4.5,-115,180,152,-110,Over,Betus
1,Kansas,-4.5,-105,-220,152,-110,Under,Betus
2,North Carolina,4.0,100,190,152,-105,Over,BetOnline
3,Kansas,-4.0,-120,-220,152,-115,Under,BetOnline


#Variable Definitions
* TeamName: The name of the team playing
* PointSpread: The projected point difference between the two opponents in the matchup
* Vigor: The odds of the pointspread
* MoneyLine: The odds of the overall matchup, which team will win
* Total: The projected total score amongst both teams
* OverUnderOdds: The odds of the total score being over or under 152 points
* Website: Which website the odds originated from

#Statistics of the odds

In [6]:
# Here we are creating a new column that concatenates the team name and the website that the odds originated from
odds_df['Team_Site'] = odds_df['TeamName'] + ' ' + odds_df['Website']

In [7]:
odds_df.groupby('Team_Site')['PointSpread'].value_counts()

Team_Site                 PointSpread
Kansas BetOnline          -4.0           1
Kansas Betus              -4.5           1
North Carolina BetOnline   4.0           1
North Carolina Betus       4.5           1
Name: PointSpread, dtype: int64

We can see from our counts of the point spread that Kansas is the clear favorite on both betting websites with point spreads of -4.5 and -4. Additionally, we can see that difference point spreads within each team is 0.5 for Kansas as well as 0.5 for UNC

In [8]:
odds_df.groupby('Team_Site')['Vigor'].value_counts()

Team_Site                 Vigor
Kansas BetOnline          -120     1
Kansas Betus              -105     1
North Carolina BetOnline   100     1
North Carolina Betus      -115     1
Name: Vigor, dtype: int64

In looking at the vigor odds within the point spread, we can see that there are 4 different levels. On Betonline, Kansas has a much higher vigor than on Betus. This tells us that BetOnline believes Kansas will have more of a chance at beating the spread than on Betus because we would have to bet 120 to get 100 back on BetOnline versus 105 on Betus. The opposite is true for UNC. Betonline believes North carolina will match the point spread much more easily than Betus

## Point Spread Odds for BetOnline.
* For Kansas on BetOnline, the bet is 120 to win 100 with a return of 220

In [9]:
# Now Let's Look at Implied probability of Vigor = Risk/Return for Betonline and Kansas
kan_betonline_ip = (120/ (120+100))*100
print(kan_betonline_ip)

54.54545454545454


* For UNC on BetOnline, the bet is 100 to win 100 with a return of 200

In [10]:
# Now Let's Look at Implied probability of Vigor = Risk/Return for Betonline and UNC
unc_betonline_ip = (100/ (100+100))*100
print(unc_betonline_ip)

50.0


In [11]:
#Total Implied Probability for BetOnline
total_ip_betonline = kan_betonline_ip+unc_betonline_ip
print(total_ip_betonline)

104.54545454545453


* The bookmaker will expect to make out with 104.54 after paying out $100 so a 4.54% margin

## Point Spread Odds for BetUs

* For Kansas on Betus, the bet is 105 to win 100 with a return of 205

In [12]:
# Now Let's Look at Implied probability of Vigor = Risk/Return for BetUs and Kansas
kan_betus_ip = (105/(105+100))*100
print(kan_betus_ip)

51.21951219512195


* For UNC on BetUs, the bet is 115 to win 100 with a return of 215

In [13]:
# Now Let's Look at Implied probability of Vigor = Risk/Return for BetUs and UNC
unc_betus_ip = (115/ (115+100))*100
print(unc_betus_ip)

53.48837209302325


In [14]:
#Total Implied Probability for BetUs
total_ip_betus = kan_betus_ip+unc_betus_ip
print(total_ip_betus)

104.70788428814521


* The bookmaker will expect to make out with 104.70 after paying out $100 so a 4.70% margin

##Calculating Actual Covering Odds for BetOnline

In [15]:
#Actual odds of Kansas covering via BetOnline = Team Implied Prob/ Total Implied Probability
actual_kansas_beton = kan_betonline_ip/total_ip_betonline
print(actual_kansas_beton)

0.5217391304347826


In [16]:
#Actual odds of UNC covering via BetOnline = Team Implied Prob/ Total Implied Probability
actual_unc_beton = unc_betonline_ip/total_ip_betonline
print(actual_unc_beton)

0.47826086956521746


These values represent the actual odds of covering on Betonline before the vigor is added. Betonline odds show that Kansas has a 52.17% chance of beating the spread and UNC has a 47.82 percent chance of beating the spread. 

In [17]:
actual_kansas_beton+actual_unc_beton
# This value must equal 1 in order to validate our calculations

1.0

##Calculating Actual Covering Odds for BetUs

In [18]:
#Actual odds of Kansas covering via BetUs = Team Implied Prob/ Total Implied Probability
actual_kansas_betus = kan_betus_ip/total_ip_betus
print(actual_kansas_betus)

0.4891657638136511


In [19]:
#Actual odds of UNC covering via BetUs = Team Implied Prob/ Total Implied Probability
actual_unc_betus = unc_betus_ip/total_ip_betus
print(actual_unc_betus)

0.5108342361863488


These values represent the actual odds of covering on BetUs before the vigor is added. BetUs odds show that Kansas has a 48.9% chance of beating the spread and UNC has a 51.08% chance of beating the spread. 

In [20]:
actual_kansas_betus+actual_unc_betus
# This value must equal 1 in order to validate our calculations

1.0

##Calculating Variance of Actual Covering Odds across Sites

In [59]:
#Let's first calculate the difference in odds between the two sites for kansas
kansas_point_spread_diff = abs(actual_kansas_beton -actual_kansas_betus)
print(kansas_point_spread_diff)

0.03257336662113147


There is a 3.2% difference between the point spread odds of Kansas on the two sites

In [58]:
from numpy.ma.core import var
kansas_point_spread_var = var((actual_kansas_beton, actual_kansas_betus))
print(kansas_point_spread_var)

0.00026525605325866046


In [60]:
#Let's first calculate the difference in odds between the two sites for UNC
unc_point_spread_diff = abs(actual_unc_beton -actual_unc_betus)
print(unc_point_spread_diff)

0.03257336662113136


There is a 3.2% difference between the point spread odds of unc on the two sites

In [61]:
unc_point_spread_var = var((actual_unc_beton, actual_unc_betus))
print(unc_point_spread_var)

0.00026525605325865867


##Overall Findings within Point Spread

When looking at the point spread criteria, we can see that the overall odds of beating the spread on Betonline is 52 to 48 in favor of Kansas and the overall odds of beating the spread on BetUs is 49 to 51 in favor of UNC. Here I would choose to bet on the spread with Kansas on Betonline as that gives us the highest probability of success. Additionally, the vigor margin is lower on BetOnline with 4.54 compared to 4.7. We also notice that the difference in point spread odds for the two sites is nearly identical for both teams. However, the variances differ

##MoneyLine Odds for BetUs With Vigor

In [23]:
odds_df[['Team_Site','MoneyLine']]

Unnamed: 0,Team_Site,MoneyLine
0,North Carolina Betus,180
1,Kansas Betus,-220
2,North Carolina BetOnline,190
3,Kansas BetOnline,-220


On BetUs, UNC's moneyline is +180 which means that if you bet 100, the payout is 180, so the total return is 280

In [24]:
# Now Let's Look at Implied probability of MoneyLine = Risk/Return for BetUs and UNC
unc_betus_ip_ml = (100/(100+180))*100
print(unc_betus_ip_ml)

35.714285714285715


On Betus, Kansas' moneyline is -220 which means that if need to bet 220 in order to get 100. so the total return is 320

In [25]:
# Now Let's Look at Implied probability of MoneyLine = Risk/Return for BetUs and Kansas
kansas_betus_ip_ml = (220/(100+220))*100
print(kansas_betus_ip_ml)

68.75


In [26]:
total_ip_betus_ml = unc_betus_ip_ml + kansas_betus_ip_ml
print(total_ip_betus_ml)

104.46428571428572


On BetUs, the total implied probability is 104.46 which means that the margin of payout for the bookkeepr is 4.46%

##MoneyLine odds for BetOnline With Vigor

In [27]:
odds_df[['Team_Site','MoneyLine']]

Unnamed: 0,Team_Site,MoneyLine
0,North Carolina Betus,180
1,Kansas Betus,-220
2,North Carolina BetOnline,190
3,Kansas BetOnline,-220


On BetOnline, UNC's moneyline is +190 which means that if you bet 100, the payout is 190, so the total return is 290

In [28]:
# Now Let's Look at Implied probability of MoneyLine = Risk/Return for BetOnline and UNC
unc_beton_ip_ml = (100/(190+100))*100
print(unc_beton_ip_ml)

34.48275862068966


On BetOnline, Kansas' moneyline is -220 which means that if need to bet 220 in order to get 100. so the total return is 320

In [29]:
# Now Let's Look at Implied probability of MoneyLine = Risk/Return for BetOnline and Kansas
kansas_beton_ip_ml = (220/(100+220))*100
print(kansas_beton_ip_ml)

68.75


In [30]:
total_ip_beton_ml = unc_beton_ip_ml + kansas_beton_ip_ml
print(total_ip_beton_ml)

103.23275862068965


On BetUs, the total implied probability is 103.23 which means that the margin of payout for the bookkeeper is 3.23%

##Actual Moneyline Odds without vigor on Betus

In [36]:
#Actual odds of UNC winning via BetUs = Team Implied Prob/ Total Implied Probability
actual_unc_betus_ml = unc_betus_ip_ml/total_ip_betus_ml
print(actual_unc_betus_ml)

0.3418803418803419


In [37]:
#Actual odds of Kansas wining via BetUs = Team Implied Prob/ Total Implied Probability
actual_kansas_betus_ml = kansas_betus_ip_ml/total_ip_betus_ml
print(actual_kansas_betus_ml)

0.6581196581196581


In [38]:
actual_unc_betus_ml+actual_kansas_betus_ml
#Validation

1.0

On the Betus site, the odds for unc winning the match are 34% against 66% for kansas

##Actual Moneyline Odds without vigor on BetOnline

In [34]:
#Actual odds of UNC winning via BetOnline = Team Implied Prob/ Total Implied Probability
actual_unc_beton_ml = unc_beton_ip_ml/total_ip_beton_ml
print(actual_unc_beton_ml)

0.3340292275574113


In [35]:
#Actual odds of Kansas wining via BetOnline = Team Implied Prob/ Total Implied Probability
actual_kansas_beton_ml = kansas_beton_ip_ml/total_ip_beton_ml
print(actual_kansas_beton_ml)

0.6659707724425887


In [39]:
actual_unc_beton_ml + actual_kansas_beton_ml
#Validation

1.0

On BetOnline, the odds for unc winning the match are 33 against 67 for kansas

##Calculating Variance of Actual MoneyLine Odds across Sites

In [62]:
#Let's first calculate the difference in odds between the two sites for kansas
kansas_ml_diff = abs(actual_kansas_betus_ml -actual_kansas_beton_ml)
print(kansas_ml_diff)

0.007851114322930619


There is a 0.7% difference between the money line odds of Kansas on the two sites

In [65]:
from numpy.ma.core import var
kansas_ml_var = var((actual_kansas_betus_ml, actual_kansas_beton_ml))
print(kansas_ml_var)

1.5409999027931576e-05


In [66]:
#Let's first calculate the difference in odds between the two sites for UNC
unc_ml_diff = abs(actual_unc_betus_ml -actual_unc_beton_ml)
print(unc_ml_diff)

0.007851114322930564


There is a 0.7% difference between the Moneyline odds of unc on the two sites

In [69]:
unc_ml_var = var((actual_unc_betus_ml, actual_unc_beton_ml))
print(unc_ml_var)

1.540999902793136e-05


##Overall Findings within MoneyLine

Both sites have Kansas as a pretty heavy favorite and the odds are very similar, so at this point I would definitely put money on Kansas as far as the site to choose, I would choose Betus since the vigor margin is 3.23% instead of 4.46%. Additionally, the difference in odds across sites are also nearly identical with similar variance

## OverUnder Odds for BetUS with Vigor

In [42]:
odds_df[['Team_Site', 'OverUnderOdds', 'OverUnder']]

Unnamed: 0,Team_Site,OverUnderOdds,OverUnder
0,North Carolina Betus,-110,Over
1,Kansas Betus,-110,Under
2,North Carolina BetOnline,-105,Over
3,Kansas BetOnline,-115,Under


On BetUs, the odds to score over 152 pts is -110, meaning you would need to bet 110 to get 100 with a return of 220 total

In [43]:
# Now Let's Look at Implied probability of over odds = Risk/Return for BetUs 
over_betus_ip = (110/(100+110))*100
print(over_betus_ip)

52.38095238095239


On BetUs, the odds to score under 152 pts is also -110, meaning you would need to bet 110 to get 100 with a return of 220 total

In [44]:
# Now Let's Look at Implied probability of under odds = Risk/Return for BetUs 
under_betus_ip = (110/(100+110))*100
print(under_betus_ip)

52.38095238095239


In [45]:
total_ou_ip_betus = over_betus_ip + under_betus_ip
print(total_ou_ip_betus)

104.76190476190477


On BetUs, the total implied probability is 104.76 which means that the margin of payout for the bookkeeper is 4.76%

##OverUnder Odds for BetOnline with Vigor

In [None]:
odds_df[['Team_Site', 'OverUnderOdds', 'OverUnder']]

Unnamed: 0,Team_Site,OverUnderOdds,OverUnder
0,North Carolina Betus,-110,Over
1,Kansas Betus,-110,Under
2,North Carolina BetOnline,-105,Over
3,Kansas BetOnline,-115,Under


On BetOnline, the odds to score over 152 pts is -105, meaning you would need to bet 105 to get 100 with a return of 205 total

In [46]:
# Now Let's Look at Implied probability of over odds = Risk/Return for BetUs 
over_beton_ip = (105/(100+105))*100
print(over_beton_ip)

51.21951219512195


On BetOnline, the odds to score under 152 pts is -115, meaning you would need to bet 115 to get 100 with a return of 215 total

In [47]:
# Now Let's Look at Implied probability of under odds = Risk/Return for BetUs 
under_beton_ip = (115/(100+115))*100
print(under_beton_ip)

53.48837209302325


In [48]:
total_ou_ip_beton = over_beton_ip + under_beton_ip
print(total_ou_ip_beton)

104.70788428814521


On BetUs, the total implied probability is 104.71 which means that the margin of payout for the bookkeeper is 4.71%

##Actual OverUnder Odds without Vigor on BetUS

In [49]:
actual_over_betus_ip = over_betus_ip/total_ou_ip_betus
print(actual_over_betus_ip)

0.5


In [50]:
actual_under_betus_ip = under_betus_ip/total_ou_ip_betus
print(actual_under_betus_ip)

0.5


In [51]:
actual_over_betus_ip + actual_under_betus_ip
#Validation

1.0

We can see that with BetUS, the actual over odds and under odds are the exact same. 

##Actual OverUnder Odds without Vigor on BetOnline

In [52]:
actual_over_beton_ip = over_beton_ip/total_ou_ip_beton
print(actual_over_beton_ip)

0.4891657638136511


In [53]:
actual_under_beton_ip = under_beton_ip/total_ou_ip_beton
print(actual_under_beton_ip)

0.5108342361863488


In [54]:
actual_over_beton_ip + actual_under_beton_ip
#Validation

1.0

We can see that with BetOnline, the over odds are less than the under odds, so Betonline believes that the game will finish under 152 points total. 

##Calculating Variance of Actual OverUnder Odds across Sites

In [71]:
#Let's first calculate the difference in odds between the two sites for Over
over_diff = abs(actual_over_betus_ip -actual_over_beton_ip)
print(over_diff)

0.010834236186348878


*There is a 1.08% difference between the over odds on the two sites

In [72]:
over_var = var((actual_over_betus_ip, actual_over_beton_ip))
print(over_var)

2.934516843539787e-05


In [73]:
#Let's first calculate the difference in odds between the two sites for UNC
under_diff = abs(actual_under_betus_ip -actual_under_beton_ip)
print(under_diff)

0.010834236186348822


There is a 1.08% difference between the under odds on the two sites

In [74]:
under_var = var((actual_under_betus_ip, actual_under_beton_ip))
print(under_var)

2.934516843539757e-05


##Overall Findings within Total

When examining what betting move to make in the over under category, BetUs is pretty much a coin flip as far as true odds of the game being over/under 152. Within Betonline however, I would choose to bet under 152 because the odds are a little better. Once again, the differences in odds in both over and under categories are similar across both sites and additionally thir variance is nearly identical

# Conclusions

* Variances and differences across sites are very small
* Kansas is the overall favorite across both sites
* In the point spread converstion, BetOnline favors Kansas while Betus favors UNC
* In the over under conversation, Betus doesn't favor either over or under and within BetOnline under is favored. 
