# Our betting day begins here
**This is the first notebook that runs when we use our model**<br>

Before we bet on games, we need to wrap up our previous day of betting. This notebook concatenates the daily model results and shows how the model is performing in live games.<br>

In [36]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests
import datetime as dt
from datetime import timedelta
import time
import papermill as pm

In [37]:
def clean_unnamed(df):
    """
    Deletes 'Unnamed: 0' column that we somehow get after reading in our CSVs
    """
    if 'Unnamed: 0' in df.columns:
        df = df.drop(columns = ['Unnamed: 0'])
        return df
    print("Dataframe does not have 'Unnamed: 0' column.")

In [38]:
today = dt.date.today()

In [39]:
today_str = str(today)

# 3-letter team codes
The three-letter codes for each team

In [40]:
team_codes = [
    'ARI', 'ARI', 'ATL', 'BAL', 'BOS', 'CHW', 'CHC', 'CIN', 'CLE', 'COL', 'DET', 'HOU',
    'KCR', 'LAA', 'LAD', 'MIA', 'MIL', 'MIN', 'NYY', 'NYM', 'OAK', 'PHI', 'PIT',
    'SDP', 'SFG', 'SEA', 'STL', 'TBR', 'TEX', 'TOR', 'WSN'
]

# record_bets function
This function runs if we have bets to record.<br>

Our answers to the prompts will update yesterday_df.<br>

**When entering the three-letter code for the year, please make sure it matches the code in the team_codes list in the previous cell. Otherwise, you'll get an error and will need to go back to the cell where we read in the yesterday_df.**

In [41]:
def record_bets(num, yesterday_df):
    if num_games > 0:
        for i in range(num_games):
            what_team = input("Enter the team you bet on. ").upper()
            amount = input("Enter the amount you bet. ")
            if what_team in yesterday_df['Away'].values:
                yesterday_df.loc[yesterday_df['Away'] == what_team, 'A_Amt_Bet'] = float(amount)
                won = float(input("How much did you win, including the stake? "))
                yesterday_df.loc[yesterday_df['Away'] == what_team, 'Money_Won'] = won
            else:
                yesterday_df.loc[yesterday_df['Home'] == what_team, 'H_Amt_Bet'] = float(amount)
                won = float(input("How much did you win, including the stake? "))
                yesterday_df.loc[yesterday_df['Home'] == what_team, 'Money_Won'] = won
            correct = input("Is the information you entered correct?").upper()
            if correct != 'Y':
                yesterday_df = pd.read_csv('Predictions_Results_' + yesterday_str + '.csv')
                record_bets(num)
                
        

Since we're not betting every single day, we input the number of days it's been since our last bet so we can derive the date to use to retrieve the CSV that needs to be filled in. So the "yesterday" variable might not necessarily be the previous day.

In [42]:
bets_to_record = input("Do you have bets to record? ").upper()
if bets_to_record == 'Y':
    since_last_bet = int(input("How many days since your last bet? "))
    yesterday = dt.date.today() - timedelta(days = since_last_bet)
    yesterday_str = str(yesterday)
    yesterday_df = pd.read_csv('Predictions_Results_' + yesterday_str + '.csv')
    num_games = int(input("How many games did you bet on during your most recent betting day? "))
    record_bets(num_games, yesterday_df)
    yesterday_df = yesterday_df.rename(columns = {'Away' : 'Away_Team', 'Home': 'Home_Team'})

Do you have bets to record? N


# yesterday_df
In these next cells, we'll show yesterday_df if it exists, and then update it with the betting information we just input. If we didn't bet and yesterday_df was not created, these cells will handle the error.

In [43]:
try:
    yesterday_df
    result = undefined_variable + 10
    print(result)
except NameError:
    print("The variable is not defined")

The variable is not defined


In [45]:
try:
    yesterday_df['Profit'] = yesterday_df['Money_Won'] - (yesterday_df['A_Amt_Bet'] + yesterday_df['H_Amt_Bet'])
    result = undefined_variable + 10
    print(result)
except NameError:
    print("The variable is not defined") 

The variable is not defined


In [46]:
try:
    yesterday_df
    result = undefined_variable + 10
    print(result)
except NameError:
    print("The variable is not defined")

The variable is not defined


# Creating CSV of game scores for season
Using the Papermill library, we'll run the notebook where we scrape all the MLB scores this season from <a href = 'https://www.baseball-reference.com/leagues/majors/2023-schedule.shtml'>Baseball Reference</a>, then use a temporary notebook, 'stored_scores' or output_notebook, to store the code, then we run Papermill's execute_notebook function.<br>

Then we can read in the 2023_Game_Scores csv and use it to merge with the Predictions_Results dataframe so we can evaluate how our model did with yesterday's games (or our most recent betting day). In the Live_Data_Wrangling notebook, this data will be used to create and derive the various win variables for our model.

In [47]:
input_notebook = 'Game_Scores.ipynb'

In [48]:
output_notebook = 'stored_scores.ipynb'

In [49]:
pm.execute_notebook(input_notebook, output_notebook)

HBox(children=(HTML(value='Executing'), FloatProgress(value=0.0, max=20.0), HTML(value='')))




{'cells': [{'cell_type': 'markdown',
   'metadata': {'tags': [],
    'papermill': {'exception': False,
     'start_time': '2023-08-22T20:48:02.243094',
     'end_time': '2023-08-22T20:48:02.251291',
     'duration': 0.008197,
     'status': 'completed'}},
   'source': "# Scraping 2021 and 2022 game scores\nWe only need to scrape two <a href='https://www.baseball-reference.com/leagues/majors/2022-schedule.shtml'>Baseball Reference</a> pages to wrangle scores from the 2021 and 2022 seasons.\n\nThis took a little figuring out because unlike most of the pages we scraped, this didn't come from a table. Instead, each score is listed under a date heading.\n",
   'id': 'af098dfb'},
  {'cell_type': 'code',
   'execution_count': 1,
   'metadata': {'tags': [],
    'papermill': {'exception': False,
     'start_time': '2023-08-22T20:48:02.267486',
     'end_time': '2023-08-22T20:48:02.911092',
     'duration': 0.643606,
     'status': 'completed'},
    'execution': {'iopub.status.busy': '2023-08-22

In [50]:
scores_df = pd.read_csv('2023_Game_Scores.csv')

In [51]:
scores_df = clean_unnamed(scores_df)

In [52]:
scores_df.head()

Unnamed: 0,Away_Team,Away_Score,Home_Team,Home_Score,Date
0,Baltimore Orioles,10,Boston Red Sox,9,2023-03-30
1,Milwaukee Brewers,0,Chicago Cubs,4,2023-03-30
2,Pittsburgh Pirates,5,Cincinnati Reds,4,2023-03-30
3,Chicago White Sox,3,Houston Astros,2,2023-03-30
4,Minnesota Twins,2,Kansas City Royals,0,2023-03-30


In [53]:
scores_df.tail()

Unnamed: 0,Away_Team,Away_Score,Home_Team,Home_Score,Date
1871,Boston Red Sox,4,Houston Astros,9,2023-08-21
1872,Kansas City Royals,4,Oakland Athletics,6,2023-08-21
1873,San Francisco Giants,4,Philadelphia Phillies,10,2023-08-21
1874,St. Louis Cardinals,1,Pittsburgh Pirates,11,2023-08-21
1875,Miami Marlins,2,San Diego Padres,6,2023-08-21


In [54]:
scores_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1876 entries, 0 to 1875
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Away_Team   1876 non-null   object
 1   Away_Score  1876 non-null   int64 
 2   Home_Team   1876 non-null   object
 3   Home_Score  1876 non-null   int64 
 4   Date        1876 non-null   object
dtypes: int64(2), object(3)
memory usage: 73.4+ KB


In [55]:
scores_df['Date'] = pd.to_datetime(scores_df['Date'])

try:
    yesterday_df['Date'] = pd.to_datetime(yesterday_df['Date'])
    result = undefined_variable + 10
    print(result)
except NameError:
    print("The variable is not defined")

The variable is not defined


Column indicating if home or road team won game

In [56]:
scores_df['Home_Win'] = np.where(scores_df['Home_Score'] > scores_df['Away_Score'], 1, 0)
scores_df['Away_Win'] = np.where(scores_df['Away_Score'] > scores_df['Home_Score'], 1, 0)

In [57]:
scores_df.head()

Unnamed: 0,Away_Team,Away_Score,Home_Team,Home_Score,Date,Home_Win,Away_Win
0,Baltimore Orioles,10,Boston Red Sox,9,2023-03-30,0,1
1,Milwaukee Brewers,0,Chicago Cubs,4,2023-03-30,1,0
2,Pittsburgh Pirates,5,Cincinnati Reds,4,2023-03-30,0,1
3,Chicago White Sox,3,Houston Astros,2,2023-03-30,0,1
4,Minnesota Twins,2,Kansas City Royals,0,2023-03-30,0,1


In [58]:
scores_df.tail()

Unnamed: 0,Away_Team,Away_Score,Home_Team,Home_Score,Date,Home_Win,Away_Win
1871,Boston Red Sox,4,Houston Astros,9,2023-08-21,1,0
1872,Kansas City Royals,4,Oakland Athletics,6,2023-08-21,1,0
1873,San Francisco Giants,4,Philadelphia Phillies,10,2023-08-21,1,0
1874,St. Louis Cardinals,1,Pittsburgh Pirates,11,2023-08-21,1,0
1875,Miami Marlins,2,San Diego Padres,6,2023-08-21,1,0


In [59]:
teams_list = scores_df['Away_Team'].unique()

In [60]:
teams_list = [
    'Arizona Diamondbacks',
    "Arizona D'Backs",
    'Atlanta Braves',
    'Baltimore Orioles',
    'Boston Red Sox',
    'Chicago White Sox',
    'Chicago Cubs',
    'Cincinnati Reds',
    'Cleveland Guardians',
    'Colorado Rockies',
    'Detroit Tigers',
    'Houston Astros',
    'Kansas City Royals',
    'Los Angeles Angels',
    'Los Angeles Dodgers',
    'Miami Marlins',
    'Milwaukee Brewers',
    'Minnesota Twins',
    'New York Yankees',
    'New York Mets',
    'Oakland Athletics',
    'Philadelphia Phillies',
    'Pittsburgh Pirates',
    'San Diego Padres',
    'San Francisco Giants',
    'Seattle Mariners',
    'St. Louis Cardinals',
    'Tampa Bay Rays',
    'Texas Rangers',
    'Toronto Blue Jays',
    'Washington Nationals'
]

# 3-letter team codes
This is where we convert the full team name to a 3-letter team code. Diamondbacks is sometimes referred to as 'D'Backs' so we need two entries for that team.

In [61]:
team_dict = {key: value for key, value in zip(teams_list, team_codes)}

In [62]:
team_dict

{'Arizona Diamondbacks': 'ARI',
 "Arizona D'Backs": 'ARI',
 'Atlanta Braves': 'ATL',
 'Baltimore Orioles': 'BAL',
 'Boston Red Sox': 'BOS',
 'Chicago White Sox': 'CHW',
 'Chicago Cubs': 'CHC',
 'Cincinnati Reds': 'CIN',
 'Cleveland Guardians': 'CLE',
 'Colorado Rockies': 'COL',
 'Detroit Tigers': 'DET',
 'Houston Astros': 'HOU',
 'Kansas City Royals': 'KCR',
 'Los Angeles Angels': 'LAA',
 'Los Angeles Dodgers': 'LAD',
 'Miami Marlins': 'MIA',
 'Milwaukee Brewers': 'MIL',
 'Minnesota Twins': 'MIN',
 'New York Yankees': 'NYY',
 'New York Mets': 'NYM',
 'Oakland Athletics': 'OAK',
 'Philadelphia Phillies': 'PHI',
 'Pittsburgh Pirates': 'PIT',
 'San Diego Padres': 'SDP',
 'San Francisco Giants': 'SFG',
 'Seattle Mariners': 'SEA',
 'St. Louis Cardinals': 'STL',
 'Tampa Bay Rays': 'TBR',
 'Texas Rangers': 'TEX',
 'Toronto Blue Jays': 'TOR',
 'Washington Nationals': 'WSN'}

In [63]:
scores_df['Home_Team'] = scores_df['Home_Team'].replace(team_dict)

In [64]:
scores_df['Away_Team'] = scores_df['Away_Team'].replace(team_dict)

In [65]:
filepath = r'C:\Users\Owner\Sports Betting\MLB_Game_Outcome\2023_Game_Scores.csv'
scores_df.to_csv(filepath)

# All game scores stored
We've just stored all of the season's game scores, including yesterday's, in a CSV. We need that to derive the win variables for our model data. Now we're going to slice yesterday's scores so that we can fill in the results and track our model's accuracy.<br>

If we didn't record bets and there's no need to update the model performance results, we created a custom exception so that the notebook stops here.

In [66]:
class NotebookStopException(Exception):
    pass

In [67]:
pred_yesterday = input("Did the model make predictions on yesterday's games? ").upper()

Did the model make predictions on yesterday's games? N


In [68]:
if not pred_yesterday == 'Y':
    print("No predictions made yesterday. Stopping notebook.")
    raise NotebookStopException()

No predictions made yesterday. Stopping notebook.


NotebookStopException: 

In [None]:
scores_yesterday = scores_df[scores_df['Date'] == yesterday_str]

In [None]:
scores_yesterday

# What about doubleheaders?
As we'll see in the Live_Data_Wrangling notebook, we scrape RotoGrinders to identify our starting pitchers each day. Most of the time when we have doubleheaders, we'll only get data on one of the games from RotoGrinders, as that's a daily fantasy sports site and most of the time only one game can be played in DFS when there's a doubleheader. So we'll include the names of the starting pitchers in the index, and when there's a doubleheader we'll just have to look up which of the games our model predicted based on the starting pitchers.<br>

This code block checks for doubleheaders in yesterday's scores. If there are doubleheaders, we'll get a prompt asking us which row(s) to drop.

In [None]:
if scores_yesterday['Away_Team'].value_counts()[0] > 1:
    print("There was at least one doubleheader yesterday.")
    num_dh = int(input("How many doubleheaders were played yesterday?"))
    scores_yesterday_copy = scores_yesterday.copy()
    for i in range(num_dh):
        idx_to_drop = int(input("Which index should be dropped from the scores_yesterday dataframe?"))
        scores_yesterday_copy.drop(idx_to_drop, inplace = True)
        scores_yesterday_copy.reset_index(drop = True, inplace = True)
        print(scores_yesterday_copy)
    scores_yesterday = scores_yesterday_copy

In [None]:
scores_yesterday = scores_yesterday[['Away_Team', 'Home_Team', 'Date', 'Home_Win']]

In [None]:
yesterday_df.info()

In [None]:
yesterday_df = pd.merge(yesterday_df, scores_yesterday, on = ['Away_Team', 'Home_Team', 'Date'], how = 'left')

In [None]:
yesterday_df['Result'] = np.where(yesterday_df['Pred'] == yesterday_df['Home_Win'], 1, 0)

In [None]:
yesterday_df = yesterday_df.rename(columns = {'Result' : 'Model_Acc'})

In [None]:
yesterday_df

# Re-ordering columns
Now that yesterday_df is filled, we'll re-order the columns, read in the CSV with all the previous model and betting results, concatenate and then re-store in the CSV.

In [None]:
new_order = ['Date', 'Away_Team', 'Home_Team', 'Away_Prob', 'Home_Prob', 'Away_ML', 'Home_ML', 'A_Amt_Bet', 'H_Amt_Bet', 'Money_Won', 'Profit', 'Pred', 'Home_Win', 'Model_Acc']

In [None]:
yesterday_df = yesterday_df[new_order]

In [None]:
yesterday_df

In [None]:
scoresheet = pd.read_csv('Predictions_Results.csv')

In [None]:
scoresheet = clean_unnamed(scoresheet)

In [None]:
scoresheet = pd.concat([scoresheet, yesterday_df])

In [None]:
scoresheet.info()

In [None]:
if scoresheet['Home_Win'].isna().any():
    scoresheet.dropna(subset = ['Home_Win'], inplace = True)

In [None]:
filepath = r'C:\Users\Owner\Sports Betting\MLB_Game_Outcome\Predictions_Results.csv'
scoresheet.to_csv(filepath)

In [None]:
total_profit = np.round(scoresheet['Profit'].sum(), 2)
games_pred = len(scoresheet)
acc = scoresheet['Model_Acc'].sum()
pct_profit = np.round(total_profit/(scoresheet['A_Amt_Bet'].sum() + scoresheet['H_Amt_Bet'].sum()), 2)
live_accuracy = np.round(acc/games_pred, 3)

In [None]:
print(f"You have bet a total of ${scoresheet['A_Amt_Bet'].sum() + scoresheet['H_Amt_Bet'].sum()} using this model")
print(f"You have won a total of ${scoresheet['Money_Won'].sum()} using this model")
print(f"Your total profit is ${total_profit}")
print(f"Your profit percentage is {np.round(pct_profit*100, 2)} percent")
print(f"Your model has made {games_pred} predictions on live games.")
print(f"Your model has made {acc} correct predictions on live games.")
print(f"The model's accuracy on live games is {np.round(live_accuracy*100, 2)} percent")

Just to try to keep the folder nice and tidy, we don't need to keep each individual day's Prediction_Results csv.

In [None]:
import os

csv_file_path = 'Predictions_Results_' + yesterday_str + '.csv'

try:
    os.remove(csv_file_path)
    print(f"CSV file '{csv_file_path}' deleted successfully.")
except FileNotFoundError:
    print(f"CSV file '{csv_file_path}' not found.")
except Exception as e:
    print(f"An error occurred: {e}")