Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tunetyme patch 1 #369

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Models/XGBoost_Models/XGBoost_69.4%_ML-3.json

Large diffs are not rendered by default.

93 changes: 93 additions & 0 deletions Program_Overview
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
Certainly. I'll provide a brief description of each file and how they interconnect to create a holistic program.

### 1. Primary Scripts:

- **main.py**:
- **Description**: This is the primary entry point for the application. It parses command-line arguments, initiates data processing, model training, and predictions. Outputs the predictions and expected value for betting.
- **Connection**: Calls functions or scripts from the `src` directory to fetch data, process data, and run models.

### 2. Data Processing (`src/Process-Data` directory):

- **Get_Data.py**:
- **Description**: Fetches game data, including teams, scores, and dates.
- **Connection**: Outputs data that can be used by other scripts for feature creation and model training.

- **Get_Odds_Data.py**:
- **Description**: Fetches betting odds data.
- **Connection**: Outputs odds data that can be used alongside game data to create a holistic dataset for training and predictions.

- **Add_Days_Rest.py**:
- **Description**: Adds a feature indicating the number of rest days for teams between games.
- **Connection**: Modifies the dataset fetched by `Get_Data.py` to include the days of rest feature.

- **Create_Games.py**:
- **Description**: Processes the raw data to generate game-related features and prepares the dataset for training and predictions.
- **Connection**: Uses data fetched by `Get_Data.py` and `Get_Odds_Data.py`, and outputs a dataset ready for model training.

### 3. Model Training (`src/Train-Models` directory):

- **XGBoost_Model_ML.py** and **XGBoost_Model_UO.py**:
- **Description**: Train the XGBoost model for predicting game outcomes and under/overs respectively.
- **Connection**: Uses the dataset generated by `Create_Games.py` and outputs trained models.

- **NN_Model_ML.py** and **NN_Model_UO.py**:
- **Description**: Train neural network models for predicting game outcomes and under/overs respectively.
- **Connection**: Uses the dataset generated by `Create_Games.py` and outputs trained models.

- **Logistic_Regression_ML.py** and **Logistic_Regression_UO.py**:
- **Description**: Train logistic regression models for predicting game outcomes and under/overs respectively.
- **Connection**: Uses the dataset generated by `Create_Games.py` and outputs trained models.

### 4. Utility Scripts (`src` directory):

- **Expected_Value.py**:
- **Description**: Calculates the expected value of bets based on predictions and odds.
- **Connection**: Used by `main.py` to provide insights into the potential value of bets.

- **Kelly_Criterion.py**:
- **Description**: Calculates the recommended fraction of the bankroll to bet based on the model's edge.
- **Connection**: Used by `main.py` to provide betting recommendations.

- **tools.py**:
- **Description**: Contains utility functions, like fetching current date games and formatting outputs.
- **Connection**: Used by multiple scripts for various utility purposes.

- **SbrOddsProvider.py**:
- **Description**: Handles fetching of odds data from external sources.
- **Connection**: Used by `Get_Odds_Data.py` to fetch betting odds.

### 5. Model Runners (`src` directory):

- **NN_Runner.py** and **XGBoost_Runner.py**:
- **Description**: Contains methods to run predictions using the respective models.
- **Connection**: Called by `main.py` to get model predictions based on user input.

### 6. Model Files:

- **XGBoost_68.9%_ML-3.json** and **XGBoost_54.8%_UO-8.json**:
- **Description**: These are saved trained XGBoost models.
- **Connection**: These models can be loaded by the runners to make predictions without retraining.

### 7. Neural Network Models:

- **keras_metadata.pb**, **saved_model.pb**, **variables.data-00000-of-00001**, and **variables.index**:
- **Description**: These are components of saved neural network models trained using TensorFlow/Keras.
- **Connection**: These can be loaded by the runners to make predictions using neural network models.

### 8. Databases:

- **dataset.sqlite** and **odds.sqlite**:
- **Description**: Databases containing game data and betting odds respectively.
- **Connection**: Used by data processing scripts to fetch required data.

### 9. Miscellaneous:

- **requirements.txt**:
- **Description**: Lists the Python packages required to run the application.
- **Connection**: Helps users set up the necessary environment to run the program.

### Holistic Overview:

The program begins with `main.py`, which calls various utilities and scripts based on user input. Data is fetched and processed in the `Process-Data` directory, then used to train models in the `Train-Models` directory. Once models are trained, predictions can be made and output to the user, along with betting insights using utility scripts.

I hope this provides a clear understanding of the structure and flow of the program. Let me know if you'd like more details on any specific component!
2 changes: 1 addition & 1 deletion src/Predict/XGBoost_Runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# from src.Utils.tools import get_json_data, to_data_frame, get_todays_games_json, create_todays_games
init()
xgb_ml = xgb.Booster()
xgb_ml.load_model('Models/XGBoost_Models/XGBoost_68.9%_ML-3.json')
xgb_ml.load_model('Models/XGBoost_Models/XGBoost_69.4%_ML-3.json')
xgb_uo = xgb.Booster()
xgb_uo.load_model('Models/XGBoost_Models/XGBoost_54.8%_UO-8.json')

Expand Down
16 changes: 11 additions & 5 deletions src/Process-Data/Create_Games.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@
import pandas as pd
from tqdm import tqdm

sys.path.insert(1, os.path.join(sys.path[0], '..'))
sys.path.insert(1, os.path.join(sys.path[0], '../..'))
from src.Utils.Dictionaries import team_index_07, team_index_08, team_index_12, team_index_13, team_index_14, team_index_current

# season_array = ["2007-08", "2008-09", "2009-10", "2010-11", "2011-12", "2012-13", "2013-14", "2014-15", "2015-16",
# "2016-17", "2017-18", "2018-19", "2019-20", "2020-21", "2021-22", "2022-23"]
season_array = ["2012-13", "2013-14", "2014-15", "2015-16", "2016-17", "2017-18", "2018-19", "2019-20", "2020-21", "2021-22", "2022-23"]
season_array = ["2012-13", "2013-14", "2014-15", "2015-16", "2016-17", "2017-18", "2018-19", "2019-20", "2020-21", "2021-22", "2022-23","2023-24"]

df = pd.DataFrame
scores = []
Expand Down Expand Up @@ -85,7 +85,7 @@
elif season == '2013-14':
home_team_series = team_df.iloc[team_index_13.get(home_team)]
away_team_series = team_df.iloc[team_index_13.get(away_team)]
elif season == '2022-23':
elif season == '2022-23' or season == '2023-24':
home_team_series = team_df.iloc[team_index_current.get(home_team)]
away_team_series = team_df.iloc[team_index_current.get(away_team)]
else:
Expand All @@ -110,11 +110,17 @@
frame['OU-Cover'] = np.asarray(OU_Cover)
frame['Days-Rest-Home'] = np.asarray(days_rest_home)
frame['Days-Rest-Away'] = np.asarray(days_rest_away)

# Calculate the Simple Moving Average (SMA) for Points_Scored
window_size = 15 # or any desired window size
frame['SMA_Points_Scored_Home'] = frame.groupby('TEAM_NAME')['PTS'].rolling(window=window_size).mean().reset_index(0, drop=True)
frame['SMA_Points_Scored_Away'] = frame.groupby('TEAM_NAME.1')['PTS.1'].rolling(window=window_size).mean().reset_index(0, drop=True)

# fix types
for field in frame.columns.values:
if 'TEAM_' in field or 'Date' in field or field not in frame:
continue
frame[field] = frame[field].astype(float)
con = sqlite3.connect("../../Data/dataset.sqlite")
frame.to_sql("dataset_2012-23", con, if_exists="replace")
con.close()
frame.to_sql("dataset_2012-24", con, if_exists="replace")
con.close()
152 changes: 101 additions & 51 deletions src/Process-Data/Get_Data.py
Original file line number Diff line number Diff line change
@@ -1,70 +1,120 @@
import os
import random
import sqlite3
import sys
import time
from datetime import date, datetime, timedelta
from datetime import datetime

import numpy as np
import pandas as pd
from tqdm import tqdm

from src.Utils.tools import get_json_data, to_data_frame

sys.path.insert(1, os.path.join(sys.path[0], '..'))
from src.Utils.Dictionaries import team_index_07, team_index_08, team_index_12, team_index_13, team_index_14, team_index_current

url = 'https://stats.nba.com/stats/' \
'leaguedashteamstats?Conference=&' \
'DateFrom=10%2F01%2F{2}&DateTo={0}%2F{1}%2F{3}' \
'&Division=&GameScope=&GameSegment=&LastNGames=0&' \
'LeagueID=00&Location=&MeasureType=Base&Month=0&' \
'OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&' \
'PerMode=PerGame&Period=0&PlayerExperience=&' \
'PlayerPosition=&PlusMinus=N&Rank=N&' \
'Season={4}' \
'&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&' \
'StarterBench=&TeamID=0&TwoWay=0&VsConference=&VsDivision='
# season_array = ["2007-08", "2008-09", "2009-10", "2010-11", "2011-12", "2012-13", "2013-14", "2014-15", "2015-16",
# "2016-17", "2017-18", "2018-19", "2019-20", "2020-21", "2021-22", "2022-23"]
season_array = ["2012-13", "2013-14", "2014-15", "2015-16", "2016-17", "2017-18", "2018-19", "2019-20", "2020-21", "2021-22", "2022-23"]

# year = [2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022]
year = [2022, 2023]
season = ["2022-23"]
# season = ["2007-08", "2008-09", "2009-10", "2010-11", "2011-12", "2012-13", "2013-14", "2014-15", "2015-16", "2016-17",
# "2017-18", "2018-19", "2019-20", "2020-2021", "2021-2022"]
df = pd.DataFrame
scores = []
win_margin = []
OU = []
OU_Cover = []
games = []
days_rest_away = []
days_rest_home = []
teams_con = sqlite3.connect("../../Data/teams.sqlite")
odds_con = sqlite3.connect("../../Data/odds.sqlite")

month = [10, 11, 12, 1, 2, 3, 4, 5, 6]
days = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
31]
for season in tqdm(season_array):
odds_df = pd.read_sql_query(f"select * from \"odds_{season}\"", odds_con, index_col="index")
team_table_str = "teams_{}-{}-" + season
year_count = 0

begin_year_pointer = year[0]
end_year_pointer = year[0]
count = 0
for row in odds_df.itertuples():
home_team = row[3]
away_team = row[4]

con = sqlite3.connect("../../Data/teams.sqlite")
date = row[2]
date_array = date.split('-')
if not date_array or len(date_array) < 2:
continue
year = date_array[0] + '-' + date_array[1]
month = date_array[2][:2]
day = date_array[2][2:]

for season1 in tqdm(season):
for month1 in tqdm(month):
if month1 == 1:
count += 1
end_year_pointer = year[count]
for day1 in tqdm(days):
if month1 == 10 and day1 < 19:
continue
if month1 in [4, 6, 9, 11] and day1 > 30:
if month[0] == '0':
month = month[1:]
if day[0] == '0':
day = day[1:]
if int(month) == 1:
year_count = 1
end_year_pointer = int(date_array[0]) + year_count
if end_year_pointer == datetime.now().year:
if int(month) == datetime.now().month and int(day) >= datetime.now().day:
continue
if month1 == 2 and day1 > 28:
if int(month) > datetime.now().month:
continue
if end_year_pointer == datetime.now().year:
if month1 == datetime.now().month and day1 > datetime.now().day:
continue
if month1 > datetime.now().month:
continue
general_data = get_json_data(url.format(month1, day1, begin_year_pointer, end_year_pointer, season1))
general_df = to_data_frame(general_data)
real_date = date(year=end_year_pointer, month=month1, day=day1) + timedelta(days=1)
general_df['Date'] = str(real_date)

x = str(real_date).split('-')
general_df.to_sql(f"teams_{season1}-{str(int(x[1]))}-{str(int(x[2]))}", con, if_exists="replace")
team_df = pd.read_sql_query(f"select * from \"teams_{year}-{month}-{day}\"", teams_con, index_col="index")
if len(team_df.index) == 30:
scores.append(row[9])
OU.append(row[5])
days_rest_home.append(row[11])
days_rest_away.append(row[12])
if row[10] > 0:
win_margin.append(1)
else:
win_margin.append(0)

time.sleep(random.randint(1, 3))
begin_year_pointer = year[count]
if row[9] < row[5]:
OU_Cover.append(0)
elif row[9] > row[5]:
OU_Cover.append(1)
elif row[9] == row[5]:
OU_Cover.append(2)

if season == '2007-08':
home_team_series = team_df.iloc[team_index_07.get(home_team)]
away_team_series = team_df.iloc[team_index_07.get(away_team)]
elif season == '2008-09' or season == "2009-10" or season == "2010-11" or season == "2011-12":
home_team_series = team_df.iloc[team_index_08.get(home_team)]
away_team_series = team_df.iloc[team_index_08.get(away_team)]
elif season == "2012-13":
home_team_series = team_df.iloc[team_index_12.get(home_team)]
away_team_series = team_df.iloc[team_index_12.get(away_team)]
elif season == '2013-14':
home_team_series = team_df.iloc[team_index_13.get(home_team)]
away_team_series = team_df.iloc[team_index_13.get(away_team)]
elif season == '2022-23':
home_team_series = team_df.iloc[team_index_current.get(home_team)]
away_team_series = team_df.iloc[team_index_current.get(away_team)]
else:
try:
home_team_series = team_df.iloc[team_index_14.get(home_team)]
away_team_series = team_df.iloc[team_index_14.get(away_team)]
except Exception as e:
print(home_team)
raise e
game = pd.concat([home_team_series, away_team_series.rename(
index={col:f"{col}.1" for col in team_df.columns.values}
)])
games.append(game)
odds_con.close()
teams_con.close()
season = pd.concat(games, ignore_index=True, axis=1)
season = season.T
frame = season.drop(columns=['TEAM_ID', 'CFID', 'CFPARAMS', 'Unnamed: 0', 'Unnamed: 0.1', 'CFPARAMS.1', 'TEAM_ID.1', 'CFID.1'])
frame['Score'] = np.asarray(scores)
frame['Home-Team-Win'] = np.asarray(win_margin)
frame['OU'] = np.asarray(OU)
frame['OU-Cover'] = np.asarray(OU_Cover)
frame['Days-Rest-Home'] = np.asarray(days_rest_home)
frame['Days-Rest-Away'] = np.asarray(days_rest_away)
# fix types
for field in frame.columns.values:
if 'TEAM_' in field or 'Date' in field or field not in frame:
continue
frame[field] = frame[field].astype(float)
con = sqlite3.connect("../../Data/dataset.sqlite")
frame.to_sql("dataset_2012-23", con, if_exists="replace")
con.close()
9 changes: 6 additions & 3 deletions src/Utils/Kelly_Criterion.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,19 @@ def american_to_decimal(american_odds):
"""
Converts American odds to decimal odds (European odds).
"""
if american_odds is None:
raise ValueError("American odds value cannot be None.")
if american_odds >= 100:
decimal_odds = (american_odds / 100)
decimal_odds = (american_odds / 100) + 1
else:
decimal_odds = (100 / abs(american_odds))
decimal_odds = (100 / abs(american_odds)) + 1
return round(decimal_odds, 2)


def calculate_kelly_criterion(american_odds, model_prob):
"""
Calculates the fraction of the bankroll to be wagered on each bet
"""
decimal_odds = american_to_decimal(american_odds)
bankroll_fraction = round((100 * (decimal_odds * model_prob - (1 - model_prob))) / decimal_odds, 2)
return bankroll_fraction if bankroll_fraction > 0 else 0
return bankroll_fraction if bankroll_fraction > 0 else 0