kyleskom · tunetyme · Nov 1, 2023 · Nov 1, 2023 · Nov 1, 2023 · Nov 1, 2023
diff --git a/Models/XGBoost_Models/XGBoost_69.4%_ML-3.json b/Models/XGBoost_Models/XGBoost_69.4%_ML-3.json
diff --git a/Program_Overview b/Program_Overview
@@ -0,0 +1,93 @@
+Certainly. I'll provide a brief description of each file and how they interconnect to create a holistic program. 
+
+### 1. Primary Scripts:
+
+- **main.py**:
+  - **Description**: This is the primary entry point for the application. It parses command-line arguments, initiates data processing, model training, and predictions. Outputs the predictions and expected value for betting.
+  - **Connection**: Calls functions or scripts from the `src` directory to fetch data, process data, and run models.
+
+### 2. Data Processing (`src/Process-Data` directory):
+
+- **Get_Data.py**:
+  - **Description**: Fetches game data, including teams, scores, and dates.
+  - **Connection**: Outputs data that can be used by other scripts for feature creation and model training.
+
+- **Get_Odds_Data.py**:
+  - **Description**: Fetches betting odds data.
+  - **Connection**: Outputs odds data that can be used alongside game data to create a holistic dataset for training and predictions.
+
+- **Add_Days_Rest.py**:
+  - **Description**: Adds a feature indicating the number of rest days for teams between games.
+  - **Connection**: Modifies the dataset fetched by `Get_Data.py` to include the days of rest feature.
+
+- **Create_Games.py**:
+  - **Description**: Processes the raw data to generate game-related features and prepares the dataset for training and predictions.
+  - **Connection**: Uses data fetched by `Get_Data.py` and `Get_Odds_Data.py`, and outputs a dataset ready for model training.
+
+### 3. Model Training (`src/Train-Models` directory):
+
+- **XGBoost_Model_ML.py** and **XGBoost_Model_UO.py**:
+  - **Description**: Train the XGBoost model for predicting game outcomes and under/overs respectively.
+  - **Connection**: Uses the dataset generated by `Create_Games.py` and outputs trained models.
+
+- **NN_Model_ML.py** and **NN_Model_UO.py**:
+  - **Description**: Train neural network models for predicting game outcomes and under/overs respectively.
+  - **Connection**: Uses the dataset generated by `Create_Games.py` and outputs trained models.
+
+- **Logistic_Regression_ML.py** and **Logistic_Regression_UO.py**:
+  - **Description**: Train logistic regression models for predicting game outcomes and under/overs respectively.
+  - **Connection**: Uses the dataset generated by `Create_Games.py` and outputs trained models.
+
+### 4. Utility Scripts (`src` directory):
+
+- **Expected_Value.py**:
+  - **Description**: Calculates the expected value of bets based on predictions and odds.
+  - **Connection**: Used by `main.py` to provide insights into the potential value of bets.
+
+- **Kelly_Criterion.py**:
+  - **Description**: Calculates the recommended fraction of the bankroll to bet based on the model's edge.
+  - **Connection**: Used by `main.py` to provide betting recommendations.
+
+- **tools.py**:
+  - **Description**: Contains utility functions, like fetching current date games and formatting outputs.
+  - **Connection**: Used by multiple scripts for various utility purposes.
+
+- **SbrOddsProvider.py**:
+  - **Description**: Handles fetching of odds data from external sources.
+  - **Connection**: Used by `Get_Odds_Data.py` to fetch betting odds.
+
+### 5. Model Runners (`src` directory):
+
+- **NN_Runner.py** and **XGBoost_Runner.py**:
+  - **Description**: Contains methods to run predictions using the respective models.
+  - **Connection**: Called by `main.py` to get model predictions based on user input.
+
+### 6. Model Files:
+
+- **XGBoost_68.9%_ML-3.json** and **XGBoost_54.8%_UO-8.json**:
+  - **Description**: These are saved trained XGBoost models.
+  - **Connection**: These models can be loaded by the runners to make predictions without retraining.
+
+### 7. Neural Network Models:
+
+- **keras_metadata.pb**, **saved_model.pb**, **variables.data-00000-of-00001**, and **variables.index**:
+  - **Description**: These are components of saved neural network models trained using TensorFlow/Keras.
+  - **Connection**: These can be loaded by the runners to make predictions using neural network models.
+
+### 8. Databases:
+
+- **dataset.sqlite** and **odds.sqlite**:
+  - **Description**: Databases containing game data and betting odds respectively.
+  - **Connection**: Used by data processing scripts to fetch required data.
+
+### 9. Miscellaneous:
+
+- **requirements.txt**:
+  - **Description**: Lists the Python packages required to run the application.
+  - **Connection**: Helps users set up the necessary environment to run the program.
+
+### Holistic Overview:
+
+The program begins with `main.py`, which calls various utilities and scripts based on user input. Data is fetched and processed in the `Process-Data` directory, then used to train models in the `Train-Models` directory. Once models are trained, predictions can be made and output to the user, along with betting insights using utility scripts.
+
+I hope this provides a clear understanding of the structure and flow of the program. Let me know if you'd like more details on any specific component!
diff --git a/src/Predict/XGBoost_Runner.py b/src/Predict/XGBoost_Runner.py
@@ -12,7 +12,7 @@
 # from src.Utils.tools import get_json_data, to_data_frame, get_todays_games_json, create_todays_games
 init()
 xgb_ml = xgb.Booster()
-xgb_ml.load_model('Models/XGBoost_Models/XGBoost_68.9%_ML-3.json')
+xgb_ml.load_model('Models/XGBoost_Models/XGBoost_69.4%_ML-3.json')
 xgb_uo = xgb.Booster()
 xgb_uo.load_model('Models/XGBoost_Models/XGBoost_54.8%_UO-8.json')
 

diff --git a/src/Process-Data/Create_Games.py b/src/Process-Data/Create_Games.py
@@ -7,12 +7,12 @@
 import pandas as pd
 from tqdm import tqdm
 
-sys.path.insert(1, os.path.join(sys.path[0], '..'))
+sys.path.insert(1, os.path.join(sys.path[0], '../..'))
 from src.Utils.Dictionaries import team_index_07, team_index_08, team_index_12, team_index_13, team_index_14, team_index_current
 
 # season_array = ["2007-08", "2008-09", "2009-10", "2010-11", "2011-12", "2012-13", "2013-14", "2014-15", "2015-16",
 #                 "2016-17", "2017-18", "2018-19", "2019-20", "2020-21", "2021-22", "2022-23"]
-season_array = ["2012-13", "2013-14", "2014-15", "2015-16", "2016-17", "2017-18", "2018-19", "2019-20", "2020-21", "2021-22", "2022-23"]
+season_array = ["2012-13", "2013-14", "2014-15", "2015-16", "2016-17", "2017-18", "2018-19", "2019-20", "2020-21", "2021-22", "2022-23","2023-24"]
 
 df = pd.DataFrame
 scores = []
@@ -85,7 +85,7 @@
             elif season == '2013-14':
                 home_team_series = team_df.iloc[team_index_13.get(home_team)]
                 away_team_series = team_df.iloc[team_index_13.get(away_team)]
-            elif season == '2022-23':
+            elif season == '2022-23' or season == '2023-24':
                 home_team_series = team_df.iloc[team_index_current.get(home_team)]
                 away_team_series = team_df.iloc[team_index_current.get(away_team)]
             else:
@@ -110,11 +110,17 @@
 frame['OU-Cover'] = np.asarray(OU_Cover)
 frame['Days-Rest-Home'] = np.asarray(days_rest_home)
 frame['Days-Rest-Away'] = np.asarray(days_rest_away)
+
+# Calculate the Simple Moving Average (SMA) for Points_Scored
+window_size = 15  # or any desired window size
+frame['SMA_Points_Scored_Home'] = frame.groupby('TEAM_NAME')['PTS'].rolling(window=window_size).mean().reset_index(0, drop=True)
+frame['SMA_Points_Scored_Away'] = frame.groupby('TEAM_NAME.1')['PTS.1'].rolling(window=window_size).mean().reset_index(0, drop=True)
+
 # fix types
 for field in frame.columns.values:
     if 'TEAM_' in field  or 'Date' in field or field not in frame:
         continue
     frame[field] = frame[field].astype(float)
 con = sqlite3.connect("../../Data/dataset.sqlite")
-frame.to_sql("dataset_2012-23", con, if_exists="replace")
-con.close()
+frame.to_sql("dataset_2012-24", con, if_exists="replace")
+con.close()
diff --git a/src/Process-Data/Get_Data.py b/src/Process-Data/Get_Data.py
@@ -1,70 +1,120 @@
 import os
-import random
 import sqlite3
 import sys
-import time
-from datetime import date, datetime, timedelta
+from datetime import datetime
 
+import numpy as np
+import pandas as pd
 from tqdm import tqdm
 
-from src.Utils.tools import get_json_data, to_data_frame
-
 sys.path.insert(1, os.path.join(sys.path[0], '..'))
+from src.Utils.Dictionaries import team_index_07, team_index_08, team_index_12, team_index_13, team_index_14, team_index_current
 
-url = 'https://stats.nba.com/stats/' \
-      'leaguedashteamstats?Conference=&' \
-      'DateFrom=10%2F01%2F{2}&DateTo={0}%2F{1}%2F{3}' \
-      '&Division=&GameScope=&GameSegment=&LastNGames=0&' \
-      'LeagueID=00&Location=&MeasureType=Base&Month=0&' \
-      'OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&' \
-      'PerMode=PerGame&Period=0&PlayerExperience=&' \
-      'PlayerPosition=&PlusMinus=N&Rank=N&' \
-      'Season={4}' \
-      '&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&' \
-      'StarterBench=&TeamID=0&TwoWay=0&VsConference=&VsDivision='
+# season_array = ["2007-08", "2008-09", "2009-10", "2010-11", "2011-12", "2012-13", "2013-14", "2014-15", "2015-16",
+#                 "2016-17", "2017-18", "2018-19", "2019-20", "2020-21", "2021-22", "2022-23"]
+season_array = ["2012-13", "2013-14", "2014-15", "2015-16", "2016-17", "2017-18", "2018-19", "2019-20", "2020-21", "2021-22", "2022-23"]
 
-# year = [2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022]
-year = [2022, 2023]
-season = ["2022-23"]
-# season = ["2007-08", "2008-09", "2009-10", "2010-11", "2011-12", "2012-13", "2013-14", "2014-15", "2015-16", "2016-17",
-#           "2017-18", "2018-19", "2019-20", "2020-2021", "2021-2022"]
+df = pd.DataFrame
+scores = []
+win_margin = []
+OU = []
+OU_Cover = []
+games = []
+days_rest_away = []
+days_rest_home = []
+teams_con = sqlite3.connect("../../Data/teams.sqlite")
+odds_con = sqlite3.connect("../../Data/odds.sqlite")
 
-month = [10, 11, 12, 1, 2, 3, 4, 5, 6]
-days = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
-        31]
+for season in tqdm(season_array):
+    odds_df = pd.read_sql_query(f"select * from \"odds_{season}\"", odds_con, index_col="index")
+    team_table_str = "teams_{}-{}-" + season
+    year_count = 0
 
-begin_year_pointer = year[0]
-end_year_pointer = year[0]
-count = 0
+    for row in odds_df.itertuples():
+        home_team = row[3]
+        away_team = row[4]
 
-con = sqlite3.connect("../../Data/teams.sqlite")
+        date = row[2]
+        date_array = date.split('-')
+        if not date_array or len(date_array) < 2:
+            continue
+        year = date_array[0] + '-' + date_array[1]
+        month = date_array[2][:2]
+        day = date_array[2][2:]
 
-for season1 in tqdm(season):
-    for month1 in tqdm(month):
-        if month1 == 1:
-            count += 1
-            end_year_pointer = year[count]
-        for day1 in tqdm(days):
-            if month1 == 10 and day1 < 19:
-                continue
-            if month1 in [4, 6, 9, 11] and day1 > 30:
+        if month[0] == '0':
+            month = month[1:]
+        if day[0] == '0':
+            day = day[1:]
+        if int(month) == 1:
+            year_count = 1
+        end_year_pointer = int(date_array[0]) + year_count
+        if end_year_pointer == datetime.now().year:
+            if int(month) == datetime.now().month and int(day) >= datetime.now().day:
                 continue
-            if month1 == 2 and day1 > 28:
+            if int(month) > datetime.now().month:
                 continue
-            if end_year_pointer == datetime.now().year:
-                if month1 == datetime.now().month and day1 > datetime.now().day:
-                    continue
-                if month1 > datetime.now().month:
-                    continue
-            general_data = get_json_data(url.format(month1, day1, begin_year_pointer, end_year_pointer, season1))
-            general_df = to_data_frame(general_data)
-            real_date = date(year=end_year_pointer, month=month1, day=day1) + timedelta(days=1)
-            general_df['Date'] = str(real_date)
 
-            x = str(real_date).split('-')
-            general_df.to_sql(f"teams_{season1}-{str(int(x[1]))}-{str(int(x[2]))}", con, if_exists="replace")
+        team_df = pd.read_sql_query(f"select * from \"teams_{year}-{month}-{day}\"", teams_con, index_col="index")
+        if len(team_df.index) == 30:
+            scores.append(row[9])
+            OU.append(row[5])
+            days_rest_home.append(row[11])
+            days_rest_away.append(row[12])
+            if row[10] > 0:
+                win_margin.append(1)
+            else:
+                win_margin.append(0)
 
-            time.sleep(random.randint(1, 3))
-    begin_year_pointer = year[count]
+            if row[9] < row[5]:
+                OU_Cover.append(0)
+            elif row[9] > row[5]:
+                OU_Cover.append(1)
+            elif row[9] == row[5]:
+                OU_Cover.append(2)
 
+            if season == '2007-08':
+                home_team_series = team_df.iloc[team_index_07.get(home_team)]
+                away_team_series = team_df.iloc[team_index_07.get(away_team)]
+            elif season == '2008-09' or season == "2009-10" or season == "2010-11" or season == "2011-12":
+                home_team_series = team_df.iloc[team_index_08.get(home_team)]
+                away_team_series = team_df.iloc[team_index_08.get(away_team)]
+            elif season == "2012-13":
+                home_team_series = team_df.iloc[team_index_12.get(home_team)]
+                away_team_series = team_df.iloc[team_index_12.get(away_team)]
+            elif season == '2013-14':
+                home_team_series = team_df.iloc[team_index_13.get(home_team)]
+                away_team_series = team_df.iloc[team_index_13.get(away_team)]
+            elif season == '2022-23':
+                home_team_series = team_df.iloc[team_index_current.get(home_team)]
+                away_team_series = team_df.iloc[team_index_current.get(away_team)]
+            else:
+                try:
+                    home_team_series = team_df.iloc[team_index_14.get(home_team)]
+                    away_team_series = team_df.iloc[team_index_14.get(away_team)]
+                except Exception as e:
+                    print(home_team)
+                    raise e
+            game = pd.concat([home_team_series, away_team_series.rename(
+                index={col:f"{col}.1" for col in team_df.columns.values}
+            )])
+            games.append(game)
+odds_con.close()
+teams_con.close()
+season = pd.concat(games, ignore_index=True, axis=1)
+season = season.T
+frame = season.drop(columns=['TEAM_ID', 'CFID', 'CFPARAMS', 'Unnamed: 0', 'Unnamed: 0.1', 'CFPARAMS.1', 'TEAM_ID.1', 'CFID.1'])
+frame['Score'] = np.asarray(scores)
+frame['Home-Team-Win'] = np.asarray(win_margin)
+frame['OU'] = np.asarray(OU)
+frame['OU-Cover'] = np.asarray(OU_Cover)
+frame['Days-Rest-Home'] = np.asarray(days_rest_home)
+frame['Days-Rest-Away'] = np.asarray(days_rest_away)
+# fix types
+for field in frame.columns.values:
+    if 'TEAM_' in field  or 'Date' in field or field not in frame:
+        continue
+    frame[field] = frame[field].astype(float)
+con = sqlite3.connect("../../Data/dataset.sqlite")
+frame.to_sql("dataset_2012-23", con, if_exists="replace")
 con.close()
diff --git a/src/Utils/Kelly_Criterion.py b/src/Utils/Kelly_Criterion.py
@@ -2,16 +2,19 @@ def american_to_decimal(american_odds):
     """
     Converts American odds to decimal odds (European odds).
     """
+    if american_odds is None:
+        raise ValueError("American odds value cannot be None.")
     if american_odds >= 100:
-        decimal_odds = (american_odds / 100)
+        decimal_odds = (american_odds / 100) + 1
     else:
-        decimal_odds = (100 / abs(american_odds))
+        decimal_odds = (100 / abs(american_odds)) + 1
     return round(decimal_odds, 2)
 
+
 def calculate_kelly_criterion(american_odds, model_prob):
     """
     Calculates the fraction of the bankroll to be wagered on each bet
     """
     decimal_odds = american_to_decimal(american_odds)
     bankroll_fraction = round((100 * (decimal_odds * model_prob - (1 - model_prob))) / decimal_odds, 2)
-    return bankroll_fraction if bankroll_fraction > 0 else 0
+    return bankroll_fraction if bankroll_fraction > 0 else 0