# Props Data Organizer

## Definitions of Formulas Used in This Notebook

### Estimated Value (EV)
EV is the average amount you can expect to win or lose per bet if you placed the same bet many times. It helps identify profitable betting opportunities by comparing the expected return to the risk involved.

**Formula:**
$$
\text{EV} = (\text{Probability of Winning} \times \text{Profit if Win}) - (\text{Probability of Losing} \times \text{Loss if Lose})
$$

### Kelly Criterion
The Kelly Criterion is a formula used to determine the optimal size of a series of bets. It aims to maximize the logarithm of wealth, balancing the trade-off between risk and reward. The formula considers both the probability of winning and the odds offered, guiding you on how much of your bankroll to wager on each bet.

**Formula:**
$$
\text{Kelly Fraction} = \frac{(\text{Probability of Winning} \times (\text{Odds} + 1)) - 1}{\text{Odds}}
$$

### Variance
Variance in sports betting represents the spread or dispersion of actual outcomes around the expected value. It's a crucial metric for understanding the risk and volatility associated with betting predictions. Higher variance indicates more volatile and unpredictable outcomes, while lower variance suggests more consistent results.

**Formula:**
$$
\text{Variance} = \frac{\sum_{i=1}^{n} (x_i - \mu)^2}{n}
$$

Where:
- $x_i$ represents each individual outcome
- $\mu$ is the mean or expected value
- $n$ is the total number of observations

In the context of prop betting:
- High variance props (e.g., 3-pointers made) tend to be more risky but potentially more profitable
- Low variance props (e.g., minutes played) typically offer more consistent but lower returns


In [1]:
import pandas as pd 
import numpy as np
import time
import requests
from NBAData.gambling import *
from datetime import datetime
from Models.xgboost_prediction import *

today = datetime.now()
formatted_date = today.strftime("%m_%d_%y")

  from .autonotebook import tqdm as notebook_tqdm


### Grabs players odds for the day (US all boookmakers, DFS is prizepicks and underdogs)

In [None]:
from NBAPropFinder.NBAPropFinder import NBAPropFinder

nba_props = NBAPropFinder(region='us_dfs')
prizePicks = nba_props.dataframe
prizePicks.head(10)

In [2]:
from WNBAPropFinder.WNBAPropFinder import WNBAPropFinder
wnba_props = WNBAPropFinder(region='us')
prizePicks = wnba_props.dataframe
prizePicks

Scraping Odds API...
Organizing Data...


Unnamed: 0,BOOKMAKER,CATEGORY,NAME,OVER/UNDER,LINE,ODDS
0,FanDuel,player_points,Tina Charles,Over,16.5,-128
1,FanDuel,player_points,Tina Charles,Under,16.5,-102
2,FanDuel,player_points,Chelsea Gray,Over,12.5,-114
3,FanDuel,player_points,Chelsea Gray,Under,12.5,-114
4,FanDuel,player_points,Jewell Loyd,Over,11.5,-106
...,...,...,...,...,...,...
427,FanDuel,player_rebounds_assists,Caitlin Clark,Under,15.5,-136
428,FanDuel,player_rebounds_assists,Napheesa Collier,Over,13.5,106
429,FanDuel,player_rebounds_assists,Napheesa Collier,Under,13.5,-140
430,FanDuel,player_rebounds_assists,Caitlin Clark,Over,14.5,-130


### Single Bets from bookmakers that dont include prizePicks or UnderDogs

In [2]:
model = loadXGBModel('PTS')
bookmakers = pd.read_csv(f'CSV_FILES/PROPS_DATA/Playoffs_US(06_22_25).csv')
data = pd.read_csv('CSV_FILES/PLAYOFF_DATA/PLAYOFFS_25_PTS_FEATURES.csv')
games = get_espn_games(date_str='20250622')



In [None]:
# Dictionary mapping prop categories to their stat columns
propDict = {
    'player_points': 'PTS',
    'player_rebounds': 'REB',
    'player_assists': 'AST',
    # 'player_threes': 'FG3M',
    # 'player_blocks': 'BLK',
    # 'player_steals': 'STL',
    # 'player_field_goals': 'FGM',
    # 'player_threes': 'FG3M',
    # 'player_frees_made': 'FTM',
    # 'player_frees_attempts': 'FTA',
    # 'player_turnovers': 'TOV',
    # 'player_points_rebounds_assists': 'PTS+REB+AST',
    # 'player_points_rebounds': 'PTS+REB',
    # 'player_points_assists': 'PTS+AST',
    # 'player_rebounds_assists': 'REB+AST',
    # 'player_blocks_steals': 'BLK+STL'
}
models = {
    'PTS': loadXGBModel('PTS'),
    'REB': loadXGBModel('REB'),
    'AST': loadXGBModel('AST'),
}
all_results = []

for category, stat in propDict.items():
    print(f"Processing {category}...")
    data = pd.read_csv(f'CSV_FILES/PLAYOFF_DATA/PLAYOFFS_25_{stat}_FEATURES.csv')
    results = single_bet(data, bookmakers, models, games, category=category, stat_line=stat)
    all_results.append(results)

combined_results = pd.concat(all_results, ignore_index=True)

final_results = combined_results.sort_values(by='EV', ascending=False).reset_index(drop=True)

print("\nTop 15 highest EV bets across all prop types:")
final_results.head(15)

In [1]:
from Models.xgboost_prediction import *
from Models.xgboost_model import *
from NBAData.gambling import *


player = "Anthony Davis"
modelPTS = loadXGBModel(stat_line='PTS')
modelAST = loadXGBModel(stat_line='AST')
modelREB = loadXGBModel(stat_line='REB')

bookmakers = pd.read_csv('CSV_FILES/HISTORICAL_ODDS/10_22_2024.csv')
PrizePicks = bookmakers[(bookmakers['bookmaker'] == 'PrizePicks')]
PrizePicks.rename(columns={'bookmaker': 'BOOKMAKER', 'player': 'NAME', 'line': 'LINE', 'price': 'ODDS', 'market': 'CATEGORY'}, inplace=True)

backTestData = pd.read_csv('CSV_FILES/REGULAR_DATA/SEASON_25_PTS_FEATURES.csv')
player_df = backTestData[backTestData['PLAYER_NAME'] == player].sort_values('GAME_DATE')
backTestData = backTestData[backTestData['GAME_DATE'] == '2024-10-22']

games = get_espn_games(date_str='20241022')
gamesv2 = get_espn_games(date_str='20241023')
total_games = games + gamesv2
opponent = findOPP(player, backTestData, total_games)
endOfSeasonPTS = pd.read_csv('CSV_FILES/REGULAR_DATA/SEASON_24_PTS_FEATURES.csv')
endOfSeasonAST = pd.read_csv('CSV_FILES/REGULAR_DATA/SEASON_24_AST_FEATURES.csv')
endOfSeasonREB = pd.read_csv('CSV_FILES/REGULAR_DATA/SEASON_24_REB_FEATURES.csv')

# predPTS = make_prediction(
#     player_name=player,
#     bookmakers=PrizePicks,
#     opponent=opponent, 
#     model=modelPTS, 
#     data=endOfSeasonPTS, 
#     games=games, 
#     is_playoff=0,
#     stat_line='PTS')
# monte_carlo_prop_simulation(
#     player_df=player_df,
#     modelPred=predPTS['predicted_stat'],
#     prop_line=predPTS['prop_line'],
#     stat_line='PTS'
# )

  from .autonotebook import tqdm as notebook_tqdm
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  PrizePicks.rename(columns={'bookmaker': 'BOOKMAKER', 'player': 'NAME', 'line': 'LINE', 'price': 'ODDS', 'market': 'CATEGORY'}, inplace=True)


In [3]:
propDict = {
    'player_points': 'PTS',
    'player_rebounds': 'REB',
    'player_assists': 'AST',
}
models = {
    'PTS': loadXGBModel('PTS'),
    'REB': loadXGBModel('REB'),
    'AST': loadXGBModel('AST'),
}
# pairs = prizePicksPairsEV(PrizePicks, propDict, models, total_games, simulations=10000, stake=100, payout=300)
# pairs.sort_values('EV', ascending=False).head(5).reset_index(drop=True)

In [4]:
trios = prizePicksTriosEV(PrizePicks, propDict, models, total_games, simulations=10000, stake=100, payout=600)
trios.sort_values('EV', ascending=False).head(5).reset_index(drop=True)

Loading datasets and generating valid combinations...
Loaded dataset for PTS
Loaded dataset for REB
Loaded dataset for AST
Processing 569769 combinations with 8 threads...
Completed 100/569769 combinations
Completed 200/569769 combinations
Completed 300/569769 combinations
Completed 400/569769 combinations
Completed 500/569769 combinations
Completed 600/569769 combinations
Completed 700/569769 combinations
Completed 800/569769 combinations
Completed 900/569769 combinations
Completed 1000/569769 combinations
Completed 1100/569769 combinations
Completed 1200/569769 combinations
Completed 1300/569769 combinations
Completed 1400/569769 combinations
Completed 1500/569769 combinations
Completed 1600/569769 combinations
Completed 1700/569769 combinations
Completed 1800/569769 combinations
Completed 1900/569769 combinations
Completed 2000/569769 combinations
Completed 2100/569769 combinations
Completed 2200/569769 combinations
Completed 2300/569769 combinations
Completed 2400/569769 combinatio

Unnamed: 0,PLAYER 1,CATEGORY 1,STAT TYPE 1,PLAYER 1 LINE,PLAYER 1 PREDICTION,PLAYER 2,CATEGORY 2,STAT TYPE 2,PLAYER 2 LINE,PLAYER 2 PREDICTION,PLAYER 3,CATEGORY 3,STAT TYPE 3,PLAYER 3 LINE,PLAYER 3 PREDICTION,TYPE,EV,PROBABILITY,KELLY CRITERION
0,Anthony Edwards,player_points,PTS,0.5,23,James Harden,player_points,PTS,22.5,15,Karl-Anthony Towns,player_rebounds,REB,11.0,8,OVER/UNDER/UNDER,423.63,0.8727,0.8725
1,Anthony Edwards,player_points,PTS,0.5,23,James Harden,player_points,PTS,22.5,15,Norman Powell,player_points,PTS,18.5,13,OVER/UNDER/UNDER,418.74,0.8646,0.8643
2,Anthony Edwards,player_points,PTS,0.5,23,Norman Powell,player_points,PTS,18.5,13,Karl-Anthony Towns,player_rebounds,REB,11.0,8,OVER/UNDER/UNDER,416.22,0.8604,0.8601
3,Anthony Edwards,player_points,PTS,0.5,23,James Harden,player_points,PTS,22.5,15,Tyrese Haliburton,player_assists,AST,9.5,13,OVER/UNDER/OVER,406.45,0.8441,0.8438
4,Anthony Edwards,player_points,PTS,0.5,23,James Harden,player_points,PTS,22.5,15,Jalen Suggs,player_assists,AST,3.5,2,OVER/UNDER/UNDER,402.07,0.8368,0.8365


In [91]:
from nba_api.stats.endpoints import leaguegamefinder

df = pd.read_csv('CSV_FILES/REGULAR_DATA/SEASON_25_PTS_FEATURES.csv')
starters_per_game = (
    df[df['STARTING'] == 1]
    .groupby(['GAME_ID', 'TEAM_ID'])
    .agg({'PLAYER_NAME': list})
    .reset_index()
)
starters_per_game

Unnamed: 0,GAME_ID,TEAM_ID,PLAYER_NAME
0,22400001,1610612737,"[Jalen Johnson, Keaton Wallace, Zaccharie Risa..."
1,22400001,1610612738,"[Derrick White, Al Horford, Jrue Holiday, Jayl..."
2,22400002,1610612748,"[Kevin Love, Bam Adebayo, Haywood Highsmith, T..."
3,22400002,1610612765,"[Jaden Ivey, Tim Hardaway Jr., Cade Cunningham..."
4,22400003,1610612753,"[Goga Bitadze, Kentavious Caldwell-Pope, Jalen..."
...,...,...,...
2455,22401228,1610612744,"[Kevon Looney, Jonathan Kuminga, Andrew Wiggin..."
2456,22401229,1610612737,"[Jalen Johnson, Zaccharie Risacher, Dyson Dani..."
2457,22401229,1610612749,"[Taurean Prince, Giannis Antetokounmpo, Brook ..."
2458,22401230,1610612745,"[Alperen Sengun, Jabari Smith Jr., Dillon Broo..."


In [None]:
'''
Lineup Composition Features
These leverage who your player is starting with:

🔹 STARTS_WITH_STAR_PLAYER_X (binary)

1 if your player started alongside a specific star (e.g., Jokic), 0 otherwise → helps capture synergy.

🔹 TEAM_STARTER_BIGS_COUNT

Number of players 6’9” or taller among starters → affects rebounding and pace.

🔹 TEAM_STARTER_SPACING_METRIC

Average 3-point percentage of the starting five → good spacing → better lanes for drives.

🔹 PACE_EXPECTATION

Estimated game pace based on historical average pace of your team’s starters + opponent’s starters → affects possessions → affects counting stats.
'''