# Props Data Organizer

## Definitions of Formulas Used in This Notebook

### Estimated Value (EV)
EV is the average amount you can expect to win or lose per bet if you placed the same bet many times. It helps identify profitable betting opportunities by comparing the expected return to the risk involved.

**Formula:**
$$
\text{EV} = (\text{Probability of Winning} \times \text{Profit if Win}) - (\text{Probability of Losing} \times \text{Loss if Lose})
$$

### Kelly Criterion
The Kelly Criterion is a formula used to determine the optimal size of a series of bets. It aims to maximize the logarithm of wealth, balancing the trade-off between risk and reward. The formula considers both the probability of winning and the odds offered, guiding you on how much of your bankroll to wager on each bet.

**Formula:**
$$
\text{Kelly Fraction} = \frac{(\text{Probability of Winning} \times (\text{Odds} + 1)) - 1}{\text{Odds}}
$$

### Variance
Variance in sports betting represents the spread or dispersion of actual outcomes around the expected value. It's a crucial metric for understanding the risk and volatility associated with betting predictions. Higher variance indicates more volatile and unpredictable outcomes, while lower variance suggests more consistent results.

**Formula:**
$$
\text{Variance} = \frac{\sum_{i=1}^{n} (x_i - \mu)^2}{n}
$$

Where:
- $x_i$ represents each individual outcome
- $\mu$ is the mean or expected value
- $n$ is the total number of observations

In the context of prop betting:
- High variance props (e.g., 3-pointers made) tend to be more risky but potentially more profitable
- Low variance props (e.g., minutes played) typically offer more consistent but lower returns


In [8]:
import pandas as pd 
import numpy as np
import time
import requests
from NBAData.gambling import *
from datetime import datetime
from Models.xgboost_prediction import *

today = datetime.now()
formatted_date = today.strftime("%m_%d_%y")

### Grabs players odds for the day (US all boookmakers, DFS is prizepicks and underdogs)

In [None]:
from NBAPropFinder.NBAPropFinder import NBAPropFinder

nba_props = NBAPropFinder(region='us_dfs')
prizePicks = nba_props.dataframe
prizePicks.head(10)

In [2]:
from WNBAPropFinder.WNBAPropFinder import WNBAPropFinder
wnba_props = WNBAPropFinder(region='us')
prizePicks = wnba_props.dataframe
prizePicks

Scraping Odds API...
Organizing Data...


Unnamed: 0,BOOKMAKER,CATEGORY,NAME,OVER/UNDER,LINE,ODDS
0,FanDuel,player_points,Tina Charles,Over,16.5,-128
1,FanDuel,player_points,Tina Charles,Under,16.5,-102
2,FanDuel,player_points,Chelsea Gray,Over,12.5,-114
3,FanDuel,player_points,Chelsea Gray,Under,12.5,-114
4,FanDuel,player_points,Jewell Loyd,Over,11.5,-106
...,...,...,...,...,...,...
427,FanDuel,player_rebounds_assists,Caitlin Clark,Under,15.5,-136
428,FanDuel,player_rebounds_assists,Napheesa Collier,Over,13.5,106
429,FanDuel,player_rebounds_assists,Napheesa Collier,Under,13.5,-140
430,FanDuel,player_rebounds_assists,Caitlin Clark,Over,14.5,-130


### Single Bets from bookmakers that dont include prizePicks or UnderDogs

In [2]:
model = loadXGBModel('PTS')
bookmakers = pd.read_csv(f'CSV_FILES/PROPS_DATA/Playoffs_US(06_22_25).csv')
data = pd.read_csv('CSV_FILES/PLAYOFF_DATA/PLAYOFFS_25_PTS_FEATURES.csv')
games = get_espn_games(date_str='20250622')



In [None]:
# Dictionary mapping prop categories to their stat columns
propDict = {
    'player_points': 'PTS',
    'player_rebounds': 'REB',
    'player_assists': 'AST',
    # 'player_threes': 'FG3M',
    # 'player_blocks': 'BLK',
    # 'player_steals': 'STL',
    # 'player_field_goals': 'FGM',
    # 'player_threes': 'FG3M',
    # 'player_frees_made': 'FTM',
    # 'player_frees_attempts': 'FTA',
    # 'player_turnovers': 'TOV',
    # 'player_points_rebounds_assists': 'PTS+REB+AST',
    # 'player_points_rebounds': 'PTS+REB',
    # 'player_points_assists': 'PTS+AST',
    # 'player_rebounds_assists': 'REB+AST',
    # 'player_blocks_steals': 'BLK+STL'
}
models = {
    'PTS': loadXGBModel('PTS'),
    'REB': loadXGBModel('REB'),
    'AST': loadXGBModel('AST'),
}
all_results = []

for category, stat in propDict.items():
    print(f"Processing {category}...")
    data = pd.read_csv(f'CSV_FILES/PLAYOFF_DATA/PLAYOFFS_25_{stat}_FEATURES.csv')
    results = single_bet(data, bookmakers, models, games, category=category, stat_line=stat)
    all_results.append(results)

combined_results = pd.concat(all_results, ignore_index=True)

final_results = combined_results.sort_values(by='EV', ascending=False).reset_index(drop=True)

print("\nTop 15 highest EV bets across all prop types:")
final_results.head(15)

In [1]:
import pandas as pd
from NBAData.gambling import prizePicksPairsEV
from Models.xgboost_model import loadXGBModel
from Models.xgboost_prediction import get_espn_games

# 1. Load your models
models = {
    'PTS': loadXGBModel('PTS'),
    'AST': loadXGBModel('AST'),
    'REB': loadXGBModel('REB')
}

# 2. Load current datasets (updated daily)
current_datasets = {
    'PTS': pd.read_csv('CSV_FILES/REGULAR_DATA/historical_24_PTS_features.csv'),
    'AST': pd.read_csv('CSV_FILES/REGULAR_DATA/historical_24_AST_features.csv'),
    'REB': pd.read_csv('CSV_FILES/REGULAR_DATA/historical_24_REB_features.csv')
}

# 3. Load PrizePicks data
prizePicks = pd.read_csv('CSV_FILES/HISTORICAL_ODDS/ALL_HISTORICAL_ODDS.csv')
prizePicks = prizePicks[(prizePicks['BOOKMAKER'] == 'PrizePicks') & (prizePicks['GAME_DATE'] == '2024-10-22')]

# Expected columns: ['NAME', 'CATEGORY', 'LINE']
# Example:
# NAME          CATEGORY           LINE
# LeBron James  player_points      25.5
# Luka Doncic   player_assists     8.5
# Nikola Jokic  player_rebounds    12.5

# 4. Define category to stat line mapping
prop_dict = {
    'player_points': 'PTS',
    'player_assists': 'AST', 
    'player_rebounds': 'REB'
}

# 5. Define today's games
games = get_espn_games(date_str='20241022')

# 6. Run the analysis
results = prizePicksPairsEV(
    prizePicks=prizePicks,
    propDict=prop_dict,
    models=models,
    games=games,
    current_datasets=current_datasets,  # Your daily updated datasets
    simulations=10000,
    stake=100,
    payout=300
)

# 7. Display results
print(f"Found {len(results)} profitable pairs")
print("\nTop 10 pairs by EV:")
top_pairs = results.nlargest(10, 'EV')
print(top_pairs[['PLAYER 1', 'PLAYER 1 LINE', 'PLAYER 2', 'PLAYER 2 LINE', 'TYPE', 'EV', 'PROBABILITY', 'KELLY CRITERION']])

# 8. Save results
results.to_csv('CSV_FILES/PROPS_EV/PrizePicksPairs_today.csv', index=False)

Loading datasets and generating valid combinations...
Using provided current dataset for PTS
Using provided current dataset for AST
Using provided current dataset for REB
Processing 827 combinations with 8 threads...
Completed 100/827 combinations
Completed 200/827 combinations
Completed 300/827 combinations
Completed 400/827 combinations
Completed 500/827 combinations
Completed 600/827 combinations
Completed 700/827 combinations
Completed 800/827 combinations
Successfully processed 827 combinations
Building final results...
Found 827 profitable pairs

Top 10 pairs by EV:
            PLAYER 1  PLAYER 1 LINE            PLAYER 2  PLAYER 2 LINE  \
343  Anthony Edwards            0.5        LeBron James            8.0   
345  Anthony Edwards            0.5          Sam Hauser            2.5   
245       Sam Hauser            6.5     Anthony Edwards            0.5   
336  Anthony Edwards            0.5        Gabe Vincent            3.5   
344  Anthony Edwards            0.5       Anthony D

In [3]:
results.sort_values(by='EV', ascending=False).head(10)

Unnamed: 0,PLAYER 1,CATEGORY 1,STAT TYPE 1,PLAYER 1 LINE,PLAYER 1 PREDICTION,PLAYER 2,CATEGORY 2,STAT TYPE 2,PLAYER 2 LINE,PLAYER 2 PREDICTION,TYPE,EV,PROBABILITY,KELLY CRITERION
343,Anthony Edwards,player_points,PTS,0.5,23,LeBron James,player_assists,AST,8.0,10,OVER/OVER,127.81,0.7594,0.7582
345,Anthony Edwards,player_points,PTS,0.5,23,Sam Hauser,player_rebounds,REB,2.5,4,OVER/OVER,123.19,0.744,0.7427
245,Sam Hauser,player_points,PTS,6.5,10,Anthony Edwards,player_points,PTS,0.5,23,OVER/OVER,114.89,0.7163,0.7149
336,Anthony Edwards,player_points,PTS,0.5,23,Gabe Vincent,player_points,PTS,3.5,2,OVER/UNDER,100.03,0.6668,0.6651
344,Anthony Edwards,player_points,PTS,0.5,23,Anthony Davis,player_assists,AST,3.0,4,OVER/OVER,96.3,0.6543,0.6526
714,LeBron James,player_assists,AST,8.0,10,Karl-Anthony Towns,player_rebounds,REB,11.0,8,OVER/UNDER,95.06,0.6502,0.6485
346,Anthony Edwards,player_points,PTS,0.5,23,Al Horford,player_rebounds,REB,6.0,7,OVER/OVER,92.22,0.6407,0.6389
339,Anthony Edwards,player_points,PTS,0.5,23,Jaxson Hayes,player_points,PTS,4.5,3,OVER/UNDER,91.64,0.6388,0.637
741,Sam Hauser,player_rebounds,REB,2.5,4,Karl-Anthony Towns,player_rebounds,REB,11.0,8,OVER/UNDER,91.19,0.6373,0.6355
342,Anthony Edwards,player_points,PTS,0.5,23,D'Angelo Russell,player_assists,AST,5.0,6,OVER/OVER,86.81,0.6227,0.6208


In [None]:
# trios = prizePicksTriosEV(PrizePicks, propDict, models, total_games, simulations=10000, stake=100, payout=600)
# trios.sort_values('EV', ascending=False).head(5).reset_index(drop=True)

In [6]:
import pandas as pd
import numpy as np
from datetime import datetime
from pathlib import Path
from typing import List, Dict, Tuple, NamedTuple
from dataclasses import dataclass

@dataclass
class BetResult:
    date: str
    player1: str
    category1: str
    line1: float
    actual1: float
    player2: str
    category2: str
    line2: float
    actual2: float
    bet_type: str
    ev: float
    probability: float
    kelly: float
    won: bool
    profit: float

class PrizePicksBacktest:
    def __init__(self, 
                 props_ev_dir: str = "CSV_FILES/HISTORICAL_PROP_PAIRS",
                 regular_data_dir: str = "CSV_FILES/REGULAR_DATA",
                 min_ev: float = 60.0,
                 stake: float = 100,
                 max_bets_per_day: int = 3,
                 kelly_fraction: float = 0.25):
        """
        Initialize backtester for PrizePicks pairs using actual results
        
        Args:
            props_ev_dir: Directory containing PrizePicks EV CSV files
            regular_data_dir: Directory containing actual game results
            min_ev: Minimum EV threshold for taking a bet
            stake: Stake size for each bet
        """
        self.props_ev_dir = Path(props_ev_dir)
        self.regular_data_dir = Path(regular_data_dir)
        self.min_ev = min_ev
        self.max_bets_per_day = max_bets_per_day
        self.stake = stake
        self.kelly_fraction = kelly_fraction
        self.results: List[BetResult] = []
        self.bet_selection_log = []  # Track bet selection process
        
        # Load actual results data
        self.actual_results = self._load_actual_results()
        
    def _load_actual_results(self) -> Dict[str, pd.DataFrame]:
        """Load actual results for each stat category"""
        results = {}
        stat_types = {
            'player_points': 'PTS',
            'player_rebounds': 'REB',
            'player_assists': 'AST'
        }
        
        for category, stat in stat_types.items():
            file_path = self.regular_data_dir / f'season_25_{stat}_features.csv'
            if file_path.exists():
                df = pd.read_csv(file_path)
                # Convert date to YYYYMMDD format
                df['GAME_DATE'] = pd.to_datetime(df['GAME_DATE']).dt.strftime('%Y%m%d')
                results[category] = df
                
        return results
    
    def _get_actual_stat(self, date: str, player: str, category: str) -> float:
        """Get actual stat value for a player on a given date"""
        if category not in self.actual_results:
            return None
            
        df = self.actual_results[category]
        result = df[(df['GAME_DATE'] == date) & (df['PLAYER_NAME'] == player)]
        
        if result.empty:
            return None
            
        stat_map = {
            'player_points': 'PTS',
            'player_rebounds': 'REB',
            'player_assists': 'AST'
        }
        
        return result[stat_map[category]].iloc[0]

    def _check_bet_result(self, bet_type: str, actual1: float, line1: float, 
                         actual2: float, line2: float) -> bool:
        """Check if a bet won based on actual results"""
        if actual1 is None or actual2 is None:
            return None
            
        bet_parts = bet_type.split('/')
        result1 = actual1 > line1 if bet_parts[0] == 'OVER' else actual1 < line1
        result2 = actual2 > line2 if bet_parts[1] == 'OVER' else actual2 < line2
        
        return result1 and result2

    def load_daily_ev_data(self, date_str: str) -> pd.DataFrame:
        """Load PrizePicks pairs EV data for a specific date"""
        file_path = self.props_ev_dir / f"{date_str}_PAIRS.csv"
        if not file_path.exists():
            return pd.DataFrame()
        
        df = pd.read_csv(file_path)
        # First filter by minimum EV
        qualified_bets = df[df['EV'] >= self.min_ev].sort_values('EV', ascending=False)
        # Then take only top N bets
        return qualified_bets.head(self.max_bets_per_day)

    def simulate_bets(self) -> None:
        """Run backtest simulation using actual results"""
        ev_files = list(self.props_ev_dir.glob("*_PAIRS.csv"))
        
        for file in sorted(ev_files):
            date_str = file.stem.split('_')[0]  # Get YYYYMMDD from filename
            
            # Load all bets for the day
            all_bets = pd.read_csv(file)
            daily_bets = self.load_daily_ev_data(date_str)
            
            # Log bet selection process
            self.bet_selection_log.append({
                'date': date_str,
                'total_available_bets': len(all_bets),
                'bets_above_min_ev': len(all_bets[all_bets['EV'] >= self.min_ev]),
                'bets_selected': len(daily_bets),
                'min_ev_selected': daily_bets['EV'].min() if not daily_bets.empty else None,
                'max_ev_selected': daily_bets['EV'].max() if not daily_bets.empty else None
            })
            
            if daily_bets.empty:
                continue
                
            for _, bet in daily_bets.iterrows():
                # Get actual results
                actual1 = self._get_actual_stat(date_str, bet['PLAYER 1'], bet['CATEGORY 1'])
                actual2 = self._get_actual_stat(date_str, bet['PLAYER 2'], bet['CATEGORY 2'])
                
                # Skip if we don't have actual results
                if actual1 is None or actual2 is None:
                    continue
                
                # Check if bet won
                won = self._check_bet_result(
                    bet['TYPE'], 
                    actual1, bet['PLAYER 1 LINE'],
                    actual2, bet['PLAYER 2 LINE']
                )
                
                if won is None:
                    continue
                    
                kelly_stake = self.stake * bet['KELLY CRITERION'] * self.kelly_fraction
                profit = kelly_stake if won else -kelly_stake
                
                result = BetResult(
                    date=date_str,
                    player1=bet['PLAYER 1'],
                    category1=bet['CATEGORY 1'],
                    line1=bet['PLAYER 1 LINE'],
                    actual1=actual1,
                    player2=bet['PLAYER 2'],
                    category2=bet['CATEGORY 2'],
                    line2=bet['PLAYER 2 LINE'],
                    actual2=actual2,
                    bet_type=bet['TYPE'],
                    ev=bet['EV'],
                    probability=bet['PROBABILITY'],
                    kelly=bet['KELLY CRITERION'],
                    won=won,
                    profit=profit
                )
                self.results.append(result)

    def calculate_metrics(self) -> Dict[str, float]:
        """Calculate performance metrics"""
        if not self.results:
            return {}
            
        profits = [r.profit for r in self.results]
        cumulative_profits = np.cumsum(profits)
        
        total_bets = len(self.results)
        winning_bets = sum(1 for r in self.results if r.won)
        total_risked = sum(abs(r.profit) for r in self.results)  # Sum of absolute profits since each profit represents the Kelly stake

        metrics = {
            'total_bets': total_bets,
            'hit_rate': winning_bets / total_bets if total_bets > 0 else 0,
            'total_profit': sum(profits),
            'roi': (sum(profits) / total_risked) if total_bets > 0 else 0,
            'max_drawdown': self._calculate_max_drawdown(cumulative_profits),
            'sharpe_ratio': self._calculate_sharpe_ratio(profits)
        }
        
        return metrics
    
    def _calculate_max_drawdown(self, cumulative_profits: np.ndarray) -> float:
        """Calculate maximum drawdown"""
        rolling_max = np.maximum.accumulate(cumulative_profits)
        drawdowns = rolling_max - cumulative_profits
        return np.max(drawdowns) if len(drawdowns) > 0 else 0
    
    def _calculate_sharpe_ratio(self, profits: List[float], risk_free_rate: float = 0.0) -> float:
        """Calculate Sharpe ratio"""
        if not profits:
            return 0.0
        
        returns = np.array(profits) / self.stake
        excess_returns = returns - risk_free_rate
        if len(excess_returns) < 2:
            return 0.0
            
        return np.mean(excess_returns) / np.std(excess_returns, ddof=1) if np.std(excess_returns, ddof=1) != 0 else 0.0

    def print_summary(self) -> None:
        """Print backtest summary with actual results"""
        metrics = self.calculate_metrics()
        if not metrics:
            print("No results to display")
            return
            
        print("\n=== PrizePicks Pairs Backtest Summary ===")
        print(f"Total Bets: {metrics['total_bets']}")
        print(f"Total Days: {len(self.bet_selection_log)}")
        print(f"Average Bets Per Day: {metrics['total_bets']/len(self.bet_selection_log):.1f}")
        print(f"Hit Rate: {metrics['hit_rate']:.2%}")
        
        # Calculate and display Kelly sizing statistics
        total_risked = sum(abs(r.profit) for r in self.results)
        avg_stake = total_risked / metrics['total_bets'] if metrics['total_bets'] > 0 else 0
        print(f"Total Amount Risked: ${total_risked:.2f}")
        print(f"Average Stake Size: ${avg_stake:.2f}")
        
        print(f"Total Profit: ${metrics['total_profit']:.2f}")
        print(f"ROI: {metrics['roi']:.2%}")
        print(f"Max Drawdown: ${metrics['max_drawdown']:.2f}")
        print(f"Sharpe Ratio: {metrics['sharpe_ratio']:.2f}")
        
        # Print bet selection statistics
        print("\nBet Selection Statistics:")
        total_available = sum(log['total_available_bets'] for log in self.bet_selection_log)
        total_above_min_ev = sum(log['bets_above_min_ev'] for log in self.bet_selection_log)
        print(f"Average Available Bets Per Day: {total_available/len(self.bet_selection_log):.1f}")
        print(f"Average Bets Above Min EV Per Day: {total_above_min_ev/len(self.bet_selection_log):.1f}")
        
        # Print top 5 highest EV bets and their actual results
        print("\nTop 5 Highest EV Bets:")
        top_ev_bets = sorted(self.results, key=lambda x: x.ev, reverse=True)[:5]
        for bet in top_ev_bets:
            print(f"{bet.date}: {bet.player1} {bet.category1} {bet.line1} (Actual: {bet.actual1:.1f}) & "
                  f"{bet.player2} {bet.category2} {bet.line2} (Actual: {bet.actual2:.1f})")
            print(f"Type: {bet.bet_type}, EV: {bet.ev:.2f}, Won: {bet.won}")

    def plot_performance(self) -> None:
        """Plot cumulative performance over time"""
        try:
            import matplotlib.pyplot as plt
            
            if not self.results:
                print("No results to plot")
                return
                
            profits = [r.profit for r in self.results]
            cumulative_profits = np.cumsum(profits)
            dates = [datetime.strptime(r.date, '%Y%m%d') for r in self.results]
            
            plt.figure(figsize=(12, 6))
            plt.plot(dates, cumulative_profits, label='Cumulative Profit')
            plt.axhline(y=0, color='r', linestyle='--', alpha=0.3)
            plt.title('PrizePicks Pairs Performance')
            plt.xlabel('Date')
            plt.ylabel('Profit ($)')
            plt.legend()
            plt.grid(True, alpha=0.3)
            plt.xticks(rotation=45)
            plt.tight_layout()
            plt.show()
            
        except ImportError:
            print("matplotlib is required for plotting")

In [11]:
# Initialize backtester
backtester = PrizePicksBacktest(min_ev=60.0, stake=100, max_bets_per_day=5, kelly_fraction=0.25)

# Run backtest with actual results; 
backtester.simulate_bets()

# Print results
backtester.print_summary()



=== PrizePicks Pairs Backtest Summary ===
Total Bets: 362
Total Days: 77
Average Bets Per Day: 4.7
Hit Rate: 40.06%
Total Amount Risked: $7269.00
Average Stake Size: $20.08
Total Profit: $-1361.75
ROI: -18.73%
Max Drawdown: $1703.91
Sharpe Ratio: -0.19

Bet Selection Statistics:
Average Available Bets Per Day: 5.0
Average Bets Above Min EV Per Day: 5.0

Top 5 Highest EV Bets:
20241118: Trey Lyles player_points 13.5 (Actual: 12.0) & Kevin Huerter player_rebounds 5.0 (Actual: 5.0)
Type: UNDER/UNDER, EV: 189.53, Won: False
20241231: Mason Plumlee player_rebounds 10.5 (Actual: 8.0) & John Konchar player_assists 2.5 (Actual: 4.0)
Type: UNDER/UNDER, EV: 186.41, Won: False
20241231: John Konchar player_points 6.5 (Actual: 7.0) & Mason Plumlee player_rebounds 10.5 (Actual: 8.0)
Type: UNDER/UNDER, EV: 184.05, Won: False
20241118: Trey Lyles player_points 13.5 (Actual: 12.0) & Kevin Huerter player_points 16.0 (Actual: 9.0)
Type: UNDER/UNDER, EV: 183.43, Won: True
20241025: Andre Drummond player

In [None]:
import pandas as pd
from Models.xgboost_model import loadXGBModel
from Models.xgboost_prediction import make_prediction

def add_predictions_to_historical(model_type='PTS'):
    """
    Add predicted values to historical features dataset.
    """
    print(f"Loading {model_type} model and data...")
    
    # Load the model
    model = loadXGBModel(model_type)
    
    # Load historical data
    historical_data = pd.read_csv(f'CSV_FILES/REGULAR_DATA/historical_24_{model_type}_features.csv')
    
    # Create a new column for predictions
    historical_data['MODEL_PREDICTION'] = None
    
    # Group by game date to process each day's games together
    for date, group in historical_data.groupby('GAME_DATE'):
        print(f"Processing {date}...")
        
        # Process each player in this date's games
        for idx, row in group.iterrows():
            player = row['PLAYER_NAME']
            opponent = row['OPP_ABBREVIATION']  # Get opponent directly from the data
            
            # Create a mock games structure that make_prediction expects
            games = [{
                'home_team': row['TEAM_ABBREVIATION'] if row['HOME_GAME'] == 1 else opponent,
                'away_team': opponent if row['HOME_GAME'] == 1 else row['TEAM_ABBREVIATION']
            }]
            
            # Create temporary props DataFrame for prediction
            temp_props = pd.DataFrame({
                'NAME': [player],
                'LINE': [row[model_type]],  # Use actual stat as the line
                'CATEGORY': [f'player_{model_type.lower()}']
            })
            
            try:
                # Make prediction
                pred = make_prediction(
                    player_name=player,
                    bookmakers=temp_props,
                    opponent=opponent,
                    model=model,
                    data=historical_data,
                    games=games,
                    is_playoff=0,
                    stat_line=model_type
                )
                
                # Store prediction
                historical_data.loc[idx, 'MODEL_PREDICTION'] = pred['predicted_stat']
                
            except Exception as e:
                print(f"Error predicting for {player} on {date}: {e}")
                continue
    
    # Save the updated dataset
    output_file = f'CSV_FILES/REGULAR_DATA/historical_25_{model_type}_features_with_predictions.csv'
    historical_data.to_csv(output_file, index=False)
    print(f"Saved predictions to {output_file}")
    
    # Print some statistics
    prediction_stats = historical_data[['MODEL_PREDICTION', model_type]].describe()
    print("\nPrediction Statistics:")
    print(prediction_stats)
    
    # Calculate accuracy metrics
    actual = historical_data[model_type]
    predicted = historical_data['MODEL_PREDICTION'].fillna(actual.mean())  # Fill NaN with mean for metrics
    mae = abs(actual - predicted).mean()
    mse = ((actual - predicted) ** 2).mean()
    rmse = mse ** 0.5
    
    print(f"\nAccuracy Metrics:")
    print(f"Mean Absolute Error: {mae:.2f}")
    print(f"Root Mean Square Error: {rmse:.2f}")

# Usage:
add_predictions_to_historical('AST')
add_predictions_to_historical('REB')

Loading AST model and data...
Processing 2023-10-24...
Processing 2023-10-25...
Processing 2023-10-26...
Processing 2023-10-27...
Processing 2023-10-28...
Processing 2023-10-29...
Processing 2023-10-30...
Processing 2023-10-31...
Processing 2023-11-01...
Processing 2023-11-02...
Processing 2023-11-03...
Processing 2023-11-04...
Processing 2023-11-05...
Processing 2023-11-06...
Processing 2023-11-08...
Processing 2023-11-09...
Processing 2023-11-10...
Processing 2023-11-11...
Processing 2023-11-12...
Processing 2023-11-13...
Processing 2023-11-14...
Processing 2023-11-15...
Processing 2023-11-16...
Processing 2023-11-17...
Processing 2023-11-18...
Processing 2023-11-19...
Processing 2023-11-20...
Processing 2023-11-21...
Processing 2023-11-22...
Processing 2023-11-24...
Processing 2023-11-25...
Processing 2023-11-26...
Processing 2023-11-27...
Processing 2023-11-28...
Processing 2023-11-29...
Processing 2023-11-30...
Processing 2023-12-01...
Processing 2023-12-02...
Processing 2023-12-0