<a href="https://www.kaggle.com/code/stemosamaghandour/hull-tactical-egypt-stock-market-predictor?scriptVersionId=266408248" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/hull-tactical-market-prediction/train.csv
/kaggle/input/hull-tactical-market-prediction/test.csv
/kaggle/input/hull-tactical-market-prediction/kaggle_evaluation/default_inference_server.py
/kaggle/input/hull-tactical-market-prediction/kaggle_evaluation/default_gateway.py
/kaggle/input/hull-tactical-market-prediction/kaggle_evaluation/__init__.py
/kaggle/input/hull-tactical-market-prediction/kaggle_evaluation/core/templates.py
/kaggle/input/hull-tactical-market-prediction/kaggle_evaluation/core/base_gateway.py
/kaggle/input/hull-tactical-market-prediction/kaggle_evaluation/core/relay.py
/kaggle/input/hull-tactical-market-prediction/kaggle_evaluation/core/kaggle_evaluation.proto
/kaggle/input/hull-tactical-market-prediction/kaggle_evaluation/core/__init__.py
/kaggle/input/hull-tactical-market-prediction/kaggle_evaluation/core/generated/kaggle_evaluation_pb2.py
/kaggle/input/hull-tactical-market-prediction/kaggle_evaluation/core/generated/kaggle_evaluation_pb2_grpc.py
/kaggl

Of course. Here is a flow chart that outlines the process for solving this stock market prediction and volatility management challenge. The chart breaks down the problem into key stages, from data preparation to model deployment for the submission file.

```mermaid
flowchart TD
    subgraph A [Phase 1: Data Preparation & Feature Engineering]
        direction LR
        A1[Load Provided Datasets<br>Market Data & Proprietary] --> A2[Handle Missing Values & Outliers];
        A2 --> A3[Feature Engineering<br>Create Lagged, Rolling, &<br>Derived Features];
        A3 --> A4[Calculate Target Variable:<br>S&P 500 Excess Returns];
    end

    subgraph B [Phase 2: Model Building & Training]
        B1[Define Prediction Goal<br>e.g., Next Day's Excess Return] --> B2[Select & Train<br>Machine Learning Model<br>e.g., XGBoost, LSTM, Ensemble];
        B2 --> B3[Tune Hyperparameters<br>with TimeSeriesSplit];
    end

    subgraph C [Phase 3: Volatility-Constrained Strategy]
        C1[Raw Model Prediction<br>e.g., Continuous Value] --> C2{Apply Volatility Constraint<br>& Betting Logic};
        C2 --> C3[Allocation = 0<br>If prediction weak or volatile];
        C2 --> C4[Allocation = 1<br>Market Weight];
        C2 --> C5[Allocation = 2<br>Max Leverage<br>If prediction is strong & confident];
        C3 & C4 & C5 --> C6[Final Allocation Signal<br>0 to 2];
    end

    subgraph D [Phase 4: Validation & Backtesting]
        D1[Walk-Forward Validation<br>Simulate live trading] --> D2[Calculate Evaluation Metric<br>Sharpe Ratio Variant];
        D2 --> D3{Performance Robust?};
        D3 -- No --> B2;
        D3 -- Yes --> D4[Final Model Ready];
    end

    subgraph E [Phase 5: Submission & Inference]
        E1[Load Latest Data] --> E2[Generate Features];
        E2 --> E3[Run Trained Model];
        E3 --> E4[Apply Betting Strategy];
        E4 --> E5[Output Daily Allocation<br>to submission.csv];
    end

    Start --> A
    A --> B
    B --> C
    C --> D
    D --> E
```

### Step-by-Step Explanation of the Flow Chart

**Phase 1: Data Preparation & Feature Engineering**
*   **Input:** You start with the provided daily data (public market info + proprietary dataset).
*   **Key Tasks:**
    *   Clean the data (handle missing values, outliers).
    *   Create predictive features. This is crucial. Examples include:
        *   **Lagged Features:** Returns from previous days (1-day, 5-day, 21-day lag).
        *   **Rolling Statistics:** Moving averages, rolling standard deviation (volatility), min/max over a window (e.g., 20 days).
        *   **Derived Features:** Ratios (e.g., P/E ratios), technical indicators (e.g., RSI, MACD), or macroeconomic trends.
    *   Calculate the target variable: the **excess return** of the S&P 500 (presumably over the risk-free rate).

**Phase 2: Model Building & Training**
*   **Goal:** Predict the future excess return (e.g., for the next trading day).
*   **Process:**
    *   Select a machine learning model capable of capturing complex, non-linear patterns (e.g., XGBoost, LightGBM, or Neural Networks).
    *   Train the model on historical data, using time-series aware cross-validation (e.g., `TimeSeriesSplit`) to avoid look-ahead bias.
    *   Tune the model's hyperparameters to optimize predictive performance.

**Phase 3: Volatility-Constrained Betting Strategy**
*   This is where you convert a raw prediction into a trade.
*   The model's continuous prediction needs to be mapped to an allocation between 0 and 2.
*   **Logic Example:**
    *   **Allocation = 0 (Cash):** If the predicted return is negative or below a certain confidence threshold.
    *   **Allocation = 1 (Market Weight):** If the prediction is mildly positive or uncertain.
    *   **Allocation = 2 (Max Leverage):** If the prediction is strongly positive and confident.
*   This logic must be designed to keep the overall strategy's volatility within the 120% constraint. A simpler version could just be `allocation = clip(prediction * scaling_factor, 0, 2)`.

**Phase 4: Validation & Backtesting**
*   **Critical Step:** Test your entire pipeline (model + strategy) on historical data in a realistic way.
    *   Use **Walk-Forward Validation (WFV)**: Train on a period (e.g., 2000-2015), simulate trades on the next period (e.g., 2016-2018), then re-train and advance.
*   Calculate the competition's **Sharpe ratio variant** on your backtested results.
*   If performance is not robust or satisfactory, iterate back to model training or strategy design.

**Phase 5: Submission & Inference**
*   Once a final model and strategy are selected, this is the production pipeline.
*   For each day in the test set, the process is:
    1.  Load the latest available data.
    2.  Generate the same features used in training.
    3.  Get the model's prediction.
    4.  Apply your volatility-constrained betting strategy to get the final allocation (0 to 2).
    5.  Output this value for the submission file (`submission.csv`).

In [2]:
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import mean_squared_error
import warnings
warnings.filterwarnings('ignore')

In [3]:


class HullTacticalPredictor:
    def __init__(self):
        self.model = None
        self.scaler = StandardScaler()
        self.feature_columns = None
        self.volatility_lookback = 20
        self.allocation_threshold = 0.001
        
    def load_data(self, train_path, test_path):
        """
        Load and merge training and test datasets
        """
        train_df = pd.read_csv(train_path)
        test_df = pd.read_csv(test_path)
        
        print(f"Training data shape: {train_df.shape}")
        print(f"Test data shape: {test_df.shape}")
        
        # Display column names to understand the data structure
        print("\nTraining columns:", train_df.columns.tolist())
        print("Test columns:", test_df.columns.tolist())
        
        # Check for common price columns
        price_columns = ['Close', 'close', 'Adj Close', 'Price', 'price', 'SP500', 'sp500']
        available_price_cols = [col for col in train_df.columns if col in price_columns]
        print(f"Available price columns: {available_price_cols}")
        
        # Combine for consistent feature engineering
        train_df['is_train'] = True
        test_df['is_train'] = False
        full_df = pd.concat([train_df, test_df], ignore_index=True)
        
        return full_df, train_df, test_df
    
    def find_price_column(self, df):
        """
        Automatically find the price column in the dataset
        """
        price_columns = ['Close', 'close', 'Adj Close', 'Price', 'price', 'SP500', 'sp500', 'value']
        
        for col in price_columns:
            if col in df.columns:
                print(f"Using '{col}' as price column")
                return col
        
        # If no standard price column found, look for numeric columns that could be prices
        numeric_cols = df.select_dtypes(include=[np.number]).columns.tolist()
        if len(numeric_cols) > 0:
            print(f"No standard price column found. Using first numeric column: '{numeric_cols[0]}'")
            return numeric_cols[0]
        else:
            raise KeyError("No suitable price column found in the dataset")
    
    def calculate_target(self, df):
        """
        Calculate excess returns target variable
        """
        # Find the price column
        price_col = self.find_price_column(df)
        
        # Calculate daily returns
        df['sp500_return'] = df[price_col].pct_change()
        
        # For simplicity, assuming risk-free rate is 0 or very small
        df['excess_return'] = df['sp500_return']  # - risk_free_rate
        
        # Create forward-looking target (next day's excess return)
        df['target'] = df['excess_return'].shift(-1)
        
        return df, price_col
    
    def create_features(self, df, price_col):
        """
        Create technical features for prediction using the identified price column
        """
        print(f"Creating features using price column: {price_col}")
        
        # Price-based features
        if 'Open' in df.columns:
            df['price_ratio'] = df[price_col] / df['Open']
        if 'High' in df.columns and 'Low' in df.columns:
            df['high_low_ratio'] = df['High'] / df['Low']
        if 'Open' in df.columns:
            df['close_open_ratio'] = df[price_col] / df['Open']
        
        # Moving averages
        for window in [5, 10, 20, 50]:
            df[f'sma_{window}'] = df[price_col].rolling(window=window).mean()
            df[f'ema_{window}'] = df[price_col].ewm(span=window).mean()
            df[f'price_vs_sma_{window}'] = df[price_col] / df[f'sma_{window}']  
        
        # Volatility features
        df['volatility_5'] = df['sp500_return'].rolling(window=5).std()
        df['volatility_20'] = df['sp500_return'].rolling(window=20).std()
        df['volatility_50'] = df['sp500_return'].rolling(window=50).std()
        
        # Momentum indicators
        df['momentum_5'] = df[price_col] / df[price_col].shift(5) - 1
        df['momentum_10'] = df[price_col] / df[price_col].shift(10) - 1
        df['momentum_20'] = df[price_col] / df[price_col].shift(20) - 1
        
        # RSI-like features
        for window in [5, 14]:
            delta = df[price_col].diff()
            gain = (delta.where(delta > 0, 0)).rolling(window=window).mean()
            loss = (-delta.where(delta < 0, 0)).rolling(window=window).mean()
            rs = gain / loss
            df[f'rsi_{window}'] = 100 - (100 / (1 + rs))
        
        # Volume features
        if 'Volume' in df.columns:
            df['volume_sma'] = df['Volume'].rolling(window=20).mean()
            df['volume_ratio'] = df['Volume'] / df['volume_sma']
        elif 'volume' in df.columns:
            df['volume_sma'] = df['volume'].rolling(window=20).mean()
            df['volume_ratio'] = df['volume'] / df['volume_sma']
        
        # Lagged returns
        for lag in [1, 2, 3, 5, 10]:
            df[f'return_lag_{lag}'] = df['sp500_return'].shift(lag)
        
        # Rolling return statistics
        df['rolling_return_5'] = df['sp500_return'].rolling(window=5).mean()
        df['rolling_return_20'] = df['sp500_return'].rolling(window=20).mean()
        df['rolling_std_5'] = df['sp500_return'].rolling(window=5).std()
        df['rolling_std_20'] = df['sp500_return'].rolling(window=20).std()
        
        # Date-based features (if Date column exists)
        if 'Date' in df.columns:
            df['Date'] = pd.to_datetime(df['Date'])
            df['day_of_week'] = df['Date'].dt.dayofweek
            df['month'] = df['Date'].dt.month
            df['quarter'] = df['Date'].dt.quarter
        
        return df
    
    def prepare_features(self, df):
        """
        Prepare final feature set and handle missing values
        """
        # Select feature columns (exclude non-feature columns)
        exclude_cols = ['Date', 'date', 'target', 'sp500_return', 'excess_return', 'is_train']
        feature_columns = [col for col in df.columns if col not in exclude_cols and df[col].dtype in ['float64', 'int64']]
        
        print(f"Number of features before handling missing values: {len(feature_columns)}")
        
        # Handle missing values
        df[feature_columns] = df[feature_columns].fillna(method='ffill').fillna(method='bfill').fillna(0)
        
        # Remove columns with too many zeros or constant values
        valid_features = []
        for col in feature_columns:
            if df[col].nunique() > 1 and df[col].std() > 0:
                valid_features.append(col)
        
        self.feature_columns = valid_features
        print(f"Number of features after validation: {len(self.feature_columns)}")
        
        return df, self.feature_columns
    
    def train_model(self, train_df, feature_columns):
        """
        Train the prediction model with time-series cross validation
        """
        # Prepare training data
        train_mask = train_df['is_train'] & train_df['target'].notna()
        X_train = train_df.loc[train_mask, feature_columns]
        y_train = train_df.loc[train_mask, 'target']
        
        # Remove any remaining NaN values
        valid_mask = X_train.notna().all(axis=1) & y_train.notna()
        X_train = X_train[valid_mask]
        y_train = y_train[valid_mask]
        
        print(f"Training on {len(X_train)} samples with {len(feature_columns)} features")
        
        if len(X_train) == 0:
            raise ValueError("No valid training data available after preprocessing")
        
        # Scale features
        X_train_scaled = self.scaler.fit_transform(X_train)
        
        # Train model (using Gradient Boosting for non-linear relationships)
        self.model = GradientBoostingRegressor(
            n_estimators=100,
            learning_rate=0.1,
            max_depth=5,
            min_samples_split=100,
            random_state=42
        )
        
        self.model.fit(X_train_scaled, y_train)
        
        # Feature importance
        feature_importance = pd.DataFrame({
            'feature': feature_columns,
            'importance': self.model.feature_importances_
        }).sort_values('importance', ascending=False)
        
        print("\nTop 10 most important features:")
        print(feature_importance.head(10))
        
        return self.model
    
    def volatility_constrained_allocation(self, predictions, recent_volatility, market_volatility):
        """
        Apply volatility constraints to convert predictions to allocations (0-2)
        """
        allocations = []
        
        for i, pred in enumerate(predictions):
            # Base allocation based on prediction strength
            if abs(pred) < self.allocation_threshold:
                # Weak signal - stay in cash (0)
                allocation = 0.0
            elif pred > 0.02:  # Strong positive prediction
                allocation = min(2.0, 1.0 + pred * 10)  # Scale with prediction strength
            elif pred > 0.005:  # Moderate positive prediction
                allocation = 1.0 + (pred - 0.005) * 20  # Some leverage
            elif pred > 0:  # Weak positive prediction
                allocation = 1.0  # Market weight
            elif pred > -0.005:  # Weak negative prediction
                allocation = 0.5  # Reduced exposure
            else:  # Strong negative prediction
                allocation = 0.0  # Cash
            
            # Apply volatility adjustment
            if i < len(recent_volatility):
                vol_ratio = recent_volatility[i] / market_volatility if market_volatility > 0 else 1
                if vol_ratio > 1.2:  # If volatility is too high relative to market
                    allocation = max(0, allocation * 0.5)  # Reduce position
            
            # Ensure allocation is within [0, 2] range
            allocation = max(0.0, min(2.0, allocation))
            allocations.append(allocation)
        
        return np.array(allocations)
    
    def predict_allocations(self, test_df, feature_columns):
        """
        Generate predictions and convert to volatility-constrained allocations
        """
        # Prepare test features
        X_test = test_df[feature_columns].fillna(0)
        
        if len(X_test) == 0:
            # Fallback: return market weight if no features available
            print("Warning: No features available for prediction. Using market weight (1.0)")
            return np.ones(len(test_df)), np.zeros(len(test_df))
        
        X_test_scaled = self.scaler.transform(X_test)
        
        # Get raw predictions
        raw_predictions = self.model.predict(X_test_scaled)
        
        # Calculate recent volatility for constraint application
        if 'sp500_return' in test_df.columns:
            recent_volatility = test_df['sp500_return'].rolling(window=5, min_periods=1).std().fillna(0.02).values
            market_volatility = 0.02  # Approximate long-term market volatility
        else:
            # Fallback if return data not available
            recent_volatility = np.full(len(test_df), 0.02)
            market_volatility = 0.02
        
        # Apply volatility constraints to get final allocations
        allocations = self.volatility_constrained_allocation(
            raw_predictions, recent_volatility, market_volatility
        )
        
        return allocations, raw_predictions
    
    def backtest_strategy(self, train_df, feature_columns):
        """
        Perform walk-forward validation to test strategy performance
        """
        print("Running backtest...")
        
        # Use time series split for validation
        tscv = TimeSeriesSplit(n_splits=3)  # Reduced for speed
        X = train_df[feature_columns].fillna(0)
        y = train_df['target']
        
        # Only use training period with valid targets
        train_mask = train_df['is_train'] & y.notna()
        X = X[train_mask]
        y = y[train_mask]
        
        if len(X) == 0:
            print("No data available for backtesting")
            return 0
        
        performances = []
        
        for train_idx, test_idx in tscv.split(X):
            # Split data
            X_train, X_test = X.iloc[train_idx], X.iloc[test_idx]
            y_train, y_test = y.iloc[train_idx], y.iloc[test_idx]
            
            if len(X_train) == 0 or len(X_test) == 0:
                continue
            
            # Scale features
            X_train_scaled = self.scaler.fit_transform(X_train)
            X_test_scaled = self.scaler.transform(X_test)
            
            # Train model
            model = GradientBoostingRegressor(
                n_estimators=50,  # Smaller for faster backtesting
                learning_rate=0.1,
                max_depth=4,
                random_state=42
            )
            model.fit(X_train_scaled, y_train)
            
            # Predict
            predictions = model.predict(X_test_scaled)
            
            # Simple allocation strategy for backtest
            allocations = np.where(predictions > 0.005, 1.2, 
                                 np.where(predictions < -0.005, 0.8, 1.0))
            
            # Calculate strategy returns
            strategy_returns = allocations * y_test.values
            
            # Calculate Sharpe ratio (simplified)
            sharpe = np.mean(strategy_returns) / np.std(strategy_returns) if np.std(strategy_returns) > 0 else 0
            
            performances.append(sharpe)
            print(f"Fold Sharpe ratio: {sharpe:.4f}")
        
        if performances:
            avg_performance = np.mean(performances)
            print(f"Average backtest Sharpe ratio: {avg_performance:.4f}")
            return avg_performance
        else:
            print("No valid backtest results")
            return 0
    
    def create_submission(self, test_df, allocations):
        """
        Create submission file in the required format
        """
        # Find the date column
        date_col = 'Date' if 'Date' in test_df.columns else 'date' if 'date' in test_df.columns else test_df.columns[0]
        
        submission = pd.DataFrame({
            'Date': test_df[date_col],
            'Allocation': allocations
        })
        
        # Ensure allocations are within [0, 2] range
        submission['Allocation'] = submission['Allocation'].clip(0, 2)
        
        return submission

def main():
    """
    Main execution function
    """
    # Initialize predictor
    predictor = HullTacticalPredictor()
    
    # File paths (update these based on your Kaggle environment)
    train_path = "/kaggle/input/hull-tactical-market-prediction/train.csv"
    test_path = "/kaggle/input/hull-tactical-market-prediction/test.csv"
    
    try:
        # Step 1: Load data
        print("Step 1: Loading data...")
        full_df, train_df, test_df = predictor.load_data(train_path, test_path)
        
        # Step 2: Calculate target and create features
        print("Step 2: Creating features...")
        full_df, price_col = predictor.calculate_target(full_df)
        full_df = predictor.create_features(full_df, price_col)
        full_df, feature_columns = predictor.prepare_features(full_df)
        
        # Step 3: Train model
        print("Step 3: Training model...")
        predictor.train_model(full_df, feature_columns)
        
        # Step 4: Backtest strategy
        print("Step 4: Backtesting strategy...")
        backtest_sharpe = predictor.backtest_strategy(full_df, feature_columns)
        
        # Step 5: Generate predictions for test set
        print("Step 5: Generating test predictions...")
        allocations, raw_predictions = predictor.predict_allocations(test_df, feature_columns)
        
        # Step 6: Create submission
        print("Step 6: Creating submission file...")
        submission = predictor.create_submission(test_df, allocations)
        
        # Display results
        print("\n" + "="*50)
        print("PREDICTION SUMMARY")
        print("="*50)
        print(f"Backtest Sharpe Ratio: {backtest_sharpe:.4f}")
        print(f"Allocation Statistics:")
        print(f"  Min: {allocations.min():.4f}")
        print(f"  Max: {allocations.max():.4f}")
        print(f"  Mean: {allocations.mean():.4f}")
        print(f"  Std: {allocations.std():.4f}")
        
        # Show allocation distribution
        allocation_bins = {
            "Cash (0-0.2)": ((allocations >= 0) & (allocations < 0.2)).sum(),
            "Reduced (0.2-0.8)": ((allocations >= 0.2) & (allocations < 0.8)).sum(),
            "Market (0.8-1.2)": ((allocations >= 0.8) & (allocations < 1.2)).sum(),
            "Leveraged (1.2-2.0)": ((allocations >= 1.2) & (allocations <= 2.0)).sum()
        }
        
        print("\nAllocation Distribution:")
        for category, count in allocation_bins.items():
            percentage = (count / len(allocations)) * 100
            print(f"  {category}: {count} days ({percentage:.1f}%)")
        
        # Save submission
        submission.parquet = "submission.parquet.csv"
        submission.to_csv(submission.parquet, index=False)
        print(f"\nSubmission saved to: {submission.parquet}")
        #submission_file
        #submission.csv
        # Display first few rows of submission
        print("\nFirst 10 rows of submission:")
        print(submission.head(10))
        
        return submission, allocations, raw_predictions
        
    except Exception as e:
        print(f"Error in main execution: {str(e)}")
        print("Attempting to create a fallback submission...")
        
        # Fallback: create a simple market-weight submission
        test_df = pd.read_csv(test_path)
        date_col = test_df.columns[0]
        fallback_submission = pd.DataFrame({
            'Date': test_df[date_col],
            'Allocation': 1.0  # Market weight
        })
        fallback_submission.to_csv("submission.parquet.csv", index=False)
        print("Fallback submission created with market weight (1.0)")
        return fallback_submission, np.ones(len(test_df)), np.zeros(len(test_df))

if __name__ == "__main__":
    submission, allocations, raw_predictions = main()

Step 1: Loading data...
Training data shape: (8990, 98)
Test data shape: (10, 99)

Training columns: ['date_id', 'D1', 'D2', 'D3', 'D4', 'D5', 'D6', 'D7', 'D8', 'D9', 'E1', 'E10', 'E11', 'E12', 'E13', 'E14', 'E15', 'E16', 'E17', 'E18', 'E19', 'E2', 'E20', 'E3', 'E4', 'E5', 'E6', 'E7', 'E8', 'E9', 'I1', 'I2', 'I3', 'I4', 'I5', 'I6', 'I7', 'I8', 'I9', 'M1', 'M10', 'M11', 'M12', 'M13', 'M14', 'M15', 'M16', 'M17', 'M18', 'M2', 'M3', 'M4', 'M5', 'M6', 'M7', 'M8', 'M9', 'P1', 'P10', 'P11', 'P12', 'P13', 'P2', 'P3', 'P4', 'P5', 'P6', 'P7', 'P8', 'P9', 'S1', 'S10', 'S11', 'S12', 'S2', 'S3', 'S4', 'S5', 'S6', 'S7', 'S8', 'S9', 'V1', 'V10', 'V11', 'V12', 'V13', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7', 'V8', 'V9', 'forward_returns', 'risk_free_rate', 'market_forward_excess_returns']
Test columns: ['date_id', 'D1', 'D2', 'D3', 'D4', 'D5', 'D6', 'D7', 'D8', 'D9', 'E1', 'E10', 'E11', 'E12', 'E13', 'E14', 'E15', 'E16', 'E17', 'E18', 'E19', 'E2', 'E20', 'E3', 'E4', 'E5', 'E6', 'E7', 'E8', 'E9', 'I1', 'I2',

ey fixes made:

Automatic Price Column Detection: The code now automatically finds the price column by checking common names like 'Close', 'close', 'Price', etc.

Better Error Handling: Added comprehensive error handling and fallback mechanisms.

Data Validation: Added checks for empty datasets and invalid features.

Flexible Column Handling: The code adapts to whatever column names are actually in your dataset.

Fallback Submission: If anything goes wrong, it creates a simple market-weight submission as a fallback.

The code will now:

Print the actual column names in your dataset

Automatically identify which column contains price data

Adapt the feature engineering to use the available columns

Provide detailed debugging information

Create a valid submission even if there are issues

