In [33]:
"""
This script performs a complete weather and rain forecasting workflow:
1.  **GLDASFetcher**: Fetches historical weather data from the NASA POWER API
    for a user-specified location.
2.  **WeatherForecaster**:
    a.  Cleans and prepares the fetched data.
    b.  Trains a separate Facebook Prophet model for each weather variable
        (e.g., temp, humidity) to learn its seasonal patterns.
    c.  Saves these Prophet models to disk.
    d.  Builds a complex scikit-learn pipeline that:
        i.   Uses the trained Prophet models as feature generators.
        ii.  Uses the raw (now forecasted) weather data as another set of features.
        iii. Feeds both sets of features into an XGBoostClassifier to predict
             the *probability of rain*.
    e.  Evaluates all models (Prophet regression models and the final XGBoost
        classification model).
    f.  Provides a `model()` method that takes a future date and interval,
        generates a full weather forecast, and predicts the likelihood of rain
        for each day.
"""

import warnings
import requests
import time
import pickle as pk
import joblib  

from datetime import datetime, timedelta
from geopy.geocoders import Nominatim
import pandas as pd
import numpy as np

from prophet import Prophet

from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from xgboost import XGBClassifier
from sklearn.metrics import (
    accuracy_score,
    precision_score,
    recall_score,
    f1_score,
    roc_auc_score,
    confusion_matrix,
    precision_recall_curve,
    auc,
    mean_absolute_error,
    mean_squared_error,
    r2_score
)


# Suppress a specific warning from sklearn about unfitted pipelines
warnings.filterwarnings("ignore", message="This Pipeline instance is not fitted yet")


# === PART 1: DATA FETCHING ===

class GLDASFetcher:
    """
    Handles fetching of daily weather data from the NASA POWER API.
    """
    def __init__(self):
        """
        Initialize the NASA POWER API fetcher.
        """
        # NASA POWER API is free and doesn't require credentials
        
        # Map our desired simple names to the NASA POWER API parameter codes
        self.variables_map = {
            'temp': ['T2M_MAX', 'T2M_MIN'],       # Daily max/min temp at 2m
            'humidity': ['QV2M'],                 # Specific humidity at 2m
            'pressure': ['PS'],                   # Surface pressure
            'precipitation': ['PRECTOTCORR'],     # Corrected daily precipitation
            'solar_rad': ['ALLSKY_SFC_SW_DWN'],   # All-sky surface shortwave downward radiation
            'wind_speed': ['WS2M']                # Wind speed at 2m
        }
        
        # Create a reverse map to rename API columns to friendly names
        self.api_to_friendly_map = {
            'T2M_MAX': 'temp_max',
            'T2M_MIN': 'temp_min',
            'QV2M': 'humidity_specific',
            'PS': 'pressure',
            'PRECTOTCORR': 'precipitation_total',
            'ALLSKY_SFC_SW_DWN': 'solar_radiation',
            'WS2M': 'wind_speed',
            'latitude': 'lat',      # Will be added manually
            'longitude': 'lon'      # Will be added manually
        }

    def get_data(self, lat, lon, start_date, end_date, variables=None):
        """
        Fetch daily NASA POWER data for a single location and date range.

        Args:
            lat (float): Latitude
            lon (float): Longitude
            start_date (str): Start date "YYYY-MM-DD"
            end_date (str): End date "YYYY-MM-DD"
            variables (list, optional): List of keys from `self.variables_map`.
                                        Defaults to all.

        Returns:
            pd.DataFrame: A DataFrame containing the time-series weather data,
                          or an empty DataFrame if the API call fails.
        """
        if variables is None:
            variables = list(self.variables_map.keys())
    
        base_url = "https://power.larc.nasa.gov/api/temporal/daily/point"
    
        # Collect all required NASA POWER parameter codes
        power_params = []
        for var in variables:
            power_params.extend(self.variables_map.get(var, []))
    
        params = {
            'parameters': ','.join(power_params),
            'community': 'RE',  # Renewable Energy community
            'longitude': lon,
            'latitude': lat,
            'start': start_date.replace("-", ""),  # API needs YYYYMMDD
            'end': end_date.replace("-", ""),
            'format': 'JSON'
        }
    
        print(f"üåç Fetching NASA POWER data for ({lat}, {lon}) from {start_date} to {end_date} ...")
        
        try:
            response = requests.get(base_url, params=params, timeout=60)
            
            # Check for HTTP errors
            response.raise_for_status() 
    
        except requests.exceptions.HTTPError as e:
            print(f"‚ùå HTTP Error: {e}")
            print(f"   Response content: {response.text}")
            return pd.DataFrame()
        except requests.exceptions.RequestException as e:
            print(f"‚ùå API error (e.g., timeout, connection issue): {e}")
            return pd.DataFrame()

        # Process the JSON response
        try:
            data = response.json()['properties']['parameter']
        except KeyError:
            print("‚ùå API Error: Unexpected JSON structure. 'properties' or 'parameter' key not found.")
            print(f"   Response: {response.json()}")
            return pd.DataFrame()

        # Build dataframe manually from the nested JSON
        records = {}
        for var_code, timeseries in data.items():
            for date_str, value in timeseries.items():
                # NASA POWER uses -999 as a fill value for missing data
                value = np.nan if value == -999 else value
                
                if date_str not in records:
                    records[date_str] = {}
                records[date_str][var_code] = value
    
        if not records:
            print("‚ùå No data records found in response.")
            return pd.DataFrame()

        # Convert the dictionary of records into a DataFrame
        df = pd.DataFrame.from_dict(records, orient='index')
        df.index = pd.to_datetime(df.index, format="%Y%m%d")
        df.index.name = "date"
        df.reset_index(inplace=True)
        
        # Add metadata
        df['latitude'] = lat
        df['longitude'] = lon

        # Rename columns from API codes to friendly names
        df.rename(columns=self.api_to_friendly_map, inplace=True)
        
        # Ensure all expected columns exist, even if data was missing
        expected_cols = ['date', 'lat', 'lon'] + [self.api_to_friendly_map[p] for p in power_params]
        for col in expected_cols:
            if col not in df.columns:
                df[col] = np.nan

        print(f"‚úÖ Retrieved {len(df)} daily records")
        return df

    def get_location_by_address(self, address):
        """
        Get latitude and longitude from a string address using Nominatim.
        Retries recursively on failure.

        Args:
            address (str): The address to geocode (e.g., "Paris, France").

        Returns:
            dict: The raw location data from geopy, or None if it fails.
        """
        time.sleep(1) 
        geolocator = Nominatim(user_agent="gldas_fetcher")
        try:
            return geolocator.geocode(address).raw
        except Exception as e:
            print(f"Warning: Geocoding failed ({e}). Retrying...")
            return self.get_location_by_address(address)  # Recursive retry


# === PART 2: WEATHER & RAIN FORECASTING ===

class WeatherForecaster:
    """
    Handles training and prediction for weather variables (Prophet) and
    rain probability (XGBoost).
    """
    def __init__(self, data_path, pipeline_path="pipeline_ensemble.pkl"):
        """
        Initialize the forecaster.

        Args:
            data_path (str): Path to the CSV file with weather data.
            pipeline_path (str, optional): Path to save/load the final
                                           XGBoost pipeline.
        """
        self.data_path = data_path
        self.pipeline_path = pipeline_path
        
        # Load and clean the main dataset
        self.df = self._load_and_clean_data(self.data_path)
        
        # Define the target variables for Prophet (all columns except metadata)
        self.targets = self.df.columns.drop(["date", "day_of_year", "lat", "lon", "did_rain"])
        
        # Placeholders for models
        self.models = None  # Will hold the loaded Prophet models
        self.pipeline = None  # Will hold the loaded XGBoost pipeline
        self.best_threshold = None # Optimal threshold for rain classification

    def _load_and_clean_data(self, path):
        """
        Loads the CSV, performs cleaning, and engineers features.
        This is the base dataset for all models.
        """
        print(f"\nüßπ Loading and cleaning data from {path}...")
        df = pd.read_csv(path)
        df['date'] = pd.to_datetime(df['date'])
        
        # Feature Engineering
        df['day_of_year'] = df['date'].dt.dayofyear
        
        # Create the binary target variable 'did_rain'
        # (0.2 mm is a common threshold for "trace" precipitation)
        df['did_rain'] = (df['precipitation_total'] >= 0.2).astype(int)
        
        # Replace NASA's fill value (-999) with NaN, just in case
        df.replace(-999.0000, np.nan, inplace=True)
        
        # We drop 'precipitation_total' as it's the basis for our target
        # 'wind_speed' is dropped here as it was likely found to be 'noise'
        # during original model development.
        df.drop(columns=["wind_speed", "precipitation_total"], inplace=True)
        
        # Drop any rows with missing data to ensure model stability
        df.dropna(inplace=True)
        print("‚úÖ Data cleaning complete.")
        return df

    def _prepare_prophet_df(self, target_column):
        """
        Extracts and renames the 'date' and a target column
        from the main DataFrame, preparing it for Prophet.
        """
        # Select the date and the specific target variable
        df_prophet = self.df[['date', target_column]].copy()
        
        # Prophet requires columns to be named 'ds' (date) and 'y' (target)
        df_prophet.rename(columns={'date': 'ds', target_column: 'y'}, inplace=True)
        
        return df_prophet

    def train_and_save_prophet(self):
        """
        Trains one Prophet model for each target variable in `self.targets`
        and saves it to a .pkl file.
        """
        print("\nüöÇ Training Prophet models for each weather variable...")
        for target in self.targets:
            # Get the formatted data for this target
            df_prophet = self._prepare_prophet_df(target)
            
            # Initialize Prophet model
            # We only care about the yearly pattern
            model = Prophet(
                yearly_seasonality=True,
                weekly_seasonality=False,
                daily_seasonality=False
            )
            
            # Fit the model
            model.fit(df_prophet)
            
            # Save the model
            filename = f'prophet_model_{target}.pkl'
            with open(filename, 'wb') as file:
                pk.dump(model, file)
            print(f"   -> Model for '{target}' trained and saved to {filename}")

    def gathering_models(self):
        """
        Loads all the pickled Prophet models from disk into `self.models`.
        """
        print("\nüì• Loading all trained Prophet models...")
        all_models = []
        for target in self.targets:
            filename = f'prophet_model_{target}.pkl'
            try:
                with open(filename, 'rb') as file:
                    all_models.append(pk.load(file))
                print(f"   -> Loaded {filename}")
            except FileNotFoundError:
                print(f"   -> ‚ùå ERROR: Model file not found: {filename}")
                print("   -> Please run `train_and_save_prophet()` first.")
                return None
        
        self.models = all_models
        print("‚úÖ All Prophet models loaded.")
        return all_models

    def predict_func(self, start_date, interval):
        """
        Uses the loaded Prophet models to forecast all weather variables.

        This function creates the *input features* that will be fed into
        the final rain prediction (XGBoost) pipeline.

        Args:
            start_date (str): The starting date in 'YYYY-MM-DD' format.
            interval (int): The number of future days to forecast.

        Returns:
            pd.DataFrame: A DataFrame containing the forecasted weather variables.
        """
        # 1. Create a DataFrame with the future dates for Prophet
        future_dates_df = pd.DataFrame({
            'ds': pd.date_range(start=start_date, periods=interval, freq='D')
        })

        # 2. Start building the final DataFrame, beginning with the date column
        future_weather_df = pd.DataFrame({'date': future_dates_df['ds']})

        # 3. Loop through each Prophet model to forecast its specific weather variable
        print("üîÆ Forecasting future weather conditions with Prophet...")
        
        if not self.models:
            print("‚ùå No Prophet models loaded. Running `gathering_models()`...")
            self.gathering_models()

        # This assumes self.targets and self.models are in the same order
        for target_variable, model in zip(self.targets, self.models):
            # Use the model to predict values for the future dates
            forecast = model.predict(future_dates_df)

            # Extract the forecasted values ('yhat') and rename
            future_values = forecast[['ds', 'yhat']].rename(
                columns={'ds': 'date', 'yhat': target_variable}
            )

            # Merge this forecast into our main weather DataFrame
            future_weather_df = pd.merge(future_weather_df, future_values, on='date')
        
        # 4. Ensure the column order matches exactly what the pipeline was trained on
        #    This is the *input* format for the `save_pipeline` pipeline.
        required_columns = ['date', 'temp_max', 'temp_min', 'humidity_specific', 'pressure', 'solar_radiation']
        future_weather_df = future_weather_df[required_columns]

        print("‚úÖ Weather forecast complete.")
        return future_weather_df

    class ProphetWrapper(BaseEstimator, TransformerMixin):
        """
        A custom wrapper to make a fitted Prophet model act like a
        scikit-learn transformer, allowing it to be used in a Pipeline.
        """
        def __init__(self, model):
            self.model = model

        def fit(self, X, y=None):
            # The model is already fitted, so fit does nothing
            return self

        def transform(self, X):
            """
            `X` is expected to be a column of dates.
            `transform` will return the 'yhat' (forecast) for those dates.
            """
            # 1. Convert input (which is just a date series) into
            #    the DataFrame format Prophet expects.
            future = pd.DataFrame({'ds': X.flatten()})
            
            # 2. Make predictions
            forecast = self.model.predict(future)
            
            # 3. Return *only* the 'yhat' value as a 2D array
            return forecast[['yhat']].values

    def save_pipeline(self):
        """
        Builds, trains, and saves the final "ensemble" pipeline that
        predicts rain using XGBoost.
        """
        print("\nüõ†Ô∏è Building and training final rain prediction pipeline...")

        # --- 1. Define Feature Sets ---
        # The 'date' column will be used by the Prophet transformers
        date_feature = ['date'] 
        # These columns will be used by the standard scaling pipeline
        weather_features = ['temp_max', 'temp_min', 'humidity_specific', 'pressure', "solar_radiation"] 
        
        # --- 2. Create the Prophet Forecasting Pipeline Branch ---
        # This branch takes the 'date' column and, for each date,
        # generates a *new* set of weather forecasts using Prophet.
        
        if not self.models:
            print("‚ùå No Prophet models found. Running `gathering_models()`...")
            self.gathering_models()
            
        p_models = self.models
        wrapped_prophets = [
            (f'prophet_{target}', self.ProphetWrapper(model))
            for target, model in zip(self.targets, p_models)
        ]
        
        # Use FeatureUnion to run all Prophet models in parallel
        prophet_forecasters = FeatureUnion(wrapped_prophets)
        
        # This pipeline selects *only* the 'date' column and passes it
        # to the bank of Prophet forecasters.
        prophet_pipeline = Pipeline([
            ('selector', ColumnTransformer(
                [('date_selector', 'passthrough', [0])],  # Assumes 'date' is at index 0
                remainder='drop'
            )),
            ('prophet_features', prophet_forecasters)
        ])
        
        # --- 3. Create the Standard Weather Pipeline Branch ---
        # This branch takes the *actual* weather features, scales them,
        # and passes them through.
        weather_pipeline = Pipeline([
            ('selector', ColumnTransformer(
                # Select all columns that are in our 'weather_features' list
                [('weather_selector', 'passthrough', [i for i, col in enumerate(self.df.drop(columns='did_rain').columns) if col in weather_features])],
                remainder='drop'
            )),
            ('scaler', StandardScaler())
        ])
        
        # --- 4. Combine Both Branches ---
        # The final feature set for XGBoost will be:
        # [prophet_temp, prophet_hum, ..., scaled_actual_temp, scaled_actual_hum, ...]
        combined_features = FeatureUnion([
            ('prophet_pipeline', prophet_pipeline),
            ('weather_pipeline', weather_pipeline)
        ])
        
        # --- 5. Split Data for Training ---
        X = self.df.drop(columns='did_rain')
        y = self.df['did_rain'].values
        
        # We MUST set shuffle=False for time-series data
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.2, shuffle=False
        )
        
        # Calculate imbalance ratio for `scale_pos_weight`
        # This helps the model pay more attention to the rare 'rain' class
        neg, pos = np.bincount(y)
        imbalance_ratio = neg / pos
        
        # --- 6. Create and Train the Final Pipeline ---
        final_pipeline = Pipeline([
            ('features', combined_features),
            ('xgb_classifier', XGBClassifier(
                n_estimators=550,
                max_depth=8,
                learning_rate=0.05,
                subsample=0.7,
                scale_pos_weight=imbalance_ratio,
                random_state=42,
                n_jobs=-1
            ))
        ])
        
        print("   -> Fitting XGBoost pipeline... (This may take a moment)")
        final_pipeline.fit(X_train, y_train)
        
        # Save the trained pipeline
        joblib.dump(final_pipeline, self.pipeline_path)
        print(f"‚úÖ Pipeline trained and saved to {self.pipeline_path}")

        # --- 7. Evaluate the Pipeline ---
        print("\nüìä Evaluating rain prediction (XGBoost) model on Test Set...")
        y_pred_proba = final_pipeline.predict_proba(X_test)[:, 1] # Prob of '1' (rain)
        
        # Find the best threshold for F1-score
        precision, recall, thresholds = precision_recall_curve(y_test, y_pred_proba)
        
        # Add a small epsilon (1e-9) to avoid division by zero
        f1_scores = 2 * recall * precision / (recall + precision + 1e-9)
        
        # Find the threshold that gives the best F1 score
        self.best_threshold = thresholds[np.argmax(f1_scores)]
        
        print(f"   -> Best Threshold (for F1 Score): {self.best_threshold:.4f}")
        
        # Apply the best threshold to get binary predictions
        y_pred = (y_pred_proba >= self.best_threshold).astype(int)
        
        # Print classification metrics
        print(f"   -> Accuracy:  {accuracy_score(y_test, y_pred):.4f}")
        print(f"   -> Precision: {precision_score(y_test, y_pred):.4f}")
        print(f"   -> Recall:    {recall_score(y_test, y_pred):.4f}")
        print(f"   -> F1 Score:  {f1_score(y_test, y_pred):.4f}")
        print(f"   -> ROC AUC:   {roc_auc_score(y_test, y_pred_proba):.4f}")
        print("   -> Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

    def calculate_metrics(self, test, train, model):
        """
        Helper function to calculate regression metrics for Prophet models.
        """
        # Get predictions
        train_pred = model.predict(train)
        test_pred = model.predict(test)
    
        y_true = test['y'].values
        y_pred = test_pred['yhat'].values
    
        y_train = train['y'].values
        yhat_train = train_pred['yhat'].values
    
        # Symmetric Mean Absolute Percentage Error
        smape = 100 * np.mean(2 * np.abs(y_pred - y_true) / (np.abs(y_true) + np.abs(y_pred) + 1e-8))
    
        metrics = {
            'MAE': mean_absolute_error(y_true, y_pred),
            'RMSE': np.sqrt(mean_squared_error(y_true, y_pred)),
            'R2': r2_score(y_true, y_pred),
            "SMAPE": smape,
            'Mean_Error': np.mean(y_pred - y_true),  # Bias
            'Training MAE': mean_absolute_error(y_train, yhat_train)
        }
        
        for metric, value in metrics.items():
            print(f"   -> {metric}: {value:.2f}")

        return metrics

    def model(self):
        """
        Main user-facing function to generate a complete forecast.
        
        Prompts the user for a start date and interval, then returns
        the rain probability and the detailed weather forecast.

        Returns:
            (pd.DataFrame, pd.DataFrame):
                - proba_df: DataFrame with rain probabilities and recommendations.
                - weather_preds_df: DataFrame with detailed weather forecasts.
        """
        # 1. Load the main rain prediction pipeline
        try:
            self.pipeline = joblib.load(self.pipeline_path)
        except FileNotFoundError:
            print(f"‚ùå Pipeline file not found at {self.pipeline_path}.")
            print("   -> Please run `save_pipeline()` first.")
            return None, None
        
        # 2. Get the date and interval from the user
        start_date = input("\nEnter the start date (YYYY-MM-DD): ")
        interval = int(input("Enter the number of days to forecast: "))
        print("=" * 50)

        # 3. Call the helper function to get the weather forecast for the full interval.
        #    This `weather_preds_df` contains all the features needed for the next step.
        weather_preds_df = self.predict_func(start_date, interval)

        # 4. Use the complete weather forecast to predict the probability of rain.
        #    The pipeline receives a DataFrame with multiple rows and 6 columns, just as it expects.
        rain_probabilities = self.pipeline.predict_proba(weather_preds_df)

        # 5. Format the rain probability results into a clean DataFrame
        proba_df = pd.DataFrame(
            (rain_probabilities * 100).round(2),
            columns=["Prob. of No Rain (%)", "Prob. of Rain (%)"],
            index=weather_preds_df['date']  # Use the future dates as the index
        )
        
        # 6. Add a human-readable recommendation
        #    (Use a default threshold if `save_pipeline` hasn't been run)
        best_threshold_percent = (self.best_threshold or 0.5) * 100
        
        proba_df["Recommendation"] = proba_df['Prob. of Rain (%)'].apply(
            lambda x: "Take an umbrella! ‚òî" if x >= best_threshold_percent else "Enjoy the clear skies! ‚òÄÔ∏è")
        

        # 7. Return both the rain probabilities and the detailed weather predictions
        return proba_df, weather_preds_df.set_index('date')
    
    def prophet_metrics(self):
        """
        Calculates and prints regression metrics for all the
        individual Prophet models.
        """
        print("\nüìà Evaluating individual Prophet (weather) models...")
        
        models = self.gathering_models()
        if not models:
            return

        for i, target in enumerate(self.targets):
            print("-" * 70)
            print(f"Evaluation Results for: {target}")
            
            # Get the data for this target
            df_prophet = self._prepare_prophet_df(target)
            
            # Split the data (must be same split as XGBoost)
            train_df, test_df = train_test_split(df_prophet, test_size=0.2, shuffle=False)
            
            # Calculate and print metrics
            self.calculate_metrics(test_df, train_df, models[i])
            print("-" * 70)


# === PART 3: SCRIPT EXECUTION ===

if __name__ == "__main__":
    
    # --- Phase 1: Fetch Data ---
    print("üöÄ NASA GLDAS Data Fetcher & Rain Forecaster")
    print("=" * 50)
    
    # Initialize the fetcher
    fetcher = GLDASFetcher()
    city = input("Enter the city name (e.g., 'Cairo, Egypt'): ")
    
    location = fetcher.get_location_by_address(city)
    lat = location["lat"]
    lon = location["lon"]
    city_name = location['display_name']

    print(f"\nGeocoded {city} to: {city_name} ({lat}, {lon})")
    
    # Define a long historical period for robust model training
    start_date = "1984-01-01"
    # Fetch data up to yesterday
    end_date = (datetime.now() - timedelta(days=6)).strftime("%Y-%m-%d")

    
    data = fetcher.get_data(
        lat=lat,
        lon=lon,
        start_date=start_date,
        end_date=end_date,
        variables=['temp', 'humidity', 'pressure', 'precipitation', 'solar_rad', 'wind_speed']
    )
    
    if data.empty:
        print("‚ùå No data retrieved. Exiting script.")
    else:
        # Save the data to a CSV file
        data_path = "nasa_daily_weather_data.csv"
        data.to_csv(data_path, index=False)
        print(f"\n‚úÖ Data successfully saved to {data_path}")
        
        # --- Phase 2: Train and Run Forecaster ---
        
        # Initialize the forecaster with the data we just saved
        wf = WeatherForecaster(data_path=data_path)
        
        # 1. Train and save all the Prophet models (temp, humidity, etc.)
        wf.train_and_save_prophet()
        
        # 2. Evaluate the Prophet models
        wf.prophet_metrics()
        
        # 3. Train and save the main XGBoost rain prediction pipeline
        wf.save_pipeline()
        
        # --- Phase 3: Get a Forecast ---
        print("\n\n" + "=" * 50)
        print("üéâ All models trained! Let's get a forecast.")
        print("=" * 50)
        
        # 4. Run the main model function
        rain_forecast, weather_forecast = wf.model()
        
        if rain_forecast is not None:
            print("\n--- ‚òî Rain Forecast ---")
            print(rain_forecast)
            
            print("\n--- üå°Ô∏è Detailed Weather Forecast ---")
            print(weather_forecast.round(2))

üöÄ NASA GLDAS Data Fetcher & Rain Forecaster


Enter the city name (e.g., 'Cairo, Egypt'):  giza



Geocoded giza to: ÿßŸÑÿ¨Ÿäÿ≤ÿ©, 12524, ŸÖÿµÿ± (29.9870753, 31.2118063)
üåç Fetching NASA POWER data for (29.9870753, 31.2118063) from 1984-01-01 to 2025-10-31 ...
‚úÖ Retrieved 15280 daily records

‚úÖ Data successfully saved to nasa_daily_weather_data.csv

üßπ Loading and cleaning data from nasa_daily_weather_data.csv...
‚úÖ Data cleaning complete.

üöÇ Training Prophet models for each weather variable...


16:57:50 - cmdstanpy - INFO - Chain [1] start processing
16:57:55 - cmdstanpy - INFO - Chain [1] done processing


   -> Model for 'temp_max' trained and saved to prophet_model_temp_max.pkl


16:57:57 - cmdstanpy - INFO - Chain [1] start processing
16:58:02 - cmdstanpy - INFO - Chain [1] done processing


   -> Model for 'temp_min' trained and saved to prophet_model_temp_min.pkl


16:58:05 - cmdstanpy - INFO - Chain [1] start processing
16:58:09 - cmdstanpy - INFO - Chain [1] done processing


   -> Model for 'humidity_specific' trained and saved to prophet_model_humidity_specific.pkl


16:58:13 - cmdstanpy - INFO - Chain [1] start processing
16:58:18 - cmdstanpy - INFO - Chain [1] done processing


   -> Model for 'pressure' trained and saved to prophet_model_pressure.pkl


16:58:21 - cmdstanpy - INFO - Chain [1] start processing
16:58:24 - cmdstanpy - INFO - Chain [1] done processing


   -> Model for 'solar_radiation' trained and saved to prophet_model_solar_radiation.pkl

üìà Evaluating individual Prophet (weather) models...

üì• Loading all trained Prophet models...
   -> Loaded prophet_model_temp_max.pkl
   -> Loaded prophet_model_temp_min.pkl
   -> Loaded prophet_model_humidity_specific.pkl
   -> Loaded prophet_model_pressure.pkl
   -> Loaded prophet_model_solar_radiation.pkl
‚úÖ All Prophet models loaded.
----------------------------------------------------------------------
Evaluation Results for: temp_max
   -> MAE: 2.14
   -> RMSE: 2.83
   -> R2: 0.87
   -> SMAPE: 7.75
   -> Mean_Error: -0.00
   -> Training MAE: 2.27
----------------------------------------------------------------------
----------------------------------------------------------------------
Evaluation Results for: temp_min
   -> MAE: 1.57
   -> RMSE: 2.04
   -> R2: 0.88
   -> SMAPE: 12.69
   -> Mean_Error: 0.00
   -> Training MAE: 1.60
-------------------------------------------------------


Enter the start date (YYYY-MM-DD):  2025-12-10
Enter the number of days to forecast:  10


üîÆ Forecasting future weather conditions with Prophet...
‚úÖ Weather forecast complete.

--- ‚òî Rain Forecast ---
            Prob. of No Rain (%)  Prob. of Rain (%)             Recommendation
date                                                                          
2025-12-10             83.540001          16.459999  Enjoy the clear skies! ‚òÄÔ∏è
2025-12-11             80.570000          19.430000  Enjoy the clear skies! ‚òÄÔ∏è
2025-12-12             81.220001          18.780001  Enjoy the clear skies! ‚òÄÔ∏è
2025-12-13             68.879997          31.120001  Enjoy the clear skies! ‚òÄÔ∏è
2025-12-14             46.040001          53.959999        Take an umbrella! ‚òî
2025-12-15             38.669998          61.330002        Take an umbrella! ‚òî
2025-12-16             36.849998          63.150002        Take an umbrella! ‚òî
2025-12-17             54.880001          45.119999        Take an umbrella! ‚òî
2025-12-18             71.550003          28.450001  Enjoy the clear 

In [36]:
weather_forecast

Unnamed: 0_level_0,temp_max,temp_min,humidity_specific,pressure,solar_radiation
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2025-12-10,21.992522,10.70317,6.919742,100.248645,3.540198
2025-12-11,21.841458,10.563173,6.887751,100.253842,3.529355
2025-12-12,21.694178,10.424236,6.857209,100.258735,3.519805
2025-12-13,21.55077,10.28672,6.827947,100.263317,3.511488
2025-12-14,21.411326,10.150999,6.799797,100.267586,3.504342
2025-12-15,21.275946,10.017456,6.772589,100.271545,3.498294
2025-12-16,21.144749,9.886478,6.746162,100.275201,3.493271
2025-12-17,21.01787,9.758452,6.720363,100.278565,3.489202
2025-12-18,20.895467,9.633759,6.695048,100.281654,3.486015
2025-12-19,20.777723,9.512769,6.67009,100.284487,3.483645


In [37]:
rain_forecast

Unnamed: 0_level_0,Prob. of No Rain (%),Prob. of Rain (%),Recommendation
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2025-12-10,83.540001,16.459999,Enjoy the clear skies! ‚òÄÔ∏è
2025-12-11,80.57,19.43,Enjoy the clear skies! ‚òÄÔ∏è
2025-12-12,81.220001,18.780001,Enjoy the clear skies! ‚òÄÔ∏è
2025-12-13,68.879997,31.120001,Enjoy the clear skies! ‚òÄÔ∏è
2025-12-14,46.040001,53.959999,Take an umbrella! ‚òî
2025-12-15,38.669998,61.330002,Take an umbrella! ‚òî
2025-12-16,36.849998,63.150002,Take an umbrella! ‚òî
2025-12-17,54.880001,45.119999,Take an umbrella! ‚òî
2025-12-18,71.550003,28.450001,Enjoy the clear skies! ‚òÄÔ∏è
2025-12-19,84.25,15.75,Enjoy the clear skies! ‚òÄÔ∏è


In [None]:
# === PART 2: WEATHER & RAIN FORECASTING ===

class WeatherForecaster:
    """
    Handles training and prediction for weather variables (Prophet) and
    rain probability (XGBoost).
    """
    def __init__(self, data_path, pipeline_path="pipeline_ensemble.pkl"):
        """
        Initialize the forecaster.

        Args:
            data_path (str): Path to the CSV file with weather data.
            pipeline_path (str, optional): Path to save/load the final
                                           XGBoost pipeline.
        """
        self.data_path = data_path
        self.pipeline_path = pipeline_path
        
        # Load and clean the main dataset
        self.df = self._load_and_clean_data(self.data_path)
        
        # Define the target variables for Prophet (all columns except metadata)
        self.targets = self.df.columns.drop(["date", "day_of_year", "lat", "lon", "did_rain"])
        
        # Placeholders for models
        self.models = None  # Will hold the loaded Prophet models
        self.pipeline = None  # Will hold the loaded XGBoost pipeline
        self.best_threshold = None # Optimal threshold for rain classification

    def _load_and_clean_data(self, path):
        """
        Loads the CSV, performs cleaning, and engineers features.
        This is the base dataset for all models.
        """
        print(f"\nüßπ Loading and cleaning data from {path}...")
        df = pd.read_csv(path)
        df['date'] = pd.to_datetime(df['date'])
        
        # Feature Engineering
        df['day_of_year'] = df['date'].dt.dayofyear
        
        # Create the binary target variable 'did_rain'
        # (0.2 mm is a common threshold for "trace" precipitation)
        df['did_rain'] = (df['precipitation_total'] >= 0.2).astype(int)
        
        # Replace NASA's fill value (-999) with NaN, just in case
        df.replace(-999.0000, np.nan, inplace=True)
        
        # We drop 'precipitation_total' as it's the basis for our target
        # 'wind_speed' is dropped here as it was likely found to be 'noise'
        # during original model development.
        df.drop(columns=["precipitation_total"], inplace=True)
        
        # Drop any rows with missing data to ensure model stability
        df.dropna(inplace=True)
        print("‚úÖ Data cleaning complete.")
        return df

    def _prepare_prophet_df(self, target_column):
        """
        Extracts and renames the 'date' and a target column
        from the main DataFrame, preparing it for Prophet.
        """
        # Select the date and the specific target variable
        df_prophet = self.df[['date', target_column]].copy()
        
        # Prophet requires columns to be named 'ds' (date) and 'y' (target)
        df_prophet.rename(columns={'date': 'ds', target_column: 'y'}, inplace=True)
        
        return df_prophet

    def train_and_save_prophet(self):
        """
        Trains one Prophet model for each target variable in `self.targets`
        and saves it to a .pkl file.
        """
        print("\nüöÇ Training Prophet models for each weather variable...")
        for target in self.targets:
            # Get the formatted data for this target
            df_prophet = self._prepare_prophet_df(target)
            
            # Initialize Prophet model
            # We only care about the yearly pattern
            model = Prophet(
                yearly_seasonality=True,
                weekly_seasonality=False,
                daily_seasonality=False
            )
            
            # Fit the model
            model.fit(df_prophet)
            
            # Save the model
            filename = f'prophet_model_{target}.pkl'
            with open(filename, 'wb') as file:
                pk.dump(model, file)
            print(f"   -> Model for '{target}' trained and saved to {filename}")

    def gathering_models(self):
        """
        Loads all the pickled Prophet models from disk into `self.models`.
        """
        print("\nüì• Loading all trained Prophet models...")
        all_models = []
        for target in self.targets:
            filename = f'prophet_model_{target}.pkl'
            try:
                with open(filename, 'rb') as file:
                    all_models.append(pk.load(file))
                print(f"   -> Loaded {filename}")
            except FileNotFoundError:
                print(f"   -> ‚ùå ERROR: Model file not found: {filename}")
                print("   -> Please run `train_and_save_prophet()` first.")
                return None
        
        self.models = all_models
        print("‚úÖ All Prophet models loaded.")
        return all_models

    def predict_func(self, start_date, interval):
        """
        Uses the loaded Prophet models to forecast all weather variables.

        This function creates the *input features* that will be fed into
        the final rain prediction (XGBoost) pipeline.

        Args:
            start_date (str): The starting date in 'YYYY-MM-DD' format.
            interval (int): The number of future days to forecast.

        Returns:
            pd.DataFrame: A DataFrame containing the forecasted weather variables.
        """
        # 1. Create a DataFrame with the future dates for Prophet
        future_dates_df = pd.DataFrame({
            'ds': pd.date_range(start=start_date, periods=interval, freq='D')
        })

        # 2. Start building the final DataFrame, beginning with the date column
        future_weather_df = pd.DataFrame({'date': future_dates_df['ds']})

        # 3. Loop through each Prophet model to forecast its specific weather variable
        print("üîÆ Forecasting future weather conditions with Prophet...")
        
        if not self.models:
            print("‚ùå No Prophet models loaded. Running `gathering_models()`...")
            self.gathering_models()

        # This assumes self.targets and self.models are in the same order
        for target_variable, model in zip(self.targets, self.models):
            # Use the model to predict values for the future dates
            forecast = model.predict(future_dates_df)

            # Extract the forecasted values ('yhat') and rename
            future_values = forecast[['ds', 'yhat']].rename(
                columns={'ds': 'date', 'yhat': target_variable}
            )

            # Merge this forecast into our main weather DataFrame
            future_weather_df = pd.merge(future_weather_df, future_values, on='date')
        
        # 4. Ensure the column order matches exactly what the pipeline was trained on
        #    This is the *input* format for the `save_pipeline` pipeline.
        required_columns = ['date', 'temp_max', 'temp_min', 'humidity_specific', 'pressure', 'solar_radiation',"evapotranspiration","soil_moisture_surface","dew_point_temp"]
        future_weather_df = future_weather_df[required_columns]

        print("‚úÖ Weather forecast complete.")
        return future_weather_df

    class ProphetWrapper(BaseEstimator, TransformerMixin):
        """
        A custom wrapper to make a fitted Prophet model act like a
        scikit-learn transformer, allowing it to be used in a Pipeline.
        """
        def __init__(self, model):
            self.model = model

        def fit(self, X, y=None):
            # The model is already fitted, so fit does nothing
            return self

        def transform(self, X):
            """
            `X` is expected to be a column of dates.
            `transform` will return the 'yhat' (forecast) for those dates.
            """
            # 1. Convert input (which is just a date series) into
            #    the DataFrame format Prophet expects.
            future = pd.DataFrame({'ds': X.flatten()})
            
            # 2. Make predictions
            forecast = self.model.predict(future)
            
            # 3. Return *only* the 'yhat' value as a 2D array
            return forecast[['yhat']].values

    def save_pipeline(self):
        """
        Builds, trains, and saves the final "ensemble" pipeline that
        predicts rain using XGBoost.
        """
        print("\nüõ†Ô∏è Building and training final rain prediction pipeline...")

        # --- 1. Define Feature Sets ---
        # The 'date' column will be used by the Prophet transformers
        date_feature = ['date'] 
        # These columns will be used by the standard scaling pipeline
        weather_features = ['temp_max', 'temp_min', 'humidity_specific', 'pressure', "solar_radiation"] 
        
        # --- 2. Create the Prophet Forecasting Pipeline Branch ---
        # This branch takes the 'date' column and, for each date,
        # generates a *new* set of weather forecasts using Prophet.
        
        if not self.models:
            print("‚ùå No Prophet models found. Running `gathering_models()`...")
            self.gathering_models()
            
        p_models = self.models
        wrapped_prophets = [
            (f'prophet_{target}', self.ProphetWrapper(model))
            for target, model in zip(self.targets, p_models)
        ]
        
        # Use FeatureUnion to run all Prophet models in parallel
        prophet_forecasters = FeatureUnion(wrapped_prophets)
        
        # This pipeline selects *only* the 'date' column and passes it
        # to the bank of Prophet forecasters.
        prophet_pipeline = Pipeline([
            ('selector', ColumnTransformer(
                [('date_selector', 'passthrough', [0])],  # Assumes 'date' is at index 0
                remainder='drop'
            )),
            ('prophet_features', prophet_forecasters)
        ])
        
        # --- 3. Create the Standard Weather Pipeline Branch ---
        # This branch takes the *actual* weather features, scales them,
        # and passes them through.
        weather_pipeline = Pipeline([
            ('selector', ColumnTransformer(
                # Select all columns that are in our 'weather_features' list
                [('weather_selector', 'passthrough', [i for i, col in enumerate(self.df.drop(columns='did_rain').columns) if col in weather_features])],
                remainder='drop'
            )),
            ('scaler', StandardScaler())
        ])
        
        # --- 4. Combine Both Branches ---
        # The final feature set for XGBoost will be:
        # [prophet_temp, prophet_hum, ..., scaled_actual_temp, scaled_actual_hum, ...]
        combined_features = FeatureUnion([
            ('prophet_pipeline', prophet_pipeline),
            ('weather_pipeline', weather_pipeline)
        ])
        
        # --- 5. Split Data for Training ---
        X = self.df.drop(columns='did_rain')
        y = self.df['did_rain'].values
        
        # We MUST set shuffle=False for time-series data
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.2, shuffle=False
        )
        
        # Calculate imbalance ratio for `scale_pos_weight`
        # This helps the model pay more attention to the rare 'rain' class
        neg, pos = np.bincount(y)
        imbalance_ratio = neg / pos
        
        # --- 6. Create and Train the Final Pipeline ---
        final_pipeline = Pipeline([
            ('features', combined_features),
            ('xgb_classifier', XGBClassifier(
                n_estimators=550,
                max_depth=8,
                learning_rate=0.05,
                subsample=0.7,
                scale_pos_weight=imbalance_ratio,
                random_state=42,
                n_jobs=-1
            ))
        ])
        
        print("   -> Fitting XGBoost pipeline... (This may take a moment)")
        final_pipeline.fit(X_train, y_train)
        
        # Save the trained pipeline
        joblib.dump(final_pipeline, self.pipeline_path)
        print(f"‚úÖ Pipeline trained and saved to {self.pipeline_path}")

        # --- 7. Evaluate the Pipeline ---
        print("\nüìä Evaluating rain prediction (XGBoost) model on Test Set...")
        y_pred_proba = final_pipeline.predict_proba(X_test)[:, 1] # Prob of '1' (rain)
        
        # Find the best threshold for F1-score
        precision, recall, thresholds = precision_recall_curve(y_test, y_pred_proba)
        
        # Add a small epsilon (1e-9) to avoid division by zero
        f1_scores = 2 * recall * precision / (recall + precision + 1e-9)
        
        # Find the threshold that gives the best F1 score
        self.best_threshold = thresholds[np.argmax(f1_scores)]
        
        print(f"   -> Best Threshold (for F1 Score): {self.best_threshold:.4f}")
        
        # Apply the best threshold to get binary predictions
        y_pred = (y_pred_proba >= self.best_threshold).astype(int)
        
        # Print classification metrics
        print(f"   -> Accuracy:  {accuracy_score(y_test, y_pred):.4f}")
        print(f"   -> Precision: {precision_score(y_test, y_pred):.4f}")
        print(f"   -> Recall:    {recall_score(y_test, y_pred):.4f}")
        print(f"   -> F1 Score:  {f1_score(y_test, y_pred):.4f}")
        print(f"   -> ROC AUC:   {roc_auc_score(y_test, y_pred_proba):.4f}")
        print("   -> Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

    def calculate_metrics(self, test, train, model):
        """
        Helper function to calculate regression metrics for Prophet models.
        """
        # Get predictions
        train_pred = model.predict(train)
        test_pred = model.predict(test)
    
        y_true = test['y'].values
        y_pred = test_pred['yhat'].values
    
        y_train = train['y'].values
        yhat_train = train_pred['yhat'].values
    
        # Symmetric Mean Absolute Percentage Error
        smape = 100 * np.mean(2 * np.abs(y_pred - y_true) / (np.abs(y_true) + np.abs(y_pred) + 1e-8))
    
        metrics = {
            'MAE': mean_absolute_error(y_true, y_pred),
            'RMSE': np.sqrt(mean_squared_error(y_true, y_pred)),
            'R2': r2_score(y_true, y_pred),
            "SMAPE": smape,
            'Mean_Error': np.mean(y_pred - y_true),  # Bias
            'Training MAE': mean_absolute_error(y_train, yhat_train)
        }
        
        for metric, value in metrics.items():
            print(f"   -> {metric}: {value:.2f}")

        return metrics

    def model(self):
        """
        Main user-facing function to generate a complete forecast.
        
        Prompts the user for a start date and interval, then returns
        the rain probability and the detailed weather forecast.

        Returns:
            (pd.DataFrame, pd.DataFrame):
                - proba_df: DataFrame with rain probabilities and recommendations.
                - weather_preds_df: DataFrame with detailed weather forecasts.
        """
        # 1. Load the main rain prediction pipeline
        try:
            self.pipeline = joblib.load(self.pipeline_path)
        except FileNotFoundError:
            print(f"‚ùå Pipeline file not found at {self.pipeline_path}.")
            print("   -> Please run `save_pipeline()` first.")
            return None, None
        
        # 2. Get the date and interval from the user
        start_date = input("\nEnter the start date (YYYY-MM-DD): ")
        interval = int(input("Enter the number of days to forecast: "))
        print("=" * 50)

        # 3. Call the helper function to get the weather forecast for the full interval.
        #    This `weather_preds_df` contains all the features needed for the next step.
        weather_preds_df = self.predict_func(start_date, interval)

        # 4. Use the complete weather forecast to predict the probability of rain.
        #    The pipeline receives a DataFrame with multiple rows and 6 columns, just as it expects.
        rain_probabilities = self.pipeline.predict_proba(weather_preds_df)

        # 5. Format the rain probability results into a clean DataFrame
        proba_df = pd.DataFrame(
            (rain_probabilities * 100).round(2),
            columns=["Prob. of No Rain (%)", "Prob. of Rain (%)"],
            index=weather_preds_df['date']  # Use the future dates as the index
        )
        
        # 6. Add a human-readable recommendation
        #    (Use a default threshold if `save_pipeline` hasn't been run)
        best_threshold_percent = (self.best_threshold or 0.5) * 100
        
        proba_df["Recommendation"] = proba_df['Prob. of Rain (%)'].apply(
            lambda x: "Take an umbrella! ‚òî" if x >= best_threshold_percent else "Enjoy the clear skies! ‚òÄÔ∏è")
        

        # 7. Return both the rain probabilities and the detailed weather predictions
        return proba_df, weather_preds_df.set_index('date')
    
    def prophet_metrics(self):
        """
        Calculates and prints regression metrics for all the
        individual Prophet models.
        """
        print("\nüìà Evaluating individual Prophet (weather) models...")
        
        models = self.gathering_models()
        if not models:
            return

        for i, target in enumerate(self.targets):
            print("-" * 70)
            print(f"Evaluation Results for: {target}")
            
            # Get the data for this target
            df_prophet = self._prepare_prophet_df(target)
            
            # Split the data (must be same split as XGBoost)
            train_df, test_df = train_test_split(df_prophet, test_size=0.2, shuffle=False)
            
            # Calculate and print metrics
            self.calculate_metrics(test_df, train_df, models[i])
            print("-" * 70)


# === PART 3: SCRIPT EXECUTION ===

if __name__ == "__main__":
    
    # # --- Phase 1: Fetch Data ---
    # print("üöÄ NASA GLDAS Data Fetcher & Rain Forecaster")
    # print("=" * 50)
    
    # # Initialize the fetcher
    # fetcher = GLDASFetcher()
    # city = input("Enter the city name (e.g., 'Cairo, Egypt'): ")
    
    # location = fetcher.get_location_by_address(city)
    # lat = location["lat"]
    # lon = location["lon"]
    # city_name = location['display_name']

    # print(f"\nGeocoded {city} to: {city_name} ({lat}, {lon})")
    
    # Define a long historical period for robust model training
    start_date = "1984-01-01"
    # Fetch data up to yesterday
    end_date = (datetime.now() - timedelta(days=6)).strftime("%Y-%m-%d")

    
    # data = fetcher.get_data(
    #     lat=lat,
    #     lon=lon,
    #     start_date=start_date,
    #     end_date=end_date,
    #     variables=['temp', 'humidity', 'pressure', 'precipitation', 'solar_rad', 'wind_speed','evapotranspiration', 'soil_moisture', 'dew_point']

    # )
    
    if data.empty:
        print("‚ùå No data retrieved. Exiting script.")
    else:
        # Save the data to a CSV file
        data_path = "nasa_daily_weather_data.csv"
        data.to_csv(data_path, index=False)
        print(f"\n‚úÖ Data successfully saved to {data_path}")
        
        # --- Phase 2: Train and Run Forecaster ---
        
        # Initialize the forecaster with the data we just saved
        wf = WeatherForecaster(data_path=data_path)
        
        # 1. Train and save all the Prophet models (temp, humidity, etc.)
        wf.train_and_save_prophet()
        
        # 2. Evaluate the Prophet models
        wf.prophet_metrics()
        
        # 3. Train and save the main XGBoost rain prediction pipeline
        wf.save_pipeline()
        
        # --- Phase 3: Get a Forecast ---
        print("\n\n" + "=" * 50)
        print("üéâ All models trained! Let's get a forecast.")
        print("=" * 50)
        
        # 4. Run the main model function
        rain_forecast, weather_forecast = wf.model()
        
        if rain_forecast is not None:
            print("\n--- ‚òî Rain Forecast ---")
            print(rain_forecast)
            
            print("\n--- üå°Ô∏è Detailed Weather Forecast ---")
            print(weather_forecast.round(2))


‚úÖ Data successfully saved to nasa_daily_weather_data.csv

üßπ Loading and cleaning data from nasa_daily_weather_data.csv...
‚úÖ Data cleaning complete.

üöÇ Training Prophet models for each weather variable...


19:13:32 - cmdstanpy - INFO - Chain [1] start processing
19:13:37 - cmdstanpy - INFO - Chain [1] done processing


   -> Model for 'temp_max' trained and saved to prophet_model_temp_max.pkl


19:13:41 - cmdstanpy - INFO - Chain [1] start processing
19:13:49 - cmdstanpy - INFO - Chain [1] done processing


   -> Model for 'temp_min' trained and saved to prophet_model_temp_min.pkl


19:13:53 - cmdstanpy - INFO - Chain [1] start processing
19:13:57 - cmdstanpy - INFO - Chain [1] done processing


   -> Model for 'humidity_specific' trained and saved to prophet_model_humidity_specific.pkl


19:14:00 - cmdstanpy - INFO - Chain [1] start processing
19:14:07 - cmdstanpy - INFO - Chain [1] done processing


   -> Model for 'pressure' trained and saved to prophet_model_pressure.pkl


19:14:09 - cmdstanpy - INFO - Chain [1] start processing
19:14:12 - cmdstanpy - INFO - Chain [1] done processing


   -> Model for 'solar_radiation' trained and saved to prophet_model_solar_radiation.pkl


19:14:14 - cmdstanpy - INFO - Chain [1] start processing
19:14:19 - cmdstanpy - INFO - Chain [1] done processing


   -> Model for 'evapotranspiration' trained and saved to prophet_model_evapotranspiration.pkl


19:14:21 - cmdstanpy - INFO - Chain [1] start processing
19:14:27 - cmdstanpy - INFO - Chain [1] done processing


   -> Model for 'soil_moisture_surface' trained and saved to prophet_model_soil_moisture_surface.pkl


19:14:29 - cmdstanpy - INFO - Chain [1] start processing
19:14:32 - cmdstanpy - INFO - Chain [1] done processing


   -> Model for 'dew_point_temp' trained and saved to prophet_model_dew_point_temp.pkl

üìà Evaluating individual Prophet (weather) models...

üì• Loading all trained Prophet models...
   -> Loaded prophet_model_temp_max.pkl
   -> Loaded prophet_model_temp_min.pkl
   -> Loaded prophet_model_humidity_specific.pkl
   -> Loaded prophet_model_pressure.pkl
   -> Loaded prophet_model_solar_radiation.pkl
   -> Loaded prophet_model_evapotranspiration.pkl
   -> Loaded prophet_model_soil_moisture_surface.pkl
   -> Loaded prophet_model_dew_point_temp.pkl
‚úÖ All Prophet models loaded.
----------------------------------------------------------------------
Evaluation Results for: temp_max
   -> MAE: 2.14
   -> RMSE: 2.83
   -> R2: 0.87
   -> SMAPE: 7.75
   -> Mean_Error: -0.00
   -> Training MAE: 2.27
----------------------------------------------------------------------
----------------------------------------------------------------------
Evaluation Results for: temp_min
   -> MAE: 1.57
   -> R