# CMIP6 Vapor Pressure (VP) Calculation - SILO Method

## Overview

This notebook calculates vapor pressure (VP) from CMIP6 climate data using the SILO method. This method matches SILO units (hPa) and uses mean relative humidity and mean temperature.

## Input Variables

- **hurs**: Mean relative humidity (%) - required
- **tasmax**: Daily maximum temperature (°C) - required
- **tasmin**: Daily minimum temperature (°C) - required

## Output

- **vp**: Vapor pressure (hPa) matching SILO units

## Calculation Method (SILO)

### Saturation Vapor Pressure (kPa)

For mean temperature T_mean in °C:

```
e_s(T_mean) = 0.611 × exp(17.27 × T_mean / (T_mean + 237.3))
```

Where: T_mean = (tasmax + tasmin) / 2

### Actual Vapor Pressure (hPa)

Using mean relative humidity (hurs) and saturation vapor pressure:

```
VP(hPa) = 10 × (hurs/100) × e_s(T_mean)
```

Or directly in hPa:

```
VP(hPa) = (hurs/100) × 0.611 × exp(17.27 × T_mean / (T_mean + 237.3)) × 10
```

This gives a daily VP proxy. SILO VP is "9am-like"; to replicate SILO closely, you would bias-correct this VP proxy to SILO over a historical overlap period.

## Usage

1. Set configuration parameters (Model, Scenario, coordinates)
2. Extract hurs, tasmax, and tasmin data from NetCDF files
3. Calculate vapor pressure using the SILO method with mean humidity
4. Save results to CSV file

## Section 1: Imports and Configuration

In [8]:
import pandas as pd
import numpy as np
import xarray as xr
import glob
import os
import time
import re
from pathlib import Path
from datetime import datetime
from tqdm import tqdm

print("Libraries imported successfully")

Libraries imported successfully


In [None]:
# CONFIGURATION - CHANGE VALUES BELOW AS NEEDED
# ============================================================================
# All other settings will automatically adjust based on these values

# Output Directory - OPTIONAL: Set to None to auto-generate, or specify a custom path
OUTPUT_DIR_MANUAL = r"C:\Users\ibian\Desktop\ClimAdapt\Anameka\Anameka_South_16_226042"  # Set to None for auto-generation

# Model (usually doesn't need to change)
MODEL = "ACCESS CM2"  # e.g., "ACCESS CM2"

# Scenario - CHANGE THIS
SCENARIO = "SSP245"   # Options: "SSP245", "SSP585", etc.

# Coordinates - CHANGE THESE
LATITUDE = -31.75   # Target latitude in decimal degrees (-90 to 90)
LONGITUDE = 117.5999984741211  # Target longitude in decimal degrees (-180 to 180)

# ============================================================================
# AUTOMATIC SETTINGS (derived from above - no need to change)
# ============================================================================

# Base directories
CMIP6_BASE_DIR = r"C:\Users\ibian\Desktop\ClimAdapt\CMIP6"
base_output_dir = r"C:\Users\ibian\Desktop\ClimAdapt\Anameka"
COORD_TOLERANCE = 0.01  # degrees (approximately 1.1 km)

# Auto-generate output directory and filename components based on scenario and coordinates
# For directory names, use underscore format (filesystem-friendly)
lat_str_dir = f"{LATITUDE:.2f}".replace('.', '_').replace('-', 'neg')
lon_str_dir = f"{LONGITUDE:.2f}".replace('.', '_').replace('-', 'neg')
# For output filenames, use decimal format (keep dots and minus signs)
lat_str = f"{LATITUDE:.2f}"
lon_str = f"{LONGITUDE:.2f}"
model_scenario = f"{MODEL.replace(' ', '_')}_{SCENARIO}"
model_scenario_dir = f"{model_scenario}_{lat_str_dir}_{lon_str_dir}"

# Use manual output directory if specified, otherwise auto-generate
if OUTPUT_DIR_MANUAL is not None and OUTPUT_DIR_MANUAL != "":
    OUTPUT_DIR = OUTPUT_DIR_MANUAL
    print(f"  [INFO] Using manual output directory: {OUTPUT_DIR}")
else:
    OUTPUT_DIR = os.path.join(base_output_dir, model_scenario_dir)
    print(f"  [INFO] Auto-generated output directory: {OUTPUT_DIR}")

# Variables required for VP calculation (SILO method with mean humidity)
REQUIRED_VARIABLES = ['hurs', 'tasmax', 'tasmin']

# Ensure output directory exists
os.makedirs(OUTPUT_DIR, exist_ok=True)

print("="*70)
print("CONFIGURATION")
print("="*70)
print(f"  Model: {MODEL}")
print(f"  Scenario: {SCENARIO}")
print(f"  Coordinates: ({LATITUDE:.6f}, {LONGITUDE:.6f})")
print(f"  CMIP6 Base Directory: {CMIP6_BASE_DIR}")
print(f"  Output Directory: {OUTPUT_DIR}")
print(f"  Required Variables: {', '.join(REQUIRED_VARIABLES)}")
print("="*70)
print("\nAll paths and filenames will automatically use the above settings.\n")

  [INFO] Using manual output directory: C:\Users\ibian\Desktop\ClimAdapt\Anameka\Anameka_South_16_226042
CONFIGURATION
  Model: ACCESS CM2
  Scenario: SSP245
  Coordinates: (-31.750000, 117.599998)
  CMIP6 Base Directory: C:\Users\ibian\Desktop\ClimAdapt\CMIP6
  Output Directory: C:\Users\ibian\Desktop\ClimAdapt\Anameka\Anameka_South_16_226042
  Required Variables: hurs, tasmax, tasmin

All paths and filenames will automatically use the above settings.



## Section 2.5: Caching Functions

To optimize performance, extracted NetCDF data is cached to CSV files. On subsequent runs, 
cached data is loaded automatically instead of re-extracting from NetCDF files.

**Cache Location:** Cached files are saved in the output directory with naming convention:
`{model_scenario}_{lat_str}_{lon_str}_{variable}.csv`

**Cache Behavior:**
- If cache exists and is valid → Load from cache (fast)
- If cache doesn't exist or is invalid → Extract from NetCDF files and save to cache


In [11]:
def get_cached_variable_path(output_dir, model_scenario, lat_str, lon_str, variable):
    """
    Generate the path for a cached variable CSV file.
    
    Parameters:
    -----------
    output_dir : str
        Output directory
    model_scenario : str
        Model and scenario string (e.g., "ACCESS_CM2_SSP245")
    lat_str : str
        Latitude formatted as string (e.g., "-31.75")
    lon_str : str
        Longitude formatted as string (e.g., "117.60")
    variable : str
        Variable name (e.g., "tasmax", "hurs")
    
    Returns:
    --------
    str
        Path to cached CSV file
    """
    cache_filename = f"{model_scenario}_{lat_str}_{lon_str}_{variable}.csv"
    cache_path = os.path.join(output_dir, cache_filename)
    return cache_path


def load_cached_variable(cache_path):
    """
    Load cached variable data from CSV file.
    
    Parameters:
    -----------
    cache_path : str
        Path to cached CSV file
    
    Returns:
    --------
    pd.DataFrame or None
        DataFrame with date and value columns if file exists and is valid, None otherwise
    """
    print(f"  [INFO] Checking cache: {os.path.basename(cache_path)}")
    
    if not os.path.exists(cache_path):
        print(f"  [INFO] Cache file not found, will extract from NetCDF")
        return None
    
    try:
        df = pd.read_csv(cache_path)
        if 'date' not in df.columns or 'value' not in df.columns:
            print(f"  [WARNING] Cached file missing required columns, will re-extract")
            return None
        
        df['date'] = pd.to_datetime(df['date'])
        
        # Basic validation - check if file has data
        if len(df) == 0:
            print(f"  [WARNING] Cached file is empty, will re-extract")
            return None
        
        print(f"  [INFO] Cache file found and valid")
        return df
    
    except Exception as e:
        print(f"  [WARNING] Error loading cached file: {e}, will re-extract")
        return None


def save_cached_variable(df, cache_path):
    """
    Save extracted variable data to CSV cache file.
    
    Parameters:
    -----------
    df : pd.DataFrame
        DataFrame with date and value columns
    cache_path : str
        Path to save cached CSV file
    """
    try:
        df[['date', 'value']].to_csv(
            cache_path,
            index=False,
            encoding='utf-8',
            float_format='%.6f'
        )
        print(f"  [INFO] Saved to cache: {os.path.basename(cache_path)}")
    except Exception as e:
        print(f"  [WARNING] Failed to save cache: {e}")



## Section 2: NetCDF Data Extraction Function

In [12]:
def extract_daily_data_from_netcdf(netcdf_dir, variable, target_lat, target_lon, tolerance=0.01):
    """
    Extract daily time series data for a specific coordinate from NetCDF files.
    
    Parameters:
    -----------
    netcdf_dir : str
        Directory containing NetCDF files for the variable
    variable : str
        Variable name (hurs, tasmax, tasmin)
    target_lat : float
        Target latitude
    target_lon : float
        Target longitude
    tolerance : float
        Coordinate matching tolerance in degrees
    
    Returns:
    --------
    pd.DataFrame
        DataFrame with columns: date, value
    """
    start_time = time.time()
    
    # Find all NetCDF files in the directory
    nc_files = sorted(glob.glob(os.path.join(netcdf_dir, f"*{variable}*.nc")))
    
    # Pattern 2: Files in subdirectories named {variable}_*
    if len(nc_files) == 0:
        var_subdirs = glob.glob(os.path.join(netcdf_dir, f"{variable}_*"))
        for var_subdir in var_subdirs:
            if os.path.isdir(var_subdir):
                found_files = sorted(glob.glob(os.path.join(var_subdir, "*.nc")))
                if found_files:
                    nc_files.extend(found_files)
                    print(f"  Found files in subdirectory: {os.path.basename(var_subdir)}/")
                    break
    
    if len(nc_files) == 0:
        print(f"  ERROR: No NetCDF files found in {netcdf_dir}")
        return None
    
    print(f"  Found {len(nc_files)} NetCDF files")
    
    # Cache coordinate information from first file
    lat_name = None
    lon_name = None
    time_name = None
    lat_idx = None
    lon_idx = None
    var_name = None
    
    # List to store daily data
    all_data = []
    
    # Process first file to get coordinate structure
    if len(nc_files) > 0:
        try:
            ds_sample = xr.open_dataset(nc_files[0], decode_times=False)
            
            # Get variable name
            for v in ds_sample.data_vars:
                if variable in v.lower() or v.lower() in variable.lower():
                    var_name = v
                    break
            
            if var_name is None:
                possible_names = [variable, variable.upper(), f'{variable}_day']
                for name in possible_names:
                    if name in ds_sample.data_vars:
                        var_name = name
                        break
            
            # Get coordinate names
            for coord in ds_sample.coords:
                coord_lower = coord.lower()
                if 'lat' in coord_lower:
                    lat_name = coord
                elif 'lon' in coord_lower:
                    lon_name = coord
                elif 'time' in coord_lower:
                    time_name = coord
            
            if lat_name and lon_name:
                # Find nearest grid point
                lat_idx = np.abs(ds_sample[lat_name].values - target_lat).argmin()
                lon_idx = np.abs(ds_sample[lon_name].values - target_lon).argmin()
                
                actual_lat = float(ds_sample[lat_name].values[lat_idx])
                actual_lon = float(ds_sample[lon_name].values[lon_idx])
                
                # Check if within tolerance
                if abs(actual_lat - target_lat) > tolerance or abs(actual_lon - target_lon) > tolerance:
                    print(f"  Warning: Nearest point ({actual_lat:.4f}, {actual_lon:.4f}) is outside tolerance")
                else:
                    print(f"  Using grid point: ({actual_lat:.4f}, {actual_lon:.4f})")
            
            ds_sample.close()
            
        except Exception as e:
            print(f"  Warning: Could not read sample file: {e}")
    
    if var_name is None or lat_idx is None or lon_idx is None:
        print(f"  ERROR: Could not determine coordinate structure")
        return None
    
    # Process all files with progress bar
    print(f"  Processing files...")
    for nc_file in tqdm(nc_files, desc=f"  {variable}", unit="file"):
        try:
            ds = xr.open_dataset(nc_file, decode_times=False)
            
            # Extract data using cached indices
            data = ds[var_name].isel({lat_name: lat_idx, lon_name: lon_idx})
            
            # Convert to numpy array
            values = data.values
            if values.ndim > 1:
                values = values.flatten()
            
            # Get time values
            time_values = None
            
            # Method 1: Try to use time coordinate from NetCDF file
            if time_name and time_name in ds.coords:
                try:
                    time_coord = ds[time_name]
                    if len(time_coord) == len(values):
                        try:
                            time_decoded = xr.decode_cf(ds[[time_name]])[time_name]
                            time_values = pd.to_datetime(time_decoded.values)
                        except:
                            if hasattr(time_coord, 'units') and 'days since' in time_coord.units.lower():
                                base_date_str = time_coord.units.split('since')[1].strip().split()[0]
                                base_date = pd.to_datetime(base_date_str)
                                time_values = base_date + pd.to_timedelta(time_coord.values, unit='D')
                except Exception as e:
                    pass
            
            # Method 2: Extract year from filename
            if time_values is None:
                year = None
                filename = os.path.basename(nc_file)
                all_years = re.findall(r'\d{4}', filename)
                for year_str in all_years:
                    year_candidate = int(year_str)
                    if 2000 <= year_candidate <= 2100:
                        year = year_candidate
                        break
                
                if year:
                    time_values = pd.date_range(start=f'{year}-01-01', periods=len(values), freq='D')
                else:
                    time_values = pd.date_range(start='2035-01-01', periods=len(values), freq='D')
            
            # Ensure correct number of dates
            if len(time_values) != len(values):
                if len(time_values) > len(values):
                    time_values = time_values[:len(values)]
            
            # Create DataFrame for this file
            if len(values) > 0:
                df_file = pd.DataFrame({
                    'date': time_values[:len(values)],
                    'value': values
                })
                all_data.append(df_file)
            
            ds.close()
            
        except Exception as e:
            tqdm.write(f"    Error processing {os.path.basename(nc_file)}: {e}")
            continue
    
    if len(all_data) == 0:
        print(f"  ERROR: No data extracted")
        return None
    
    # Combine all data
    print(f"  Combining data from {len(all_data)} files...")
    combined_df = pd.concat(all_data, ignore_index=True)
    
    # Sort by date
    combined_df = combined_df.sort_values('date').reset_index(drop=True)
    
    # Remove duplicate dates (keep first occurrence)
    combined_df = combined_df.drop_duplicates(subset='date', keep='first')
    
    elapsed_time = time.time() - start_time
    print(f"  ✓ Extracted {len(combined_df):,} daily records in {elapsed_time:.1f} seconds")
    print(f"  Date range: {combined_df['date'].min()} to {combined_df['date'].max()}")
    
    return combined_df

## Section 3: Vapor Pressure Calculation Function

In [13]:
def calculate_saturation_vapor_pressure(temperature):
    """
    Calculate saturation vapor pressure (kPa) at a given temperature.
    
    Parameters:
    -----------
    temperature : float or array
        Temperature in °C
    
    Returns:
    --------
    float or array
        Saturation vapor pressure in kPa
    """
    # SILO formula: e_s(T) = 0.611 × exp(17.27 × T / (T + 237.3))
    return 0.611 * np.exp(17.27 * temperature / (temperature + 237.3))


def calculate_vapor_pressure(hurs_df, tasmax_df, tasmin_df):
    """
    Calculate vapor pressure (hPa) from mean relative humidity and temperature using SILO method.
    
    Parameters:
    -----------
    hurs_df : pd.DataFrame
        DataFrame with date and value (mean relative humidity %) columns
    tasmax_df : pd.DataFrame
        DataFrame with date and value (maximum temperature °C) columns
    tasmin_df : pd.DataFrame
        DataFrame with date and value (minimum temperature °C) columns
    
    Returns:
    --------
    pd.DataFrame
        DataFrame with date and value (vapor pressure hPa) columns
    """
    # #region agent log
    import json
    try:
        with open(r'c:\Users\ibian\Desktop\ClimAdapt\Anameka\Grid\Anameka\.cursor\debug.log', 'a', encoding='utf-8') as f:
            f.write(json.dumps({"sessionId":"debug-session","runId":"run1","hypothesisId":"A","location":"calculate_vapor_pressure:entry","message":"Function entry","data":{"hurs_shape":list(hurs_df.shape) if hurs_df is not None else None,"tasmax_shape":list(tasmax_df.shape) if tasmax_df is not None else None,"tasmin_shape":list(tasmin_df.shape) if tasmin_df is not None else None},"timestamp":int(__import__('time').time()*1000)})+'\n')
    except: pass
    # #endregion
    
    # Merge temperature dataframes
    # #region agent log
    try:
        with open(r'c:\Users\ibian\Desktop\ClimAdapt\Anameka\Grid\Anameka\.cursor\debug.log', 'a', encoding='utf-8') as f:
            f.write(json.dumps({"sessionId":"debug-session","runId":"run1","hypothesisId":"B","location":"calculate_vapor_pressure:before_temp_merge","message":"Before temp merge","data":{"tasmax_dates_count":len(tasmax_df['date'].unique()) if 'date' in tasmax_df.columns else 0,"tasmin_dates_count":len(tasmin_df['date'].unique()) if 'date' in tasmin_df.columns else 0,"tasmax_has_date":'date' in tasmax_df.columns,"tasmin_has_date":'date' in tasmin_df.columns},"timestamp":int(__import__('time').time()*1000)})+'\n')
    except: pass
    # #endregion
    
    temp_df = tasmax_df.merge(tasmin_df, on='date', suffixes=('_max', '_min'))
    
    # #region agent log
    try:
        with open(r'c:\Users\ibian\Desktop\ClimAdapt\Anameka\Grid\Anameka\.cursor\debug.log', 'a', encoding='utf-8') as f:
            f.write(json.dumps({"sessionId":"debug-session","runId":"run1","hypothesisId":"B","location":"calculate_vapor_pressure:after_temp_merge","message":"After temp merge","data":{"temp_df_shape":list(temp_df.shape),"temp_df_cols":list(temp_df.columns)},"timestamp":int(__import__('time').time()*1000)})+'\n')
    except: pass
    # #endregion
    
    temp_df['tmean'] = (temp_df['value_max'] + temp_df['value_min']) / 2.0
    
    # #region agent log
    try:
        with open(r'c:\Users\ibian\Desktop\ClimAdapt\Anameka\Grid\Anameka\.cursor\debug.log', 'a', encoding='utf-8') as f:
            f.write(json.dumps({"sessionId":"debug-session","runId":"run1","hypothesisId":"C","location":"calculate_vapor_pressure:after_tmean","message":"After tmean calc","data":{"tmean_nan_count":int(temp_df['tmean'].isna().sum()),"tmean_min":float(temp_df['tmean'].min()) if not temp_df['tmean'].isna().all() else None,"tmean_max":float(temp_df['tmean'].max()) if not temp_df['tmean'].isna().all() else None},"timestamp":int(__import__('time').time()*1000)})+'\n')
    except: pass
    # #endregion
    
    # Merge with mean humidity
    # #region agent log
    try:
        with open(r'c:\Users\ibian\Desktop\ClimAdapt\Anameka\Grid\Anameka\.cursor\debug.log', 'a', encoding='utf-8') as f:
            f.write(json.dumps({"sessionId":"debug-session","runId":"run1","hypothesisId":"D","location":"calculate_vapor_pressure:before_hurs_merge","message":"Before hurs merge","data":{"hurs_dates_count":len(hurs_df['date'].unique()) if 'date' in hurs_df.columns else 0,"temp_dates_count":len(temp_df['date'].unique()),"hurs_has_value":'value' in hurs_df.columns},"timestamp":int(__import__('time').time()*1000)})+'\n')
    except: pass
    # #endregion
    
    merged = hurs_df.merge(temp_df[['date', 'tmean']], on='date')
    
    # #region agent log
    try:
        with open(r'c:\Users\ibian\Desktop\ClimAdapt\Anameka\Grid\Anameka\.cursor\debug.log', 'a', encoding='utf-8') as f:
            f.write(json.dumps({"sessionId":"debug-session","runId":"run1","hypothesisId":"D","location":"calculate_vapor_pressure:after_hurs_merge","message":"After hurs merge","data":{"merged_shape":list(merged.shape),"merged_cols":list(merged.columns),"merged_empty":merged.empty},"timestamp":int(__import__('time').time()*1000)})+'\n')
    except: pass
    # #endregion
    
    # Calculate saturation vapor pressure at mean temperature (in kPa)
    # SILO formula: e_s(T) = 0.611 × exp(17.27 × T / (T + 237.3))
    # #region agent log
    try:
        with open(r'c:\Users\ibian\Desktop\ClimAdapt\Anameka\Grid\Anameka\.cursor\debug.log', 'a', encoding='utf-8') as f:
            f.write(json.dumps({"sessionId":"debug-session","runId":"run1","hypothesisId":"E","location":"calculate_vapor_pressure:before_es_calc","message":"Before es calculation","data":{"tmean_has_nan":merged['tmean'].isna().any(),"tmean_dtype":str(merged['tmean'].dtype)},"timestamp":int(__import__('time').time()*1000)})+'\n')
    except: pass
    # #endregion
    
    merged['es_kpa'] = calculate_saturation_vapor_pressure(merged['tmean'])
    
    # #region agent log
    try:
        with open(r'c:\Users\ibian\Desktop\ClimAdapt\Anameka\Grid\Anameka\.cursor\debug.log', 'a', encoding='utf-8') as f:
            f.write(json.dumps({"sessionId":"debug-session","runId":"run1","hypothesisId":"E","location":"calculate_vapor_pressure:after_es_calc","message":"After es calculation","data":{"es_kpa_nan_count":int(merged['es_kpa'].isna().sum()),"es_kpa_min":float(merged['es_kpa'].min()) if not merged['es_kpa'].isna().all() else None},"timestamp":int(__import__('time').time()*1000)})+'\n')
    except: pass
    # #endregion
    
    # Calculate actual vapor pressure using mean relative humidity (in kPa)
    # e_a = (hurs/100) × e_s(T_mean)
    # #region agent log
    try:
        with open(r'c:\Users\ibian\Desktop\ClimAdapt\Anameka\Grid\Anameka\.cursor\debug.log', 'a', encoding='utf-8') as f:
            f.write(json.dumps({"sessionId":"debug-session","runId":"run1","hypothesisId":"F","location":"calculate_vapor_pressure:before_ea_calc","message":"Before ea calculation","data":{"hurs_value_nan_count":int(merged['value'].isna().sum()) if 'value' in merged.columns else -1,"hurs_value_dtype":str(merged['value'].dtype) if 'value' in merged.columns else "missing"},"timestamp":int(__import__('time').time()*1000)})+'\n')
    except: pass
    # #endregion
    
    merged['ea_kpa'] = (merged['value'] / 100.0) * merged['es_kpa']
    
    # #region agent log
    try:
        with open(r'c:\Users\ibian\Desktop\ClimAdapt\Anameka\Grid\Anameka\.cursor\debug.log', 'a', encoding='utf-8') as f:
            f.write(json.dumps({"sessionId":"debug-session","runId":"run1","hypothesisId":"F","location":"calculate_vapor_pressure:after_ea_calc","message":"After ea calculation","data":{"ea_kpa_nan_count":int(merged['ea_kpa'].isna().sum())},"timestamp":int(__import__('time').time()*1000)})+'\n')
    except: pass
    # #endregion
    
    # Convert to SILO VP units (hPa): VP(hPa) = 10 × e_a(kPa)
    merged['vp'] = 10.0 * merged['ea_kpa']
    
    # Return DataFrame with date and vp columns
    vp_df = merged[['date', 'vp']].copy()
    vp_df = vp_df.rename(columns={'vp': 'value'})
    
    # #region agent log
    try:
        with open(r'c:\Users\ibian\Desktop\ClimAdapt\Anameka\Grid\Anameka\.cursor\debug.log', 'a', encoding='utf-8') as f:
            f.write(json.dumps({"sessionId":"debug-session","runId":"run1","hypothesisId":"A","location":"calculate_vapor_pressure:exit","message":"Function exit","data":{"vp_df_shape":list(vp_df.shape),"vp_df_has_value":'value' in vp_df.columns},"timestamp":int(__import__('time').time()*1000)})+'\n')
    except: pass
    # #endregion
    
    return vp_df

## Section 4: Main Processing

In [14]:
# Construct data directory path
data_dir = os.path.join(CMIP6_BASE_DIR, f"{MODEL} {SCENARIO}")

if not os.path.exists(data_dir):
    raise ValueError(f"Data directory not found: {data_dir}")

print("="*70)
print(f"Processing Coordinate: ({LATITUDE:.6f}, {LONGITUDE:.6f})")
print(f"Model: {MODEL}, Scenario: {SCENARIO}")
print("="*70)
print(f"\nData directory: {data_dir}\n")

# Extract data for all required variables
extracted_data = {}

for variable in REQUIRED_VARIABLES:
    print(f"\n{'='*70}")
    print(f"Processing variable: {variable}")
    print(f"{'='*70}")
    
    # Check for cached data first
    cache_path = get_cached_variable_path(OUTPUT_DIR, model_scenario, lat_str, lon_str, variable)
    df = load_cached_variable(cache_path)
    
    if df is not None:
        # Use cached data
        extracted_data[variable] = df
        print(f"  [OK] Loaded from cache: {len(df):,} records for {variable}")
        print(f"  [INFO] Date range: {df['date'].min()} to {df['date'].max()}")
    else:
        # Extract from NetCDF files
        df = extract_daily_data_from_netcdf(
            data_dir, 
            variable, 
            LATITUDE, 
            LONGITUDE, 
            tolerance=COORD_TOLERANCE
        )
        
        if df is not None and len(df) > 0:
            extracted_data[variable] = df
            print(f"  [OK] Extracted {len(df):,} records for {variable}")
            # Save to cache for future runs
            save_cached_variable(df, cache_path)
        else:
            print(f"  [ERROR] Failed to extract data for {variable}")

# Check if all required variables are available
missing_vars = [v for v in REQUIRED_VARIABLES if v not in extracted_data]

if missing_vars:
    raise ValueError(f"Missing required variables: {missing_vars}")

print(f"\n{'='*70}")
print("Calculating Vapor Pressure...")
print(f"{'='*70}")

print(f"[DEBUG] About to call calculate_vapor_pressure")
print(f"[DEBUG] extracted_data keys: {list(extracted_data.keys())}")

# #region agent log
import json
try:
    with open(r'c:\Users\ibian\Desktop\ClimAdapt\Anameka\Grid\Anameka\.cursor\debug.log', 'a', encoding='utf-8') as f:
        f.write(json.dumps({"sessionId":"debug-session","runId":"run1","hypothesisId":"A","location":"cell11:before_calc","message":"Before calculate_vapor_pressure call","data":{"extracted_keys":list(extracted_data.keys())},"timestamp":int(__import__('time').time()*1000)})+'\n')
except Exception as log_err:
    print(f"[DEBUG] Log write failed: {log_err}")
# #endregion

# Calculate vapor pressure using SILO method
print(f"[DEBUG] Calling calculate_vapor_pressure...")
try:
    vp_df = calculate_vapor_pressure(
        extracted_data['hurs'],
        extracted_data['tasmax'],
        extracted_data['tasmin']
    )
    print(f"[DEBUG] calculate_vapor_pressure returned, shape: {vp_df.shape if vp_df is not None else None}")
    # #region agent log
    try:
        with open(r'c:\Users\ibian\Desktop\ClimAdapt\Anameka\Grid\Anameka\.cursor\debug.log', 'a', encoding='utf-8') as f:
            f.write(json.dumps({"sessionId":"debug-session","runId":"run1","hypothesisId":"A","location":"cell11:after_calc","message":"After calculate_vapor_pressure call","data":{"vp_df_created":vp_df is not None,"vp_df_shape":list(vp_df.shape) if vp_df is not None else None},"timestamp":int(__import__('time').time()*1000)})+'\n')
    except: pass
    # #endregion
except Exception as e:
    print(f"[DEBUG] EXCEPTION CAUGHT: {type(e).__name__}: {str(e)}")
    import traceback
    print(f"[DEBUG] Traceback:\n{traceback.format_exc()}")
    # #region agent log
    try:
        with open(r'c:\Users\ibian\Desktop\ClimAdapt\Anameka\Grid\Anameka\.cursor\debug.log', 'a', encoding='utf-8') as f:
            f.write(json.dumps({"sessionId":"debug-session","runId":"run1","hypothesisId":"A","location":"cell11:exception","message":"Exception in calculate_vapor_pressure","data":{"exception_type":type(e).__name__,"exception_msg":str(e),"traceback":traceback.format_exc()},"timestamp":int(__import__('time').time()*1000)})+'\n')
    except: pass
    # #endregion
    raise

print(f"  [OK] Calculated vapor pressure for {len(vp_df):,} days")
print(f"  Date range: {vp_df['date'].min()} to {vp_df['date'].max()}")
print(f"  VP range: {vp_df['value'].min():.2f} to {vp_df['value'].max():.2f} hPa")
print(f"  VP mean: {vp_df['value'].mean():.2f} hPa")

# Save to CSV (using auto-generated filename components from configuration)
# Verify lat_str and lon_str are in decimal format
print(f"\n  [INFO] Filename components: lat_str='{lat_str}', lon_str='{lon_str}'")
output_filename = f"{model_scenario}_{lat_str}_{lon_str}_vp.csv"
output_path = os.path.join(OUTPUT_DIR, output_filename)

vp_df.to_csv(output_path, index=False, encoding='utf-8', float_format='%.2f')
print(f"\n  [OK] Saved vapor pressure data to: {output_filename}")

print(f"\n{'='*70}")
print("[SUCCESS] VAPOR PRESSURE CALCULATION COMPLETED!")
print(f"{'='*70}")

Processing Coordinate: (-31.750000, 117.599998)
Model: ACCESS CM2, Scenario: SSP245

Data directory: C:\Users\ibian\Desktop\ClimAdapt\CMIP6\ACCESS CM2 SSP245


Processing variable: hurs
  [INFO] Checking cache: ACCESS_CM2_SSP245_-31.75_117.60_hurs.csv
  [INFO] Cache file found and valid
  [OK] Loaded from cache: 10,957 records for hurs
  [INFO] Date range: 2035-01-01 00:00:00 to 2064-12-30 00:00:00

Processing variable: tasmax
  [INFO] Checking cache: ACCESS_CM2_SSP245_-31.75_117.60_tasmax.csv
  [INFO] Cache file found and valid
  [OK] Loaded from cache: 10,957 records for tasmax
  [INFO] Date range: 2035-01-01 00:00:00 to 2064-12-30 00:00:00

Processing variable: tasmin
  [INFO] Checking cache: ACCESS_CM2_SSP245_-31.75_117.60_tasmin.csv
  [INFO] Cache file found and valid
  [OK] Loaded from cache: 10,957 records for tasmin
  [INFO] Date range: 2035-01-01 00:00:00 to 2064-12-30 00:00:00

Calculating Vapor Pressure...
[DEBUG] About to call calculate_vapor_pressure
[DEBUG] extracted_data