# Polytope Data Availability Check

## Overview
This notebook performs systematic availability checks for meteorological parameters from the ECMWF Extremes-DT dataset via the Polytope API. It tests data availability across different time periods and generates comprehensive availability reports.

## Purpose
- Check availability of weather parameters from Extremes-DT dataset
- Generate availability reports for data procurement planning
- Test different parameter configurations and time ranges
- Validate data access for the DE374 project

## Data Sources
- **Extremes-DT**: High-resolution weather forecasts (0.04Â° resolution)
- **Polytope API**: Destination Earth platform data access interface
- **Geographic Coverage**: Europe (70.5/-23.5/29.5/62.5)

---

In [None]:
"""
Polytope Data Availability Check

This notebook checks the availability of weather parameters from the 
Extremes-DT dataset via the Polytope API. It systematically tests
different meteorological parameters and generates availability reports.
"""

from c_directories import c_directories

# Initialize directory structure for Leonardo HPC environment
# Note: c_directories now requires both nwp and run_where parameters
dirs = c_directories("edt", "leonardo")

## Setup and Configuration

First, we initialize the directory structure for the Leonardo HPC environment and import necessary modules.

In [None]:
def get_req(date: str, param: str) -> dict:
    """
    Generate Polytope request dictionary for Extremes-DT data.
    
    Creates a request configuration for downloading specific meteorological
    parameters from the Extremes-DT dataset via Polytope API.
    
    Args:
        date: Date in YYYYMMDD format
        param: ECMWF parameter code as string (e.g., '167', '228')
        
    Returns:
        dict: Complete Polytope request dictionary
    """
    # Wind parameters at height levels require special configuration
    height_params = ["131", "132", "228246", "228247", "43"]
    is_height_param = param in height_params
    
    # Precipitation requires different step configuration
    step_config = "0-1" if param == "228" else "0"
    
    return {
        "class": "d1",                    # Destination Earth class 1
        "expver": "0001",                 # Experiment version
        "stream": "oper",                 # Operational stream
        "dataset": "extremes-dt",         # Extremes Digital Twin dataset
        "date": date,                     # Forecast base date
        "time": "0000",                   # 00 UTC base time
        "type": "fc",                     # Forecast type
        "levtype": "hl" if is_height_param else "sfc",  # Level type
        "levelist": "100" if is_height_param else "",    # Height level
        "step": step_config,              # Forecast step
        "param": param,                   # Parameter code
        "area": "70.5/-23.5/29.5/62.5",  # Europe bounding box
        "grid": "0.04/0.04",             # ~4km resolution
    }

## Request Configuration Functions

We define functions to create Polytope API requests with the appropriate parameters for different data types and configurations.

In [None]:
# Display path to availability check results
# This shows where the availability CSV file will be stored
print(f"Availability results will be saved to: {dirs.nwp_check / 'extremes-dt_availability.csv'}")

/leonardo_scratch/large/userexternal/lmonaco0/DE374_lot2/extremes-dt/check/extremes-dt_availability.csv


## File Path Verification

Let's check where our availability results will be stored to ensure the directory structure is correctly configured.

In [None]:
"""
Historical Data Availability Check

Systematically checks availability of meteorological parameters
from Extremes-DT dataset over a specified date range.
"""

import datetime
import pandas as pd
import earthkit.data as ek

# Define parameter codes and corresponding human-readable names
# Parameters: 167=2t, 165=2d, 166=tp, 168=10u, 10v, 131/132=100u/100v, 
# 228246/228247=u/v wind components
fields_param = ['167', '165', '166', '168', '228', "131", "132", "228246", "228247"]
fields_name = ['2t', '2d', 'tp', '10u', '10v', '100u', '100v', 'u', 'v']

# Initialize results dataframe
df = pd.DataFrame(columns=['date'] + fields_name)

# Define date range for availability check
start = datetime.datetime.strptime("20240404", "%Y%m%d")
end = datetime.datetime.strptime("20251122", "%Y%m%d")

# Iterate through each date in the range
d = start
while d <= end:
    ds = d.strftime("%Y%m%d")
    
    # Initialize row with all parameters set to None
    new_row = {
        'date': ds, 
        '2t': None, '2d': None, 'tp': None, '10u': None, '10v': None,
        '100u': None, '100v': None, 'u': None, 'v': None
    }
    
    # Test availability for each parameter
    for i, param in enumerate(fields_param):
        try:
            # Attempt to request data via Polytope
            data = ek.from_source(
                "polytope",
                "ecmwf-destination-earth",
                request=get_req(ds, param),
                address="polytope.lumi.apps.dte.destination-earth.eu",
                stream=False
            )
            # If successful, mark as available
            new_row[fields_name[i]] = "Yes"
            
        except Exception:
            # If request fails, mark as unavailable
            new_row[fields_name[i]] = "No"
    
    # Add row to dataframe and save incrementally
    df = pd.concat([df, pd.DataFrame([new_row])], ignore_index=True)
    df.to_csv(dirs.nwp_check / 'extremes-dt_availability.csv', index=False)
    
    # Move to next day
    d += datetime.timedelta(days=1)
    print(f"Processed date: {ds}")

2025-11-24 13:23:04 - INFO - Sending request...
{'request': 'area: 70.5/-23.5/29.5/62.5\n'
            'class: d1\n'
            'dataset: extremes-dt\n'
            "date: '20240404'\n"
            "expver: '0001'\n"
            'grid: 0.04/0.04\n'
            "levelist: ''\n"
            'levtype: hl\n'
            "param: '43'\n"
            "step: '0'\n"
            'stream: oper\n'
            "time: '0000'\n"
            'type: fc\n',
 'verb': 'retrieve'}
2025-11-24 13:23:06 - INFO - Request accepted. Please poll ./04508ac3-de4d-4d1f-8077-e8e6faa8f07e for status
2025-11-24 13:23:06 - INFO - Checking request status (04508ac3-de4d-4d1f-8077-e8e6faa8f07e)...
2025-11-24 13:23:06 - INFO - The current status of the request is 'queued'
2025-11-24 13:23:07 - INFO - The current status of the request is 'processing'
2025-11-24 13:23:09 - INFO - Sending request...
{'request': 'area: 70.5/-23.5/29.5/62.5\n'
            'class: d1\n'
            'dataset: extremes-dt\n'
            "date: '20

## Historical Data Availability Analysis

Now we'll perform a comprehensive check of data availability for key meteorological parameters over a specified historical period. This analysis helps us understand:

- Which parameters are consistently available
- Temporal gaps in the dataset  
- Parameter-specific availability patterns

**Parameters being tested:**
- **167**: 2-meter temperature (2t)
- **165**: 2-meter dewpoint temperature (2d) 
- **166**: Total precipitation (tp)
- **168**: 10-meter U/V wind components (10u, 10v)
- **131/132**: 100-meter U/V wind components (100u, 100v)
- **228246/228247**: Additional wind components (u, v)

In [None]:
def get_req_new(date: str, param: int) -> dict:
    """
    Generate simplified Polytope request for surface parameters.
    
    Creates a basic surface-level request configuration for testing
    additional meteorological parameters from Extremes-DT.
    
    Args:
        date: Date in YYYYMMDD format
        param: ECMWF parameter code as integer
        
    Returns:
        dict: Polytope request dictionary for surface data
    """
    return {
        "class": "d1",                    # Destination Earth class 1
        "expver": "0001",                 # Experiment version
        "stream": "oper",                 # Operational stream
        "dataset": "extremes-dt",         # Extremes Digital Twin dataset
        "date": date,                     # Forecast base date
        "time": "0000",                   # 00 UTC base time
        "type": "fc",                     # Forecast type
        "levtype": "sfc",                 # Surface level only
        "step": "0",                      # Analysis step
        "param": param,                   # Parameter code
        "area": "70.5/-23.5/29.5/62.5",  # Europe bounding box
        "grid": "0.04/0.04",             # ~4km resolution
    }

## Extended Parameter Testing

For additional validation, we define a simplified request function to test extended surface parameters including geophysical and soil variables.

In [None]:
"""
Extended Parameter Availability Test

Tests availability of additional surface parameters including
geophysical and soil moisture variables.
"""

import datetime
import pandas as pd
import earthkit.data as ek

# Define additional surface parameters to test
# 129=z(geopotential), 28=t(temperature), 172=lsm(land-sea mask),
# 43=slhf(surface latent heat flux), 39-42=soil moisture layers,
# e=evaporation
fields_param = [129, 28, 172, 43, 39, 40, 41, 42]
fields_name = ['z', 't', 'lsm', 'slhf', 'sshf', 'ewss', 'nsss', 'e']

# Initialize results dataframe
df = pd.DataFrame(columns=['date'] + fields_name)

# Test single date for parameter availability
start = datetime.datetime.strptime("20251210", "%Y%m%d")
end = datetime.datetime.strptime("20251210", "%Y%m%d")

d = start
while d <= end:
    ds = d.strftime("%Y%m%d")
    
    # Initialize row for current date
    new_row = {
        'date': ds, 
        'z': None, 't': None, 'lsm': None, 'slhf': None,
        'sshf': None, 'ewss': None, 'nsss': None, 'e': None
    }
    
    # Test only first parameter for now (limited by [:1])
    for i, param in enumerate(fields_param[:1]):
        try:
            # Request data via Polytope
            data = ek.from_source(
                "polytope",
                "ecmwf-destination-earth",
                request=get_req_new(ds, param),
                address="polytope.lumi.apps.dte.destination-earth.eu",
                stream=False
            )
            
            # Save sample data to file for verification
            data.to_target("file", dirs.nwp_check / f"extremes-dt_{ds}_{param}.grib")
            new_row[fields_name[i]] = "Yes"
            
        except Exception:
            print(f"Exception for date {ds} and param {param}")
            new_row[fields_name[i]] = "No"
    
    # Display result for current parameter
    print(f"Parameter result: {new_row[fields_name[0]]}")
    
    # Commented out for testing - uncomment to save results
    # df = pd.concat([df, pd.DataFrame([new_row])], ignore_index=True)
    # df.to_csv(dirs.nwp_check / 'extremes-dt_availability_new.csv', index=False)
    
    # Move to next day
    d += datetime.timedelta(days=1)
    print(f"Processed date: {ds}")

2025-12-10 13:31:00 - INFO - Sending request...
{'request': 'area: 70.5/-23.5/29.5/62.5\n'
            'class: d1\n'
            'dataset: extremes-dt\n'
            "date: '20251210'\n"
            "expver: '0001'\n"
            'grid: 0.04/0.04\n'
            'levtype: sfc\n'
            'param: 129\n'
            "step: '0'\n"
            'stream: oper\n'
            "time: '0000'\n"
            'type: fc\n',
 'verb': 'retrieve'}


2025-12-10 13:31:00 - INFO - Request accepted. Please poll ./49e5ab05-e972-4447-ac1b-3cf06b23455a for status
2025-12-10 13:31:00 - INFO - Checking request status (49e5ab05-e972-4447-ac1b-3cf06b23455a)...
2025-12-10 13:31:01 - INFO - The current status of the request is 'queued'
2025-12-10 13:31:01 - INFO - The current status of the request is 'processing'


eccezione for date 20251210 and param 129
No
Processed date: 20251210
