# NetCDF File Processing for Sea Surface Currents (SSC) Data
This notebook contains the process for automatically merging NetCDF files corresponding to specified start and end dates. The merged file will contain concatenated data along the time dimension for the sea surface currents data.

### Importing Libraries

In [6]:
import xarray as xr
import pandas as pd
from datetime import datetime, timedelta
from pathlib import Path

## Specify Start and End Dates
Set the `start_date` and `end_date` variables to the desired range of dates for which you want to merge NetCDF files.

In [7]:
# Define the start and end date of the simulation or observation period
# Format: 'YYYY-MM-DD' (Year-Month-Day)
start_date = '2023-08-08' 
end_date = '2023-08-08'   

# Define the output file path
output_file_path = "Data/1_day_SSC_Data.nc"

## Generate File Paths
The function `generate_file_paths` creates a list of file paths for the NetCDF files that fall within the specified date range.

In [8]:
def generate_file_paths(start_date, end_date, base_directory):
    # Convert start and end dates from string to datetime objects
    start_dt = datetime.strptime(start_date, '%Y-%m-%d')
    end_dt = datetime.strptime(end_date, '%Y-%m-%d')

    # Extend the end date by one day to ensure the range includes all hours of the last day
    end_dt += timedelta(days=1)

    # Create a Pandas date range of hourly intervals from start to the end of the last day
    # The last hour is excluded as it technically belongs to the following day
    dates = pd.date_range(start_dt, end_dt, freq='h')[:-1]  # Hourly frequency, excluding the first hour of the next day
    
    # Initialize a list to store the generated file paths
    file_paths = []
    # Loop through each date in the range to construct file paths
    for date in dates:
        # Format the file name based on the date and time
        file_name = f"CODAR_MALT_{date.strftime('%Y_%m_%d_%H%M')}-{int(date.timestamp())}.nc"
        # Construct the full file path using the base directory and date components
        file_path = Path(base_directory) / f"SSC_MaltaSicily_{date.year}" / str(date.year) / date.strftime('%m') / date.strftime('%d') / file_name
        # Check if the file path exists before adding it to the list
        if file_path.exists():
            file_paths.append(str(file_path))
    
    return file_paths

# Base path to NetCDF files
base_directory = "Data/sea_surface_currents"

# Generate the file paths for the specified date range
file_paths = generate_file_paths(start_date, end_date, base_directory)

## Merge NetCDF Files
The following code merges all NetCDF files from the generated file paths into a single file.

In [9]:
def merge_netcdf_files(file_paths, output_file_path):
    # Load all NetCDF files into a list of xarray datasets
    datasets = [xr.open_dataset(fp) for fp in file_paths]
    
    # Merge all datasets into one along the time dimension
    combined_dataset = xr.concat(datasets, dim='time')
    
    # Save the combined dataset to a new NetCDF file
    combined_dataset.to_netcdf(output_file_path)
    
    # Close all datasets to free up resources
    for ds in datasets:
        ds.close()
    
    print(f"All NetCDF files have been merged into {output_file_path}")
    print("="*175)

# Merge the NetCDF files
merge_netcdf_files(file_paths, output_file_path)

All NetCDF files have been merged into Data/1_day_SSC_Data.nc


### Verify Merged Dataset

This code opens the merged NetCDF file and verifies its contents, ensuring that the dimensions, coordinates, and variables are as expected.

In [10]:
# Open the merged dataset
ds = xr.open_dataset(output_file_path)

# Print dataset information
print("=" * 125)
print("Sea Surface Currents Dataset Information")
print("=" * 125)
print("\nDataset Dimensions:")
print(ds.dims)
print("\nDataset Coordinates:")
print(ds.coords)
print("\nData Variables in the Dataset:")
print(ds.data_vars)
print("\nAttributes (Metadata) in the Dataset:")
print(ds.attrs)

# Verify the time dimension is as expected
time_points = ds.sizes['time']
print("Time Points:", time_points)

# Ensure the lat/lon are within the specified bounds
lat_min, lat_max = ds['lat'].min().values, ds['lat'].max().values
lon_min, lon_max = ds['lon'].min().values, ds['lon'].max().values
print("\nLatitude Range in the Dataset:", lat_min, "to", lat_max)
print("Longitude Range in the Dataset:", lon_min, "to", lon_max)

# Assert the presence of expected variables
assert 'u' in ds.variables, "u variable is missing from the dataset"
assert 'v' in ds.variables, "v variable is missing from the dataset"

print("=" * 125)

# Close the dataset after inspection
ds.close()

Sea Surface Currents Dataset Information

Dataset Dimensions:

Dataset Coordinates:
Coordinates:
  * time     (time) datetime64[ns] 2023-08-08 ... 2023-08-08T23:00:00
  * lat      (lat) float32 35.74 35.77 35.79 35.81 ... 36.81 36.84 36.86 36.88
  * lon      (lon) float32 13.68 13.72 13.76 13.8 ... 15.26 15.3 15.34 15.38

Data Variables in the Dataset:
Data variables:
    u        (time, lat, lon) float64 ...
    v        (time, lat, lon) float64 ...
    stdu     (time, lat, lon) float64 ...
    stdv     (time, lat, lon) float64 ...
    cov      (time, lat, lon) float64 ...
    velo     (time, lat, lon) float64 ...
    head     (time, lat, lon) float64 ...

Attributes (Metadata) in the Dataset:
{'NC_GLOBAL.Title': 'Near-Real time Surface Ocean Velocity', 'NC_GLOBAL.origin': 'BARK (measured);POZZ (measured);MRAG (measured);LICA (measured);SOPU (measured);', 'NC_GLOBAL.source': 'HF Radar Derived Surface Currents obtained from CODAR combine method', 'NC_GLOBAL.history': '08-Aug-2023 00:50