# NetCDF File Processing

This notebook details the process for preprocessing and interpolating a NetCDF file across specified start and end dates. The resulting file will include concatenated data along the time dimension, interpolated to specified geographical points and hourly intervals.

### Importing Libraries

In [1]:
# Import necessary libraries
import numpy as np
import xarray as xr
import pandas as pd

### Data Selection

Selecting the relevant date range and geographical points for the data processing. The goal is to interpolate the dataset to these specific points and times.

In [2]:
# Define the start and end dates
start_date = '2023-01-01'
end_date = '2023-01-03'

# Open the file
ds = xr.open_dataset('Data/Wind/data.nc')

# Define the time range for interpolation
end_date_extended = pd.Timestamp(end_date) + pd.Timedelta(days=1) - pd.Timedelta(seconds=1)
times_hourly = pd.date_range(start=start_date, end=end_date_extended, freq='h')

# Define the specific longitude and latitude points for interpolation (to match the sea surface current data)
longitude_points = np.array([13.6768, 13.7174, 13.7579, 13.7985, 13.839, 13.8796, 13.9202, 13.9607, 14.0013,
                             14.0419, 14.0824, 14.123, 14.1635, 14.2041, 14.2447, 14.2852, 14.3258, 14.3664,
                             14.4069, 14.4475, 14.488, 14.5286, 14.5692, 14.6097, 14.6503, 14.6908, 14.7314,
                             14.772, 14.8125, 14.8531, 14.8937, 14.9342, 14.9748, 15.0153, 15.0559, 15.0965,
                             15.137, 15.1776, 15.2182, 15.2587, 15.2993, 15.3398, 15.3804])
latitude_points = np.array([35.7447, 35.767, 35.7892, 35.8115, 35.8338, 35.856, 35.8783, 35.9006, 35.9228,
                            35.9451, 35.9673, 35.9896, 36.0119, 36.0341, 36.0564, 36.0787, 36.1009, 36.1232,
                            36.1455, 36.1677, 36.19, 36.2123, 36.2345, 36.2568, 36.2791, 36.3013, 36.3236,
                            36.3458, 36.3681, 36.3904, 36.4126, 36.4349, 36.4572, 36.4794, 36.5017, 36.524,
                            36.5462, 36.5685, 36.5908, 36.613, 36.6353, 36.6576, 36.6798, 36.7021, 36.7243,
                            36.7466, 36.7689, 36.7911, 36.8134, 36.8357, 36.8579, 36.8802])

### Interpolation

Linearly interpolate the dataset to the defined hourly time points and geographical coordinates. This step ensures the data matches the desired spatial and temporal resolution.

In [3]:
# Perform interpolation
ds_interpolated = ds.interp(
    time=times_hourly,
    longitude=longitude_points,
    latitude=latitude_points,
    method='linear'
)

# Specify the output file path
output_file_path = "Data/Processed_Wind_Data.nc"

# Save the interpolated dataset to a new file
ds_interpolated.to_netcdf(output_file_path)

### Verify Merged Dataset

This code opens the merged NetCDF file and verifies its contents, ensuring that the dimensions, coordinates, and variables are as expected.

In [4]:
# Open the processed wind dataset
ds = xr.open_dataset(output_file_path)

# Print dataset information
print("=" * 125)
print("Processed Wind Data Information")
print("=" * 125)
print("\nDataset Dimensions:")
print(ds.dims)
print("\nDataset Coordinates:")
print(ds.coords)
print("\nData Variables in the Dataset:")
print(ds.data_vars)
print("\nAttributes (Metadata) in the Dataset:")
print(ds.attrs)

# Verify the time dimension is as expected
time_points = ds.sizes['time']
print("Time Points:", time_points)

# Ensure the lat/lon are within the specified bounds
lat_min, lat_max = ds['latitude'].min().item(), ds['latitude'].max().item()
lon_min, lon_max = ds['longitude'].min().item(), ds['longitude'].max().item()
print("\nLatitude Range in the Dataset:", lat_min, "to", lat_max)
print("Longitude Range in the Dataset:", lon_min, "to", lon_max)

# Assert the presence of expected variables
assert 'u10' in ds.variables, "u10 variable is missing from the dataset"
assert 'v10' in ds.variables, "v10 variable is missing from the dataset"

print("=" * 125)

# Close the dataset after inspection
ds.close()

Processed Wind Data Information

Dataset Dimensions:

Dataset Coordinates:
Coordinates:
  * time       (time) datetime64[ns] 2023-01-01 ... 2023-01-03T23:00:00
  * longitude  (longitude) float64 13.68 13.72 13.76 13.8 ... 15.3 15.34 15.38
  * latitude   (latitude) float64 35.74 35.77 35.79 35.81 ... 36.84 36.86 36.88

Data Variables in the Dataset:
Data variables:
    v10      (time, latitude, longitude) float64 ...
    u10      (time, latitude, longitude) float64 ...

Attributes (Metadata) in the Dataset:
{'Conventions': 'CF-1.6', 'history': '2023-12-17 08:43:38 GMT by grib_to_netcdf-2.25.1: /opt/ecmwf/mars-client/bin/grib_to_netcdf.bin -S param -o /cache/tmp/77d2d03f-e095-4d90-b83e-999c43b3595c-adaptor.mars_constrained.external-1702802610.7212312-29905-15-tmp.nc /cache/tmp/77d2d03f-e095-4d90-b83e-999c43b3595c-adaptor.mars_constrained.external-1702802608.214588-29905-14-tmp.grib'}
Time Points: 72

Latitude Range in the Dataset: 35.7447 to 36.8802
Longitude Range in the Dataset: 13.676