<a name="top"></a>
<div style="width:1000 px">

<div style="float:right; width:98 px; height:98px;">
<img src="https://cdn.miami.edu/_assets-common/images/system/um-logo-gray-bg.png" alt="Miami Logo" style="height: 98px;">
</div>

<div style="float:right; width:98 px; height:98px;">
<img src="https://media.licdn.com/dms/image/C4E0BAQFlOZSAJABP4w/company-logo_200_200/0/1548285168598?e=2147483647&v=beta&t=g4jl8rEhB7HLJuNZhU6OkJWHW4cul_y9Kj_aoD7p0_Y" alt="STI Logo" style="height: 98px;">
</div>


<h1>Calculate Surface-Based Hot Dry Windy for Each Model and Timestep</h1>
By: Kayla Besong, PhD
    <br>
Last Edited: 11/29/23
<br>
<br>    
<br>
Takes models/variables downloaded and calculates surface based hot-dry-windy. The hot-dry-windy calculation uses vapor pressure deficit and multiplies by windspeed, hence leveraging previously calculated variables. The function that computes the 24HR AVG, MIN, MAX outputs is in File_concat_mod_functions.ipynb. 
<br>
<br>
NOTE: The operational and 'true' hot-dry-windy index (HDWI) is not computed at the surface, rather it involves analyzing vpd and windspeed in the lowest 500m of the atmosphere and is  more computationally intensive. The 'true' HDWI also takes the max value of the day. Here, by just multiplying vpd by windspeed at the surface, the resulting product is a 'surface-based-hot-dry-windy'. The difference between surface based HDW and the HDWI can be stark depending on the region you are analyzing. Please see: (Kramer et al., 2024; Watts et al., 2020).
<br>
<div style="clear:both"></div>
</div>

<hr style="height:2px;">

## Import needed libraries, etc.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
import pandas as pd
from dask.distributed import Client, LocalCluster
import dask.array as da
import os
import glob
from metpy.units import units
import math

In [None]:
import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')
pd.options.mode.chained_assignment = None

## Establish a dask client. This is a lot of data.

In [None]:
Cluster = LocalCluster(n_workers = 8, threads_per_worker=4, memory_limit='30GB',  processes=True)
#Cluster = LocalCluster()

In [None]:
client = Client(Cluster)
client

### The integral notebook of functions to run

In [None]:
%run File_concat_mod_functions.ipynb

## The HDWI function, variables, models, etc. 

In [None]:
def hdwi(model, main_dir):

    '''

    This function generates the surface-based hot-dry-windy by multiplying preexisting vpd and windspeed. Path naming convention may need altered. 

    Inputs:
    
    model: (str) model name, used as file path 
    main_dir: (str) the directory that contains the model data 

    Outputs:
    
    Nothing, files are saved to pointed directory.  

    '''   
                               
    model_list = []                                                                                                     # Initialize an empty list to store file lists for each variable
    parent_dir = f'{main_dir}/{model}'                                                                                  # Define the parent directory path for the model
    variable_options = ['vpd', 'wspeed']                                                                                # List of variable options to process
                               
    for v in variable_options:                                                                                          # Iterate over each variable option
        model_list.append(sorted(glob.glob(os.path.join(parent_dir, f'{v}_{get_filename(model)}_Abs_*.nc'))))           # Append sorted list of file paths for each variable
                                       
    if len(model_list[0]) != len(model_list[1]):                                                                        # Check if the number of files for each variable is the same
        print('the number of years for each variable are not the same')                                                 # Print a message if the number of files is not the same
                               
    else:                                                                                                               # If the number of files is the same
        for v, ws in zip(model_list[0], model_list[1]):                                                                 # Iterate over pairs of files for each variable
            if int(v[-7:-3]) != int(ws[-7:-3]):                                                                         # Check if the years in the file names are aligned
                print('the years for each variable are not aligned')                                                    # Print a message if the years are not aligned
            else:                                                                                                       # If the years are aligned
                print(v, ws)                                                                                            # Print the file names
                           
                if model == 'NAM':                                                                                      # Special handling for NAM model
                    vt = xr.open_dataset(v)                                                                             # Open the dataset for vpd
                    wt = xr.open_dataset(ws)                                                                            # Open the dataset for wspeed
                           
                    v_times = vt.time.values                                                                            # Get the time values from the vpd dataset
                    matching_indices_1 = [i for i, t in enumerate(wt.time.values) if t in v_times]                      # Find matching time indices in wspeed dataset
                           
                    wt = wt.isel(time=matching_indices_1)                                                               # Select matching time indices in wspeed dataset
                           
                else:                                                                                                   # For other models
                    vt = xr.open_dataset(v).chunk(get_chunk_database(model))                                            # Open and chunk the dataset for vpd
                    wt = xr.open_dataset(ws).chunk(get_chunk_database(model))                                           # Open and chunk the dataset for wspeed
                                               
                hdwi = (vt['vpd'] * wt['wspeed']).to_dataset(name='hdwi')                                               # Calculate HDWI and create a new dataset
                                           
                resampler_regular_vars('hdwi', hdwi, main_dir, model)                                                   # Resample and save the HDWI dataset

In [None]:
model_options = ['CONUS404', 'ERA5', 'HRRR', 'NAM', 'NARR', 'NCEP', 'UFS_S2S']


In [None]:
main_dir = 'database_files'

### breaking it down into easy to process, not easy to process so I can restart the kernel in between 

In [None]:
model_list1 = ['NARR', 'NCEP']
## era5 already done in development of code

In [None]:
%%time

for m in model_list1:
    hdwi(m, main_dir)

In [None]:
%%time

hdwi('HRRR', main_dir)

In [None]:
%%time

hdwi('CONUS404', main_dir)

In [None]:
%%time

hdwi('NAM', main_dir)