<a name="top"></a>
<div style="width:1000 px">

<div style="float:right; width:98 px; height:98px;">
<img src="https://cdn.miami.edu/_assets-common/images/system/um-logo-gray-bg.png" alt="Miami Logo" style="height: 98px;">
</div>

<div style="float:right; width:98 px; height:98px;">
<img src="https://media.licdn.com/dms/image/C4E0BAQFlOZSAJABP4w/company-logo_200_200/0/1548285168598?e=2147483647&v=beta&t=g4jl8rEhB7HLJuNZhU6OkJWHW4cul_y9Kj_aoD7p0_Y" alt="STI Logo" style="height: 98px;">
</div>


<h1>Point Selection for sFWRD</h1>
By: Kayla Besong, PhD
    <br>
Last Edited: 05/21/24
<br>
<br>    
<br>
Select the closest latitude, longitude point from a sFWRD file. The GACC stations are used here as an example for extraction. <div style="clear:both"></div>
</div>

<hr style="height:2px;">

## Import needed libraries, etc.

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
import pandas as pd
from dask.distributed import Client, LocalCluster
import dask.array as da
import os
import glob


In [2]:
import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')
pd.options.mode.chained_assignment = None

## The integral notebook of functions to run + some others

In [4]:
%run ../../../../Universal_Functions/File_concat_mod_functions.ipynb

In [5]:
def euclid(input_lat, input_lon, new_lat, new_lon):

    ''' Compute the euclidean distance!

    Inputs:

    input_lat: (float) the observational latitude
    input_lon: (float) the observational longitude
    new_lat: (float) the found model latitude
    new_lon: (float) the found model longitude
    
    Outputs:

    The euclidean distance: (float)

    '''
    
    return np.sqrt((new_lat - input_lat)**2 + (new_lon - input_lon)**2)

In [5]:
def latlon_sel(df, obs_lat, obs_lon):

    ''' Function that takes the input sFWRD dataset and selects the closest latitude/longitude to an observational point.
    Two methods are used to find the closest point, one is the average of 4 points forming a box around the observational point,
    the other is a single point. The smallest distance between the two methods to the observational point using euclidean distance is returned.
    This seems redundant but for models with high resolution, such as NCEP, the closest point using just euclidean distance of one point
    can be up to 2.5 degrees away from an observed location. 

    Inputs:

    df: (xarray dataset) the input dataset 
    obs_lat: (float) the observational latitude
    obs_lon: (float) the observational longitude
    
    Outputs:

    df_avg or df_euc: (xarray dataset) the returned xarray dataset either the box averaged or euclidean depending on which is closest to the point

    '''    
    
    northern_most = obs_lat + 1                                                                                          # Define the northernmost latitude boundary
    southern_most = obs_lat - 1                                                                                          # Define the southernmost latitude boundary

    eastern_most = obs_lon + 1                                                                                           # Define the easternmost longitude boundary
    western_most = obs_lon - 1                                                                                           # Define the westernmost longitude boundary

    df_avg = df.sel(latitude=slice(southern_most, northern_most), longitude=slice(western_most, eastern_most))           # Select a subset of the dataframe within the defined latitude and longitude boundaries

    distances = xr.apply_ufunc(euclid, obs_lat, obs_lon, df['latitude'], df['longitude'])                                # Compute the Euclidean distances between the observation point and all points in the dataframe
    y_sel_euc, x_sel_euc = np.unravel_index(distances.argmin(), distances.shape)                                         # Find the indices of the minimum distance
    df_euc = df.isel(latitude=y_sel_euc, longitude=x_sel_euc)                                                            # Select the point in the dataframe with the minimum distance

    dis_avg = euclid(obs_lat, obs_lon, float(df_avg.latitude.mean().values), float(df_avg.longitude.mean().values))      # Compute the Euclidean distance between the observation point and the average point in the subset
    dis_euc = euclid(obs_lat, obs_lon, float(df_euc.latitude.values), float(df_euc.longitude.values))                    # Compute the Euclidean distance between the observation point and the closest point

    if dis_avg < dis_euc:                                                                                                # Compare the distances

        return df_avg.mean('latitude').mean('longitude')                                                                 # Return the mean value of the subset if its distance is smaller

    else:

        return df_euc                                                                                                    # Return the closest point if its distance is smaller



## Get observation files by variable and station

In [6]:
sites = pd.read_csv('obs/SiteList_LatLon.csv', sep = ',')

In [7]:
site_list = sites['Site'].values
site_list

array(['KPDX', 'KRDD', 'PHKO', 'KRAL', 'KSLC', 'PAFA', 'KABQ', 'KMSO',
       'KDEN', 'KATL', 'KMKE'], dtype=object)

In [8]:
sites

Unnamed: 0,Site,Latitude,Longitude,Elevation-Ft,Elevation-Meter
0,KPDX,45.59578,-122.60917,20,6.1
1,KRDD,40.51461,-122.2977,499,152.1
2,PHKO,19.74083,-156.05056,43,13.1
3,KRAL,33.95189,-117.44511,819,249.6
4,KSLC,40.77069,-111.96503,4226,1288.0
5,PAFA,64.80389,-147.87611,433,132.0
6,KABQ,35.04167,-106.61472,5351,1630.9
7,KMSO,46.92083,-114.0925,3199,975.0
8,KDEN,39.84658,-104.65622,5404,1647.1
9,KATL,33.64028,-84.42694,1027,313.0


In [9]:
# vars = ['temp',
#          'rh',
#          'windspeed',
#          'winddir',
#          'windgust',
#          'precip_1hr',
#          'precip_24hr', 
#          'vpd', 
#          'hdwi']

In [10]:
obs_to_mod_var = {'temp': 't2',
                  'rh': 'rh',
                  'windspeed': 'ws',
                  'winddir': 'wd',
                  'windgust': 'gust',
                  'precip_1hr': 'tp',
                  'vpd': 'vpd',
                  'hdwi': 'hdw'  
}

In [11]:
models =  ['ERA5', 'NCEP', 'CONUS404', 'NAM', 'NARR', 'HRRR']

In [None]:
%%time                                                                                                                                
                          
t = 'Abs'                                                                                                                             # Set the type of data to 'Abs'
                            
for sitename in site_list:                                                                                                            # Loop through each site name in site_list
                            
    dir_maker(f'obs/model_by_site/{sitename}')                                                                                        # Create a directory for the current site
                            
    print(f'{sitename} IS STARTING')                                                                                                  # Print a message indicating the start of processing for the current site
                                 
    for obs_v, mod_v in obs_to_mod_var.items():                                                                                       # Loop through each observation variable and its corresponding model variable
        
        obs = xr.open_mfdataset(glob.glob(os.path.join('obs/database_like_files', f'{obs_v}*{sitename}*{t}*.nc'))[0])                 # Open the observation dataset for the current variable and site
        
        obs_lat = float(obs.latitude.values)                                                                                          # Get the latitude of the observation site
        obs_lon = float(obs.longitude.values)                                                                                         # Get the longitude of the observation site
                                            
        for m in models:                                                                                                              # Loop through each model
                            
            dir_maker(f'obs/model_by_site/{sitename}/{m}')                                                                            # Create a directory for the current model
                            
            print(f'{m}: {mod_v}')                                                                                                    # Print a message indicating the current model and variable

            try:
            
                mod = xr.open_mfdataset(f'database_files_final/{m}/{mod_v}_{get_filename(m)}_{t}*.nc', chunks=get_chunk_sFWD(m))       # Open the model dataset for the current variable and model
                mod_sel = latlon_sel(mod, obs_lat, obs_lon)                                                                            # Select the data for the observation site's latitude and longitude
                                 
                mod_sel.to_netcdf(f'obs/model_by_site/{sitename}/{m}/{mod_v}_{m}_{sitename}.nc')                                       # Save the selected model data to a NetCDF file
                print(f'obs/model_by_site/{sitename}/{m}/{mod_v}_{m}_{sitename}.nc')                                                   # Print the path of the saved NetCDF file
                                             
                del mod_sel                                                
                                         
            except:                                                                                                                    # Handle exceptions
                             
                print(f'{m}: {mod_v} is not available')                                                                                # Print a message indicating the model variable is not available
                             
                                                                                                                                       # Delete the selected model data to free up memory

KPDX IS STARTING
ERA5: t2
obs/model_by_site/KPDX/ERA5/t2_ERA5_KPDX.nc
NCEP: t2
obs/model_by_site/KPDX/NCEP/t2_NCEP_KPDX.nc
CONUS404: t2
obs/model_by_site/KPDX/CONUS404/t2_CONUS404_KPDX.nc
NAM: t2
obs/model_by_site/KPDX/NAM/t2_NAM_KPDX.nc
NARR: t2
obs/model_by_site/KPDX/NARR/t2_NARR_KPDX.nc
HRRR: t2
obs/model_by_site/KPDX/HRRR/t2_HRRR_KPDX.nc
ERA5: rh
obs/model_by_site/KPDX/ERA5/rh_ERA5_KPDX.nc
NCEP: rh
obs/model_by_site/KPDX/NCEP/rh_NCEP_KPDX.nc
CONUS404: rh
obs/model_by_site/KPDX/CONUS404/rh_CONUS404_KPDX.nc
NAM: rh
obs/model_by_site/KPDX/NAM/rh_NAM_KPDX.nc
NARR: rh
obs/model_by_site/KPDX/NARR/rh_NARR_KPDX.nc
HRRR: rh
obs/model_by_site/KPDX/HRRR/rh_HRRR_KPDX.nc
ERA5: ws
obs/model_by_site/KPDX/ERA5/ws_ERA5_KPDX.nc
NCEP: ws
obs/model_by_site/KPDX/NCEP/ws_NCEP_KPDX.nc
CONUS404: ws
obs/model_by_site/KPDX/CONUS404/ws_CONUS404_KPDX.nc
NAM: ws
obs/model_by_site/KPDX/NAM/ws_NAM_KPDX.nc
NARR: ws
obs/model_by_site/KPDX/NARR/ws_NARR_KPDX.nc
HRRR: ws
obs/model_by_site/KPDX/HRRR/ws_HRRR_KPDX.nc
ER