# **Function Notebook for Insights on the Spatiotemporal Variability of Downslope Winds in Coastal Santa Barbara: A Case Study From the Sundowner Winds Experiment (SWEX)**
## This notebook contains the following:
> - #### Functions that are used in other Jupyter Notebooks for all analysis and figures in the article Insights on the Spatiotemporal Variability of Downslope Winds in Coastal Santa Barbara: A Case Study From the Sundowner Winds Experiment (SWEX).

## **Function: "cartopy_basemap_subplots"**

- #### Description: A function to draw one or multiple cartopy plots with lat/lon labels and a coastline, as well as numerous other options. This function can be adapted for many areas, but has the greatest amount of options for the Santa Barbara, CA domain. Here is a [link](https://github.com/dlnash/AR_types/blob/master/modules/plotter.py) to a similar function created by [Deanna Nash](https://dlnash.github.io/). Lower resolution (10m) Ocean mask, coastline, and island features are courtesy of [NaturalEarth.com](https://www.naturalearthdata.com/). Note also that all of the [Natural Earth data](https://www.naturalearthdata.com/features/) uses the WGS84 datum, which is the deafult globe used when intializing a Cartopy projection. Thus, we do not need to define a seperate [globe](https://scitools.org.uk/cartopy/docs/latest/reference/generated/cartopy.crs.Globe.html) instance. Higher resoloution Ocean mask and coastline features are courtesy of [FOSSGIS](https://osmdata.openstreetmap.de/). They are processed Open Street Maps shape files. All files reference to the WGS84 datum, which is the default globe for cartopy.

> - #### **Input Variables**: 
>> - #### **plot_crs**: a cartopy coordinate reference system (CRS) that describes the CRS we want the map to be plotted in
>> - #### **data_crs**: a cartopy coordinate reference system (CRS) that describes that CRS that our data comes in
>> - #### **fig_size**: a tuple that defines the figure size. First value is length and second value is height of figure (in inches).
>> - #### **nrows**: An integer which defines how many subplot rows the user wants.
>> - #### **ncols**: An integer which defines how many subplot cols the user wants.
>> - #### **wspace_float**: A number which defines the width spacing between sublpots.
>> - #### **hspace_float**: A number which defines the height spacing between sublpots.
>> - #### **lon_lat_extent**: an array-like variable that will set the extent of our map and has the order: [lower_lon, upper_lon, lower_lat, upper_lat]
>> - #### **lon_lat_ticks**:  an array-like variable that will be used to draw the ticklabels on our map. Has the order [lower_lon_tick, upper_lon_tick, lower_lat_tick, upper_lat_tick] 
>> - #### **lon_lat_tick_num**: a list of two integers that tells us how many evenly spaced lon/lat ticklabels we want. Has the order [lon_tick_num, lat_tick_num]
>> - #### **high_res_coastline**: A boolean which indicates if we should use the high resolution OSM coastline outline, or the 10m natural earth coastline outline.
>> - #### **high_res_wrf_topo_sb_bool**: A boolean which indicates if we should draw 50m resolution WRF topography over Santa Barbara. Only draws topography if "True" is provided.
>> - #### **low_res_wrf_topo_sb_bool**: A boolean which indicates if we should draw 1km resolution WRF topography over Santa Barbara. Only draws topography if "True" is provided.
>> - #### **low_res_wrf_topo_ca_bool**: A boolean which indicates if we should draw a ?km resolution WRF topography layer that covers the state of California. Only draws topography if "True" is provided.
>> - #### **wrf_topo_colorbar_each_plot_bool**: A boolean which indicates if we should draw a colorbar for our WRF topography for each plot or subplot created.
>> - #### **wrf_topo_colorbar_entire_figure_bool**: A boolean which indicates if we should draw a single colorbar for our WRF topography for the entire figure (subplots or single plot).
>> - #### **scale_bar_bool**: A boolean which indicates if we should draw a scale bar. Only draws scale bar if "True" is provided.
>> - #### **scale_bar_position**: A tuple that defines the (x,y) coordinate to begin drawing the scale bar.
>> - #### **scale_bar_length**: A value that takes an unusual format to define how long to make the scalebar. Examples: 1_0 is 10km, 10_0 is 100km, and 0o1 is 1km.
>> - #### **inset_ca_bool**: A boolean which indicates if we should draw an inset figure of California. Only draws inset if "True" is provided.
>> - #### **inset_bbox_position**: A tuple that defines the bounding box for the inset axis. This usually needs to be reset if you change the extent of your map. Format of tuple is (start_x_position, start_y_position, end_x_position, end_y_position).
>> - #### **ocean_color**: A string that defines the color to use for the Natural Earth ocean mask.
> - #### **Output Variables**:
>> - #### **fig**: matplotlib figure 
>> - #### **ax or axs**: cartopy axis or axes with basemap data plotted on one or multiple subplots.

In [1]:
#--------------------------------------------------------------------------------------------------
#Entire package imports
import numpy as np
import xarray as xr

#cartopy imports
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import cartopy.io.shapereader as shapereader
from cartopy.mpl.geoaxes import GeoAxes
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER

#matplotlib impots
import matplotlib.font_manager
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import matplotlib.patches as mpatches
from mpl_toolkits.axes_grid1 import make_axes_locatable
from mpl_toolkits.axes_grid1.inset_locator import inset_axes

#PIL imports
import PIL.Image

#scalebar imports
import scalebar
#--------------------------------------------------------------------------------------------------
def cartopy_basemap_subplots(plot_crs, data_crs, fig_size, 
                             nrows, ncols, wspace_float, hspace_float,
                             lon_lat_extent, lon_lat_ticks, lon_lat_tick_num, 
                             lon_lat_ticks_on=True, xtick_ytick_set_list=[True],
                             high_res_coastline=False, 
                             high_res_wrf_topo_sb_bool=False, 
                             low_res_wrf_topo_sb_bool=False, 
                             low_res_wrf_topo_ca_bool=False, 
                             wrf_topo_colorbar_each_plot_bool=True, 
                             wrf_topo_colorbar_entire_figure_bool=False,
                             scale_bar_bool=False, scale_bar_position=None, scale_bar_length=1_0, 
                             inset_ca_bool=False, inset_bbox_position=None, 
                             ocean_color=None):
#--------------------------------------------------------------------------------------------------
    #Define font items
    fontdict_text_color_bar  = {'fontsize': 24, 'fontweight': 'normal', 'fontname': 'Nimbus Roman'}
    fontdict_text_annotation = {'fontsize': 24, 'fontweight': 'normal', 'fontname': 'Nimbus Roman'}
    fontdict_tick_labels     = {'fontsize': 24, 'fontweight': 'normal', 'fontname': 'Nimbus Roman'}
    
    #Define both map and data coordinate systems  
    plot_crs = plot_crs
    data_crs = data_crs
    
    #Define map boundaries
    lon_min_extent = lon_lat_extent[0]
    lon_max_extent = lon_lat_extent[1]
    lat_min_extent = lon_lat_extent[2]
    lat_max_extent = lon_lat_extent[3]
    
    #Only define lat/lon tick variables if the user requested it
    if lon_lat_ticks_on == True:
    
        #Define map tick labels
        lon_min_tick = lon_lat_ticks[0]
        lon_max_tick = lon_lat_ticks[1]
        lat_min_tick = lon_lat_ticks[2]
        lat_max_tick = lon_lat_ticks[3]

        #Number of ticks we want. Serves as input to numpy "linspace" function below
        lon_num_tick  = lon_lat_tick_num[0]
        lat_num_tick  = lon_lat_tick_num[1]

        #Define longitude and latitude ticks using "linspace"
        map_xticks = np.linspace(lon_min_tick, lon_max_tick, num=lon_num_tick, endpoint=True).round(decimals=2) 
        map_yticks = np.linspace(lat_min_tick, lat_max_tick, num=lat_num_tick, endpoint=True).round(decimals=2)
    
    #Define cartopy axis
    #Use the compressed layout for better spacing subplots
    #https://stackoverflow.com/questions/73826115/cartopy-too-much-vertical-space-between-subplots-even-with-tight-layout
    fig, axs = plt.subplots(nrows=nrows, ncols=ncols, 
                            subplot_kw={'projection': plot_crs}, 
                            gridspec_kw={'wspace':wspace_float, 'hspace':hspace_float}, 
                            figsize=fig_size, layout='compressed')
#--------------------------------------------------------------------------------------------------
    #If we we want to make only a single map, put the single "axs" into a list so we can iterate it
    #Thanks ChatGPT: https://chatgpt.com/share/6702e471-d918-800f-ae87-9a83f77a2939
    if nrows*ncols == 1: 
        axs = [axs]
    else:
        axs = axs.flatten()
    
    #For each defined axis we have, create the following basemap
    for ax_index, ax in enumerate(axs):

        #Set extent of map
        ax.set_extent((lon_min_extent, lon_max_extent, lat_min_extent, lat_max_extent), crs=plot_crs)

        #Only set x-ticks and y-ticks if the user requested it
        if (lon_lat_ticks_on == True) & (xtick_ytick_set_list[ax_index] == True):

            #Setting tick locations and labels for lon/lat (degrees; projection = plot_crs)
            ax.set_xticks(map_xticks, crs=plot_crs)
            ax.set_yticks(map_yticks, crs=plot_crs)
            ax.set_xticklabels(map_xticks, **fontdict_tick_labels)
            ax.set_yticklabels(map_yticks, **fontdict_tick_labels)

            #Add formatter to lon/lat ticks (degree symbol & direction label)
            #https://scitools.org.uk/cartopy/docs/latest/gallery/tick_labels.html#sphx-glr-gallery-tick-labels-py
            ax.xaxis.set_major_formatter(LONGITUDE_FORMATTER)
            ax.yaxis.set_major_formatter(LATITUDE_FORMATTER)
            
            #Make ticks longer
            ax.tick_params(axis='both', which='major', length=15) 
#--------------------------------------------------------------------------------------------------              
        #If the user requested to add WRF topography over southern California to the map, do it!
        if (high_res_wrf_topo_sb_bool == True) | (low_res_wrf_topo_sb_bool == True) | (low_res_wrf_topo_ca_bool == True):

            #If the user wants the 50m resolution topography, use this file path
            if high_res_wrf_topo_sb_bool == True:
                wrf_file_str = './SWEX2022_datasets/geographic_files/sb_wrf_topo_50m.nc'
                wrf_hgt_str  = 'HGT_M' 
                wrf_lon_str  = 'XLONG_M'
                wrf_lat_str  = 'XLAT_M'
                elevation_colorbar_start = 0
                elevation_colorbar_end   = 2200
                elevation_colorbar_step  = 100
                elevation_colorbar_label_step = 1

            #Else, if the user wants the 1km resolution topography, use this file path
            elif low_res_wrf_topo_sb_bool == True:
                wrf_file_str = './SWEX2022_datasets/geographic_files/sb_wrf_topo_1km.nc'
                wrf_hgt_str  = 'HGT' 
                wrf_lon_str  = 'XLONG'
                wrf_lat_str  = 'XLAT'
                elevation_colorbar_start = 0
                elevation_colorbar_end   = 2200
                elevation_colorbar_step  = 100
                elevation_colorbar_label_step = 1

            #Else, if the user wants 9km resolution topography over the entire state of CA, use this path
            elif low_res_wrf_topo_ca_bool == True:
                wrf_file_str = '/home/voyager-sbarc/wrf/wrf451/sundowners/swex2022/iop10/run_545_ERA5_1km/wrfout_d01_2022-05-11_18:00:00'
                wrf_hgt_str  = 'HGT' 
                wrf_lon_str  = 'XLONG'
                wrf_lat_str  = 'XLAT'
                elevation_colorbar_start = 0
                elevation_colorbar_end   = 3600
                elevation_colorbar_step  = 100
                elevation_colorbar_label_step = 2

            #Read in NetCDF file
            nc_file = xr.open_dataset(wrf_file_str)
            
            # Check if "Time" exists in the dataset dimensions and select the first time step if it does
            if "Time" in nc_file.dims:
                nc_file = nc_file.isel(Time=0)
            
            #Read in NetCDF variables
            nc_ele  = nc_file[wrf_hgt_str].squeeze()  #Units = meters
            nc_lon  = nc_file[wrf_lon_str].squeeze()  #Units = degrees 
            nc_lat  = nc_file[wrf_lat_str].squeeze()  #Units = degress 

            #Make a colormap for topography. Thanks Deanna!
            #https://github.com/dlnash/AR_types/blob/master/analysis/fig1_elevation.ipynb
            colors_land = plt.cm.terrain(np.linspace(0.35, 1, 256))
            terrain_map = mcolors.LinearSegmentedColormap.from_list('terrain_map', colors_land)

            #Make the norm
            norm_boundaries = np.arange(elevation_colorbar_start, elevation_colorbar_end, elevation_colorbar_step)
            norm            = mcolors.BoundaryNorm(boundaries=norm_boundaries, ncolors=256)

            #Plot topography
            topo_plot = ax.pcolormesh(nc_lon, nc_lat, nc_ele, cmap=terrain_map, shading='auto', norm=norm, transform=data_crs)

            #If the user requested a colorbar for elevation on the current plot, add one
            if wrf_topo_colorbar_each_plot_bool == True:

                #Colorbar for topography
                #https://matplotlib.org/3.1.1/gallery/axes_grid1/demo_colorbar_with_axes_divider.html
                #Second Answer: https://stackoverflow.com/questions/30030328/correct-placement-of-colorbar-relative-to-geo-axes-cartopy
                divider = make_axes_locatable(ax)
                cax     = divider.append_axes('right', size='2%', pad=0.25, axes_class=plt.Axes)
                cbar    = plt.colorbar(topo_plot, cax=cax, orientation='vertical', spacing='uniform', drawedges=True, ticks=norm_boundaries[::elevation_colorbar_label_step])
                cbar.set_label('Elevation (m)', color='black',  labelpad=20, **fontdict_text_color_bar)

                #Set font for colorbar tick lables
                #https://stackoverflow.com/questions/7257372/set-font-properties-to-tick-labels-with-matplot-lib/7280803
                ticks_font = matplotlib.font_manager.FontProperties(family='Nimbus Roman', style='normal', size=24, weight='normal', stretch='normal')
                for label in cbar.ax.get_yticklabels():
                    label.set_fontproperties(ticks_font)
#--------------------------------------------------------------------------------------------------   
        #If the user requests a high resolution coastline, draw it.
        if high_res_coastline == True:

            #Add in high resolution coastline from OSM
            reader_ocean     = shapereader.Reader('./data_swex/geographic_plotting_files/osm_ocean/water_polygons_Clip.shp')
            reader_coastline = shapereader.Reader('./data_swex/geographic_plotting_files/osm_coastline/west_coast_lines.shp')
            ocean            = ax.add_geometries(geoms=reader_ocean.geometries(),     crs=ccrs.PlateCarree(), facecolor=ocean_color, edgecolor='None')
            coastline        = ax.add_geometries(geoms=reader_coastline.geometries(), crs=ccrs.PlateCarree(), facecolor='None',      edgecolor='black')

        #Else, use natural earth 10m data.
        else:
            #Add in ocean mask and coastline from Natural Earth (https://www.naturalearthdata.com/)
            #https://stackoverflow.com/questions/20990381/how-to-add-custom-shapefile-to-map-using-cartopy
            #Add in ocean mask, high resolution coastline, islands, state lines, and country lines all from Natural Earth
            #https://www.naturalearthdata.com/
            ax.add_feature(cfeature.OCEAN, facecolor=ocean_color, zorder=1)
            #ax.add_feature(cfeature.NaturalEarthFeature(category='physical', name='coastline',         scale='10m', facecolor='None', linewidth=2))
            ax.add_feature(cfeature.NaturalEarthFeature(category='physical', name='minor_islands',     scale='10m', facecolor='None'))
            #ax.add_feature(cfeature.NaturalEarthFeature(category='cultural', name='admin_0_countries', scale='10m', facecolor='None'))
            ax.add_feature(cfeature.NaturalEarthFeature(category='cultural', name='admin_1_states_provinces', scale='10m', facecolor='None'))
#--------------------------------------------------------------------------------------------------
        #If the user requested a scale bar, add one:
        if scale_bar_bool == True: 

            #Add in scalebar
            #Scale bar module copied from: 
            #https://stackoverflow.com/questions/32333870/how-can-i-show-a-km-ruler-on-a-cartopy-matplotlib-plot
            scale_bar_text_kwargs = {'fontsize': 24, 'fontweight': 'normal', 'fontname': 'Nimbus Roman'}
            scalebar.scale_bar(ax, scale_bar_position, scale_bar_length, color='black', text_kwargs=scale_bar_text_kwargs, linewidth=4, zorder=10)
#--------------------------------------------------------------------------------------------------           
    #If the user requested one colorbar for elevation for the entire figure, add one
    if wrf_topo_colorbar_entire_figure_bool == True:

        #Colorbar for topography
        #https://matplotlib.org/3.1.1/gallery/axes_grid1/demo_colorbar_with_axes_divider.html
        #Second Answer: https://stackoverflow.com/questions/30030328/correct-placement-of-colorbar-relative-to-geo-axes-cartopy
        cbar = fig.colorbar(topo_plot, ax=axs.ravel().tolist(), orientation='vertical', spacing='uniform', pad=0.01, drawedges=True, ticks=norm_boundaries[::elevation_colorbar_label_step])
        cbar.set_label('Elevation (m)', color='black', **fontdict_text_color_bar)  

        #Set font for colorbar tick lables
        #https://stackoverflow.com/questions/7257372/set-font-properties-to-tick-labels-with-matplot-lib/7280803
        ticks_font = matplotlib.font_manager.FontProperties(family='Nimbus Roman', style='normal', size=24, weight='normal', stretch='normal')
        for label in cbar.ax.get_yticklabels():
            label.set_fontproperties(ticks_font)
#--------------------------------------------------------------------------------------------------       
    #If the user requested a inset California figure, add one:
    if inset_ca_bool == True: 
        
        #Add inset figure of CA
        #https://stackoverflow.com/questions/55385515/embed-small-map-cartopy-on-matplotlib-figure
        ax2 = inset_axes(ax, width='100%',height='100%',loc='upper left', borderpad=0,
                         axes_class=GeoAxes, axes_kwargs={'projection':plot_crs},
                         bbox_to_anchor=inset_bbox_position, bbox_transform=ax.transAxes)

        #Set limits of figure (California state)
        ax2.set_extent((-126,-114,31,43), crs=plot_crs)

        #Add in states and land (Natural Earth Raster)
        #Note that we change the maximum amount of pixels that the PIL package can display because if we do not, we get a security error
        #See stackoverflow thread linked below for more information
        #https://gis.stackexchange.com/questions/313490/increasing-resolution-of-cartopy-stock-background
        #https://stackoverflow.com/questions/51152059/pillow-in-python-wont-let-me-open-image-exceeds-limit
        # PIL.Image.MAX_IMAGE_PIXELS = 243280000
        # ax2.imshow(plt.imread('./data_swex/geographic_plotting_files/NE1_HR_LC_SR_W_DR/NE1_HR_LC_SR_W_DR.tif'), origin='upper', transform=ccrs.PlateCarree(), extent=[-180, 180, -90, 90])
        # ax2.add_feature(cfeature.STATES.with_scale('10m'), facecolor='None', edgecolor='black')
        ax2.add_feature(cfeature.OCEAN, facecolor='lightskyblue')
        ax2.add_feature(cfeature.NaturalEarthFeature(category='cultural', name='admin_1_states_provinces', scale='10m', facecolor='Grey', edgecolor='black'))

        #Remove ticks from axis
        ax2.tick_params(labelleft=False, labelbottom=False,left=False,bottom=False)

        #Add rectangle around zoomed region
        #https://scitools.org.uk/cartopy/docs/v0.5/matplotlib/introductory_examples/02.polygon.html
        ax2.add_patch(mpatches.Rectangle(xy=[lon_min_extent, lat_min_extent], 
                                         width=abs((lon_min_extent - lon_max_extent)), 
                                         height=abs((lat_min_extent-lat_max_extent)), 
                                         edgecolor='red', fill=False, alpha=1, zorder=4, transform=plot_crs))
#--------------------------------------------------------------------------------------------------    
    #Return single axis if we only want 1 plot
    if nrows*ncols == 1: 
        return (fig, axs[0])
    #Else return a list of axes
    else:
        return (fig, axs)
#---------------------------------------------------------------------------------------------------   

ERROR 1: PROJ: proj_create_from_database: Open of /home/sbarc/students/mariandob/mambaforge/envs/swex/share/proj failed


## **Function: "xy_lidar_wind_profiles_ds"**
- #### Description: This function reads in horizontal wind lidar data collected from the SWEX campaign,and saves each variable into its own array-like (list) variable and then formats the data so it is "stacked". To clarify this point, it is important to note that each timestep for all lidar data represents a sample of the vertical profile of the atmosphere. Because we want to eventually create a time-height plot of the data, we stack each vertical level of the data so each layer represents a complete timeseries of that particular layer.  
> - #### **Input Parameters:** 
>> - #### **"glob_paths"**: A glob of paths to our lidar files (type: list)
>> - #### **"instituion_str"**: A string that determines which institution's LiDAR data we are reading in. Options are: 
>>> - #### **San Jose State University LiDAR = "sjsu" (Processed Wind Profile Files)**
>>> - #### **National Center for Atmospheric Research LiDARs = "ncar" (NetCDF files)**
>>> - #### **University of Notre Dame LiDARs = "und" (Processed Wind Profile Files)** 
>>> - #### **University of Virginia LiDAR = "uwow" (Processed Wind Profile Files) or "uwow_vad" (Processed Velocity-Azimuth Display Profiles)** 
> - #### **Ouput Parameters:** 
>> - #### **lidar_zy_profiles_ds**: A xarray dataset that contains horizontal velocitydata and signal-to-noise ratio (UWOW VAD only) as data variables for a single LiDAR instrument.

In [None]:
#----------------------------------------------------------------------------------------------------------------------
#Import entire packages
import numpy as np
import pandas as pd
import xarray as xr
#----------------------------------------------------------------------------------------------------------------------
#metpy imports
from metpy.units import units
from metpy.calc import wind_components
#----------------------------------------------------------------------------------------------------------------------
def xy_lidar_wind_profiles_ds(glob_paths, institution_str):
#----------------------------------------------------------------------------------------------------------------------
    #Ensure glob paths are strings in a sorted list
    glob_paths_sorted = [str(path) for path in sorted(glob_paths)]
#----------------------------------------------------------------------------------------------------------------------
    #NCAR LIDAR PROCESSING
    if institution_str == 'ncar':
                
        #Important Note
        #From quick inspection, NCAR LIDARs feature slightly different scan heights for the same instrument in different daily files.
        #This makes things a tad more tediuous to work with.
        #In order to overcome this limitation, we will do some processing on the "height" coordinate among daily LiDAR files
        
        #Define empty list to store scan heights from each file
        height_list = []
        
        #Define empty lists to store variables we grab from each NetCDF file
        xy_wind_u_list         = []
        xy_wind_v_list         = []
        xy_wind_speed_list     = []
        xy_wind_direction_list = []
        xy_wind_snr_list       = []
        
        #For each file that we found, do the following
        for file_index, file_path in enumerate(glob_paths_sorted):
            
            #Open the netcdf ncar lidar file
            nc_file = xr.open_dataset(file_path)
            
            #Save the floored height coordinate for further processing
            #We floor the height coordinate becuase this minimizes the amount of conflicting values between daily LiDAR files, while still maintaining accuracy
            height_list.append(np.floor(nc_file.height))
            
            #If we are on the second file or greater, do the following:
            if (file_index > 0): 
            
                #Check to see if the current "height" coordinate array is close to equal to 
                if np.allclose(height_list[file_index], height_list[file_index-1], atol=1) == True:
                
                    #Grab the variables we want (i.e. vertical velocity)
                    #Note that we are tranposing this variable to match the dimensions of the date and height 2D variables
                    xy_wind_u_list.append(nc_file['u'].transpose())
                    xy_wind_v_list.append(nc_file['v'].transpose())
                    xy_wind_speed_list.append(nc_file['wind_speed'].transpose())
                    xy_wind_direction_list.append(nc_file['wind_direction'].transpose())
                    xy_wind_snr_list.append(nc_file['mean_snr'].transpose())
            
            #Otherwise, if we just have a single file, just grab the variables as usual
            else:
                    #Grab the variables we want (i.e. vertical velocity)
                    #Note that we are tranposing this variable to match the dimensions of the date and height 2D variables
                    xy_wind_u_list.append(nc_file['u'].transpose())
                    xy_wind_v_list.append(nc_file['v'].transpose())
                    xy_wind_speed_list.append(nc_file['wind_speed'].transpose())
                    xy_wind_direction_list.append(nc_file['wind_direction'].transpose())
                    xy_wind_snr_list.append(nc_file['mean_snr'].transpose())

        #Finally, concatenate the variables along the time dimension and simply use the first (left) "height" coordinate at the final coordinate
        xy_wind_u_concat         = xr.concat(xy_wind_u_list, dim='time', join='override')
        xy_wind_v_concat         = xr.concat(xy_wind_v_list, dim='time', join='override')
        xy_wind_speed_concat     = xr.concat(xy_wind_speed_list, dim='time', join='override')
        xy_wind_direction_concat = xr.concat(xy_wind_direction_list, dim='time', join='override')
        xy_wind_snr_concat = xr.concat(xy_wind_snr_list, dim='time', join='override')

        
        #Create an xarray Dataset from the DataArrays
        lidar_xy_profiles_ds = xr.Dataset({'xy_wind_u':xy_wind_u_concat, 'xy_wind_v':xy_wind_v_concat,
                                          'xy_wind_speed':xy_wind_speed_concat, 'xy_wind_direction':xy_wind_direction_concat,
                                          'xy_wind_snr': xy_wind_snr_concat})
#----------------------------------------------------------------------------------------------------------------------
    #SJSU, UND, and UWOW (processed wind profiles or VAD files) LIDAR PROCESSING 
    #Each of these LiDAR instruments have similar data formats
    #So we can process them more or less the same
    if (institution_str == 'sjsu') | (institution_str == 'und') | (institution_str == 'uwow') | (institution_str == 'uwow_vad'):
        
        #If we have data from the SJSU, UND, or the non-VAD UWOW instruments, do the following:
        if (institution_str == 'sjsu') | (institution_str == 'und') | (institution_str == 'uwow'):
            
            #Define input parameters for these instruments when reading files using pandas
            whitespace_bool = True
            skip_rows       = 1
            column_names    = ['height_m', 'wind_direction_deg', 'wind_speed_ms']
        
        #If we have data from UWOW VAD analysus, do the following:
        elif institution_str == 'uwow_vad':
            
            #Define input parameters for these instruments when reading files using pandas
            whitespace_bool = False
            skip_rows       = 3
            column_names    = ['height_m', 'wind_speed_ms', 'wind_direction_deg', 'snr']
            xy_snr_single_timestep = []

        #Set up storage locations for each timestep of data
        xy_date_single_timestep           = []
        xy_height_single_timestep         = []
        xy_wind_u_single_timestep         = []
        xy_wind_v_single_timestep         = []
        xy_wind_speed_single_timestep     = []
        xy_wind_direction_single_timestep = []
        
        #For each file that we found, do the following
        for file_index, file_path in enumerate(glob_paths_sorted):
            
            #Read in csv LiDAR file depending on which data we have
            df = pd.read_csv(file_path, delim_whitespace=whitespace_bool, skiprows=skip_rows, names=column_names)

            #Compute U and V components of wind using metpy and store values in appropriate lists
            xy_wind_u, xy_wind_v = wind_components(df['wind_speed_ms'].values*units('m/s'), df['wind_direction_deg'].values*units.deg)
            xy_wind_u_single_timestep.append(xy_wind_u)
            xy_wind_v_single_timestep.append(xy_wind_v)

            #Store each column of data (i.e. 1-dimensional array) in each our our storage locations
            #Later on we will reshape our storage locations to plot using the pcolormesh function in matplotlib
            xy_date_single_timestep.append(pd.to_datetime(file_path[-19:-11]+file_path[-10:-4]))
            xy_height_single_timestep.append(df['height_m'].values)
            xy_wind_speed_single_timestep.append(df['wind_speed_ms'].values)
            xy_wind_direction_single_timestep.append(df['wind_direction_deg'].values)
            
            #Do some additional processing for uwow data since it has SNR values
            if institution_str == 'uwow_vad':
                xy_snr_single_timestep.append(df['snr'].values)
            
        #Turn our lists of 1D arrays (1 array per timstep per variable) into 2 dimensional arrays
        xy_wind_u_2d         = np.column_stack(xy_wind_u_single_timestep)
        xy_wind_v_2d         = np.column_stack(xy_wind_v_single_timestep)
        xy_wind_speed_2d     = np.column_stack(xy_wind_speed_single_timestep)
        xy_wind_direction_2d = np.column_stack(xy_wind_direction_single_timestep)
        
        #If we have the UWOW VAD do the following final steps
        if institution_str == 'uwow_vad':
            
            #Convert SNR data from 1D to 2D
            xy_snr_2d = np.column_stack(xy_snr_single_timestep)
            
            #Create a xarray Dataset using our variables
            lidar_xy_profiles_ds = xr.Dataset({'xy_wind_u':(['height', 'time'], xy_wind_u_2d),
                                               'xy_wind_v':(['height', 'time'], xy_wind_v_2d),
                                               'xy_wind_speed':(['height', 'time'], xy_wind_speed_2d),
                                               'xy_wind_direction':(['height', 'time'], xy_wind_direction_2d),
                                               'xy_wind_snr':(['height', 'time'], xy_snr_2d)}, 
                                             coords={'height':np.unique(np.concatenate(xy_height_single_timestep)), 'time':xy_date_single_timestep})
        else:
        
            #Create a xarray Dataset using our variables (no SNR)
            lidar_xy_profiles_ds = xr.Dataset({'xy_wind_u':(['height', 'time'], xy_wind_u_2d),
                                               'xy_wind_v':(['height', 'time'], xy_wind_v_2d),
                                               'xy_wind_speed':(['height', 'time'], xy_wind_speed_2d),
                                               'xy_wind_direction':(['height', 'time'], xy_wind_direction_2d)}, 
                                             coords={'height':np.unique(np.concatenate(xy_height_single_timestep)), 'time':xy_date_single_timestep})
#----------------------------------------------------------------------------------------------------------------------        
    return (lidar_xy_profiles_ds)
#----------------------------------------------------------------------------------------------------------------------

## **Function: "z_lidar_wind_profiles_ds**"
- #### Description: This function reads LiDAR stare files, which contain vertical velocity data, collected from the SWEX campaign. The function reads stare files from an individual LiDAR and creates an xarray Dataset with the vertical velocity magnitude and signal-to-noise ratio as data variables. 
> - #### **Input Parameters**: 
>> - #### **"glob_paths"**: A list of file paths to our LiDAR files
>> - #### **"institution_str"**: A string that defines which LiDAR instrument we are reading data files from. Options are: **"sjsu", "ncar", "und", "uwow"**. 
> - #### **Ouput Parameters**: 
>> - #### **lidar_z_profiles_ds**: A xarray dataset that contains vertical velocity magnitude and signal-to-noise ratio as data variables for a single LiDAR instrument.

In [None]:
#----------------------------------------------------------------------------------------------------------------------
#Import entire packages
import numpy as np
import pandas as pd
import xarray as xr
#----------------------------------------------------------------------------------------------------------------------
#metpy imports
from metpy.units import units
from metpy.calc import wind_components
#----------------------------------------------------------------------------------------------------------------------
def z_lidar_wind_profiles_ds(glob_paths, institution_str):

    #Ensure glob paths are strings in a sorted list
    glob_paths_sorted = [str(path) for path in sorted(glob_paths)]
#----------------------------------------------------------------------------------------------------------------------
    #NCAR LIDAR PROCESSING
    if institution_str == 'ncar':
        
        #Important Note
        #From quick inspection, NCAR LIDARs feature slightly different scan heights for the same instrument in different daily files.
        #This makes things a tad more tediuous to work with.
        #In order to overcome this limitation, we will do some processing on the "height" coordinate among daily LiDAR files
        
        #Define empty list to store scan heights from each file
        height_list = []
        
        #Define empty lists to store variables we grab from each NetCDF file
        z_wind_speed_list = []
        z_wind_snr_list   = []
        
        #For each file that we found, do the following
        for file_index, file_path in enumerate(glob_paths_sorted):
            
            #Open the netcdf ncar lidar file
            nc_file = xr.open_dataset(file_path)
            
            #Save the floored height coordinate for further processing
            #We floor the height coordinate becuase this minimizes the amount of conflicting values between daily LiDAR files, while still maintaining accuracy
            height_list.append(np.floor(nc_file.height))
            
            #If we are on the second file or greater, do the following:
            if (file_index > 0): 
            
                #Check to see if the current "height" coordinate array is close to equal to 
                if np.allclose(height_list[file_index], height_list[file_index-1], atol=1) == True:
                
                    #Grab the variables we want (i.e. vertical velocity)
                    #Note that we are tranposing this variable to match the dimensions of the date and height 2D variables
                    z_wind_speed_list.append(nc_file['w'].transpose())
                    z_wind_snr_list.append(nc_file['mean_snr'].transpose())

            #Otherwise, if we just have a single file, just grab the variables as usual
            else:

                #Grab the variables we want (i.e. vertical velocity)
                #Note that we are tranposing this variable to match the dimensions of the date and height 2D variables
                z_wind_speed_list.append(nc_file['w'].transpose())
                z_wind_snr_list.append(nc_file['mean_snr'].transpose())

        #Finally, concatenate the variables along the time dimension and simply use the first (left) "height" coordinate at the final coordinate
        z_wind_speed_concat = xr.concat(z_wind_speed_list, dim='time', join='override')
        z_wind_snr_concat   = xr.concat(z_wind_snr_list, dim='time', join='override')
        
        #Create an xarray Dataset from the DataArrays
        lidar_z_profiles_ds = xr.Dataset({'wind_speed_z':z_wind_speed_concat, 'wind_speed_z_snr':z_wind_snr_concat})
#----------------------------------------------------------------------------------------------------------------------
    #SJSU, UND, and UWOW LIDAR PROCESSING
    if (institution_str == 'sjsu') | (institution_str == 'und') | (institution_str == 'uwow'):
        
        #We need to define a few processing parameters depending on which institution's LiDAR instrument we are using
        if institution_str == 'sjsu':
            skip_rows  = 17 
            chunk_size = 1345
            range_gate_size = 18
        elif institution_str == 'und':
            skip_rows  = 17 
            chunk_size = 151
            range_gate_size = 48
        elif institution_str == 'uwow':
            skip_rows  = 17 
            chunk_size = 201
            range_gate_size = 30

        #Define storage locations for each timestep of observations for each variable
        z_snr_chunk_list        = []
        z_date_chunk_list       = []
        z_height_chunk_list     = []
        z_wind_speed_chunk_list = []

        #For each stare file we have from the lidar, do the folloing
        for file_index, file in enumerate(glob_paths_sorted):
            
            #Read in the hourly stare file
            #We skip the first 17 rows of data for the lidar, since this just contains metadata related to the lidar and stare file
            #Also note that we read in an entire hourly stare file, which contains the data for an entire hour of observations (sampled at some frequency. Seems like ~ every 1 second), in chunks
            #We read in the data in chunks because it makes dealing with the file structure of each stare a bit more clear
            #This is because each stare file contains a row of 5 columns which specifies the lidar scanning information and provides an observation time 
            #This row of 5 columns is then followed by 200 rows of 4 columns of actual measured data (e.g. vertical velocity)
            #This then repeats for every observation that is in the single stare file (observations are collected about every few seconds)
            #This is a bit confusing but should not cause any mistakes or error. Check out the raw files to get a better sense of what is going on if these comments are not clear enough.
            chunky_df = pd.read_csv(file, names=['time_or_range_gate', 'azimuth_or_vertical_velocity', 'elevation_or_snr_plus_1', 'pitch_or_beta', 'roll_or_nan'], skiprows=skip_rows, chunksize=chunk_size, delim_whitespace=True, on_bad_lines='warn')
            
            #Because we read each individual file in chunks, we must iterate through the chunks
            #For each chunk, do the following
            for chunk_index, chunk in enumerate(chunky_df):
                
                #We found some data in some LiDAR stare files that had some incomplete data/weird symbols
                #These chunks end up messing up the way we read in our data since the amount of lines we read in per chunk are affected
                #The line below is one way to deal with this, where we simply skip the rest of the entire file if we encounter a single sample that is off
                #If the first element "roll_or_nan" column in our chunk has a NaN value in the first row, we know we have hit the part of the file where the data is affected
                #If this happens skip this iteration. This will also skip all other iterations in the file since the chunking pattern is screwed up
                if np.isnan(chunk['roll_or_nan'].iloc[0]):
                    continue
                else:
                    #Make a timestamp that represent the time of each observation

                    #Year, month, day of observation come from file name
                    #Format of date in file name is "YYYYMMDD"
                    z_year_month_day_chunk = pd.to_datetime(file[-15:-7], format='%Y%m%d')

                    #The rest of the timestamp (hours, minutes, seconds, etc...) comes from the chunk we are on in the stare file
                    #This time is provided as a decimal time (e.g. 15.02572123; format is HH.HHHHHHHH)
                    #We convert this into a pandas timedelta object, which will convert the decimal into the appropriate hours, minutes, seconds we need
                    #We then can easily combine this with the year, month, day datetime object we created before
                    z_decimal_hour_chunk   = pd.Timedelta(chunk['time_or_range_gate'].iloc[0], unit='hours')

                    #Combine year, month, day datetim eobject from file name with Timedelta object from stare file to get actual observation time for a single observation 
                    z_date_single_chunk = z_year_month_day_chunk + z_decimal_hour_chunk

                    #Convert column of range gate numbers into altitudes
                    #From the stare file metadata, we see that the altitude of each measurement is defined as:
                    #Altitude of Measurement = (range_gate_number * 0.5) + gate_length, where gate_length is a constant value each instrument's stare files
                    #Also notice how we index this series of values, starting with the second value (remember python uses zero indexing) and going to the end of the chunk
                    #This is because we do not want the first row of data which contains 5 columns of values that correspond to lidar scanning and time information
                    z_height_single_chunk = (chunk['time_or_range_gate'].iloc[1::] + 0.5) * range_gate_size

                    #Grab the vertical velocity column, be sure to skip the first row
                    z_wind_speed_single_chunk = chunk['azimuth_or_vertical_velocity'].iloc[1::]

                    #Grab signal to noise ratio (SNR) values
                    #Remember that in the stare files we have a column of intensity values which are defined as SNR+1
                    #To get the actual SNR values, we subtract 1 from this column
                    z_snr_single_chunk = chunk['elevation_or_snr_plus_1'].iloc[1::] - 1

                    #Append all variable values for a single observation time to the proper storage locations
                    z_date_chunk_list.append(z_date_single_chunk)
                    z_height_chunk_list.append(z_height_single_chunk.values)
                    z_wind_speed_chunk_list.append(z_wind_speed_single_chunk.values)
                    z_snr_chunk_list.append(z_snr_single_chunk.values)
        
        #Turn lists of 1-D arrays into 2-D arrays
        z_wind_speed_2d = np.column_stack(z_wind_speed_chunk_list)
        z_wind_snr_2d   = np.column_stack(z_snr_chunk_list)
        
        #Create a xarray Dataset using our variables
        lidar_z_profiles_ds = xr.Dataset({'wind_speed_z':(['height', 'time'], z_wind_speed_2d), 
                                          'wind_speed_z_snr':(['height', 'time'], z_wind_snr_2d)}, 
                                         coords={'height':np.unique(np.concatenate(z_height_chunk_list)), 'time':z_date_chunk_list})
#----------------------------------------------------------------------------------------------------------------------
    #Return DataFrame
    return(lidar_z_profiles_ds)
#----------------------------------------------------------------------------------------------------------------------

## **Define a helper function "vertical_resolution_averaging_radiosonde**"
- #### Description: This function reads in some user input parameters, as well as two array-like inputs that should come from a radiosonde, and essentially averages the input variable from the radiosonde to an evenly spaced vertical resolution. Another important distinction is that the location of the vertically averaged radiosonde variable is placed at at the half way point between each vertical resolution bin ("radiosonde_altitude_array_evenly_spaced_midpoint" output parameter). 
- #### For example, if the user requests that the radisonde temperature variable we vertically avaeraged every 25m starting from 0m above ground, the first vertically averaged point would consist of all temperature values that fall between 0m and 25m and that point would be placed half way between 0m and 25m (i.e. 12.5m). This translates to our vertically averaged point being comprised of values above and below its location. 
> - #### **Input Parameters**: 
>> - #### **"starting_altitude"** | **Type: float** | The starting altitude (in any unit) we want in the beginning of our evenly spaced altitude array (Should always be zero)
>> - #### **"ending_altitude"** | **Type: float** | The ending altitude (in any unit) we want at the end of our evenly spaced altitude array (10000 m seems to be a good value for SWEX data so far, but we could try others)
>> - #### **"desired_vertical_resolution"** | **Type: float** | The desired vertical resolution we want to average to (in any units, as long as units are consistent with the variable "radiosonde_altitude_array")
>> - #### **"radiosonde_altitude_array"** | **Type: array-like** | The altitude array (1-dimensional) from the radiosonde data
>> - #### **"radiosonde_met_variable_array"** | **Type: array-like** | The meteorlogical variable array (1-dimensional) from the radiosonde that we want to modify to match a specific vertical resolution
>> - #### **"radiosonde_altitude_array_evenly_spaced_midpoint_bool"** | **Type: Boolean** | A boolean that determines if the function returns an altitude array that represents to the midpoints of each vertical averaging window. 
> - #### **Ouput Parameters**:
>> - #### **"radiosonde_altitude_array_evenly_spaced_midpoint"** | **Type: numpy array** | An array (1-dimensional) which contains the evenly spaced altitude values for the meteorological variable we have averaged added to the vertical resolution divided by 2 | Returned ONLY if input parameter "radiosonde_altitude_array_evenly_spaced_midpoint_bool" is set to true
>> - #### **"radiosonde_metvariable_array_averaged"** | **Type: numpy array** | An array (1-dimensional) which contains the values of the meteorological variable that has been averaged | Returned regardless of status of input parameter: "radiosonde_altitude_array_evenly_spaced_midpoint_bool"

In [None]:
#----------------------------------------------------------------------------------------------------------------------
import numpy as np
#----------------------------------------------------------------------------------------------------------------------
#Create a function for averaging radisonde data at a given vertical resolution
def vertical_resolution_averaging_radiosonde(starting_altitude, ending_altitude, desired_vertical_resolution, radiosonde_altitude_array, radiosonde_metvariable_array, radiosonde_altitude_array_evenly_spaced_midpoint_bool):
#----------------------------------------------------------------------------------------------------------------------
    #Define storage location for averaged met variable values
    radiosonde_metvariable_array_averaged = []

    #Define storage location for altitude values (spaced at midpoint increments)
    radiosonde_altitude_array_evenly_spaced_midpoint = []
    
    #Create the array which represents the altitudes with the desired evenly spaced vertical resolution
    #This array will be used to create the altitude array we will use for plotting, which is simply this array but with the altitude values offset by (desired_vertical_resolution/2)
    #Example: The requested vertical resolution for our data is 25m, thus for the first point we average all values between 0m and 25m and then plot that averaged value at an altitude of 12.5m
    radiosonde_altitude_array_evenly_spaced = np.arange(starting_altitude, ending_altitude+desired_vertical_resolution, desired_vertical_resolution)
#----------------------------------------------------------------------------------------------------------------------
    #For each value in the radiosonde altitude 1-dimensional array, do the following
    for altitude_index, altitude_value in enumerate(radiosonde_altitude_array_evenly_spaced):
        
        #For the first value take the mean of varibale values between the first altitude in our created array and the one directly above it
        #If we want the midpoint_altitude_array, perform that computation as well
        if altitude_index == 0:
            
            #Average met variable array between two altitudes via indexing
            radiosonde_metvariable_array_averaged.append(np.nanmean(radiosonde_metvariable_array[(radiosonde_altitude_array >= radiosonde_altitude_array_evenly_spaced[altitude_index]) & 
                                                                                              (radiosonde_altitude_array <= radiosonde_altitude_array_evenly_spaced[altitude_index+1])])) 
        
        #For all other evenly spaced altitude values, besides that last altitude, take the mean between the current altitude and the next altitude
        elif altitude_index < len(radiosonde_altitude_array_evenly_spaced)-1: 
            
            #Average met variable array between two altitudes via indexing
            radiosonde_metvariable_array_averaged.append(np.nanmean(radiosonde_metvariable_array[(radiosonde_altitude_array > radiosonde_altitude_array_evenly_spaced[altitude_index]) & 
                                                                                              (radiosonde_altitude_array <= radiosonde_altitude_array_evenly_spaced[altitude_index+1])]))
        
        #If the user wants an array of evenly spaced midpoint altitudes, compute it
        #If we are on the last evenly spaced altitude value, do not compute that one.
        if (radiosonde_altitude_array_evenly_spaced_midpoint_bool == True) & (altitude_index < len(radiosonde_altitude_array_evenly_spaced)-1):
                
                #Store averaged altitude value added to the vertical resolution divided by 2 (i.e. midpoint) if the user wants it
                radiosonde_altitude_array_evenly_spaced_midpoint.append(altitude_value+(desired_vertical_resolution/2))
#----------------------------------------------------------------------------------------------------------------------
    #Return the evenly spaced midpoint altitude array AND the averaged met variable array if that is what the user requested
    if radiosonde_altitude_array_evenly_spaced_midpoint_bool == True:
        return(np.asarray(radiosonde_altitude_array_evenly_spaced_midpoint), np.asarray(radiosonde_metvariable_array_averaged))
   
    #Else return ONLY the averaged met variable array
    else:
        return(np.asarray(radiosonde_metvariable_array_averaged))
#----------------------------------------------------------------------------------------------------------------------