<img src='./img/EU-Copernicus-EUM_3Logos.png' alt='Logo EU Copernicus EUMETSAT' align='right' width='50%'></img>

<br>

# LTPy functions

This notebook lists all `functions` that are defined and used throughout the `LTPy course`.
The following functions are listed:

**[Data loading and re-shaping functions](#load_reshape)**
* [generate_xr_from_1D_vec](#generate_xr_from_1D_vec)
* [load_l2_data_xr](#load_l2_data_xr)
* [generate_geographical_subset](#generate_geographical_subset)
* [generate_masked_array](#generate_masked_array)
* [load_masked_l2_da](#load_masked_l2_da)
* [select_channels_for_rgb](#rgb_channels)
* [normalize](#normalize)

**[Data visualization functions](#visualization)**
* [visualize_l2](#visualize_l2)
* [visualize_gome_mollweide](#visualize_gome_mollweide)
* [visualize_imshow](#visualize_imshow)
* [visualize_s5p_pcolormesh](#visualize_s5p_pcolormesh)
* [visualize_s3_pcolormesh](#visualize_s3_pcolormesh)

<hr>

#### Load required libraries

In [2]:
import os
from matplotlib import pyplot as plt

import xarray as xr
from netCDF4 import Dataset
import numpy as np

from matplotlib import pyplot as plt
import matplotlib.colors
from matplotlib.colors import LogNorm
import cartopy.crs as ccrs
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER

import warnings
warnings.simplefilter(action = "ignore", category = RuntimeWarning)
warnings.simplefilter(action = "ignore", category = FutureWarning)

<hr>

## <a id="load_reshape"></a>Data loading and re-shaping functions

### <a id='generate_xr_from_1D_vec'></a>`generate_xr_from_1D_vec`

In [1]:
def generate_xr_from_1D_vec(file,lat_path, lon_path, variable, parameter_name, longname, no_of_dims, unit):
    """ 
    Takes a netCDF4.Dataset or xarray.DataArray object and returns a xarray DataArray object with latitude / longitude
    information as coordinate information
    
    Parameters:
        file (netCDF4 data file): AC SAF Level 2 data file, loaded a netCDF4.Dataset or xarray.DataArray
        lat_path (str): internal path of the data file to the latitude information, e.g. 'GEOLOCATION/LatitudeCentre'
        lon_path (str): internal path of the data file to the longitude information, e.g. 'GEOLOCATION/LongitudeCentre'
        variable (array): extracted variable of interested
        parameter_name (str): parameter name, preferably extracted from the data file
        longname (str): Long name of the parameter, preferably extracted from the data file
        no_of_dims (int): Define the number of dimensions of your input array
        unit (str): Unit of the parameter, preferably extracted from the data file
    
    Returns:
        1 or 2-dimensional (depending on the given number of dimensions) xarray DataArray  with latitude / longitude information as coordinate information
    """
    latitude = file[lat_path]
    longitude = file[lon_path]
    param = variable 
    
    if (no_of_dims==1):
        param_da = xr.DataArray(
            param[:],
            dims=('ground_pixel'),
            coords={
                'latitude': ('ground_pixel', latitude[:]),
                'longitude': ('ground_pixel', longitude[:])
            },
            attrs={'long_name': longname, 'units': unit},
            name=parameter_name
        )
    else:
        param_da = xr.DataArray(
            param[:],
            dims=["x","y"],
            coords={
                'latitude':(['x','y'],latitude[:]),
                'longitude':(['x','y'],longitude[:])
            },
            attrs={'long_name': longname, 'units': unit},
            name=parameter_name
        )
        
    return param_da

### <a id='load_l2_data_xr'></a>`load_l2_data_xr`

In [6]:
def load_l2_data_xr(directory, internal_filepath, parameter, latName, lonName, no_of_dims, unit, longname):
    """ 
    Loads a Metop-A/B Level 2 dataset in HDF format and returns a xarray DataArray with all the ground pixels of all directory 
    files. Uses function 'generate_xr_from_1D_vec' to generate the xarray DataArray.
    
    Parameters:
        directory (str): directory where the HDF files are stored
        internal_filepath (str): internal path of the data file that is of interest, e.g. TOTAL_COLUMNS
        parameter (str): paramter that is of interest, e.g. NO2
        latName (str): name of latitude variable
        lonName (str): name of longitude variable
        no_of_dims (int): number of dimensions of input array
        unit (str): unit of the parameter, preferably taken from the data file
        longname (str): longname of the parameter, preferably taken from the data file
    
    Returns:
        1 or 2-dimensional xarray DataArray with latitude / longitude information as coordinate information
    """
    fileList = [os.path.join(directory, f) for f in os.listdir(directory)]
    datasets = []

    for i in fileList:
        tmp=Dataset(i)
        param=tmp[internal_filepath+'/'+parameter]
        da_tmp= generate_xr_from_1D_vec(tmp,'GEOLOCATION/'+latName, 'GEOLOCATION/'+lonName,
                                param, param.name, longname, no_of_dims, unit)
        if(no_of_dims==1):
            datasets.append(da_tmp)
        else:
            da_tmp_st = da_tmp.stack(ground_pixel=('x','y'))
            datasets.append(da_tmp_st)

    return xr.concat(datasets, dim='ground_pixel')

### <a id='generate_geographical_subset'></a>`generate_geographical_subset`

In [1]:
def generate_geographical_subset(xarray, latmin, latmax, lonmin, lonmax):
    """ 
    Generates a geographical subset of a xarray DataArray and shifts the longitude grid from a 0-360 to a -180 to 180 deg grid.
    
    Parameters:
        xarray (xarray DataArray): a xarray DataArray with latitude and longitude coordinates
        latmin, latmax, lonmin, lonmax (int): boundaries of the geographical subset
        
    Returns:
        Geographical subset of a xarray DataArray.
    """   
    xarray = xarray.assign_coords(longitude=(((xarray.longitude + 180) % 360) - 180))
    return xarray.where((xarray.latitude < latmax) & (xarray.latitude > latmin) & (xarray.longitude < lonmax) & (xarray.longitude > lonmin),drop=True)

### <a id='generate_masked_array'></a>`generate_masked_array`

In [10]:
def generate_masked_array(xarray, mask, threshold, operator):
    """ 
    Applies a cloud mask (e.g. cloud fraction values) onto a given data array, based on a given threshold.
    
    Parameters:
        xarray (xarray DataArray): a three-dimensional xarray DataArray object
        mask (xarray DataArray): 1-dimensional xarray DataArray, e.g. cloud fraction values
        threshold (float): any number between 0 and 1, specifying the degree of cloudiness which is acceptable
        operator (str): operator how to mask the array, e.g. '<', '>' or '='
        
    Returns:
        Masked xarray DataArray with flagged negative values
    """
    if(operator=='<'):
        cloud_mask = xr.where(mask < threshold, 1, 0) #Generate cloud mask with value 1 for the pixels we want to keep
    else:
        cloud_mask = xr.where(mask == threshold, 1, 0)
    xarray_masked = xr.where(cloud_mask ==1, xarray, 0) #Apply mask onto the DataArray
    xarray_masked.attrs = xarray.attrs #Set DataArray attributes 
    return xarray_masked[xarray_masked > 0] #Return masked DataArray and flag negative values

### <a id='load_masked_l2_da'></a>`load_masked_l2_da`

In [6]:
def load_masked_l2_da(directory, internal_filepath, parameter, latName, lonName, longname, no_of_dims,  unit, threshold, operator):
    """ 
    Loads a Metop-A/B Gome-2 Level 2 data and cloud fraction information and returns a masked data array.
    
    Parameters:
        directory(str): Path to directory with Level 2 data files.
        internal_filepath(str): Internal file path under which the parameters are strored, e.g. TOTAL_COLUMNS
        parameter(str): atmospheric parameter, e.g. NO2
        latName (str): name of the latitude variable within the file
        lonName (str): name of the longitude variable within the file
        longname(str): long name of the parameter that shall be used
        unit(str): unit of the parameter
        threshold (float): any number between 0 and 1, specifying the degree of cloudiness which is acceptable
        operator (str): operator how to mask the array, e.g. '<', '>' or '='
        
    Returns:
        Masked xarray DataArray with flagged negative values
    """  
    da = load_l2_data_xr(directory, internal_filepath, parameter, latName, lonName, no_of_dims, unit, longname)
    cloud_fraction = load_l2_data_xr(directory, 'CLOUD_PROPERTIES', 'CloudFraction', 'LatitudeCentre', 'LongitudeCentre', no_of_dims, '-', 'Cloud Fraction')
    
    return generate_masked_array(da, cloud_fraction, threshold, operator)

### <a id='rgb_channels'></a> `select_channels_for_rgb`

In [None]:
def select_channels_for_rgb(xarray, red_channel, green_channel, blue_channel):
    """ 
    Selects the channels / bands of a multi-dimensional xarray for red, green and blue composites.
    
    Parameters:
        xarray(xarray Dataset): xarray Dataset object that stores the different channels / bands.
        red_channel(str): Name of red channel to be selected
        green_channel(str): Name of green channel to be selected
        blue_channel(str): Name of blue channel to be selected

    Returns:
        Three xarray DataArray objects with selected channels / bands
    """  
    return xarray[red_channel], xarray[green_channel], xarray[blue_channel]

## <a id='normalize'></a> `normalize`

In [None]:
def normalize(array):
    """ 
    Normalizes a numpy array / xarray DataArray object value to values between 0 and 1.
    
    Parameters:
        xarray(numpy array or xarray DataArray): xarray DataArray or numpy array object.

    Returns:
        Normalized array
    """ 
    array_min, array_max = array.min(), array.max()
    return ((array - array_min)/(array_max - array_min))

<hr>

## <a id="visualization"></a>Data visualization functions

### <a id='visualize_l2'></a>`visualize_l2`

In [1]:
def visualize_l2(xr_dataarray, conversion_factor, projection, vmin, vmax, point_size,color_scale, unit, title, set_global=False):
    """ 
    Visualizes a xarray DataArray in a given projection using matplotlib's scatter function.
    
    Parameters:
        xr_dataarray(xarray DataArray): a one-dimensional xarray DataArray object with latitude and longitude information as coordinates
        conversion_factor (float): any number to convert the DataArray values
        projection (str): choose one of cartopy's projection, e.g. ccrs.PlateCarree()
        vmin (int): minimum number on visualisation legend
        vmax (int): maximum number on visualisation legend
        point_size (int): size of marker, e.g. 5
        color_scale (str): string taken from matplotlib's color ramp reference
        unit (str): define the unit to be added to the color bar
        title (str): define titl of the plot
        set_global (logical): set True, if the plot shall have a global coverage
    """
    fig, ax = plt.subplots(figsize=(40, 10))
    ax = plt.axes(projection=projection)

    ax.coastlines()
    if set_global:
        ax.set_global()
    
    if (projection==ccrs.PlateCarree()):
        gl = ax.gridlines(draw_labels=True, linestyle='--')
        gl.xlabels_top=False
        gl.ylabels_right=False
        gl.xformatter=LONGITUDE_FORMATTER
        gl.yformatter=LATITUDE_FORMATTER
        gl.xlabel_style={'size':14}
        gl.ylabel_style={'size':14}

    # plot pixel positions
    img = ax.scatter(
        xr_dataarray.longitude.data,
        xr_dataarray.latitude.data,
        c=xr_dataarray.data*conversion_factor,
        cmap=plt.cm.get_cmap(color_scale),
        marker='o',
        s=point_size,
        transform=ccrs.PlateCarree(),
        vmin=vmin,
        vmax=vmax
    )

    plt.xticks(fontsize=16)
    plt.yticks(fontsize=16)
    plt.xlabel("Longitude", fontsize=16)
    plt.ylabel("Latitude", fontsize=16)
    cbar = fig.colorbar(img, ax=ax, orientation='horizontal', fraction=0.04, pad=0.1)
    cbar.set_label(str(conversion_factor) + ' ' + unit, fontsize=16)
    cbar.ax.tick_params(labelsize=14)
    ax.set_title(title, fontsize=20, pad=20.0)
    plt.show()

### <a id='visualize_gome_mollweide'></a>`visualize_gome_mollweide`

In [2]:
def visualize_gome_mollweide(xr_dataarray, conversion_factor, color_scale, marker_size, vmin, vmax):
    """ 
    Visualizes a xarray dataarray in a mollweide projection using matplotlib's scatter function.
    
    Parameters:
        xr_dataarray (xarray DataArray): a three-dimensional xarray DataArray object
        conversion_factor (float): any number to convert the DataArray values
        color_scale (str): string taken from matplotlib's color ramp reference 
        marker_size (str): size of the marker
        vmin (int): minimum number on visualisation legend
        vmax (int): maximum number on visualisation legend
    """
    fig, ax = plt.subplots(figsize=(40, 10))
    ax = plt.axes(projection=ccrs.Mollweide())

    ax.coastlines()
    ax.set_global()

    ax.gridlines(linestyle='--')
    img = ax.scatter(
        xr_dataarray.longitude.data,
        xr_dataarray.latitude.data,
        c=xr_dataarray.data*conversion_factor,
        cmap=plt.cm.get_cmap(color_scale),
        marker='o',
        s=marker_size,
        transform=ccrs.PlateCarree(),
        vmin=vmin,
        vmax=vmax
    )

    cbar = fig.colorbar(img, ax=ax, orientation='horizontal', fraction=0.04, pad=0.1)
    cbar.set_label(str(conversion_factor) + ' ' + xr_dataarray.units, fontsize=16)
    cbar.ax.tick_params(labelsize=14)
    ax.set_title(xr_dataarray.long_name, fontsize=20, pad=20.0)
    plt.show()

### <a id='visualize_imshow'></a>`visualize_imshow`

In [6]:
def visualize_imshow(data_array, projection, conversion_factor, color_scale, vmin, vmax, lonmin, lonmax, latmin, latmax, unit, set_global=False, log_scale=False):
    """ 
    Visualizes a numpy MaskedArray with matplotlib's 'imshow' function.
    
    Parameters:
        data_array (numpy MaskedArray): any numpy MaskedArray, e.g. loaded with the NetCDF library and the Dataset function
        projection (str): a projection provided by the cartopy library, e.g. ccrs.PlateCarree()
        conversion_factor (float): any number to convert the DataArray values
        color_scale(str): string taken from matplotlib's color ramp reference  
        vmin (int): minimum number on visualisation legend
        vmax (int): maximum number on visualisation legend
        lonmin, lonmax, latmin, latmax (float): geographic boundary values 
        unit (str): define unit of the plot to be added to the colorbar
        set_global (logical): set True, if the plot shall have a global coverage
        log_scale (logical): set True, if the color_scale shall have a logarithmic scaling
    """
    fig=plt.figure(figsize=(20, 12))

    ax=plt.axes(projection=projection)
    ax.coastlines()

    if(set_global):
        ax.set_global()
        ax.gridlines()
    
    if (projection==ccrs.PlateCarree()):
        ax.set_extent([lonmin, lonmax, latmin, latmax], projection)
        gl = ax.gridlines(draw_labels=True, linestyle='--')
        gl.xlabels_top=False
        gl.ylabels_right=False
        gl.xformatter=LONGITUDE_FORMATTER
        gl.yformatter=LATITUDE_FORMATTER
        gl.xlabel_style={'size':14}
        gl.ylabel_style={'size':14}

    if(log_scale):
        img1 = plt.imshow(data_array[:]*conversion_factor,
                          cmap=color_scale,
                          aspect='auto',
                          norm=matplotlib.colors.LogNorm(vmin=vmin, vmax=vmax))
    else:
        img1 = plt.imshow(data_array[:]*conversion_factor,
                          cmap=color_scale,
                          vmin=vmin,
                          vmax=vmax,
                          aspect='auto')

    cbar = fig.colorbar(img1, ax=ax, orientation='horizontal', fraction=0.04, pad=0.1)
    cbar.set_label(str(conversion_factor) + ' ' + unit, fontsize=16)
    cbar.ax.tick_params(labelsize=14)
    
    plt.show()

### <a id='visualize_s5p_pcolormesh'></a>`visualize_s5p_pcolormesh`

In [2]:
def visualize_s5p_pcolormesh(data_array, longitude, latitude, projection, color_scale, unit, long_name, vmin, vmax, lonmin, lonmax, latmin, latmax, log=True, set_global=True):
    """ 
    Visualizes a numpy array with matplotlib's 'pcolormesh' function.
    
    Parameters:
        data_array: any numpy MaskedArray, e.g. loaded with the NetCDF library and the Dataset function
        longitude: numpy Array holding longitude information
        latitude: numpy Array holding latitude information
        projection: a projection provided by the cartopy library, e.g. ccrs.PlateCarree()
        color_scale (str): string taken from matplotlib's color ramp reference
        unit (str): the unit of the parameter, taken from the NetCDF file if possible
        long_name (str): long name of the parameter, taken from the NetCDF file if possible
        vmin (int): minimum number on visualisation legend
        vmax (int): maximum number on visualisation legend
        lonmin,lonmax,latmin,latmax: geographic extent of the plot
        log (logical): set True, if the values shall be represented in a logarithmic scale
        set_global (logical): set True, if the plot shall have a global coverage
    """
    fig=plt.figure(figsize=(20, 10))

    ax = plt.axes(projection=projection)

    # define the coordinate system that the grid lons and grid lats are on
    if(log):
        img = plt.pcolormesh(longitude, latitude, np.squeeze(data_array), norm=LogNorm(), 
                             cmap=plt.get_cmap(color_scale), transform=ccrs.PlateCarree(),
                            vmin=vmin,
                            vmax=vmax)
    else:
        img = plt.pcolormesh(longitude, latitude, data_array, 
                        cmap=plt.get_cmap(color_scale), transform=ccrs.PlateCarree(),
                        vmin=vmin,
                        vmax=vmax)

    ax.coastlines()

    if (projection==ccrs.PlateCarree()):
        ax.set_extent([lonmin, lonmax, latmin, latmax], projection)
        gl = ax.gridlines(draw_labels=True, linestyle='--')
        gl.xlabels_top=False
        gl.ylabels_right=False
        gl.xformatter=LONGITUDE_FORMATTER
        gl.yformatter=LATITUDE_FORMATTER
        gl.xlabel_style={'size':14}
        gl.ylabel_style={'size':14}

    if(set_global):
        ax.set_global()
        ax.gridlines()

    cbar = fig.colorbar(img, ax=ax, orientation='horizontal', fraction=0.04, pad=0.1)
    cbar.set_label(unit, fontsize=16)
    cbar.ax.tick_params(labelsize=14)
    ax.set_title(long_name, fontsize=20, pad=20.0)

 #   plt.show()
    return fig, ax

### <a id='visualize_s3_pcolormesh'></a>`visualize_s3_pcolormesh`

In [1]:
def visualize_s3_pcolormesh(color_array, array, latitude, longitude, title):
    """ 
    Visualizes a numpy array (Sentinel-3 data) with matplotlib's 'pcolormesh' function as RGB image.
    
    Parameters:
        color_array (numpy MaskedArray): any numpy MaskedArray, e.g. loaded with the NetCDF library and the Dataset function
        longitude (numpy Array): array with longitude values
        latitude (numpy Array) : array with latitude values
        title (str): title of the resulting plot
    """
    fig=plt.figure(figsize=(20, 12))

    ax=plt.axes(projection=ccrs.Mercator())
    ax.coastlines()

    gl = ax.gridlines(draw_labels=True, linestyle='--')
    gl.xlabels_top=False
    gl.ylabels_right=False
    gl.xformatter=LONGITUDE_FORMATTER
    gl.yformatter=LATITUDE_FORMATTER
    gl.xlabel_style={'size':14}
    gl.ylabel_style={'size':14}

    img1 = plt.pcolormesh(longitude, latitude, array*np.nan, color=color_array,
                          clip_on = True,
                          edgecolors=None,
                          zorder=0,
                          transform=ccrs.PlateCarree())
    ax.set_title(title, fontsize=20, pad=20.0)
    plt.show()

<hr>

<p style="text-align:left;">This project is licensed under the <a href="./LICENSE">MIT License</a> <span style="float:right;"><a href="https://gitlab.eumetsat.int/eumetlab/atmosphere/atmosphere">View on GitLab</a> | <a href="https://training.eumetsat.int/">EUMETSAT Training</a> | <a href=mailto:training@eumetsat.int>Contact</a></span></p>