This notebook aims to process the data measured by the LiDAR CORAL. We measured waer vapour mixing ratio (MR). Unfortunately, the data is very heavy to store, so we first concatenate it in a simpler way before storing it in the folder data/processed/data.

In [1]:
import xarray as xr
import numpy as np
import intake
import pandas as pd
import matplotlib.pyplot as plt
%load_ext memory_profiler
import matplotlib.gridspec as gridspec
from scipy.stats import ttest_ind
import seaborn as sns
import matplotlib as mpl
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
import eurec4a


## Get the wvmr data and the clouds data processed in the code process_LW_BCO_data

In [2]:
MR = xr.open_dataset('./../data/MR.nc')['water vapour mixing ratio']
windspeed = xr.open_dataset('./../data/windspeed_eurec4a.nc')['wind_speed']
windspeed_avg = windspeed.sel(range = 1000, method = 'nearest').mean(dim = 'time')
start_time = np.datetime64('2020-01-12T10:00:00')
end_time   = np.datetime64('2020-02-28T23:59:59')

In [3]:
#We load the cloud borders array that has been processed in the code "process_LW_BCO_data.ipynb"
clouds = np.loadtxt('./../data/processed_data/cloud_borders_2020.txt', delimiter=',', dtype='object') 
clouds[:, 0] = np.array(clouds[:, 0], dtype='datetime64[s]')
clouds[:, 1] = np.array(clouds[:, 1], dtype='datetime64[s]')
clouds[:, 2] = np.array(clouds[:, 2], dtype='float64')
clouds[:, 3] = np.array(clouds[:, 3], dtype='float64')
clouds[:, 4] = np.array(clouds[:, 4], dtype='float64')
clouds[:, 5] = np.array(clouds[:, 5], dtype='float64')

#Since MR data is only available during Januray February (EUREC4A period), we restric our clouds during that time
clouds_time = clouds[(clouds[:, 0] >= start_time) & (clouds[:, 0] <= end_time)]


In [51]:
"""
As in the LW_down code we want to avoid cross-contamination. Therefore, we put NaN each time a cloud is measure in the MR data.
This avoids to have very high MR in our vertical profiles that come from clouds. This will only be sued when we look at vertical profiles as a function of 
distance to the cloud (as clouds might be in these distance bins, so we don't wanna include their values). When looking at different gap length, we study regions
without gap, so this does not matter.

As a reminder, we focus on the surroundings of clouds, and we are therefore not interested in clouds values.
"""
def mask_clouds(MR, clouds):
    for cloud in clouds:
        start_time = np.datetime64(cloud[0])
        end_time = np.datetime64(cloud[1])
        

        # Mask the values in MR for the given time range
        MR = MR.where(~((MR.time >= start_time) & (MR.time <= end_time)), np.nan)
        
    return MR

In [39]:
masked_MR = mask_clouds(MR, clouds_time)

In [40]:
masked_MR.to_netcdf('./../data/processed_data/masked_MR.nc')

In [6]:
masked_MR = xr.open_dataset('./../data/processed_data/masked_MR.nc')['water vapour mixing ratio']

## Process data as a function of gap length

In this part of the code we process the data as a function of gap length. This is a preparation for plot ???
We separate the data into MR profiles with information on the gap length for each cloud gap. Then we store the data in the folder data/processed_data/vertical_profiles

In [8]:
def mr_profile_gaps(MR, clouds):
    """
    This function takes the vertical profiles of MR in between clouds, i.e., in the gaps. Furthermore, it stores the vertical profiles as averaged over the entire 
    gaps by including the gap length for each profile. This will allow later to plot the MR profiles as a function of gap length

    Inputs:
    - MR data from the LiDAR
    - clouds array with their start and endtime

    Outputs:
    - combined_gaps: these are the combined averaged vertical profiles between clouds (in the gaps). Each vertical profile has information about its gap length 
    associated to it.
    """
    gaps = []
    gap_lengths = []

    MR = MR.chunk({"time": 1000, "alt": 201})
    
    # Ensure clouds are sorted by start time
    clouds = clouds[np.argsort(clouds[:, 0])]

    for i in range(len(clouds) - 1):
        # Get the end time of the current cloud and the start time of the next cloud
        end_time_current, start_time_next = clouds[i, 1], clouds[i + 1, 0]
        
        # Select the MR values between the end of the current cloud and the start of the next cloud
        gap = MR.sel(time=slice(end_time_current, start_time_next)).compute()
        
        # Determine the length of the gap in number of time points
        gap_length = len(gap.time)
        
        # Filter and average the gap values
        gap = gap.where((gap <= 0.03) & (gap >= 0.0001), drop=True)
        gap = gap.mean(dim='time')
        
        # Add the gap MR values and their length to the respective lists
        gaps.append(gap)
        gap_lengths.append(gap_length)

    # Combine all gaps into a single DataArray with a new 'gaps' dimension
    combined_gaps = xr.concat(gaps, dim='gaps')
    
    # Add the gap lengths as a coordinate to the combined DataArray
    combined_gaps = combined_gaps.assign_coords(gap_length=('gaps', gap_lengths))

    return combined_gaps

In [10]:
%%time
combined_gaps = mr_profile_gaps(MR, clouds_time)

CPU times: user 1min 8s, sys: 5.9 s, total: 1min 13s
Wall time: 1min 14s


In [11]:
combined_gaps.to_netcdf('./../data/processed_data/combined_gaps.nc')

## processing of data up/downwind and for shallow/deep clouds as a function of distance

Similar as before, but this time we process the data into different bins of distance to clouds. Therefore, we average vertical profiles of MR at different distances from all the observed clouds. we separate the analysis into 2: deep and shallow clouds, and separate the atmosphere in two: up and downwind

In [13]:
"""
this function creates two vertical profiles, up and downwind, and bins it into a certain amount of bins defined by distance_bins (each bin correspnds to a certain distance from the cloud).
Changing this parameter, changes the amount of bins that we have at the end. 
"""
def mr_profile_distances_with_bins_up_down(MR, clouds, distance_bins, windspeed_avg):
    profiles_before = []
    profiles_after = []

    MR = MR.chunk({"time": 1000, "alt": 201})
    delta_time = np.timedelta64(60, 'm')

    # Convert distance bins to time bins based on windspeed_avg
    time_bins = [(np.timedelta64(int(lower / windspeed_avg), 's'), np.timedelta64(int(upper / windspeed_avg), 's')) 
                 for lower, upper in distance_bins]
    

    #create a profile for each cloud by taking the wvmr before and after the cloud
    for cloud_index, cloud_times in enumerate(clouds):
        start_time_cloud = np.datetime64(cloud_times[0])
        end_time_cloud = np.datetime64(cloud_times[1])
        
        end_time = end_time_cloud + delta_time
        start_time = start_time_cloud - delta_time
        
        # Select the MR values between the specified indices after the cloud
        MR_after_cloud = MR.sel(time=slice(end_time_cloud, end_time)).compute()
        MR_after_cloud = MR_after_cloud.where((MR_after_cloud <= 0.03) & (MR_after_cloud >= 0.0001), drop=True)

        # Select the MR values between the specified indices before the cloud
        MR_before_cloud = MR.sel(time=slice(start_time, start_time_cloud)).compute()
        MR_before_cloud = MR_before_cloud.where((MR_before_cloud <= 0.03) & (MR_before_cloud >= 0.0001), drop=True)

        bin_profiles_after = []
        bin_profiles_before = []

        # Process before cloud, taking wvmr in the distances (/time) corresponding to each bin
        for bin_index, (start_bin, end_bin) in enumerate(time_bins):
            bin_start_time = start_time_cloud - start_bin
            bin_end_time = start_time_cloud - end_bin
            MR_in_bin = MR_before_cloud.sel(time=slice(bin_end_time, bin_start_time)).mean(dim='time')
            MR_in_bin = MR_in_bin.assign_coords(profile=f'bin_{bin_index}')
            bin_profiles_before.append(MR_in_bin)
            
        # Process after cloud, same as before but upwind
        for bin_index, (start_bin, end_bin) in enumerate(time_bins):
            bin_start_time = end_time_cloud + start_bin
            bin_end_time = end_time_cloud + end_bin
            MR_in_bin = MR_after_cloud.sel(time=slice(bin_start_time, bin_end_time)).mean(dim='time')
            MR_in_bin = MR_in_bin.assign_coords(profile=f'bin_{bin_index}')
            bin_profiles_after.append(MR_in_bin)
        
        
        # Concatenate profiles along a new dimension for this cloud
        cloud_profile_before = xr.concat(bin_profiles_before, dim='bin')
        cloud_profile_after = xr.concat(bin_profiles_after, dim='bin')
        
        profiles_before.append(cloud_profile_before)
        profiles_after.append(cloud_profile_after)

    # Concatenate all cloud profiles along a new dimension
    combined_profiles_before = xr.concat(profiles_before, dim='cloud')
    combined_profiles_after = xr.concat(profiles_after, dim='cloud')

    return combined_profiles_before, combined_profiles_after


In [17]:
clouds_deep = clouds_time[(clouds_time[:,2] >= 40) & (clouds_time[:,3] <= 1500) & (clouds_time[:,2] <= 10000) & ((clouds_time[:,4] - clouds_time[:,3]) >= 600) & ((clouds_time[:,4] - clouds_time[:,3]) <= 2500)]
clouds_shallow = clouds_time[(clouds_time[:,2] >= 40) & (clouds_time[:,3] <= 1500) & (clouds_time[:,2] <= 10000) & ((clouds_time[:,4] - clouds_time[:,3]) >= 100) & ((clouds_time[:,4] - clouds_time[:,3]) <= 700)]

In [18]:
%%time
distance_bins = [(500, 2000), (2000, 4000), (6000, 10000), (10000, 20000)]

combined_profiles_before_shallow, combined_profiles_after_shallow = mr_profile_distances_with_bins_up_down(masked_MR, clouds_shallow, distance_bins, windspeed_avg.item())
combined_profiles_before_deep, combined_profiles_after_deep = mr_profile_distances_with_bins_up_down(masked_MR, clouds_deep, distance_bins, windspeed_avg.item())


CPU times: user 2min 23s, sys: 9.58 s, total: 2min 33s
Wall time: 2min 33s


In [22]:
combined_profiles_before_shallow.to_netcdf('./../data/processed_data/vertical_profiles/combined_profiles_before_shallow.nc')
combined_profiles_after_shallow.to_netcdf('./../data/processed_data/vertical_profiles/combined_profiles_after_shallow.nc')
combined_profiles_before_deep.to_netcdf('./../data/processed_data/vertical_profiles/combined_profiles_before_deep.nc')
combined_profiles_after_deep.to_netcdf('./../data/processed_data/vertical_profiles/combined_profiles_after_deep.nc')