# Exercise 08: Hydrometeor retrieval from satellite CPR observations

Manfred Brath

<img src="earthcare.png" alt="EarthCARE" width="25%">

Image credit: JAXA/NICT/ESA.

In [None]:
%matplotlib widget

import os
import numpy as np
import matplotlib.pyplot as plt
import xarray as xr
from pyarts import xml
from radar_module import RadarSimulator, radar_reflectivity_to_apriori

os.makedirs("plots", exist_ok=True)


In this exercise we want to retrieve hydrometeor water content profiles from (simulated) 
satellite cloud profiling radar observations using the optimal estimation method.

The cloud profiling radar (CPR) is a nadir-looking radar similiar to the radar on board the CloudSat satellite.
This notebook performs hydrometeor retrievals from satellite cloud profiling radar (CPR) observations using the Optimal Estimation Method (OEM).
The CPR uses a frequency of 90 GHz has a vertical resolution of 250 m between 2 km and 20 km altitude resulting in 72 range bins.
The measurement data consists of a short cloudy segment ( 200km) over the tropical pacific with roughly 900 profiles resulting in profiles roughly every 200 m. The observation data consist of radar reflectivities in dBZe at the CPR range bins.
For the background atmosphere, we use a model profile for that time and location.


## 1. Load Observations and Background Data

Load the simulated CPR observations and the background atmospheric state and visualize them.

1. Plot the radar reflectivity factor in dBZe along the satellite track as function of latitude and range bins. 
2. Plot the temperature profile of the background atmosphere.

In [None]:
# Load observations
observation = xr.load_dataset("observation/simulated_CPR_observation.nc")

# Load background atmosphere
background_atm = xml.load("observation/background_state.xml")



In [None]:
# Plot CPR observations and background atmosphere

## 2. Prepare A Priori Estimates

Prepare a priori hydrometeor profiles for the full flight track. 
Use the function *radar_reflectivity_to_apriori* to create a priori hydrometeor profiles for the full flight track.
Since the CPR only measures radar reflectivity (Z), you need to decide how to split the hydrometeor content into frozen and liquid hydrometeors.
There is the possibility to use only one or both types of hydrometeors. If you choose to use only one type of hydrometeor, you can still adjust where to place it (e.g. only liquid hydrometeors below a certain altitude).
If you choose to use both types of hydrometeors, you need to split the hydrometeor content into frozen and liquid hydrometeors.
The function *radar_reflectivity_to_apriori*  separates the hydrometeor content by altitude. Between a maximum and minimum separation altitude, the hydrometeor content is split between liquid and frozen hydrometeors and
a linear transition is applied from liquid to frozen hydrometeors. Below the minimum separation altitude, only liquid hydrometeors are used and above the maximum separation altitude, only frozen hydrometeors are used. 
Decide on suitable values for the minimum and maximum separation altitudes.

As the retrieval will be performed on the background atmosphere grid, the function *radar_reflectivity_to_apriori* also performs the interpolation of the a priori profiles onto the background atmosphere grid.

```python
radar_reflectivity_to_apriori(
    observation["Z_raw"].values, # radar reflectivity factor in dBZe
    observation["range_bins"].values, # range bins of the radar
    background_atm.get("z", keep_dims=False), # altitude grid of the background atmosphere
    sep_min_altitude, # minimum separation altitude in meters
    sep_max_altitude, # maximum separation altitude in meters
    hydrometeors=["liquid", "frozen"], # types of hydrometeors to use
    min_val=1e-12, # minimum hydrometeor mass density in kg/m^3 to avoid log(0) issues
    
):
``` 

One important note: The retrieval later will retrieve the logarithm of the hydrometeor mass densities. Therefore the a priori profiles must not contain zero values. Use the *min_val* argument to set a minimum hydrometeor mass density (e.g. 1e-12 kg/m³) to avoid log(0) issues.

In [None]:
# RWC_a_priori = ... # Define a priori profile 

## 3. Visualize A Priori Hydrometeor Profiles

Plot the a priori FWC and RWC profiles along the satellite track as function of latitude and altitude.

## 4. Set Up Retrieval Framework

In [None]:
# Initialize retrieval object
retrieval = RadarSimulator()
retrieval.set_frequency_grid(observation["frequencies"].values)  # CPR frequency
retrieval.prepare_sensor(unit="dBZe")

Set up your retreival quantities:

- Possible retrieval quantities: 
    - Frozen Water Content (FWC) and  
    - Rain Water Content (RWC)

- Possible PSD models and scatterers:

    | Parameter | Value/Description | Additional Information |
    |-----------|-------------------|--------------------------|
    | **Retrieval Quantitiy** | FWC | *mass_density* |
    | **Retrieval Quantitiy** | RWC | *mass_density* |
    | | |
    | **FWC PSD Model** | McFarquaharHeymsfield97 |
    | **FWC PSD Model** | FieldEtAl07ML | mid-latitude ice clouds |
    | **FWC PSD Model** | FieldEtAl07TR | tropical ice clouds |
    | **RWC PSD Model** | AbelBoutle12 |
    | | |
    | **FWC Scatterer** | GemSnow_Id32.scat_data |
    | **FWC Scatterer** | H2O_ice_full_spectrum | spheres |
    | **FWC Scatterer** | EvansSnowAggregate_Id1.scat_data |
    | **FWC Scatterer** | LargeColumnAggregate_Id18.scat_data |
    | **FWC Scatterer** | LargePlateAggregate_Id20.scat_data |
    | **RWC Scatterer** | H2O_liquid_full_spectrum | spheres |

Use the following code snippet as a starting point to set up the retrieval quantities:
```python
# Set up hydrometeor properties
retrieval.set_retrieval_quantity("FWC","mass_density", "McFarquaharHeymsfield97", 'H2O_ice_full_spectrum')

# if you want to retrieve both hydrometeors just repeat the above line for RWC
```

*...and don't worry there is no perfect choice here. None of the choices is fully correct, but some are maybe better than others.*

***IMPORTANT***: 
1. The retrieval will be performed on the background atmosphere grid. 
2. The retrieval quantities are defined in $\log_{10}$ space. This means that the retrieval will retrieve $\log_{10}$(FWC) and/or $\log_{10}$(RWC).

In [None]:
# retrieval.set_retrieval_quantity(... )  # Define retrieval quantities here

## 5. Prepare Covariance Matrices

Prepare the measurement error covariance matrix (S_y) and the a priori error covariance matrix (S_a).
- For the measurement error covariance matrix (S_y), assume a standard deviation of 1.0 dBZe for the radar reflectivity measurements and assume uncorrelated errors between the different range bins (diagonal matrix).
- For the a priori error covariance matrix (S_a), assume a standard deviation of 2.0 in $\log_{10}$ space for both hydrometeor mass densities and no vertical correlations (diagonal matrix). 
    - What does this standard deviation mean in terms of mass density?

In [None]:

# Set up hydrometeor properties
    
# Define error covariance matrices
# S_y: measurement error covariance 
# S_y = np.diag(...)

# S_a: List of a priori error covariance matrices with each entry corresponding to one retrieved variable.
# Each matrix has shape (n_z, n_z) where n_z is the number of vertical levels (background atmosphere). 
# S_a = [np.diag(...),]



## 6. Perform Retrievals Along the Track

Loop through all observation points and perform hydrometeor retrievals.
The background atmosphere also serves as the initial guess (a priori) for the retrieval. Therefore, make sure to set the hydrometeor profiles in the background atmosphere to the a priori profiles created in step 2 before starting the retrieval.

To set the hydrometeor profiles in the background atmosphere, you can use the following code snippet:

```python
# Set a priori hydrometeor profiles in the background atmosphere
background_atm.set("scat_species-FWC-mass_density", FWC_a_priori[:, profile_index].reshape((1, len(FWC_a_priori), 1, 1))
```
***IMPORTANT***: Though the retrieval will be performed in $\log_{10}$ space, the hydrometeor profiles in the atmosphere object need to be set in linear space (mass density in kg/m³).


For each observation point along the satellite track, extract the observation vector (y_cpr) and perform the retrieval using the  function (object method) 

```python
hyd_ret, DeltaHyd, y_fit, full_result = retrieval.hydrometeor_retrieval(
        y_cpr, # observation vector
        S_y, # measurement error covariance matrix
        S_a, # a priori error covariance matrix
        background_atm, # background and a priori atmosphere object
        max_iter=50, # maximum number of iterations
        N_range_bins_edges=N_range_bins_edges, # number of range bin edges, see observation data
        min_range_bin_altitude=min_range_bin_altitude, # minimum range bin altitude, see observation data
        max_range_bin_altitude=max_range_bin_altitude, # maximum range bin altitude, see observation data
        Verbosity=True,
        lm_ga_settings=[1e2, 2, 3, 1e2, 1, 99], # Levenberg-Marquardt settings
        stop_dx=0.01 # stop criterion based on relative change in state vector
    )
```
to perform the retrieval for each observation point along the satellite track. For the levermore-Marquardt settings, you can use the provided default values, see also https://atmtools.github.io/arts-docs-2.6/stubs/pyarts.workspace.Workspace.OEM.html

* Hyd_ret : dict
    * Dictionary containing retrieved hydrometeor values ($\log_{10}$-space) for each quantity.
    * Keys are the retrieval quantity names, values are 1D arrays of retrieved values.
* DeltaHyd : dict
    * Dictionary containing total retrieval uncertainties ($\log_{10}$-space) for each quantity.
    * Keys are the retrieval quantity names, values are 1D arrays of uncertainties
    computed from observation and smoothing errors.
* y_fit : array_like
    * Fitted observation vector (simulated measurements based on retrieved state).
* result : dict
    * Complete result dictionary from the retrieval containing all diagnostic
    * information including state vector, averaging kernel, etc.

Store the retrieval results for each observation point in suitable way for later visualization.

In [None]:
#loop over all profiles along the satellite track and perform retrievals...

## 7. Residuals

Compare the observed reflectivity with the fitted values and examine residuals. Plot the residuals (observation - fit) along the satellite track as function of latitude and range bins.

In [None]:
# Plot the residual between observations and the radar reflectivities from the retrieved profiles (y_fit - y_cpr) 

## 8. Visualize Retrieved Hydrometeor Profiles and Uncertainties

Plot the retrieved hydrometeor profile(s) along with their uncertainties and the difference to the a priori profiles as function of latitude and altitude.

In [None]:
# Plot the retrieved hydrometeor profiles along with their uncertainties and the difference to the a priori profiles

## 11. Compare with True Atmospheric State

Load the true atmospheric state from the provided XML file and calculate the total liquid and frozen water content along the satellite track.

In [None]:
# true_atmosphere = xml.load('atmosphere/atmospheres_true.xml')

# # Calculate total ice and total liquid from true atmosphere
# TLWC = np.zeros_like(RWC_apr)
# TFWC = np.zeros_like(IWC_apr)

# for i in range(len(true_atmosphere)):
    
#     TLWC[:, i] = (true_atmosphere[i].get('scat_species-LWC-mass_density', keep_dims=False) 
#                   + true_atmosphere[i].get('scat_species-RWC-mass_density', keep_dims=False))
                
#     TFWC[:, i] = (true_atmosphere[i].get('scat_species-IWC-mass_density', keep_dims=False) 
#                   + true_atmosphere[i].get('scat_species-SWC-mass_density', keep_dims=False)
#                   + true_atmosphere[i].get('scat_species-GWC-mass_density', keep_dims=False)
#                   + true_atmosphere[i].get('scat_species-HWC-mass_density', keep_dims=False))

# # lower limit for log space
# threshold = 1e-18

# TLWC[TLWC < threshold] = threshold
# TFWC[TFWC < threshold] = threshold




Plot the true and the difference between retrieved and true hydrometeor (retrieved - true) profiles along the satellite track as function of latitude and altitude. Where does the retrieval perform well and where not?

In [None]:
# Plot the true and the difference between retrieved and true hydrometeor (retrieved - true)