# Data quality filtering

In this tutorial you will learn how to apply the quality filtering to the TROPOMI NO$_2$ data product. 

### The qa_value

All TROPOMI retrieval files (other products are CO, CH$_4$, HCHO, O$_3$, Aerosol Index etc..) contain a quantity called "*qa_value*". This quantity has to be used as a filter to remove observations of low quality and observations which are not of interest for a certain application. The *qa_value* has a value between 0 and 1:
- *qa_value* = 0: There was an error during the retrieval and this value can never be used in a meaningful way.
- *qa_value* = 1: There are no warnings or errors, and this pixel can always be used.
- 0 < *qa_value* < 1: One or more warnings or cautions popped up in the retrieval

The retrieval groups also provide *qa_value* thresholds (normally 0.5) with the recommendation to use only retrievals with values bigger than this number. For the NO$_2$ product we have two thresholds:

- *qa_value* > 0.75. 
    This is the recommended pixel filter. It removes cloud-covered scenes (cloud radiance fraction > 0.5), partially snow/ice covered scenes, errors, and problematic retrievals.
- *qa_value* > 0.50.
    Compared to the stricter filter, this adds the good quality retrievals over clouds and over scenes covered by snow/ice. Errors and problematic retrievals are still filtered out. In particular, this filter may be useful for assimilation and model comparison studies.


The *qa_value* should not be mistaken with the estimated error of the product. This is a separate field. For example, the main product in the NO$_2$ file is **nitrogendioxide_tropospheric_column** 
with an estimated error
**nitrogendioxide_tropospheric_column_precision**.

In the retrieval this uncertainty is estimated, taking into account several aspects like the instrument noise of TROPOMI, uncertainties in the characterisation of the clouds, aerosols, surface reflectivity (albedo) and errors in the estimation of the stratospheric NO$_2$ column. 
___

We start the tutorial by loading the required Python packages. In this case we only need "numpy" (for mathematical calculations with arrays) and "netCDF4" which is the package to read the TROPOMI data. We will use "os" for making a backup copy of the file.

In [1]:
import numpy as np

from netCDF4 import Dataset

import os


The approach we will follow here is adding a few fields to the TROPOMI datafile. It is good practice to make a copy of the original file first: 

In [2]:
s5p_filename_original = "data/S5P_PAL__L2__NO2____20180625T112113_20180625T130243_03619_01_020301_20211108T154829.nc"

s5p_filename_qa_filtered = "data/S5P_PAL__L2__NO2____20180625T112113_20180625T130243_03619_01_020301_20211108T154829_qafiltered.nc"

# Now make a copy of the original file
os.system("cp "+s5p_filename_original+" "+s5p_filename_qa_filtered)

print("Created a copy named: ",s5p_filename_qa_filtered)


Created a copy named:  data/S5P_PAL__L2__NO2____20180625T112113_20180625T130243_03619_01_020301_20211108T154829_qafiltered.nc


"cp is a unix/linux command, and you may have to replace this with a windows command.

If this does not work you can also manually copy the netcdf file in the folder to a new netcdf with the name 
"S5P_PAL__L2__NO2____20180625T112113_20180625T130243_03619_01_020301_20211108T154829_qafiltered.nc"

### Reading from, writing to the S5P_NO2 file with Python

Now we will open the new file with the intention to add fields (option "a")

In [3]:
# Use "Dataset" from the netCDF4 package to read the file
#    with option 'a' to allow adding new fields

ncf = Dataset(s5p_filename_qa_filtered,'a')


PermissionError: ignored

Let us check what is in this file:

In [None]:
ncf


You should have obtained a list with attributes like "institution", "source", "time_coverage_start" etc. If not: the file was not correctly imported.

Now, these were only the main "global" attributes of the file. 
The data product, is stored in the group "PRODUCT". So let us have a look at this. 

In [None]:
ncf["PRODUCT"]


This includes the two fields we are after:
- nitrogendioxide_tropospheric_column
- qa_value


In [None]:
ncf["PRODUCT/qa_value"]

Note that this is stored as an integer to save space. The "scale_factor" provides the instruction how to arrive at a real value between 0.0 and 1.0. <br>
Note that this is done automatically by the netCDF4 library, and we do not have to worry about this. 

The main data product is the "nitrogendioxide_tropospheric_column":


In [None]:
ncf["PRODUCT/nitrogendioxide_tropospheric_column"]

This is stored as a float (real number). <br>
Note the unit: it is a column density of NO$_2$: the total number of molecules above a square meter of surface, expressed in moles. A typical range is between 1e-6 and 1e-3. <br> 
For previous satellite products a common unit is molecules per square centimeter (a typical range is between 1e14 and 1e17), and this can be easily obtained by multiplying with the factor provided as attribute.

The array is three-dimensional:
- The first index is a time index, and would allow multiple orbits to be stored in one file. In our case this dimension is 1, there is only one orbit.
- The second dimension is the number of measurements along the orbit of the satellite
- The third dimension is the number of across-track observations measured at the same time. This is always 450 for TROPOMI. On average a pixel is about 6 km wide.

In [None]:
print("TROPOMI swath_width is roughly ", 6 * 450, " km")


With such a wide swath, and with 14-15 orbits per day, TROPOMI can observe the entire Earth in 1 day!

----

### Apply the qa_value filtering to the NO2 obsevations

Now, let us apply the qa_value to filter out pixels we do not want to use, and at the same time introduce two new fields which we will write to the file:
- no2trop_qavalue075_filtered
- no2trop_qavalue050_filtered

Before executing the code, first read it and see if you understand it.

In [None]:
# copy the two relevant fields to arrays in python
no2 = ncf["/PRODUCT/nitrogendioxide_tropospheric_column"][0,:,:]
qavalue = ncf["/PRODUCT/qa_value"][0,:,:]
# crf = ncf["/PRODUCT/SUPPORT_DATA/DETAILED_RESULTS/cloud_radiance_fraction_nitrogendioxide_window"][0,:,:]
# amf = ncf["/PRODUCT/SUPPORT_DATA/DETAILED_RESULTS/formaldehyde_tropospheric_air_mass_factor"][0,:,:]

print ( "qavalue[1600,225] = ", qavalue[1600, 225] )
print ( "no2[1600,225] = ", no2[1600, 225] )

# We need also the dimensions to define the new arrays in order to add them to the NO2 datafile
ncproduct = ncf['/PRODUCT']
time = ncproduct.dimensions['time']
scanline = ncproduct.dimensions['scanline']
ground_pixel = ncproduct.dimensions['ground_pixel']

# When the qa_value is too low, the NO2 tropospheric column will be set to "FillValue"
FillValue = 9.96921E36

# We define two extra NO2 fields, filtered with qa_value > 0.75 (a) and qa_value > 0.50 (b)

print ( "creating variable /PRODUCT/nitrogendioxide_tropospheric_column_qavalue075_filtered" )
var_no2a = ncf.createVariable( '/PRODUCT/nitrogendioxide_tropospheric_column_qavalue075_filtered', np.float32,('time','scanline','ground_pixel',), fill_value=FillValue )
var_no2a.long_name = "Tropospheric column of NO2, filtered for qa_value >= 0.75"
var_no2a.coordinates = "longitude latitude"
var_no2a.units = "mol m-2"
# var_no2._FillValue = "9.96921E36"
var_no2a[0, :, :] = np.where(qavalue < 0.75, FillValue, no2)

print ( "creating variable /PRODUCT/nitrogendioxide_tropospheric_column_qavalue050_filtered" )
var_no2b = ncf.createVariable( '/PRODUCT/nitrogendioxide_tropospheric_column_qavalue050_filtered', 'f4',('time','scanline','ground_pixel',), fill_value=FillValue )
var_no2b.long_name = "Tropospheric column of NO2, filtered for qa_value >= 0.5"
var_no2b.coordinates = "longitude latitude"
var_no2b.units = "mol m-2"
# var_no2._FillValue = "9.96921E36"
var_no2b[0, :, :] = np.where(qavalue < 0.5, FillValue, no2)

print ( "qa_value filtered no2 values written/added to the file" )


Let's check if these fields have been added to the PRODUCT group

In [None]:
ncf["PRODUCT"]

In [None]:
# Let's now finish the writing and close the file

ncf.close()

Please do not forget this last command, otherwise the file may be corrupted.