![lop](../../images/logo_diive1_128px.png)

<span style='background:#FFF8E1; font-size:40px; display:block; border-left:40px solid #FFECB3; padding-left:40px;'>
<b>
    Flux Processing Chain
</b>
</span>

---
**Notebook version**: `3` (18 Aug 2023)  
Example notebook for the flux processing chain using the time series processing library `diive`.  
**Author**: Lukas Hörtnagl (holukas@ethz.ch)  


</br>

---

# **Background**

- This notebook demonstrates part of the flux post-processing used for fluxes from Swiss FluxNet research stations
- For a description of the different flux levels, see [Flux Processing Chain](https://www.swissfluxnet.ethz.ch/index.php/data/ecosystem-fluxes/flux-processing-chain/)
- Flux calculations (Level-1) were done in a previous step
- This notebook uses the calculated fluxes (Level-1) and applies several post-processing steps:
    - Quality flag extension (Level-2)
    - Storage correction (Level-3.1)
    - Outlier removel (Level-3.2)
- Other flux levels are currently not produced in this example:
    - Gap-filling (Level-4.1)
    - NEE Partitioning (Level-4.2)

</br>

---

# **Setup**

## User settings
- `FLUXVAR` is the name of the flux variable in the data files. In the EddyPro `_fluxnet_` output files, the flux variables are:
  - `FC` ... CO2 flux, becomes `NEE` after storage correction (Level-3.1)
  - `FH2O` ... H2O flux
  - `LE` ... Latent heat flux (water)
  - `H` ... Sensible heat flux

In [2]:
FLUXVAR = "LE"  # Name of the flux variable
FOLDERS = [r"Z:\CH-FRU_Fruebuel\20_ec_fluxes\2023\Level-0\CH-FRU_FR-20230730-090003\2-0_eddypro_flux_calculations\results"]  # Folders where data files are located
SITE_LAT = 46.815333  # Latitude of site
SITE_LON = 9.855972  # Longitude of site

## Imports
- This notebook uses `diive` to check eddy covariance fluxes for quality

In [3]:
import os
import pkg_resources
from datetime import datetime
from pathlib import Path
from diive.core.io.filereader import MultiDataFileReader, search_files
from diive.core.io.files import save_parquet
from diive.pkgs.fluxprocessingchain.level2_qualityflags import FluxQualityFlagsLevel2EddyPro
from diive.pkgs.fluxprocessingchain.level31_storagecorrection import FluxStorageCorrectionSinglePointEddyPro
from diive.pkgs.outlierdetection.stepwiseoutlierdetection import StepwiseOutlierDetection
from diive.pkgs.qaqc.qcf import FlagQCF

dt_string = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
print(f"This page was last modified on: {dt_string}")
version_diive = pkg_resources.get_distribution("diive").version
print(f"diive version: v{version_diive}")

  from pandas import Int64Index as NumericIndex


This page was last modified on: 2023-08-18 16:28:01
diive version: v0.55.0


### Show docstring for `FluxQualityFlagsLevel2EddyPro`

In [4]:
print(FluxQualityFlagsLevel2EddyPro.__name__)
print(FluxQualityFlagsLevel2EddyPro.__doc__)

FluxQualityFlagsLevel2EddyPro

    Create QCF (quality-control flag) for selected flags, calculated
    from EddyPro's _fluxnet_ output files
    


### Show docstring for `FluxStorageCorrectionSinglePointEddyPro`

In [5]:
print(FluxStorageCorrectionSinglePointEddyPro.__name__)
print(FluxStorageCorrectionSinglePointEddyPro.__doc__)

FluxStorageCorrectionSinglePointEddyPro

    Estimation of storage fluxes (gases, sensible heat, latent heat) from concentrations
    (1-point profile) as calculated by EddyPro
    


### Show docstring for `StepwiseOutlierDetection`

In [6]:
print(StepwiseOutlierDetection.__name__)
print(StepwiseOutlierDetection.__doc__)

StepwiseOutlierDetection

    Step-wise outlier detection in time series data

    The class is optimized to work in Jupyter notebooks.

    Quality flags that can be directly created via this class:
    - `.flag_missingvals_test()`: Generate flag that indicates missing records in data
    - `.flag_outliers_abslim_test()`: Generate flag that indicates if values in data are outside the specified range
    - `.flag_outliers_abslim_dtnt_test()`: Generate flag that indicates if daytime and nighttime values in data are
        outside their respectively specified ranges
    - `.flag_outliers_increments_zcore_test()`: Identify outliers based on the z-score of increments
    - `.flag_outliers_localsd_test()`: Identify outliers based on the local standard deviation
    - `.flag_manualremoval_test()`: Remove data points for range, time or point-by-point
    - `.flag_outliers_stl_riqrz_test()`: Identify outliers based on seasonal-trend decomposition and z-score
        calculations, taking the i

## Load data

### Search files

In [7]:
filepaths = search_files(FOLDERS, "*.csv")
filepaths = [fp for fp in filepaths if "eddypro_" in fp.stem and "_fluxnet_" in fp.stem and fp.stem.endswith("_adv")]
filepaths

[WindowsPath('Z:/CH-DAS_Dav-Subcanopy/20_ec_fluxes/2023/Level-0/CH-DAS_FR-20230119-182509/2-0_eddypro_flux_calculations/results/eddypro_CH-DAS_FR-20230119-182509_fluxnet_2023-01-19T183058_adv.csv'),
 WindowsPath('Z:/CH-DAS_Dav-Subcanopy/20_ec_fluxes/2023/Level-0/CH-DAS_FR-20230123-165518/2-0_eddypro_flux_calculations/results/eddypro_CH-DAS_FR-20230123-165518_fluxnet_2023-01-23T165718_adv.csv'),
 WindowsPath('Z:/CH-DAS_Dav-Subcanopy/20_ec_fluxes/2023/Level-0/CH-DAS_FR-20230201-103101/2-0_eddypro_flux_calculations/results/eddypro_CH-DAS_FR-20230201-103101_fluxnet_2023-02-01T103639_adv.csv'),
 WindowsPath('Z:/CH-DAS_Dav-Subcanopy/20_ec_fluxes/2023/Level-0/CH-DAS_FR-20230208-191222/2-0_eddypro_flux_calculations/results/eddypro_CH-DAS_FR-20230208-191222_fluxnet_2023-02-08T191719_adv.csv'),
 WindowsPath('Z:/CH-DAS_Dav-Subcanopy/20_ec_fluxes/2023/Level-0/CH-DAS_FR-20230215-083624/2-0_eddypro_flux_calculations/results/eddypro_CH-DAS_FR-20230215-083624_fluxnet_2023-02-15T083957_adv.csv'),
 Wind

### Read Files

In [8]:
data = MultiDataFileReader(filetype='EDDYPRO_FLUXNET_30MIN', filepaths=filepaths)

Reading file eddypro_CH-DAS_FR-20230119-182509_fluxnet_2023-01-19T183058_adv.csv ...
Reading file eddypro_CH-DAS_FR-20230123-165518_fluxnet_2023-01-23T165718_adv.csv ...
Reading file eddypro_CH-DAS_FR-20230201-103101_fluxnet_2023-02-01T103639_adv.csv ...
Reading file eddypro_CH-DAS_FR-20230208-191222_fluxnet_2023-02-08T191719_adv.csv ...
Reading file eddypro_CH-DAS_FR-20230215-083624_fluxnet_2023-02-15T083957_adv.csv ...
Reading file eddypro_CH-DAS_FR-20230221-170839_fluxnet_2023-02-21T171152_adv.csv ...
Reading file eddypro_CH-DAS_FR-20230301-084304_fluxnet_2023-03-01T084630_adv.csv ...
Reading file eddypro_CH-DAS_FR-20230307-103124_fluxnet_2023-03-07T103436_adv.csv ...
Reading file eddypro_CH-DAS_FR-20230314-171811_fluxnet_2023-03-14T172105_adv.csv ...
Reading file eddypro_CH-DAS_FR-20230321-101312_fluxnet_2023-03-21T101610_adv.csv ...
Reading file eddypro_CH-DAS_FR-20230324-083003_fluxnet_2023-03-24T085658_adv.csv ...
Reading file eddypro_CH-DAS_FR-20230325-083003_fluxnet_2023-03-25

Exception: Frequency detection failed.

### Check data

In [None]:
df_orig = data.data_df.copy()
df_orig

In [None]:
df_orig.describe()

</br>

</br>

---

# **Level-2: Quality flag extension**

> Extract additional quality information from the EddyPro `_fluxnet_` output file and store it in newly added quality flags.



Note that the USTAR filtering is not part of the Level-2 calculations.

## Settings

In [None]:
SIGNAL_STRENGTH_COL = 'CUSTOM_AGC_MEAN'  # Name of the column storing the signal strength, typically 'CUSTOM_AGC_MEAN' for LI-7500, 'XXX?' for LI-7200
SIGNAL_STRENGTH_THRESHOLD = 90  # Signal strength threshold, flux values where threshold is exceeded are flagged as rejected
SIGNAL_STRENGTH_METHOD = 'discard above'  # 'discard above' flags fluxes where signal strength > threshold, 'discard below' where signal strength < threshold

</br>

## Inititate calculations

In [None]:
fluxqc = FluxQualityFlagsLevel2EddyPro(fluxcol=FLUXVAR, df=df_orig, levelid='L2')

</br>

## Flag: **Missing values** test
- Flag calculated here from missing flux values in the EddyPro output file

In [None]:
fluxqc.missing_vals_test()

</br>

## Flag: **SSITC** tests
- Flag calculated in EddyPro
- Combination of the two partial tests *steady state test* and *developed turbulent conditions test*
- This notebook expects the SSITC flag to follow the flagging policy according to Mauder and Foken 2004: `0` for best quality fluxes, `1` for fluxes suitable for general analysis such as annual
budgets (although this is debatable) and `2` for fluxes that should be discarded from the dataset

In [None]:
fluxqc.ssitc_test()

</br>

## Flag: **Completeness** test
- Flag calculated here from the gas number of records percentage in EddyPro output file
- Checks gas number of records available for each averaging Interval

In [None]:
fluxqc.gas_completeness_test()

</br>

## Flag: **Spectral correction factor** test
- Flag calculated here from the gas `scf` variable in EddyPro output file

In [None]:
fluxqc.spectral_correction_factor_test()

</br>

## Flag: **Signal strength / AGC** / window dirtiness test (if available)
- Flag calculated here from the signal strength / AGC variable in EddyPro output file

In [None]:
fluxqc.signal_strength_test(signal_strength_col=SIGNAL_STRENGTH_COL,
                            method=SIGNAL_STRENGTH_METHOD, 
                            threshold=SIGNAL_STRENGTH_THRESHOLD)

</br>

## Flags: **Raw data screening** tests
- Flags were calculated in EddyPro

In [None]:
fluxqc.raw_data_screening_vm97_tests(spikes=True,
                                     amplitude=True,
                                     dropout=True,
                                     abslim=False,
                                     skewkurt_hf=False,
                                     skewkurt_sf=False,
                                     discont_hf=False,
                                     discont_sf=False)

</br>

## Flag: **Angle-of-attack** test (optional)
- Flag was calculated in EddyPro

In [None]:
fluxqc.angle_of_attack_test()

</br>

## Add flags to dataset

In [None]:
# print(fluxqc.fluxflags)
df_level2 = fluxqc.get()

</br>

## Available `Level-2` variables

In [None]:
[x for x in df_level2.columns if 'L2' in x]

</br>

</br>

---

# **Level-3.1: Storage correction**

(description here)

## Inititate calculations

In [None]:
s = FluxStorageCorrectionSinglePointEddyPro(df=df_level2, fluxcol=FLUXVAR)

</br>

## Calculate storage correction
- The flux storage term (single point) is added to the flux
- For some records, the storage term can be missing. In such cases, missing terms are gap-filled using random forest
- Without gap-filling the storage term, we can lose an additional e.g. 2-3% of flux data

In [None]:
s.storage_correction()

In [None]:
s.showplot(maxflux=20)

In [None]:
s.report()

</br>

## Add results to dataset
- Adds the storage-corrected flux and the gap-filled storage term time series to the dataset

In [None]:
# print(s.storage)
df_level31 = s.get()

In [None]:
df_level31

</br>

## Available `Level-3.1` variables before preliminary QCF

In [None]:
_vars = [print(x) for x in df_level31.columns if 'L3.1' in x]
FLUXVAR31 = f"{FLUXVAR}_L3.1"  # Storage-corrected flux after Level-3.1
print(f"\nName of the storage-corrected flux variable after Level-3.1:  {FLUXVAR31}")

</br>

## Generate `QCF`, needed for Level-3.2
- `QCF` is the `quality control flag` that combines the quality tests so far into one single `QCF` flag (0-1-2)

In [None]:
qcf = FlagQCF(series=df_level31[FLUXVAR31], df=df_level31, levelid='L3.1', swinpot=df_level31['SW_IN_POT'], nighttime_threshold=50)
qcf.calculate(daytime_accept_qcf_below=2, nighttimetime_accept_qcf_below=2)
df_level31 = qcf.get()

In [None]:
qcf.report_qcf_flags()

In [None]:
qcf.report_qcf_evolution()

In [None]:
qcf.report_qcf_series()

In [None]:
qcf.showplot_qcf_heatmaps()

In [None]:
qcf.showplot_qcf_timeseries()

## Available `Level-3.1` variables after preliminary QCF

In [None]:
_vars = [print(x) for x in df_level31.columns if 'L3.1' in x]
FLUXVAR31QCF = f"{FLUXVAR31}_L3.1_QCF"  # Quality-controlled flux after Level-3.1
print(f"\nName of the storage-corrected and quality-controlled flux variable after Level-3.1:  {FLUXVAR31QCF}  (this variable will be used in the following outlier detection)")

</br>

</br>

---

# **Level-3.2: Outlier detection**
- Needs quality-controlled data (so far)

</br>

## Plot time series

In [None]:
df_level31[FLUXVAR31QCF].plot(title=f"{FLUXVAR31QCF} \n(quality-controlled Level-3.1 version of {FLUXVAR31}) \n(originally based on {FLUXVAR})");

</br>

## Initiate calculations

In [None]:
sod = StepwiseOutlierDetection(dataframe=df_level31,
                               col=FLUXVAR31QCF,
                               site_lat=SITE_LAT,
                               site_lon=SITE_LON,
                               timezone_of_timestamp='UTC+01:00')

</br>

</br>

## Flag, outlier detection: **absolute limits, separate for daytime and nighttime data**

In [None]:
print(sod.flag_outliers_abslim_dtnt_test.__doc__)
sod.flag_outliers_abslim_dtnt_test(daytime_minmax=[-50, 50], nighttime_minmax=[-5, 20], showplot=True, verbose=True)

In [None]:
sod.addflag()

</br>

</br>

## Flag, outlier detection: **Absolute limits**

In [None]:
print(sod.flag_outliers_abslim_test.__doc__)
sod.flag_outliers_abslim_test(minval=-40, maxval=40, showplot=True, verbose=True)

In [None]:
sod.addflag()

</br>

</br>

## Flag, outlier detection: **z-score over all data, separate for daytime and nighttime**

In [None]:
print(sod.flag_outliers_zscore_dtnt_test.__doc__)
sod.flag_outliers_zscore_dtnt_test(threshold=3, showplot=True, verbose=True)

In [None]:
sod.addflag()

</br>

</br>

## Flag, outlier detection: **Manual flagging of datapoints**

In [None]:
print(sod.flag_manualremoval_test.__doc__)
sod.flag_manualremoval_test(remove_dates=[['2019-12-31 19:45:00', '2020-01-31 19:45:00']],
                            showplot=True, verbose=True)

In [None]:
sod.addflag()

</br>

</br>

## Flag, outlier detection: **Increments z-score**

In [None]:
print(sod.flag_outliers_increments_zcore_test.__doc__)
sod.flag_outliers_increments_zcore_test(threshold=10, showplot=True, verbose=True)

In [None]:
sod.addflag()

</br>

</br>

## Flag, outlier detection: **z-score over all data with IQR**

In [None]:
print(sod.flag_outliers_zscoreiqr_test.__doc__)
sod.flag_outliers_zscoreiqr_test(factor=2, showplot=True, verbose=True)

In [None]:
sod.addflag()

</br>

</br>

## Flag, outlier detection: **z-score over all data**

In [None]:
print(sod.flag_outliers_zscore_test.__doc__)
sod.flag_outliers_zscore_test(threshold=4, showplot=True, verbose=True)

In [None]:
sod.addflag()

</br>

</br>

## Flag, oulier detection: **Local standard deviation**

In [None]:
print(sod.flag_outliers_localsd_test.__doc__)
sod.flag_outliers_localsd_test(n_sd=4, winsize=480, showplot=True, verbose=True)

In [None]:
sod.addflag()

</br>

</br>

## Flag, outlier detection: **Seasonal trend decomposition with z-score on residuals**

In [None]:
print(sod.flag_outliers_stl_rz_test.__doc__)
sod.flag_outliers_stl_rz_test(zfactor=3, decompose_downsampling_freq='6H', repeat=False, showplot=True)

In [None]:
sod.showplot_orig()
sod.showplot_cleaned()

In [None]:
sod.addflag()

</br>

</br>

## Flag, outlier detection: **Seasonsal trend decomposition with z-score on residuals (IQR)**

In [None]:
print(sod.flag_outliers_stl_riqrz_test.__doc__)
sod.flag_outliers_stl_riqrz_test(zfactor=2, decompose_downsampling_freq='6H', repeat=False, showplot=True)

In [None]:
sod.addflag()

</br>

</br>

## Flag, outlier detection: **Thymeboost**
- More info about [thymeboost](https://github.com/tblume1992/ThymeBoost)

In [None]:
print(sod.flag_outliers_thymeboost_test.__doc__)
sod.flag_outliers_thymeboost_test(showplot=True, verbose=True)

In [None]:
sod.addflag()

</br>

</br>

## Flag, outlier detection: **Local outlier factor**

In [None]:
print(sod.flag_outliers_lof_test.__doc__)
sod.flag_outliers_lof_test(n_neighbors=None, contamination=0.005, showplot=True, verbose=True)

In [None]:
sod.addflag()

</br>

</br>

## Flag, outlier detection: **Local outlier factor, daytime/nighttime**

In [None]:
print(sod.flag_outliers_lof_dtnt_test.__doc__)
sod.flag_outliers_lof_dtnt_test(n_neighbors=None, contamination=0.0005, showplot=True, verbose=True)

In [None]:
sod.addflag()

</br>

</br>

## Finalize outlier detection: **Collect all flags**

In [None]:
df32 = sod.get()

## Generate `QCF`, needed for Level-3.3

In [None]:
qcf = FlagQCF(series=_df['NEE_L3.1'], df=_df, levelid='L3.1', swinpot=_df['SW_IN_POT'], nighttime_threshold=50)
qcf.calculate(daytime_accept_qcf_below=2, nighttimetime_accept_qcf_below=2)
df32qcf = qcf.get()

In [None]:
qcf.report_qcf_flags()

In [None]:
qcf.report_qcf_evolution()

In [None]:
qcf.report_qcf_series()

In [None]:
qcf.showplot_qcf_heatmaps()

In [None]:
df32.describe()

In [None]:
df32.to_csv("mylovelyhorse.csv")

# **FINISH**
Congratulations, you reached the end of this notebook! Before you go let's store your finish time.

In [None]:
dt_string = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
print(f"Finished. {dt_string}")