![lop](../../images/logo_diive1_128px.png)

<span style='font-size:32px; display:block;'>
<b>
    Format meteo data for EddyPro flux processing
</b>
</span>

---
**Notebook version**: `1` (24 Feb 2025)  
**Author**: Lukas Hörtnagl (holukas@ethz.ch)  

</br>

# **Background**

- Formats meteo data to be used in EddyPro flux processing

More info:
- [EddyPro help: Supported biomet file formats](https://www.licor.com/support/EddyPro/topics/biomet-data-format.html)

</br>

# **Settings**

## Variables

In [1]:
# Name of the variables in the original data file
SW_IN = 'SW_IN_T1_1_1'
RH = 'RH_T1_2_1'
PPFD_IN = 'PPFD_IN_T1_2_1'
LW_IN = 'LW_IN_T1_1_1'
TA = 'TA_T1_2_1'
# PA = None  # Not available for this site

# Rename original variables for EddyPro, and add units
rename_dict = {
    TA: ('Ta_1_1_1', 'C'),
    SW_IN: ('Rg_1_1_1', 'W+1m-2'),
    RH: ('RH_1_1_1', '%'),
    LW_IN: ('Lwin_1_1_1', 'W+1m-2'),
    # PA: ('Pa_1_1_1', 'kPa),
    PPFD_IN: ('PPFD_1_1_1', 'umol+1m-2s-1'),
}

## Database settings (used for example data)

In [2]:
# Settings for database download
SITE = 'ch-fru'  # Site name
START = '2024-01-01 00:01:00'  # Download data starting with this date
STOP = '2024-02-01 00:01:00'  # Download data before this date (the stop date itself is not included)
MEASUREMENTS = ['TA', 'RH', 'SW', 'PPFD', 'LW']  # No PA in this example
FIELDS = [TA, RH, SW_IN, LW_IN, PPFD_IN]  # No PA in this example
TIMEZONE_OFFSET_TO_UTC_HOURS = 1  # Timezone, e.g. "1" is translated to timezone "UTC+01:00" (CET, winter time)
data_version = "meteoscreening_diive"
DIRCONF = r'F:\Sync\luhk_work\20 - CODING\22 - POET\configs'

</br>

# **Imports**

In [3]:
import importlib.metadata
import warnings
from datetime import datetime

import matplotlib.gridspec as gridspec
import matplotlib.pyplot as plt
import numpy as np

from dbc_influxdb import dbcInflux

from diive.core.plotting.heatmap_datetime import HeatmapDateTime  # For plotting heatmaps
from diive.core.plotting.timeseries import TimeSeries  # For simple (interactive) time series plotting
from diive.pkgs.formats.meteo import FormatMeteoForEddyProProcessing  # Class to format output files for upload

warnings.filterwarnings(action='ignore', category=FutureWarning)
warnings.filterwarnings(action='ignore', category=UserWarning)

version_diive = importlib.metadata.version("diive")
print(f"diive version: v{version_diive}")

diive version: v0.85.6


</br>

# **Docstring**

In [4]:
# help(FormatMeteoForEddyProProcessing)

</br>

# **Load example data**
- This example uses data from a database.

In [5]:
dbc = dbcInflux(dirconf=DIRCONF)
df, _, _ = \
    dbc.download(bucket=f'{SITE}_processed',
                 measurements=MEASUREMENTS,
                 fields=FIELDS,
                 start=START,
                 stop=STOP,
                 timezone_offset_to_utc_hours=TIMEZONE_OFFSET_TO_UTC_HOURS,
                 data_version='meteoscreening_diive')

Reading configuration files was successful.
Connection to database works.

DOWNLOADING
    from bucket ch-fru_processed
    variables ['TA_T1_2_1', 'RH_T1_2_1', 'SW_IN_T1_1_1', 'LW_IN_T1_1_1', 'PPFD_IN_T1_2_1']
    from measurements ['TA', 'RH', 'SW', 'PPFD', 'LW']
    from data version meteoscreening_diive
    between 2024-01-01 00:01:00 and 2024-02-01 00:01:00
    with timezone offset to UTC of 1
Used querystring: from(bucket: "ch-fru_processed") |> range(start: 2024-01-01T00:01:00+01:00, stop: 2024-02-01T00:01:00+01:00) |> filter(fn: (r) => r["_measurement"] == "TA" or r["_measurement"] == "RH" or r["_measurement"] == "SW" or r["_measurement"] == "PPFD" or r["_measurement"] == "LW") |> filter(fn: (r) => r["data_version"] == "meteoscreening_diive") |> filter(fn: (r) => r["_field"] == "TA_T1_2_1" or r["_field"] == "RH_T1_2_1" or r["_field"] == "SW_IN_T1_1_1" or r["_field"] == "LW_IN_T1_1_1" or r["_field"] == "PPFD_IN_T1_2_1") |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColu

The dataframe with original data looks like this:

In [6]:
df

Unnamed: 0_level_0,LW_IN_T1_1_1,PPFD_IN_T1_2_1,RH_T1_2_1,SW_IN_T1_1_1,TA_T1_2_1
TIMESTAMP_END,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2024-01-01 00:30:00,298.680643,0.0,99.997990,0.0,0.063089
2024-01-01 01:00:00,269.906123,0.0,98.689082,0.0,0.365761
2024-01-01 01:30:00,241.274010,0.0,95.548050,0.0,0.326444
2024-01-01 02:00:00,228.099020,0.0,89.946188,0.0,0.767750
2024-01-01 02:30:00,251.443823,0.0,85.651175,0.0,1.136206
...,...,...,...,...,...
2024-01-31 22:00:00,243.446680,0.0,68.076216,0.0,2.878339
2024-01-31 22:30:00,243.958657,0.0,68.221826,0.0,2.901994
2024-01-31 23:00:00,245.087567,0.0,69.436327,0.0,2.463166
2024-01-31 23:30:00,247.150223,0.0,65.885408,0.0,3.031844


</br>

# **Format data**

In [7]:
f = FormatMeteoForEddyProProcessing(
    df=df,
    cols=rename_dict
)
f.run()


Sanitizing timestamp ...
>>> Validating timestamp naming of timestamp column TIMESTAMP_END ... Timestamp name OK.
>>> Converting timestamp TIMESTAMP_END to datetime ... OK
>>> All rows have timestamp TIMESTAMP_END, no rows removed.
>>> Sorting timestamp TIMESTAMP_END ascending ...
>>> Removing data records with duplicate indexes ... OK (no duplicates found in timestamp index)
Detecting time resolution from timestamp TIMESTAMP_END ... OK
   Detected 30min time resolution with HIGH confidence.
   Resolution detected from most frequent timestep (timedelta):
       from full data = None / -failed- (not used)
       from timedelta = 30min / 99% occurrence (OK)
       from progressive = 30min / data 312+312 (not used)

>>> Creating continuous 30min timestamp index for timestamp TIMESTAMP_END between 2024-01-01 00:30:00 and 2024-02-01 00:00:00 ...
Splitting timestamp into two separate columns ('TIMESTAMP_1', 'yyyy-mm-dd') and ('TIMSTAMP_2', 'HH:MM')
Filling missing values with -9999 ...
Rena

In [8]:
reformatted_df = f.get_results()
reformatted_df

Unnamed: 0_level_0,TIMESTAMP_1,TIMSTAMP_2,Lwin_1_1_1,PPFD_1_1_1,RH_1_1_1,Rg_1_1_1,Ta_1_1_1
Unnamed: 0_level_1,yyyy-mm-dd,HH:MM,W+1m-2,umol+1m-2s-1,%,W+1m-2,C
TIMESTAMP_END,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
2024-01-01 00:30:00,2024-01-01,00:30,298.680643,0.0,99.997990,0.0,0.063089
2024-01-01 01:00:00,2024-01-01,01:00,269.906123,0.0,98.689082,0.0,0.365761
2024-01-01 01:30:00,2024-01-01,01:30,241.274010,0.0,95.548050,0.0,0.326444
2024-01-01 02:00:00,2024-01-01,02:00,228.099020,0.0,89.946188,0.0,0.767750
2024-01-01 02:30:00,2024-01-01,02:30,251.443823,0.0,85.651175,0.0,1.136206
...,...,...,...,...,...,...,...
2024-01-31 22:00:00,2024-01-31,22:00,243.446680,0.0,68.076216,0.0,2.878339
2024-01-31 22:30:00,2024-01-31,22:30,243.958657,0.0,68.221826,0.0,2.901994
2024-01-31 23:00:00,2024-01-31,23:00,245.087567,0.0,69.436327,0.0,2.463166
2024-01-31 23:30:00,2024-01-31,23:30,247.150223,0.0,65.885408,0.0,3.031844


</br>

# **Save reformatted data to CSV**

In [9]:
reformatted_df.to_csv("meteo.csv", index=False)

</br>

# **End of notebook**

In [10]:
dt_string = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
print(f"Finished {dt_string}")

Finished 2025-02-24 15:14:24
