<font color='IndianRed'>QC and Gap Filling procedures derived from MetObs Toolkit and compiled into a notebook by Patricia San Nicolás Vargas. Please check: https://github.com/mnunezpeiro/HEAT/tree/main/cws_qc%2Bgf </font>

# <font color='IndianRed'>1.0 Install MetObs Toolkit
***
Install the MetObs Toolkit if it is not already installed in your virtual environment.</font>

In [None]:
!pip3 install git+https://github.com/vergauwenthomas/MetObs_toolkit
%config InlineBackend.print_figure_kwargs = {'bbox_inches':None}

<font color='IndianRed'>Import the MetObs toolkit together with other auxiliar libraries.</font>

In [2]:
import sys
import os
import metobs_toolkit
import pandas as pd 
import datetime as dt

## <font color='IndianRed'>1.1 Import raw CWS data and create a single dataframe
***
</font>

<font color='IndianRed'>Define a set of functions for combining the raw CWS CSV files into a single CSV file.</font>

In [35]:
def clean_name_file(datafile_name):
     # Limit the file name to the first 17 characters
     clean_name = datafile_name.replace(".", ":")[:17]
     return clean_name_file


def combine_csv(data_folder_path, output_datafile_name):
    # List for storing DataFrames of each file
    dfs = []
    # List for storing clean file names
    datafiles_names = []

    # Iterate over all files in the folder
    for datafile in os.listdir(data_folder_path):
        if datafile.endswith(".csv"):
            # Read CSV file and store as DataFrame
            datafile_path = os.path.join(data_folder_path, datafile)
            df = pd.read_csv(datafile_path)

            # Add the DataFrame to the list
            dfs.append(df)

            # Get file name without extension and clean it
            clean_filename = clean_name_file(
                os.path.splitext(datafile)[0])
            datafiles_names.append(clean_filename)

    # Merge all DataFrames into one using the first column as index
    df_combined = pd.concat([df.set_index(df.columns[0])
                             for df in dfs], axis=1)
    
    # Replace ':' for '.' in the columns names
    df_combined.columns = [col.replace(":", ".")
                           for col in df_combined.columns]

    # Rename the columns starting from the second with the clean file names
    df_combined.columns = datafiles_names

    # Write the combined DataFrame to a new CSV file
    output_datafile_path = os.path.join(data_folder_path, output_datafile_name)
    df_combined.to_csv(output_datafile_path)

<font color='IndianRed'>Define the path to the folder containing the raw CWS CSV files and the name of the output CSV file.</font>

In [None]:
# Path to the folder containing the CSV files
data_folder_path = r'C:/your_path/your_folder' # Change this

# Name of the output CSV file
output_datafile_name = r'C:/your_new_path/your_CSV_file.csv' # Change this

<font color='IndianRed'>Combine the Netatmo CSV files.</font>

In [None]:
combine_csv(data_folder_path, output_datafile_name)

<font color='IndianRed'>Set the directory that contains the data to which QC and Gap Filling is to be applied.</font>

In [None]:
BASE_DIR = r'C:/your_new_path/your_CSV_file.csv' # Change this
print('BASE_DIR: ',BASE_DIR)

<font color='IndianRed'>Specify the path to the combined CSV file, load and print.</font>

In [None]:
# Path to the data file
data_path = os.path.join(BASE_DIR, 'your_data_file_name.csv') # Change this

# Load and print the data
data = pd.read_csv(data_path, delimiter=',')
data

## <font color='IndianRed'>1.2 Creating a standardized dataset
***
In order  to use your dataset with the Metobs-toolkit, you need to standardize your dataset. This is done by specifying which column or row of your dataset represents which type of observations, which column or row indicates the locations, etc. By doing so you create the template.
How is your dataset structured?
1. Long format: when your data has the station observations stacked as rows. The column headers contain the name of the variable.
2. Wide format: when each column contains the data of a different station. Every row contains the data of a particular variable.
3. Single station format: when the file contains observations of only station.</font>

<font color='IndianRed'>Run the following code cell to start the prompt that will guide you through all the steps that are needed to build your template file.</font>

In [None]:
metobs_toolkit.build_template_prompt()

<font color='IndianRed'>Copy here the code obtained from the prompt</font>

In [41]:
# Here the code obtained from the prompt

#The following commented code is an example of what should be pasted here.
# #1. Define the paths to your files: 

# data_file = "C:\\your_path\\your_data_file.csv"
# meta_data_file = "C:\\your_path\\your_metadata_file.csv"
# template = "C:\\your_path\\template.csv"

# #2. initiate a dataset: 

# your_dataset = metobs_toolkit.Dataset()

# #3. Update the paths to your files: 

# your_dataset.update_settings(
#     input_data_file = data_file,
#     input_metadata_file = meta_data_file,
#     template_file = template,
#     output_folder = "C:\\your_path\\your_folder",
#     )

# #3B. Update specific settings (optional): 

# your_dataset.update_qc_settings(gapsize_in_records = 3)

# #4. Import your data : 

# your_dataset.import_data_from_file()

<font color='IndianRed'>An initial QC is performed when the data set is read in. This initial QC will determine the time resolution for each station and, based on this time resolution, missing observations are determined. The assumed time resolution can be checked by printing the metadf dataframe:</font>

In [None]:
print("Metadf dataframe:")
print(your_dataset.metadf.assumed_import_frequency)

<font color='IndianRed'>Save the work you have done to a .json file.</font>

In [None]:
your_dataset.save_dataset(outputfolder='C:\\your_path', # Change this
                     filename='your_file_name.pkl') # Change this

## <font color='IndianRed'>1.3 Automatic QC actions
***
Some QC actions are automatically excuted when you load or import the data into your dataset:
 1. Looking for duplicated timesptamps
 2. Looking for observation values that are not valid (e.g. some text instead of a number)
 3. For each station a time resolution is estimated, based on this time resolution the dataset looks for missing observations.
 4. When a series of consecutive missing observations are detected (and this is longer than a certain threshold), then this is labelled as a gap.
 </font>

### <font color='IndianRed'>1.3.1 Optional data overview
The next code cell provides an overview of:
1. The observations
2. The outliers in the observations
3. The number of missing observations
4. The number of gaps</font>

In [None]:
# ------OPTIONAL------
your_dataset.show()

### <font color='IndianRed'>1.3.2 Optional data plots
The next code cell plots the temperature timeseries, including all stations.</font>

In [None]:
# ------OPTIONAL------
your_dataset.make_plot(obstype='temp', colorby='name')

<font color='IndianRed'>And next the sites with their corresponding temperatures are plotted:</font>

In [None]:
# ------OPTINAL-------
your_dataset.make_geo_plot(variable='temp', title=None, timeinstance=None, vmin=None, vmax=None)

## <font color='IndianRed'>2. Introduction to Quality Control checks
***
</font>

<font color='IndianRed'>This is an introduction to QC procedures in the toolkit and their default configuration.

The QC procedures are divided in two parts:
- Individual QC tests
- Group QC tests

Next, these two parts are explained:</font>

## <font color='IndianRed'>2.1 Part 1 - Invidual QC tests
***
Individual QC settings are stored in a dictionary that contains multiple levels. This configuration can be inspected by running the following cell.</font>

In [None]:
qc_settings = your_dataset.settings.qc["qc_check_settings"]

<font color='IndianRed'>The keys of the dictionary are printed next to get an idea of the different available checks:</font>

In [None]:
print(qc_settings.keys())

### <font color='IndianRed'>2.1.1 Gross Value check
***
The Gross Value check tests your dataset to see if the observations are between certain thresholds.

The settings for the gross value check can be found in the qc_settings dictionary by using the key "gross_value":</font>

In [None]:
print(qc_settings['gross_value'])

<font color='IndianRed'>This dictionary only has one key as default: "temp". This is because default values are currently only given for temperature.

The pre-defined thresholds given for temperature gross value check can be vieweved by executing the following code:</font>

In [None]:
print(qc_settings['gross_value']['temp'])

### <font color='IndianRed'>2.1.2 Persistence check
***
It checks if there are any repetitive and consecutive observation values in your dataset. It is defined a certain time window to check if the observations are constant within this window. This time window should reflect a time interval during which you expect some variation in the observed variable.

For this check to be executed, the time window should contain a minimum number of observations, which is determined by "min_num_obs". If all observations  in the time window are identical, then these are all labeled as a persistence outlier.

The pre-defined configuration given for the persistence check can be vieweved by executing the following code:</font>


In [None]:
print(qc_settings["persistance"])

### <font color='IndianRed'>2.1.3 Repetitions check
***
Similar to the persistence check. With the persistance check you define a certain time window during which you expect some variation in the observed variable. This time window is absent for the repetitions check: it simply checks the series of observations and looks for a series of consecutive constant values. A series of such constant values could indicate a connection error. In many cases the persistence check and the repetitions check will give the same results. However, in some cases, one of the checks will be more suitable, for example when the time resolution of your data is very coarse.

max_valid_repetitions = consecutive records for which the values do not change.

The pre-defined number of repetitions given for the repetitions check can be vieweved by executing the following code:</font>

In [None]:
print(qc_settings["repetitions"])

### <font color='IndianRed'>2.1.4 Step check
The step check inspects the dataset for abrupt changes in the observations between consecutive timestamps. If an observation varies too much from the previous observation, it is labeled as an outlier.

In general, for temperatures, the decrease threshhold is set less stringent than the increase threshold. This is aligned with the fact that a temperature drop is meteorologycally more common than a sudden increase which is often the result of a radiation error.

The pre-defined values given for temperature can be vieweved by executing the following code (please note that the variation is given in degrees per second):</font>

In [None]:
print(qc_settings["step"])

### <font color='IndianRed'>2.1.5 Window variation check

Similar to the step check. This test analyses the variation of the data in a certain time window. This variation needs to be between a certain minimum and maximum threshold.

The pre-defined values given for temperature can be vieweved by executing the following code (please note that the variation is given in degrees per second):</font>

In [None]:
print(qc_settings["window_variation"])

## <font color='IndianRed'>2.2 Part 2 - Group QC test
***
Group QC procedures is divided in two procedures:

2.2.1 The neighbour test, namely buddy check, which is automated.

2.2.2 The manual visual check, which is done manually.</font>

### <font color='IndianRed'>2.2.1 Neighbour test

The buddy check compares an observation against its neighbours (i.e. buddies). The check looks for buddies in a neighbourhood specified by a certain radius. The buddy check flags observations if the (absolute value of the) difference between the observations and the average of the neighbours normalized by the standard deviation in the circle is greater than a predefined threshold.
This check is based on the buddy check from titanlib. Documentation on the titanlib buddy check can be found <a href='https://github.com/metno/titanlib/wiki/Buddy-check'>here.</a></font>

### <font color='IndianRed'>2.2.2 Manual visual check

Lorem ipsum dolor sit amet.</font>

# <font color='IndianRed'>3. Applying quality control
***
</font>

## <font color='IndianRed'>3.1 Individual QC procedures
***
The five QC individual checks can be applied to the dataset with the function apply_quality_control.

--OPTIONAL-- If the data is given with a time resolution different from 1 hour, the following code should be executed:</font>

In [None]:
your_dataset.coarsen_time_resolution(freq='1H')

<font color='IndianRed'>Update the settings of the QC</font>

In [51]:
your_dataset.update_qc_settings(obstype="temp", #(str, optional) – The observation type to update the quality control settings for. The default is ‘temp’.
                                gapsize_in_records=3, #(int (> 0), optional) – A gap is defined as a sequence of missing observations with a length greater or equal to this number, on the input frequencies. The default is None.
                                gross_value_max_value=50.0, # (numeric, optional) – Maximal value for gross value check. The default is None. [Adjust to location]
                                gross_value_min_value=-5, # (numeric, optional) – Minimal value for gross value check. The default is None. [Adjust to location]
                                persis_time_win_to_check=6, # (Timedelta or str, optional) – Time window for persistance check. The default is None.
                                persis_min_num_obs=5, #(int (> 0), optional) – Minimal window members for persistance check. The default is None.
                                rep_max_valid_repetitions=5, #(int (> 0), optional) – Maximal valid repetitions for repetitions check. The default is None.
                                step_max_decrease_per_sec=0.0033333333333333333, # (numeric, optional) – Maximal increase per second for step check. The default is None. [This is 12 degrees per hour, change if needed according to the location]
                                step_max_increase_per_sec=0.0033333333333333333, # (numeric (< 0), optional) – Maximal decrease per second for step check. The default is None. [This is 12 degrees per hour, change if needed according to the location]
                                win_var_max_decrease_per_sec=None, #(numeric (> 0), optional) – Maximal decrease per second for window variation check. The default is None.
                                win_var_max_increase_per_sec=None, #(numeric (> 0), optional) – Maximal increase per second for window variation check. The default is None.
                                win_var_min_num_obs=None, #(int (> 0), optional) – Minimal window members for window variation check. The default is None.
                                win_var_time_win_to_check=None #(Timedelta or str, optional) – Time window for window variation check. The default is None.
                                )

<font color='IndianRed'>Apply QC</font>

In [None]:
your_dataset.apply_quality_control(obstype="temp",
                                   gross_value=True,
                                   persistance=True,
                                   repetitions=True,
                                   step=True,
                                   window_variation=False)

<font color='IndianRed'>--OPTIONAL--

 QC can be done on a specific station rather than the full dataset.

The following code gives an expample of how this can be done.</font>

In [None]:
specific_station = 'your_station_name' # change this

station = your_dataset.get_station(specific_station)

station.apply_quality_control(obstype="temp",
                                gapsize_in_records=3.
                                gross_value_max_value=,
                                gross_value_max_value=,
                                persis_time_win_to_check=,
                                persis_min_num_obs=,
                                rep_max_valid_repetitions=,
                                step_max_decrease_per_sec=,
                                step_max_increase_per_sec=,
                                win_var_max_decrease_per_sec=,
                                win_var_max_increase_per_sec=,
                                win_var_min_num_obs=,
                                win_var_time_win_to_check=)   # Change if needed)

<font color='IndianRed'>Next, the QC statistics are printed.</font>

In [None]:
qc_statistics = station.get_qc_stats(obstype='temp',
                                     make_plot=True)

<font color='IndianRed'>This function prints the outliers dataframe (outiliers_df). The label indicates the reason why the value was flagged as an outlier</font>

In [None]:
your_dataset.outliersdf.xs('temp', level='obstype')

## <font color='IndianRed'>3.2 Group quality control procedures
***
The two group QC procedures are applied next.
    </font>


### <font color='IndianRed'>3.2.1 Buddy check</font>

In [None]:
your_dataset.update_qc_settings(obstype='temp',
                                buddy_radius=1000, #defined in meters, >0
                                buddy_min_sample_size=3 #(int (> 2), optional) – The minimum sample size to calculate statistics on. The default is None.
                                buddy_max_elev_diff=None, #(numeric (> 0), optional) – The maximum altitude difference allowed for buddies. The default is None. [Not considered because a correction by altitude is carried out]
                                buddy_min_std=None, #(numeric (> 0), optional) – The minimum standard deviation for sample statistics. This should represent the accuracy of the observations.
                                buddy_threshold=4 #(numeric (> 0), optional) – The threshold (std units) for flagging observations as buddy outliers. The default is None [In previous works we used 4, but this should be tested]
                                buddy_elev_gradient=-0.01 #(numeric, optional) – Describes how the obstype changes with altitude (in meters). The default is None. [The WMO recommends -1 K/100 m]
                                ) 

<font color='IndianRed'>Apply the buddy check</font>

In [None]:
your_dataset.apply_buddy_check(obstype='temp', use_constant_altitude=False, haversine_approx=True)

### <font color='IndianRed'>3.2.2 Manual visual check
This function generates pie charts to display the quality control statistics. There is a general pie chart with the overall label of the observations: ok, outlier or missing. Next, there is also a general pie chart, specifying how the different types of outliers are distributed. Finally, each QC check also has its own chart, denoting how many observations pass this check by labelling them as ok, outlier or not checked. Observations which are already labeled as an outlier are not checked again by the following checks, which results in the "not checked" label.</font>

In [None]:
qc_statistics = your_dataset.get_qc_stats(obstype='temp',
                                          stationname='your_station_name',    #None means all stations are plotted
                                          make_plot=True)

<font color='IndianRed'>When plotting a time series, the QC outliers will also be present in the form of scatters on the time series. To visualise this use the colorby='label' attribute in the plotting function:</font>

In [None]:
your_dataset.make_plot(colorby='label')

<font color='IndianRed'>You can also plot just the observations of one or more station of your choice. You can specify which station by using the stationnames argument of the plotting function:</font>

In [None]:
your_dataset.make_plot(colorby='label', stationnames=["your_station_name"]) # change this

# <font color='IndianRed'>4. Save the QC dataset
***
It is important save your dataset, so that you do not have to repeat all the previous steps when you continue working.</font>

In [None]:
save_directory = 'C:\\your_directory' # Provide a directory where this dataset needs to be saved
filename = 'filename_qc_controlled_dataset.pkl' # name of the file in which the dataset is saved
your_dataset.save_dataset(outputfolder= save_directory, filename=filename)

# <font color='IndianRed'>5. GAP-FILLING
Definition of a 'gap' and the difference with a 'missing observation':

An observation can be missing because of two reasons:
1. The timestamp is not present in the dataset.
2. There is no value known, and it is represented with a NaN-value.

Every time one of these two occurs, it is labeled as a 'missing observation'. A consecutive of missing observations are labeled as a 'gap'. The minimum number of missing observations needed to define a gap is stored in the settings with the parameter "gapsize_n" (timestamps). The default value of this is 40. Note that a gap is not defined based on the length of the time period which is missing, but only on the number of missing observations. When performing one of the gap-filling techniques, only the gaps are filled, while the missing observations not defined as a gap are left in blank. The missing observations can be filled, but this has to be performed separately.</font>

## <font color='IndianRed'>5.1 Read in the data
***
</font>

<font color='IndianRed'>Next, specify the path to the folder and the file with the QC dataset</font>

In [None]:
# Specify the path to the folder
folder_with_qc_dataset = os.path.join('C:\\your_path', 'your_folder_name') # change this

# Make an empty Dataset object
empty_dataset = metobs_toolkit.Dataset()

# Import the QC dataset
dataset_incomplete = empty_dataset.import_dataset(folder_path= folder_with_qc_dataset,
                                                  filename='filename_qc_controlled_dataset.pkl') # change this

# Print the first 5 lines
print(dataset_incomplete.df.head(5))

<font color='IndianRed'>Create an empty dataset, import the QC Dataset object, and print the 5 first lines.</font>

In [None]:
# Create an empty dataset
empty_dataset = metobs_toolkit.Dataset()

# Import the QC dataset
dataset_incomplete = empty_dataset.import_dataset(folder_path= folder_with_qc_dataset,
                                                  filename='filename_qc_controlled_dataset.pkl') # change this

# Print the first 5 lines
print(dataset_incomplete.df.head(5))

<font color='IndianRed'> Next, the missing observations and gaps are updated. For this, the outliers are interpreted as missing observations. If there is a sequence of these outliers for a station, larger than n_gapsize, then this will be interpreted as a gap. The outliers are not removed.
</font>

In [61]:
dataset_incomplete.update_gaps_and_missing_from_outliers(obstype='temp', n_gapsize=3)

# <font color='IndianRed'>5.2 Get information about the gaps
***
</font>

<font color='IndianRed'>Get general information about all the gaps in the dataset.</font>

In [None]:
print("Information about all gaps in the data set:")
print(dataset_incomplete.get_gaps_df())
print(dataset_incomplete.get_gaps_info())

<font color='IndianRed'>Get detailed information about one of the gaps of the data set (Start counting 0 to get the first gap).</font>

In [None]:
print("Information on the first gap:")
print(dataset_incomplete.gaps[0].to_df())
print(dataset_incomplete.gaps[0].get_info())

<font color='IndianRed'>Get info about missing observations.</font>

In [None]:
print("Information about the missing observations:")
print(dataset_incomplete.get_missing_obs_info())

<font color='IndianRed'>Get detailed information of the gaps of one station.</font>

In [None]:
your_chosen_station = dataset_incomplete.get_station('your_station_name') # change this
print(your_chosen_station.get_gaps_df())

<font color='IndianRed'>The gaps can be visualised by making a timeplot of the incomplete dataset. In the code below a timeplot is made of the temperature for a certain time period and for one of the stations.</font>

In [None]:
# Select a station and time period
stationname = "your_station_name" # change this
begin_plot = dt.datetime.strptime("2023-01-01 00:00:00", '%Y-%m-%d %H:%M:%S') # Set your own dates
end_plot = dt.datetime.strptime("2023-12-31 23:00:00", '%Y-%m-%d %H:%M:%S') # Set your own dates

# Make timeplot
dataset_incomplete.make_plot(stationnames=[stationname],
                             obstype='temp',
                             colorby='name',
                             starttime=begin_plot,
                             endtime=end_plot,
                             legend=False,
                             show_outliers=False
)

## <font color='IndianRed'>5.3 Prepare the data set for the gap-filling
Before we start with the gap-filling, determine which data you would like to fill.

Make changes to the code below if your want to select a certain station.

If you like to fill your complete data set, and not just a one station, you can simply ignore the code cel below and not run it.</font>

In [None]:
# ------OPTIONAL------
# Change the time resolution (if the data is already coarsen, this step is not needed)
dataset_incomplete.coarsen_time_resolution(freq = "1H")

In [None]:
# ------OPTIONAL------
# Select one station
station_selected = dataset_incomplete.get_station('name_station') # change this

# Print the first 5 lines of the result
print(station_selected.df.head(5))

<font color='IndianRed'>Remember that besides the gaps, there are also missing observations which are not labeled as gap.

These missing observations can be filled through linear interpolation using the following code.</font>

In [67]:
dataset_incomplete.fill_missing_obs_linear() # change in case you selected only one station

## <font color='IndianRed'>5.4 Perform gap-filling
To perform the gap-filling, it is needded to get access to the ERA5 data that corresponds to your data set. 

<b>Before you run the code cel below, please read instructions <a href=https://vergauwenthomas.github.io/MetObs_toolkit/topics/gee_authentication.html>here</a></b></font>

In [None]:
# Get the ERA5 data
metobs_toolkit.connect_to_gee(force=True, #create new credentials
                              auth_mode='notebook', # 'notebook', 'localhost', 'gcloud' (requires gcloud installed) or 'colab' (works only in colab)
                              )
era_model = dataset_incomplete.get_modeldata(
                    modelname='ERA5_hourly',  # Name of the model data to get from GEE
                    stations=None,     # CHANGE THIS: If only for one station: specify its name,
                                              # if you want to download the data for all stations: put None
                    startdt=None,             # None means the starttime of the observations is taken
                    enddt=None)               # None means the endtime of the observations is taken

# Print the first five lines
print(era_model.df.head())

<font color='IndianRed'>If the ERA5 is too large, you should get a notification.

In this case, the ERA5 data will be uploaded automatically to your Google Drive, and you will need to retrieve it from there. Please, download that ERA5 data to your computer and specify the path to the folder down below.

After doing so, please run the code cell below to read in the ERA5 data.</font>

In [73]:
# DO NOT RUN THIS CODE CELL, except for the case when your ERA5 data set is too large

# Make an empty Modeldata object
era_model = metobs_toolkit.Modeldata('ERA5_hourly')

 # Read in the ERA5 data
era_model.set_model_from_csv(csvpath='C:\\your_path\\your_ERA5_file.csv') # change this

<font color='IndianRed'>Now update, execute and fill in the gaps</font>

In [None]:
# Update the settings (definition of the period to calculate biases for)
dataset_incomplete.update_gap_and_missing_fill_settings(gap_debias_prefered_leading_period_hours=24, # CHANGE THIS
                                                        gap_debias_prefered_trailing_period_hours=24,  # CHANGE THIS
                                                        gap_debias_minimum_leading_period_hours=6,  # CHANGE THIS
                                                        gap_debias_minimum_trailing_period_hours=6) # CHANGE THIS
# Fill gaps with the hybrid method
dataset_incomplete.fill_gaps_automatic(modeldata=era_model,
                            obstype="temp",     # CHANGE THIS!
                            max_interpolate_duration_str='3H', # CHANGE THIS, if you want!
                            overwrite_fill=True)

# Fill gaps using linear interpolation
dataset_incomplete.fill_gaps_linear(obstype='temp', overwrite_fill=False)

## <font color='IndianRed'>5.5 Visualise the filled gaps
Make changes in the code cel below to visualise one of your filled gaps.

- Change the name of the station
- Change the begin and end date of the plot
- Change the observation type you want to visualise, which is the one you gap-filled earlier.</font>

In [None]:
# Select a station and time period
stationname = "70.ee.50.00.47.f2" # CHANGE THIS
begin_plot = dt.datetime.strptime("2023-02-01 00:00:00", '%Y-%m-%d %H:%M:%S') # CHANGE THIS
end_plot = dt.datetime.strptime("2023-03-30 23:00:00", '%Y-%m-%d %H:%M:%S')   # CHANGE THIS

# Make timeplot
dataset_incomplete.make_plot(
                  stationnames=[stationname],
                  obstype='temp', # CHANGE THIS
                  colorby='label',
                  starttime=begin_plot,
                  endtime=end_plot,
                  legend=False,
                  show_outliers=False
                  )

<font color='IndianRed'>You can also get detailed information about the gap-filling of the gaps.</font>

In [None]:
# Get information for every gap
print(dataset_incomplete.get_gaps_info())

<font color='IndianRed'>As the last step, save the gap-filled dataset.</font>

In [None]:
# Specify the path to the folder
save_directory = os.path.join(BASE_DIR, 'Your_folder') # Change this to the correct folder

# Save the gap-filled data set
dataset_incomplete.save_dataset(outputfolder = save_directory,
                     filename='yourfilename_gap_filled_dataset.pkl') # Change this

# Save the gap-filled data set in a csv file.
dataset_incomplete.write_to_csv(filename='C:\\your_path\\yourfilename_gap_filled_dataset.csv')