![lop](../../images/logo_diive1_128px.png)
# **Find gaps in time series**

**Last updated**: 18 Apr 2023  
**Author**: Lukas Hörtnagl (holukas@ethz.ch)
---
Example for the `GapFinder` class in `pkgs.analyses.gapfinder.GapFinder` of the time series processing library `diive`.

# Imports

In [1]:
from diive.pkgs.analyses.gapfinder import GapFinder
from diive.configs.exampledata import load_exampledata_DIIVE_CSV_30MIN

  from pandas import Int64Index as NumericIndex


# Load example data

In [2]:
data_df, metadata_df = load_exampledata_DIIVE_CSV_30MIN()
series = data_df['NEE_CUT_REF_orig']
series

Reading file exampledata_CH-DAV_FP2022.5_2022.07_ID20230206154316_30MIN.diive.csv ...


TIMESTAMP_MIDDLE
2022-07-01 00:15:00         NaN
2022-07-01 00:45:00         NaN
2022-07-01 01:15:00    1.304188
2022-07-01 01:45:00         NaN
2022-07-01 02:15:00         NaN
                         ...   
2022-07-31 21:45:00         NaN
2022-07-31 22:15:00         NaN
2022-07-31 22:45:00         NaN
2022-07-31 23:15:00         NaN
2022-07-31 23:45:00         NaN
Freq: 30T, Name: NEE_CUT_REF_orig, Length: 1488, dtype: float64

# Find gaps in time series

In [3]:
gf = GapFinder(series=series, limit=None, sort_results=True)
gapfinder_df = gf.get_results()

# Results

In [4]:
gapfinder_df

Unnamed: 0_level_0,GAP_START,GAP_END,GAP_LENGTH
IS_NUMERIC_CUMSUM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
698,2022-07-30 17:45:00,2022-07-31 08:15:00,30
556,2022-07-23 18:45:00,2022-07-24 08:45:00,29
684,2022-07-29 19:15:00,2022-07-30 08:15:00,27
421,2022-07-17 19:45:00,2022-07-18 07:45:00,25
327,2022-07-13 20:15:00,2022-07-14 07:45:00,24
...,...,...,...
516,2022-07-22 10:15:00,2022-07-22 10:15:00,1
534,2022-07-22 19:45:00,2022-07-22 19:45:00,1
85,2022-07-04 16:45:00,2022-07-04 16:45:00,1
548,2022-07-23 14:15:00,2022-07-23 14:15:00,1


In [5]:
longestgap = gapfinder_df.iloc[0]
print(f"The longest gap had a length of {longestgap['GAP_LENGTH']} missing records and was found between {longestgap['GAP_START']} and {longestgap['GAP_END']}.")

The longest gap had a length of 30 missing records and was found between 2022-07-30 17:45:00 and 2022-07-31 08:15:00.


In [6]:
print(f"Here are the three longest gaps:\n{gapfinder_df.head(3)}")

Here are the three longest gaps:
                            GAP_START             GAP_END  GAP_LENGTH
IS_NUMERIC_CUMSUM                                                    
698               2022-07-30 17:45:00 2022-07-31 08:15:00          30
556               2022-07-23 18:45:00 2022-07-24 08:45:00          29
684               2022-07-29 19:15:00 2022-07-30 08:15:00          27
