![lop](../../images/logo_diive1_128px.png)

<span style='font-size:40px; display:block;'>
<b>
    Find gaps in time series
</b>
</span>

---
**Notebook version**: `2` (24 Oct 2023)  
**Author**: Lukas Hörtnagl (holukas@ethz.ch)  

</br>

# **Description**

- Get an overview of existing data gaps in a time series

</br>

# **Imports**

In [1]:
import importlib.metadata
import warnings
from datetime import datetime

from diive.configs.exampledata import load_exampledata_parquet
from diive.pkgs.analyses.gapfinder import GapFinder

warnings.filterwarnings("ignore")
version_diive = importlib.metadata.version("diive")
print(f"diive version: v{version_diive}")

diive version: v0.85.0


</br>

# **Docstring**

In [2]:
# help(GapFinder)

</br>

# **Load example data**

In [3]:
data_df = load_exampledata_parquet()
series = data_df['NEE_CUT_REF_orig']
series

Loaded .parquet file L:\Sync\luhk_work\20 - CODING\21 - DIIVE\diive\diive\configs\exampledata\exampledata_PARQUET_CH-DAV_FP2022.5_2013-2022_ID20230206154316_30MIN.parquet (0.042 seconds).
    --> Detected time resolution of <30 * Minutes> / 30min 


TIMESTAMP_MIDDLE
2013-01-01 00:15:00      NaN
2013-01-01 00:45:00      NaN
2013-01-01 01:15:00      NaN
2013-01-01 01:45:00    0.538
2013-01-01 02:15:00      NaN
                       ...  
2022-12-31 21:45:00      NaN
2022-12-31 22:15:00    3.518
2022-12-31 22:45:00      NaN
2022-12-31 23:15:00      NaN
2022-12-31 23:45:00      NaN
Freq: 30min, Name: NEE_CUT_REF_orig, Length: 175296, dtype: float64

</br>

# **Find gaps in time series**

In [4]:
gf = GapFinder(series=series, limit=None, sort_results=True)
gapfinder_df = gf.get_results()

</br>

# **Results**

In [5]:
gapfinder_df

Unnamed: 0_level_0,GAP_START,GAP_END,GAP_LENGTH
IS_NUMERIC_CUMSUM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
30027,2018-03-06 19:15:00,2018-04-30 15:15:00,2633
21932,2016-09-12 16:15:00,2016-09-22 09:45:00,468
48475,2021-05-18 16:15:00,2021-05-25 13:45:00,332
31041,2018-06-19 22:45:00,2018-06-26 10:45:00,313
5204,2013-11-22 13:15:00,2013-11-28 09:45:00,282
...,...,...,...
14639,2015-07-23 21:45:00,2015-07-23 21:45:00,1
14613,2015-07-22 17:45:00,2015-07-22 17:45:00,1
58130,2022-12-25 12:45:00,2022-12-25 12:45:00,1
100,2013-01-14 10:15:00,2013-01-14 10:15:00,1


In [6]:
longestgap = gapfinder_df.iloc[0]
print(
    f"The longest gap had a length of {longestgap['GAP_LENGTH']} missing records and was found between {longestgap['GAP_START']} and {longestgap['GAP_END']}.")

The longest gap had a length of 2633 missing records and was found between 2018-03-06 19:15:00 and 2018-04-30 15:15:00.


In [7]:
print(f"Here are the three longest gaps:\n{gapfinder_df.head(3)}")

Here are the three longest gaps:
                            GAP_START             GAP_END  GAP_LENGTH
IS_NUMERIC_CUMSUM                                                    
30027             2018-03-06 19:15:00 2018-04-30 15:15:00        2633
21932             2016-09-12 16:15:00 2016-09-22 09:45:00         468
48475             2021-05-18 16:15:00 2021-05-25 13:45:00         332


</br>

# **End of notebook**

In [8]:
dt_string = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
print(f"Finished {dt_string}")

Finished 2025-01-23 12:40:55
