# Timeseries Feature Engineering Notebook

The aim of this notebook is to derive and introduce additional relevant features to the train/test timeseries datasets as a means of enhancing the model's performance.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm import tqdm

In [2]:
# let's start by separating the location from the timestamp
# each location is geographically different, so flood conditions may vary across locations
# so the location will essentially serve as a categorical feature

In [3]:
# let's derive the month from the timestamp
# certain months may correspond to a wet/dry season for a particular location,
# which can help explain increased/decreased rainfall that could relate to flood events

In [4]:
# let's identify the season from the month, according to https://southafrica-info.com/land/south-africa-weather-climate/

In [5]:
# the American Meteorological Society (AMS) defines intensities for rainfall periods (https://glossary.ametsoc.org/wiki/Rain)
# let's track the rainfall intensity for each day using the daily precipitation (24 hours)

In [6]:
# let's keep a running total of the rainfall for each month
# intuitively, the more water there is (e.g. from rainfall), the more likely a flood is to occur
# so the rainier a month is, the more likely it is to have a flood event

In [7]:
# let's keep a running average of the rainfall for each month as well
# intuitively, months with a higher average rainfall would be more likely to have a flood event