# Time series in Pastas

Time series are at the heart of Pastas and modeling hydraulic head fluctuations. In this section background information is provided on important characteristics of time series and how these may influence your modeling results.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pastas as ps

## Different types of time series

### regular and irregular time series

Time series data may be defined as a set of data values measured at certain times, ordered in a way that the time indices are increasing. Many time series analysis and modeling methods require that the time step between the measurements is always the same or in other words, equidistant. Such regular time series may have missing data, but it will still be possible to lay the values on a time-grid with constant time steps. Often hydraulic heads were measured at irregular time intervals, and the time series is irregular. This is especially true for historic time series that were measured by hand. The result is that the measurements can not be laid on a regular time grid. The figure below graphically shows the difference between the three types of time series.

In [None]:
regular = pd.Series(
    index=pd.date_range("2000-01-01", "2000-01-10", freq="D"), data=np.ones(10)
)
missing_data = regular.copy()
missing_data.loc[["2000-01-03", "2000-01-08"]] = np.nan

index = [t + pd.Timedelta(np.random.rand() * 24, unit="H") for t in missing_data.index]
irregular = missing_data.copy()
irregular.index = index

fig, axes = plt.subplots(3, 1, figsize=(6, 3), sharex=True, sharey=True)

regular.plot(ax=axes[0], linestyle=" ", marker="o", x_compat=True)
missing_data.plot(ax=axes[1], linestyle=" ", marker="o", x_compat=True)
irregular.plot(ax=axes[2], linestyle=" ", marker="o", x_compat=True)

for i, name in enumerate(
    ["(a) Regular time steps", "(b) Missing Data", "(c) Irregular time steps"]
):
    axes[i].grid()
    axes[i].set_title(name)
plt.tight_layout()

### Independent and dependent time series 

We can differentiate between two types of input time series for Pastas models: the dependent and independent time series. The dependent time series are those that we want to explain (e.g., the groundwater levels), and are referred to as the `oseries` in pastas. The independent time series are those that we use to model the dependent time series (e.g., precipitation or evaporation), and are referred to as `stresses` in Pastas. The requirements for these time series are different:

- The dependent time series may be of any kind: regular, missing data or irregular.
- The stress time series needs to have regular time steps.

### A word on timestamps and measurements

Often the measurements represent a flux measured over a certain time period. For example, precipitation is often provided in mm/day, and the value represent the cumulative precipitation amount for a day. This is recorded at the end of the day, e.g., 2000-01-01 24:00:00 represents the total precipitation that fell on the first of January.

## The Python package for time series data: Pandas

Pandas provides a lot of methods to deal with time series data, for example resampling, gap-filling, and computing descriptive statistics. Another important functionality of Pandas are the `Pandas.read_csv` and related methods, which allows you to easily load data from csv-files and other popular data storage formats. For more information and user guidance on Pandas please see their documentation website (https://pandas.pydata.org).

All time series should be provided to Pastas as `pandas.Series` with a `pandas.DatetimeIndex`. Internally, these time series are stored in a `pastas.TimeSeries` object, which adds functionality for resampling and extending the time series in time. More on that later. Pandas provides a lot of methods to deal with time series data, for example resampling, gap-filling, and computing descriptive statistics. Another important functionality of Pandas are the `Pandas.read_csv` and related methods, which  

<div class="alert alert-info">

<b>Note</b>
    
* The dtype for a date should be the `pandas.Timestamp`.
* The dtype for a sequence of dates should be the `pandas.DatetimeIndex` with `pandas.Timestamp`s.
* The dtype for a time series should be a `pandas.Series` with a `pandas.DatetimeIndex`.
* The dtype for the values of a `pandas.Series` should be `float`.    
    
</div>

## Validating user-provided time series

As is clear from the descriptions above, the user is required to provide time series in a certain format and with certain characteristics, depending on the type of time series. To prevent issues later in the modelng chain, all user-provided time series are internally checked on a number of things. This is done using the `pastas.validate_stress` and `pastas.validate_oseries` methods, which can also be called directly by the user. Let's look at the docstring of these methods to see what is checked:

In [None]:
?ps.validate_stress

The last check (equidistant time steps) is not required for `oseries`. If any of these checks fail, the `pastas.validate_stress` and `pastas.validate_oseries` methods will return an Error with pointers on how to solve the problem and fix the time series. We refer to the Examples-section for more example of how to pre-process the user-provided time series.

## Settings for user-provided time series

As mentioned before the user-provided time series are internally stored as `pastas.TimeSeries` objects. The TimeSeries class is used internally to:

1. extend the time series forward and backward in time, 
2. to resample the time series to a different frequency.

It should be noted that if these operations are not required (e.g., th time series is long enough or/and the right frequency) Pastas will use the user-provided time series directly. How these two operations are performed (only if required) depends on the `settings` that are provided. Providing these setting is done when creating a stress model object. By default, all the stressmodels have sensible predefined values for this. These predefined options can accessed through `ps.rcParams["timeseries"]`:

In [None]:
pd.DataFrame.from_dict(ps.rcParams["timeseries"])

Each column name is a valid option for the `settings` argument. For example, the default setting for the precipitation stress provided to the `ps.RechargeModel` object is "prec" (see the docstring of `ps.RechargeModel`). This means that the following settings is used:

In [None]:
ps.rcParams["timeseries"]["prec"]

# sm = ps.Stressmodel(stress, settings="prec")

Alternatively, one may provide a dictionary to a stress model object with the settings. 

In [None]:
settings = {
    "fill_before": 0.0,
    "fill_after": 0.0,
    # Etcetera
}
# sm = ps.Stressmodel(stress, settings=settings)

If Pastas does an operation on the original time series, it will **always** output an INFO message describing what is done. To see these, make sure the log_level of Pastas is set to "INFO" by running `ps.set_log_level("INFO")` before.