# A Basic Model
In this example application it is shown how a simple time series model can be developed to simulate groundwater levels. The recharge (calculated as preciptation minus evaporation) is used as the explanatory time series.

In [1]:
# First perform the necessary imports
import pandas as pd
import matplotlib.dates as md
import matplotlib.pyplot as plt
from pasta import *
%matplotlib notebook

### 1. Importing the dependent time series data
In this codeblock the groundwater time series are imported using pandas `read_csv` function. In this example these are the groundwater levels observed in a well in The Netherlands (ID B58C0698001). As PASTA expects a pandas Series object, a single column isa selected from the pandas DataFrame (`gwdata`). To check if you have the correct data type, you can use `type(oseries)` as is done in this example. 

The following characteristics are important when importing and preparing the observed time series:
- The observed time series are stored as a Pandas time series object.
- All lines must contain a value, nan must be dropped.
- The time step can be irregular, this is no problem for PASTA.

In [2]:
# Import and check the observed groundwater time series
gwdata = pd.read_csv('../data/B58C0698001_0.csv', skiprows=11, 
                     parse_dates=['PEIL DATUM TIJD'], index_col='PEIL DATUM TIJD', 
                     skipinitialspace=True)
gwdata.rename(columns={'STAND (MV)': 'h'}, inplace=True)
gwdata.index.names = ['date']
gwdata.h *= 0.01
oseries = 30.17 - gwdata.h  # Change the reference  (NAP)
print 'The data type of the oseries is: %s' %type(oseries)

# Plot the observed groundwater levels
plt.figure()
oseries.plot()
plt.ylabel('Head [m]');
plt.xlabel('Time [years]');
plt.show()

The data type of the oseries is: <class 'pandas.core.series.Series'>


<IPython.core.display.Javascript object>

### 2. Import the independent time series
Two time series that explain the groundwater levels are imported: the precipitation and the potential evaporation. As is the case with the observed groundwater levels, PASTA expects pandas Series object for all time series. 

Important characteristics of these time series are:
- Same units as the observed time series
- Values are necessary on every time step
- The time series should be stored as Pandas Time Series

In [3]:
# Import observed precipitation series
precip = pd.read_csv('../data/Heibloem_rain_data.dat', skiprows=4,
                     delim_whitespace=True, parse_dates=['date'], 
                     index_col='date')
precip = precip.precip
precip /= 1000.0  # Meters
print 'The data type of the precip series is: %s' %type(precip)

# Import observed evaporation series
evap = pd.read_csv('../data/Maastricht_E_June2015.csv', skiprows=4, 
                   sep=';', parse_dates=['DATE'], index_col='DATE')
evap.rename(columns={'VALUE (m-ref)': 'evap'}, inplace=True)
evap = evap.evap
print 'The data type of the evap series is: %s' %type(evap)

# Calculate the recharge to the groundwater

recharge = precip - evap
print 'The data type of the recharge series is: %s' %type(recharge)

# Plot the time series of the precipitation and evaporation
plt.figure()
recharge.plot(label='Recharge')
plt.xlabel('Time [years]')
plt.show()

The data type of the precip series is: <class 'pandas.core.series.Series'>
The data type of the evap series is: <class 'pandas.core.series.Series'>
The data type of the recharge series is: <class 'pandas.core.series.Series'>


<IPython.core.display.Javascript object>

### 3. Creating the time series model
In this code block the actual time series model is created. First, an instance of the Model class is created (named `ml` here). Second, the different components of the time series model are created and added to the model. The imported time series are automatically checked for missing values and other inconsistencies. The keyword argument fillnan can be used to determine how missing values are handled. If any nan-values are found this will be reported by PASTA.

In [8]:
# Initiate the base model
ml = Model(oseries)

# Add the recharge data as explanatory variable
ts1 = Tseries(recharge, Gamma, name='recharge', fillnan='mean')
ml.addtseries(ts1)

# Add a constant
d = Constant(value=oseries.mean())
ml.addtseries(d)

# Add a noisemodel
n = NoiseModel()
ml.addnoisemodel(n)

6 nan-value(s) in the oseries was/were found and handled/filled with: drop
Inferred frequency from time series: freq=D


### 4. Solving and plotting the model
The next and final step is to optimize the model parameters. By default a non-linear least squares method is used for the optimization. The python package LMFIT is used for this (https://github.com/lmfit/lmfit-py). Some standard optimization statistics are reported along with the optimized parameter values. 

In [9]:
# Solve the time series model
ml.solve()

[[Fit Statistics]]
    # function evals   = 39
    # data points      = 644
    # variables        = 5
    chi-square         = 4.238
    reduced chi-square = 0.007
[[Variables]]
    recharge_A:    717.715554 +/- 32.23438 (4.49%) (init= 500)
    recharge_n:    1.06560588 +/- 0.016268 (1.53%) (init= 1)
    recharge_a:    126.354801 +/- 7.497620 (5.93%) (init= 100)
    constant_d:    27.5732426 +/- 0.018537 (0.07%) (init= 27.90008)
    noise_alpha:   57.0049674 +/- 7.330319 (12.86%) (init= 14)
[[Correlations]] (unreported correlations are <  0.100)
    C(recharge_A, recharge_a)    =  0.833 
    C(recharge_A, constant_d)    = -0.769 
    C(recharge_n, recharge_a)    = -0.651 
    C(recharge_a, constant_d)    = -0.645 
    C(recharge_A, recharge_n)    = -0.261 
    C(recharge_n, constant_d)    =  0.212 


In [10]:
ml.plot()
plt.show()

<IPython.core.display.Javascript object>

## 6. Advanced plotting
Often a simple plot of the simulated and observed series does not give enough informaion. To obtain more information on the calibrated model the command `plot_results` provides a plot with more information. 

In [8]:
ml.plot_results()

<IPython.core.display.Javascript object>