#### Exercise 05: Timeseries tutorial / APL

The APL library originates from the acquisition of KXEN and their Automated Machine Learning technology. In many ways, the two native Machine Learning libraries in HANA - PAL & APL are comparable in relation to cases they can be used for. The main difference between PAL & APL, is using PAL components is a manual process, and requires, what we normally refer to as Data Science skills, but also gives freedom to select, adjust and finetune each individual algorithm. APL, on the other hand, is based on automation, and best practice processes, which limits the freedom to adjust - that's is taken care of by the APL engine. This also means that Machine Learning based on APL is not only for Data Scientists, but also Business Analysts.

Below is the generic Machine Learning process illustrated, and how APL has automated many of the steps, especially the more complex ones.

<table>
    <img src="./images/ML_Process.png",width=60,height=30>
</table>

Let's quickly go through the process, explaining the different parts, and how APL supports them:
- Data Connection: We need to access the relevant data, and beside tables and views, APL can also use HANA DataFrames
- Feature Engineering: Feature Engineering is about preparing the data; deriving new features, handle missing values, reduce the influence of outliers, grouping “similar” values (binning). APL handles automatically missing values, outliers, and binning.
- Variable Reduction & Sampling: Not all variables have an influence, therefore sometimes we eliminate them before we train a model to speed up the training. As part of the training process, APL will automatically eliminate variables that don't have an influence or are covered by other variables (auto-correlation). Sampling is a very important part of the Machine Learning process, it splits data up into a training part and a test, and sometimes also a validation part. The test and validation dataset measures the accuracy and how reusable (generalized) the model is.
- Machine Learning Training: This is the core part, where we want to train a selected algorithm based on the training data, see if we can identify a reusable pattern. Normally you have multiple algorithms to choose from, and when chosen, the parameters have to be adjusted. Based on the scenario, APL will use a predefined algorithm, with preset parameters.
- Model Validation: When the model is trained, the test and validation dataset are used to measure the accuracy and how reusable (generalized) the model is. APL does this automatically
- Model interpretation: Sometimes we would like to understand the pattern of the trained model, and for some algorithms, this is possible, and for some not – so something you should be aware of when selecting which algorithm to use. APL uses an algorithm, that explains the pattern in an easily understandable way
- Apply Model: This last part is where all the prior work materializes into value. Applying the model, normally is embedding a trained model to enhance an existing business process. For most of the scenarios, APL can produce the trained model – pattern, as SQL code – easy to consume and embed into a business process.


Let's start by creating the connection to SAP HANA Cloud


In [None]:
%run ./02-setup.ipynb

### Create an HANA Dataframe for the actual series

In [None]:
sql_cmd = 'SELECT "Date", "Cash", "MondayMonthInd", "FridayMonthInd" FROM "CASHFLOW"'
series_in = hdf.DataFrame(conn, sql_cmd)

series_in.head(5).collect()

### Fit & Predict with APL

In [None]:
from hana_ml.algorithms.apl.time_series import AutoTimeSeries
apl_model = AutoTimeSeries(time_column_name= 'Date', target= 'Cash', horizon= 21)
series_out = apl_model.fit_predict(data = series_in, build_report=True)

In [None]:
df = series_out.select(series_out.columns[0:5]).collect()
dict = {'ACTUAL': 'Actual', 
        'PREDICTED_1': 'Forecast', 
        'LOWER_INT_95PCT': 'Lower Limit', 
        'UPPER_INT_95PCT': 'Upper Limit' }
df.rename(columns=dict, inplace=True)
df

##### Reports

In [None]:
apl_model.generate_html_report('Cash_Report')
apl_model.generate_notebook_iframe_report()

##### Target Statistics

In [None]:
df = apl_model.get_debrief_report('ContinuousTarget_Statistics').collect()
df.drop('Oid', axis=1, inplace=True)
df.style.hide(axis='index')