vignettes/basic-use-case.Rmd

---
title: "Use-case"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Use-case}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, echo=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
  # fig.path = "Readme_files/"
)

library(compboost)
```

## Data: Titanic Passenger Survival Data Set

We use the [titanic dataset](https://www.kaggle.com/c/titanic/data) with binary
classification on `survived`. First of all we store the train and test data
in two data frames and remove all rows that contains `NA`s:

```{r}
# Store train and test data:
df_train = na.omit(titanic::titanic_train)

str(df_train)
```

In the next step we transform the response to a factor with more intuitive levels:

```{r}
df_train$Survived = factor(df_train$Survived, labels = c("no", "yes"))
```

## Initializing Model

Due to the `R6` API it is necessary to create a new class object which gets the data, the target as character, and the used loss. Note that it is important to give an initialized loss object:
```{r}
cboost = Compboost$new(data = df_train, target = "Survived", oob_fraction = 0.3)
```

Use an initialized object for the loss gives the opportunity to use a loss initialized with a custom offset.

## Adding Base-Learner

Adding new base-learners is also done by giving a character to indicate the feature. As second argument it is important to name an identifier for the factory since we can define multiple base-learner on the same source.

### Numerical Features

For instance, we can define a spline and a linear base-learner of the same feature:
```{r}
# Spline base-learner of age:
cboost$addBaselearner("Age", "spline", BaselearnerPSpline)

# Linear base-learner of age (degree = 1 with intercept is default):
cboost$addBaselearner("Age", "linear", BaselearnerPolynomial)
```

Additional arguments can be specified after naming the base-learner:
```{r}
# Spline base-learner of fare:
cboost$addBaselearner("Fare", "spline", BaselearnerPSpline, degree = 2,
  n_knots = 14, penalty = 10, differences = 2)
```

For references to the base learner documentation see [functionality](https://danielschalk.com/compboost/articles/fct-baselearner.html) at the project page.

### Categorical Features

When adding categorical features we use a dummy coded representation with a ridge penalty:
```{r}
cboost$addBaselearner("Sex", "categorical", BaselearnerCategoricalRidge)
```

Finally, we can check what factories are registered:
```{r}
cboost$getBaselearnerNames()
```

## Define Logger

### Time logger

This logger logs the elapsed time. The time unit can be one of `microseconds`, `seconds` or `minutes`. The logger stops if `max_time` is reached. But we do not use that logger as stopper here:

```{r}
cboost$addLogger(logger = LoggerTime, use_as_stopper = FALSE, logger_id = "time",
  max_time = 0, time_unit = "microseconds")
```


## Train Model and Access Elements

```{r, warnings=FALSE}
cboost$train(2000, trace = 250)
cboost
```

Objects of the `Compboost` class do have member functions such as `getCoef()`, `getInbagRisk()` or `predict()` to access the results:
```{r}
str(cboost$getCoef())
str(cboost$getInbagRisk())
str(cboost$predict())
```

To obtain a vector of selected base learners use `getSelectedBaselearner()`:
```{r}
table(cboost$getSelectedBaselearner())
```

We can also access predictions directly from the response object `cboost$response` and `cboost$response_oob`. Note that `$response_oob` was created automatically when defining an `oob_fraction` within the constructor:
```{r}
oob_label = cboost$response_oob$getResponse()
oob_pred = cboost$response_oob$getPredictionResponse()
table(true_label = oob_label, predicted = oob_pred)
```

## Retrain the Model

To continue the training or set the whole model to another iteration simply re-call `train()`:
```{r, warnings=FALSE}
cboost$train(3000)

str(cboost$getCoef())
str(cboost$getInbagRisk())
table(cboost$getSelectedBaselearner())
```

## Next steps

- Have a look at the [visualization capabilities](https://danielschalk.com/compboost/articles/getting_started/visualizations.html) of the package.
- See how [other loss functions](https://danielschalk.com/compboost/articles/getting_started/robust_regression.html) effect the model training.