Skip to content

Commit

Permalink
Merge pull request #159 from schalkdaniel/general_updates
Browse files Browse the repository at this point in the history
update vignette
  • Loading branch information
Daniel Schalk committed Apr 1, 2018
2 parents 25706cc + 23254eb commit 6f0bc1d
Show file tree
Hide file tree
Showing 4 changed files with 108 additions and 54 deletions.
7 changes: 4 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,10 @@ Maintainer: Daniel Schalk <d-schalk@t-online.de>
Description: A C++ implementation of componentwise boosting. The idea is to
patch all components, which are implemented as class, together. This
gives the user maximal flexiblity. In addition the main classes can be
extended with custom functions from R. This will slow down the whole
algorithm. Nevertheless, it is possible to do some prototyping within R for
custom baselearner.
extended with custom functions from R or C++. Using cutom R functions will
slow down the whole algorithm. Nevertheless, it is possible to do some
prototyping within R for custom baselearner and implement then in C++
afterwards without recompiling the whole package.
License: GPL (>= 3)
Encoding: UTF-8
LazyData: true
Expand Down
2 changes: 1 addition & 1 deletion vignettes/classes.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ between the classes see the `C++` documentation:
\url{}
\end{center}

\subsection{Loss Classes}
\subsection{Loss Classes}\label{subsec:loss-classes}

For a theoretical background see section methodology.

Expand Down
49 changes: 2 additions & 47 deletions vignettes/intro.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,50 +9,5 @@ vignette: >
%\VignetteEncoding{UTF-8}
---

Vignettes are long form documentation commonly included in packages. Because they are part of the distribution of the package, they need to be as compact as possible. The `html_vignette` output type provides a custom style sheet (and tweaks some options) to ensure that the resulting html is as small as possible. The `html_vignette` format:

- Never uses retina figures
- Has a smaller default figure size
- Uses a custom CSS stylesheet instead of the default Twitter Bootstrap style

## Vignette Info

Note the various macros within the `vignette` section of the metadata block above. These are required in order to instruct R how to build the vignette. Note that you should change the `title` field and the `\VignetteIndexEntry` to match the title of your vignette.

## Styles

The `html_vignette` template includes a basic CSS theme. To override this theme you can specify your own CSS in the document metadata as follows:

output:
rmarkdown::html_vignette:
css: mystyles.css

## Figures

The figure sizes have been customised so that you can easily put two images side-by-side.

```{r, fig.show='hold'}
plot(1:10)
plot(10:1)
```

You can enable figure captions by `fig_caption: yes` in YAML:

output:
rmarkdown::html_vignette:
fig_caption: yes

Then you can use the chunk option `fig.cap = "Your figure caption."` in **knitr**.

## More Examples

You can write math expressions, e.g. $Y = X\beta + \epsilon$, footnotes^[A footnote here.], and tables, e.g. using `knitr::kable()`.

```{r, echo=FALSE, results='asis'}
knitr::kable(head(mtcars, 10))
```

Also a quote using `>`:

> "He who gives up [code] safety for [code] speed deserves neither."
([via](https://twitter.com/hadleywickham/status/504368538874703872))
- For what stands compboost
- component-wise boosting aka model based boosting
104 changes: 101 additions & 3 deletions vignettes/theory.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,109 @@ vignette: >

\section{Methodology}



\subsection{Learning Theory Reminder}

\subsubsection{Loss Function}

The aim of machine learning is to find a model (function) $\hat{f}$ which
approximate the real but unknown $f$. which best suits our data. But finding
this model requires a mapping from the training data
$\mathcal{D}_\mathrm{train} = \left\{(x^{(i)}, y^{(i)})\ |\ i \in \{1, \dots, n\}\right\}$
to a model $\hat{f}$. This mapping is called inducer. \\

To quantify the goodness of a prediction $y = f(x)$ we need a function to
measure the loss of this prediction. Basically, the loss function can also
be seen as a metric between the true value $y$ and its prediction $y$:

\begin{align*}
L : \mathcal{Y} \times \mathcal{X} &\rightarrow\ \mathbb{R}_+ \\
y,x &\mapsto\ L\left(y, f(x)\right)
\end{align*}

The loss function is used within the inducer to fit a function (model)
$\hat{f}$ using training data $\mathcal{D}_\mathrm{train}$ (see section
\ref{sec:grad-boosting}). It is worth mentioning that different loss functions
transfer their properties to the inducer. For instance measuring the absolute
difference between $y$ and $f(x)$ (absolute loss) is more robust in terms
of outliers then measuring the qudratic differences (quadratic loss). \\

The properties of the loss function is also used to tackle different tasks.
Doing classification requires other loss functions than regression tasks.
To get an overview about different losses and their use see section
\ref{subsec:loss-classes} about the implemented loss classes of
\texttt{compboost}.


\subsubsection{Empirical Risk}

It would be desirable to have the loss for every possible combination of
$x \in \mathcal{X}$ and the corresponding true value $y \in \mathcal{Y}$.
Therefore, the natural thing would be to measure the expectation of the loss
with respect to the joint distribution $\mathbb{P}_{xy}$. This expectation is
defined as the risk $\mathcal{R}(f)$:
\[
\mathcal{R}(f) = \mathbb{E}\left[L(y, f(x))\right] = \int L(y,f(x))\ d\mathbb{P}_{xy}
\]

Since $\mathbb{P}_{xy}$ is unknown it is not possible to exactly calculate
$\mathcal{R}(f)$. The most common way to approximate the risk is to use its
empirical analogon the mean using the observation of the training data
$(y, x) \in \mathcal{D}_\mathrm{train}$. This is called the empirical risk
$\mathcal{R}_\mathrm{emp}(f)$:
\[
\mathcal{R}_\mathrm{emp}(f) = \frac{1}{n}\sum\limits_{i=1}^n\
L\left(y^{(i)}, f(x^{(i)})\right)
\]

It is also common to use the empirical risk as a summed version:
\[
\mathcal{R}_\mathrm{emp}(f) = \sum\limits_{i=1}^n\
L\left(y^{(i)}, f(x^{(i)})\right)
\]

In \texttt{compboost} we are using the average version of the empirical risk.\\

\subsubsection{Loss Minimization}

An obvious aim is now to minimize the empirical risk which is also known as
loss minimization and use the function $\hat{f}$ which minimizes
$\mathcal{R}_\mathrm{emp}(f)$:
\[
\hat{f} = \underset{f \in H}{\mathrm{arg~min}}\ \mathcal{R}_\mathrm{emp}(f)
\]

In component-wise boosting we assume that $f$ is a function which can be
parameterized by $\theta \in \Theta$ since we want to have interpretable
learner (as we will see later). Hence, we can parameterize the empirical risk:
\[
\mathcal{R}_\mathrm{emp}(\theta) = \frac{1}{n}\sum\limits_{i=1}^n\
L\left(y^{(i)}, f(x^{(i)}|\theta)\right)
\]
Therefore, the loss minimization yields in finding a parameter setting
$\hat{\theta}$ which minimizes $\mathcal{R}_\mathrm{emp}(\theta)$:
\[
\hat{\theta} = \underset{\theta \in \Theta}{\mathrm{arg~min}}\
\mathcal{R}_\mathrm{emp}(\theta)
\]


\subsection{Gradient Boosting Reminder}\label{sec:grad-boosting}

\subsubsection{Forward Stagewise Additive Modelling}

- find parameter in a greedy fashion (makes optimization very simple)

\subsubsection{Gradient Boosting}

- Ganz kurz forward stagewise additive modelling (eher zitieren)
- Ganz kurz boosting (eher zitieren)
- Bisschen genauer model based boosting



- Eigenschaften vom Loss werden auf Algorithmus übertragen
- Genauere infos zur loss definition auf classes verweisen
\subsection{Component-wise Boosting}


- Bisschen genauer model based boosting

0 comments on commit 6f0bc1d

Please sign in to comment.