Merge pull request #159 from schalkdaniel/general_updates

update vignette
schalkdaniel · Apr 1, 2018 · 6f0bc1d · 6f0bc1d
2 parents 25706cc + 23254eb
commit 6f0bc1d
Show file tree

Hide file tree

Showing 4 changed files with 108 additions and 54 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -7,9 +7,10 @@ Maintainer: Daniel Schalk <d-schalk@t-online.de>
 Description: A C++ implementation of componentwise boosting. The idea is to 
     patch all components, which are implemented as class, together. This 
     gives the user maximal flexiblity. In addition the main classes can be 
-    extended with custom functions from R. This will slow down the whole 
-    algorithm. Nevertheless, it is possible to do some prototyping within R for 
-    custom baselearner.
+    extended with custom functions from R or C++. Using cutom R functions will 
+    slow down the whole algorithm. Nevertheless, it is possible to do some 
+    prototyping within R for custom baselearner and implement then in C++ 
+    afterwards without recompiling the whole package.
 License: GPL (>= 3)
 Encoding: UTF-8
 LazyData: true

diff --git a/vignettes/classes.Rmd b/vignettes/classes.Rmd
@@ -32,7 +32,7 @@ between the classes see the `C++` documentation:
   \url{}
 \end{center}
 
-\subsection{Loss Classes}
+\subsection{Loss Classes}\label{subsec:loss-classes}
 
 For a theoretical background see section methodology.
 

diff --git a/vignettes/intro.Rmd b/vignettes/intro.Rmd
@@ -9,50 +9,5 @@ vignette: >
   %\VignetteEncoding{UTF-8}
 ---
 
-Vignettes are long form documentation commonly included in packages. Because they are part of the distribution of the package, they need to be as compact as possible. The `html_vignette` output type provides a custom style sheet (and tweaks some options) to ensure that the resulting html is as small as possible. The `html_vignette` format:
-
-- Never uses retina figures
-- Has a smaller default figure size
-- Uses a custom CSS stylesheet instead of the default Twitter Bootstrap style
-
-## Vignette Info
-
-Note the various macros within the `vignette` section of the metadata block above. These are required in order to instruct R how to build the vignette. Note that you should change the `title` field and the `\VignetteIndexEntry` to match the title of your vignette.
-
-## Styles
-
-The `html_vignette` template includes a basic CSS theme. To override this theme you can specify your own CSS in the document metadata as follows:
-
-    output: 
-      rmarkdown::html_vignette:
-        css: mystyles.css
-
-## Figures
-
-The figure sizes have been customised so that you can easily put two images side-by-side. 
-
-```{r, fig.show='hold'}
-plot(1:10)
-plot(10:1)
-```
-
-You can enable figure captions by `fig_caption: yes` in YAML:
-
-    output:
-      rmarkdown::html_vignette:
-        fig_caption: yes
-
-Then you can use the chunk option `fig.cap = "Your figure caption."` in **knitr**.
-
-## More Examples
-
-You can write math expressions, e.g. $Y = X\beta + \epsilon$, footnotes^[A footnote here.], and tables, e.g. using `knitr::kable()`.
-
-```{r, echo=FALSE, results='asis'}
-knitr::kable(head(mtcars, 10))
-```
-
-Also a quote using `>`:
-
-> "He who gives up [code] safety for [code] speed deserves neither."
-([via](https://twitter.com/hadleywickham/status/504368538874703872))
+- For what stands compboost
+- component-wise boosting aka model based boosting
diff --git a/vignettes/theory.Rmd b/vignettes/theory.Rmd
@@ -13,11 +13,109 @@ vignette: >
 
 \section{Methodology}
 
+
+
+\subsection{Learning Theory Reminder}
+
+\subsubsection{Loss Function}
+
+The aim of machine learning is to find a model (function) $\hat{f}$ which 
+approximate the real but unknown $f$. which best suits our data. But finding 
+this model requires a mapping from the training data 
+$\mathcal{D}_\mathrm{train} = \left\{(x^{(i)}, y^{(i)})\ |\ i \in \{1, \dots, n\}\right\}$ 
+to a model $\hat{f}$. This mapping is called inducer. \\
+
+To quantify the goodness of a prediction $y = f(x)$ we need a function to 
+measure the loss of this prediction. Basically, the loss function can also 
+be seen as a metric between the true value $y$ and its prediction $y$:
+
+\begin{align*}
+  L : \mathcal{Y} \times \mathcal{X} &\rightarrow\ \mathbb{R}_+ \\
+                                 y,x &\mapsto\ L\left(y, f(x)\right)
+\end{align*}
+
+The loss function is used within the inducer to fit a function (model) 
+$\hat{f}$ using training data $\mathcal{D}_\mathrm{train}$ (see section 
+\ref{sec:grad-boosting}). It is worth mentioning that different loss functions 
+transfer their properties to the inducer. For instance measuring the absolute
+difference between $y$ and $f(x)$ (absolute loss) is more robust in terms
+of outliers then measuring the qudratic differences (quadratic loss). \\
+
+The properties of the loss function is also used to tackle different tasks. 
+Doing classification requires other loss functions than regression tasks. 
+To get an overview about different losses and their use see section 
+\ref{subsec:loss-classes} about the implemented loss classes of 
+\texttt{compboost}.
+
+
+\subsubsection{Empirical Risk}
+
+It would be desirable to have the loss for every possible combination of 
+$x \in \mathcal{X}$ and the corresponding true value $y \in \mathcal{Y}$. 
+Therefore, the natural thing would be to measure the expectation of the loss
+with respect to the joint distribution $\mathbb{P}_{xy}$. This expectation is
+defined as the risk $\mathcal{R}(f)$:
+\[
+  \mathcal{R}(f) = \mathbb{E}\left[L(y, f(x))\right] = \int L(y,f(x))\ d\mathbb{P}_{xy}
+\]
+
+Since $\mathbb{P}_{xy}$ is unknown it is not possible to exactly calculate 
+$\mathcal{R}(f)$. The most common way to approximate the risk is to use its
+empirical analogon the mean using the observation of the training data 
+$(y, x) \in \mathcal{D}_\mathrm{train}$. This is called the empirical risk
+$\mathcal{R}_\mathrm{emp}(f)$:
+\[
+  \mathcal{R}_\mathrm{emp}(f) = \frac{1}{n}\sum\limits_{i=1}^n\ 
+  L\left(y^{(i)}, f(x^{(i)})\right)
+\]
+
+It is also common to use the empirical risk as a summed version:
+\[
+  \mathcal{R}_\mathrm{emp}(f) = \sum\limits_{i=1}^n\ 
+  L\left(y^{(i)}, f(x^{(i)})\right)
+\]
+
+In \texttt{compboost} we are using the average version of the empirical risk.\\
+
+\subsubsection{Loss Minimization}
+
+An obvious aim is now to minimize the empirical risk which is also known as
+loss minimization and use the function $\hat{f}$ which minimizes 
+$\mathcal{R}_\mathrm{emp}(f)$:
+\[
+  \hat{f} = \underset{f \in H}{\mathrm{arg~min}}\ \mathcal{R}_\mathrm{emp}(f)
+\]
+
+In component-wise boosting we assume that $f$ is a function which can be 
+parameterized by $\theta \in \Theta$ since we want to have interpretable 
+learner (as we will see later). Hence, we can parameterize the empirical risk:
+\[
+  \mathcal{R}_\mathrm{emp}(\theta) = \frac{1}{n}\sum\limits_{i=1}^n\ 
+  L\left(y^{(i)}, f(x^{(i)}|\theta)\right)
+\]
+Therefore, the loss minimization yields in finding a parameter setting  
+$\hat{\theta}$ which minimizes $\mathcal{R}_\mathrm{emp}(\theta)$:
+\[
+  \hat{\theta} = \underset{\theta \in \Theta}{\mathrm{arg~min}}\
+  \mathcal{R}_\mathrm{emp}(\theta)
+\]
+
+
+\subsection{Gradient Boosting Reminder}\label{sec:grad-boosting}
+
+\subsubsection{Forward Stagewise Additive Modelling}
+
+- find parameter in a greedy fashion (makes optimization very simple)
+
+\subsubsection{Gradient Boosting}
+
 - Ganz kurz forward stagewise additive modelling (eher zitieren)
 - Ganz kurz boosting (eher zitieren)
-- Bisschen genauer model based boosting
 
 
 
-- Eigenschaften vom Loss werden auf Algorithmus übertragen
-- Genauere infos zur loss definition auf classes verweisen
+\subsection{Component-wise Boosting}
+
+
+- Bisschen genauer model based boosting
+