Permalink
Browse files

Tweaking intro to vignette

  • Loading branch information...
1 parent 0faf91a commit 2dc780374cf91f3fcbdac5d8a5eeccf1aa6258c0 @lianos committed Mar 27, 2012
Showing with 34 additions and 24 deletions.
  1. +30 −20 inst/doc/MLplay.Rnw
  2. +4 −4 inst/doc/MLplay.bib
View
@@ -190,15 +190,15 @@ corners of your data..
It can't be stressed enough that proper parameter model assessment
(through cross validation, for instance) is \emph{absolutely essential}, when
attempting to apply predictive modeling techniques in ``the real world.''
-
+
For further ML references, especially in the context of \texttt{R} and bioconductor,
the reader might be interested in the following resources:
-
+
\begin{itemize}
\item The \href{http://www.bioconductor.org/help/course-materials/2011/CSAMA/}{CSAMA 2011 workshop, machine learning primer}, by Vincent Carey.
\item The \href{http://cran.r-project.org/web/packages/kernlab/}{vignette from the caret package}, by Max Kuhn.
\end{itemize}
-
+
\item We will be exploring support vector machines through a new R library
I am authoring called \href{https://gihub.com/lianos/shikken}{shikken}.
Shikken is a wrapper to the excellent
@@ -229,13 +229,13 @@ that was first introduced by Boser, Guyon and Vapnik~\cite{Boser:1992uo}. Put
simply, in a two-class classification setting, the SVM finds ``the best''
separating hyperplane ${\bf w}$ that separates the data points in each
class from each other, as shown in Figure~\ref{fig:svmdecision}. Once ${\bf w}$
-is found, a point ${\bf x}_i$ is classified by \emph{the sign} of the
+is found, a point ${\bf x}_i$ is classified by \emph{the sign} of the
described function $f(x)$, shown in Equation~\ref{eqn:primaldiscriminant}.
\begin{align}
f(x_i) = {\bf w} \cdot {\bf x}_i + b
\label{eqn:primaldiscriminant}
-\end{align}
+\end{align}
An advantageous property of the SVM, is that it finds the separating hyperplane
with the largest margin (subject to constraints set by the user).
@@ -310,10 +310,9 @@ SVM methods available in shikken~\footnote{
packages. \texttt{kernlab} also has an implementation of the spectrum kernel
you can use.
}.
-
+
<<initialize, results=hide, echo=TRUE, eval=TRUE>>=
library(BiocSeqSVM)
-library(shikken)
## Create two class data
set.seed(123)
@@ -348,7 +347,13 @@ lsvm <- SVM(X, y, C=100)
plotDecisionSurface(lsvm, X, y)
## Does it accurately classify the data?
-table(predict(lsvm, X), y)
+preds <- predict(lsvm, X)
+accuracy <- (sum(preds == y) / length(y)) * 100
+
+cat(sprintf("Accuracy: %.2f%%\n", accuracy))
+
+## Also can show accuracy with a confusion matrix:
+table(preds, y)
@
The \texttt{plotDecisionSurface} function draws the data points
@@ -394,6 +399,8 @@ closer to our negative data than the positive data.
X.out <- rbind(X, t(c(-1, -0.5)))
y.out <- c(y, -1)
+simplePlot(X.out, y.out)
+
lsvm <- SVM(X.out, y.out, C=100)
plotDecisionSurface(lsvm, X.out, y.out)
@@ -415,7 +422,7 @@ plotDecisionSurface(lsvm, X.out, y.out)
@
\begin{figure}[htbp]
- \centering
+ \centering
\mbox{\subfigure{\includegraphics[width=3in]{Rfigs/gen-easyMargin.pdf}}\quad
\subfigure{\includegraphics[width=3in]{Rfigs/gen-easySoftMargin.pdf} }}
\caption{
@@ -501,7 +508,7 @@ X3d <- t(X3d)
@
\begin{figure}[htbp]
- \centering
+ \centering
\mbox{\subfigure{\includegraphics[width=3in]{Rfigs/gen-circleData}}\quad
\subfigure{\includegraphics[width=3in]{figs/poly-circle-3d.png} }}
\caption{
@@ -523,7 +530,7 @@ Now we have to travel into the weeds a bit ...
There is a \emph{dual} formulation of the SVM objective function that
uses Lagrange multipliers to make the optimization problem of the \emph{primal}
-(Equation~\ref{eqn:prmial}) easier to solve (apparently!).
+(Equation~\ref{eqn:primal}) easier to solve (apparently!).
Its optimal value is the same as the primal one under certain
constraints\cite{BenHur:2008ec}. To help keep
our sanity, the derivation of the dual from the primal is skipped here,
@@ -535,20 +542,23 @@ Equation~\ref{eqn:dual}.
\begin{align}
\max_a \sum_{i=1}^n \alpha_i - \frac {1} {2} \sum_{i=1}^n \sum_{j=1}^n y_i y_j \alpha_i \alpha_j \left\langle {\bf x}_i, {\bf x}_j \right\rangle \nonumber \\
+ = \max_a \sum_{i=1}^n \alpha_i - \frac {1} {2} \sum_{i=1}^n \sum_{j=1}^n y_i y_j \alpha_i \alpha_j k \left( {\bf x}_i, {\bf x}_j \right) \nonumber \\
\mbox{s.t : } \sum_{i=1}^n y_i \alpha_i = 0; \mbox{ and } 0 \leq a_o \leq C
\label{eqn:dual}
\end{align}
It can also be shown that the weight vector ${\bf w}$ can be
-expressed solely as a function over the examples ${\bf x}_i$ and their optimal values of $\alpha_i$ (found in Equation~\ref{eqn:dual}), as shown in Equation~\ref{eqn:wvector}.
+expressed solely as a function over the examples ${\bf x}_i$ and their optimal values of $\alpha_i$, as shown in Equation~\ref{eqn:wvector}.
\begin{align}
{\bf w} = \sum_{i=1}^n y_i \alpha_i {\bf x}_i
\label{eqn:wvector}
\end{align}
Using the kernel trick we can rewrite our discriminant function from
-$f(x) = {\bf x} \cdot x + b$ Equation~\ref{eqn:wvector} to:
+Equation~\ref{eqn:primaldiscriminant}
+% $f(x) = {\bf x} \cdot x + b$
+to the form shown in Equation~\ref{eqn:wkernel}. Note that the solution to the dual and calculating the objective function only involve evaluating the kernel function over pairs of examples. If we have a sufficiently clever implementation of the kernel function, we can avoid having to explicitly embed our data into its higher dimensional space.
\begin{align}
f({\bf x}) = \sum_{i=1}^n y_i \alpha_i k({\bf x}_i, {\bf x}) + b
@@ -563,11 +573,11 @@ $\alpha_i > 0$ --- these examples are called the \emph{support vectors} and lie
\paragraph{Important take away from the dual and kernels}
\begin{itemize}
\item We can use kernels to calculate similarities between two objects
- by implicitly mapping them to different feature spaces.
- \item The dual of the SVM can be solved in this implicit mapping (Equation~\ref{eqn:dual}),
- which means you can work in, say, a $50,000$ dimensional space without having
- to explicitly generate feature vectors of $50,000$ dimensions for all of your
- data points
+ by \emph{implicitly} mapping them to different feature spaces.
+ \item The dual of the SVM can be solved in this implicit mapping
+ (Equation~\ref{eqn:dual}), which means you can work in, say, a $50,000$
+ dimensional space without having to explicitly generate feature vectors
+ of $50,000$ dimensions for all of your data points
\item The decision boundary of the SVM has a sparse representation which only
relies on the $\alpha_i$ values from your support vectors, and the support
vectors themselves, which you keep in their ``native'' (lower dimensional)
@@ -606,7 +616,7 @@ plotDecisionSurface(psvm, Xc, yc, wireframe=TRUE)
@
\begin{figure}[htbp]
- \centering
+ \centering
\mbox{\subfigure{\includegraphics[width=3in]{Rfigs/gen-svmPoly.pdf}}\quad
\subfigure{\includegraphics[width=3in]{Rfigs/gen-svmPoly3D.pdf} }}
\caption{
@@ -640,7 +650,7 @@ plotDecisionSurface(gsvm, Xc, yc, wireframe=TRUE)
@
\begin{figure}[htbp]
- \centering
+ \centering
\mbox{\subfigure{\includegraphics[width=3in]{Rfigs/gen-svmGaus.pdf}}\quad
\subfigure{\includegraphics[width=3in]{Rfigs/gen-svmGaus3D.pdf} }}
\caption{
View
@@ -1,5 +1,5 @@
@article{Ratsch:2006il,
-author = {Rätsch, Gunnar and Sonnenburg, Sören and Schäfer, Christin},
+author = {Ratsch, Gunnar and Sonnenburg, Sören and Schäfer, Christin},
journal = {BMC Bioinformatics},
title = {{Learning interpretable SVMs for biological sequence classification.}},
month = {},
@@ -8,7 +8,7 @@ @article{Ratsch:2006il
}
@article{BenHur:2008ec,
-author = {Ben-Hur, Asa and Ong, Cheng Soon and Sonnenburg, Sören and Schölkopf, Bernhard and Rätsch, Gunnar},
+author = {Ben-Hur, Asa and Ong, Cheng Soon and Sonnenburg, Sören and Schölkopf, Bernhard and Ratsch, Gunnar},
journal = {PLoS Comput Biol},
title = {{Support vector machines and kernels for computational biology.}},
number = {10},
@@ -38,7 +38,7 @@ @article{Noble:2006br
}
@article{Sonnenburg:2007wu,
-author = {Sonnenburg, S. and Rätsch, G and Rieck, K.},
+author = {Sonnenburg, S. and Ratsch, G and Rieck, K.},
journal = {Large Scale Kernel Machines},
title = {{Large scale learning with string kernels}},
month = {},
@@ -77,7 +77,7 @@ @article{Leslie:2002tx
}
@article{Sonnenburg:2008do,
-author = {Sonnenburg, Sören and Zien, Alexander and Philips, Petra and Rätsch, Gunnar},
+author = {Sonnenburg, Soren and Zien, Alexander and Philips, Petra and Ratsch, Gunnar},
journal = {Bioinformatics},
title = {{POIMs: positional oligomer importance matrices--understanding support vector machine-based signal detectors.}},
number = {13},

0 comments on commit 2dc7803

Please sign in to comment.