-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
0d3c3a2
commit eb96d6f
Showing
26 changed files
with
8,140 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,22 @@ | ||
machine-learning-cheat-sheet | ||
Machine learning cheat sheet | ||
============================ | ||
|
||
classical equations and diagrams of machine learning | ||
This cheat sheet contains many classical equations and diagrams on machine learning, which will help you quickly recall knowledge and ideas on machine learning. | ||
|
||
The cheat sheet will also appeal to someone who is preparing for a job interview related to machine learning. | ||
|
||
##LaTeX template | ||
This open-source book adopts the [Springer latex templte](http://www.springer.com/authors/book+authors?SGWID=0-154102-12-970131-0). | ||
|
||
##How to compile on Windows | ||
1. Install [Tex Live 2012](http://www.tug.org/texlive/), then add its `bin` path for example `D:\texlive\2012\bin\win32` to he PATH environment variable. | ||
2. Install [TeXstudio](http://texstudio.sourceforge.net/). | ||
3. Configure TeXstudio. | ||
Run TeXstudio, click `Options-->Configure Texstudio-->Commands`, set `XeLaTex` to `xelatex -synctex=1 -interaction=nonstopmode %.tex`. | ||
|
||
Click `Options-->Configure Texstudio-->Build`, | ||
set `Build & View` to `Compile & View`, | ||
set `Default Compiler` to `XeLaTex`, | ||
set `PDF Viewer` to `Internal PDF Viewer(windowed)`, so that when previewing it will pop up a standalone window, which will be convenient. | ||
4. Compile. Use Open `main.tex` with TeXstudio,click the green arrow on the menu bar, then it will start to compile. | ||
In the messages window below we can see the compilation command that TeXstudio is using is `xelatex -synctex=1 -interaction=nonstopmode "ACM-cheat-sheet".tex` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
%%%%%%%%%%%%%%%%%%%%%%acknow.tex%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% | ||
% sample acknowledgement chapter | ||
% | ||
% Use this file as a template for your own input. | ||
% | ||
%%%%%%%%%%%%%%%%%%%%%%%% Springer %%%%%%%%%%%%%%%%%%%%%%%%%% | ||
|
||
\extrachap{Acknowledgements} | ||
|
||
Use the template \emph{acknow.tex} together with the Springer document class SVMono (monograph-type books) or SVMult (edited books) if you prefer to set your acknowledgement section as a separate chapter instead of including it as last part of your preface. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
%%%%%%%%%%%%%%%%%%%%%%acronym.tex%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% | ||
% sample list of acronyms | ||
% | ||
% Use this file as a template for your own input. | ||
% | ||
%%%%%%%%%%%%%%%%%%%%%%%% Springer %%%%%%%%%%%%%%%%%%%%%%%%%% | ||
|
||
\extrachap{Acronyms} | ||
|
||
Use the template \emph{acronym.tex} together with the Springer document class SVMono (monograph-type books) or SVMult (edited books) to style your list(s) of abbreviations or symbols in the Springer layout. | ||
|
||
Lists of abbreviations\index{acronyms, list of}, symbols\index{symbols, list of} and the like are easily formatted with the help of the Springer-enhanced \verb|description| environment. | ||
|
||
\begin{description}[CABR] | ||
\item[ABC]{Spelled-out abbreviation and definition} | ||
\item[BABI]{Spelled-out abbreviation and definition} | ||
\item[CABR]{Spelled-out abbreviation and definition} | ||
\end{description} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
%%%%%%%%%%%%%%%%%%%%% appendix.tex %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% | ||
% | ||
% sample appendix | ||
% | ||
% Use this file as a template for your own input. | ||
% | ||
%%%%%%%%%%%%%%%%%%%%%%%% Springer-Verlag %%%%%%%%%%%%%%%%%%%%%%%%%% | ||
|
||
\chapter{Chapter Heading} | ||
\label{introA} % Always give a unique label | ||
% use \chaptermark{} | ||
% to alter or adjust the chapter heading in the running head | ||
|
||
Use the template \emph{appendix.tex} together with the Springer document class SVMono (monograph-type books) or SVMult (edited books) to style appendix of your book in the Springer layout. | ||
|
||
|
||
\section{Section Heading} | ||
\label{sec:A1} | ||
% Always give a unique label | ||
% and use \ref{<label>} for cross-references | ||
% and \cite{<label>} for bibliographic references | ||
% use \sectionmark{} | ||
% to alter or adjust the section heading in the running head | ||
Instead of simply listing headings of different levels we recommend to let every heading be followed by at least a short passage of text. Further on please use the \LaTeX\ automatism for all your cross-references and citations. | ||
|
||
|
||
\subsection{Subsection Heading} | ||
\label{sec:A2} | ||
Instead of simply listing headings of different levels we recommend to let every heading be followed by at least a short passage of text. Further on please use the \LaTeX\ automatism for all your cross-references and citations as has already been described in Sect.~\ref{sec:A1}. | ||
|
||
For multiline equations we recommend to use the \verb|eqnarray| environment. | ||
\begin{eqnarray} | ||
\vec{a}\times\vec{b}=\vec{c} \nonumber\\ | ||
\vec{a}\times\vec{b}=\vec{c} | ||
\label{eq:A01} | ||
\end{eqnarray} | ||
|
||
\subsubsection{Subsubsection Heading} | ||
Instead of simply listing headings of different levels we recommend to let every heading be followed by at least a short passage of text. Further on please use the \LaTeX\ automatism for all your cross-references and citations as has already been described in Sect.~\ref{sec:A2}. | ||
|
||
Please note that the first line of text that follows a heading is not indented, whereas the first lines of all subsequent paragraphs are. | ||
|
||
% For figures use | ||
% | ||
\begin{figure}[t] | ||
\sidecaption[t] | ||
% Use the relevant command for your figure-insertion program | ||
% to insert the figure file. | ||
% For example, with the graphicx style use | ||
\includegraphics[scale=.65]{figure} | ||
% | ||
% If no graphics program available, insert a blank space i.e. use | ||
%\picplace{5cm}{2cm} % Give the correct figure height and width in cm | ||
% | ||
\caption{Please write your figure caption here} | ||
\label{fig:A1} % Give a unique label | ||
\end{figure} | ||
|
||
% For tables use | ||
% | ||
\begin{table} | ||
\caption{Please write your table caption here} | ||
\label{tab:A1} % Give a unique label | ||
% | ||
% Follow this input for your own table layout | ||
% | ||
\begin{tabular}{p{2cm}p{2.4cm}p{2cm}p{4.9cm}} | ||
\hline\noalign{\smallskip} | ||
Classes & Subclass & Length & Action Mechanism \\ | ||
\noalign{\smallskip}\hline\noalign{\smallskip} | ||
Translation & mRNA$^a$ & 22 (19--25) & Translation repression, mRNA cleavage\\ | ||
Translation & mRNA cleavage & 21 & mRNA cleavage\\ | ||
Translation & mRNA & 21--22 & mRNA cleavage\\ | ||
Translation & mRNA & 24--26 & Histone and DNA Modification\\ | ||
\noalign{\smallskip}\hline\noalign{\smallskip} | ||
\end{tabular} | ||
$^a$ Table foot note (with superscript) | ||
\end{table} | ||
% |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
%%%%%%%%%%%%%%%%%%%%clist.tex %%%%%%%%%%%%%%%%%%%%%%%% | ||
% | ||
% sample list of contributors and their addresses | ||
% | ||
% Use this file as a template for your own input. | ||
% | ||
%%%%%%%%%%%%%%%%%%%%%%%% Springer %%%%%%%%%%%%%%%%%%%% | ||
\contributors | ||
|
||
\begin{thecontriblist} | ||
Firstname Surname | ||
\at ABC Institute, 123 Prime Street, Daisy Town, NA 01234, USA, \email{smith@smith.edu} | ||
\and | ||
Firstname Surname | ||
\at XYZ Institute, Technical University, Albert-Schweitzer-Str. 34, 1000 Berlin, Germany, \email{meier@tu.edu} | ||
\end{thecontriblist} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,127 @@ | ||
\chapter{Introduction} | ||
|
||
\section{Types of machine learning} | ||
\begin{equation}\nonumber | ||
Machine\, learning\begin{cases} | ||
Supervised\, learning \begin{cases} Classfication\, \\ Regression \end{cases}\\ | ||
Unsupervised\, learning \begin{cases} Discovering\, clusters\, \\ Discovering\, latent\, factors\, \\ Discovering\, graph\, structure\, \\ Matrix\, completion \end{cases}\\ | ||
\end{cases} | ||
\end{equation} | ||
|
||
\section{Three elements of a machine learning method} | ||
|
||
\textbf{method = model + strategy + algorithm} | ||
|
||
\subsection{Model} | ||
In supervised learning, a model is a decision function or conditional probability distribution to be learned. The model's hypothesis space contains all possible decition fuctions $f(x)$ or conditional probability distributions $P(y|\vec{x})$. | ||
|
||
\subsection{Strategy} | ||
Given a model's hypothesis space, we need a strategy to select which hypothesis is optimal. | ||
|
||
\subsubsection{Loss function and risk function} | ||
|
||
\begin{definition} | ||
In order to measure how well a function fits the training data, a \textbf{loss function} $L:Y \times Y \rightarrow R \geq 0$ is defined. For training example $(x_i,y_i)$, the loss of predicting the value $\widehat{y}$ is $L(y_i,\widehat{y})$. | ||
\end{definition} | ||
|
||
The following is some common loss functions: | ||
\begin{enumerate} | ||
\item 0-1 loss function $L(Y,f(X))=I(Y,f(X))=\begin{cases} 1, & Y=f(X) \\ 0, & Y \neq f(X) \end{cases}$ | ||
\item Quadratic loss function $L(Y,f(X))=\left(Y-f(X)\right)^2$ | ||
\item Absolute loss function $L(Y,f(X))=\abs{Y-f(X)}$ | ||
\item Logarithmic loss function $L(Y,P(Y|X))=-\log{P(Y|X)}$ | ||
\end{enumerate} | ||
|
||
\begin{definition} | ||
The risk of function $f$ is defined as the expected loss of $f$: | ||
\begin{equation} | ||
R_{exp}(f)=E_p\left[L\left(Y,f(X)\right)\right]=\int _{X \times Y} L\left(y,f(x)\right)P(x,y)dxdy | ||
\end{equation} | ||
which is also called expected loss or \textbf{risk function}. | ||
\end{definition} | ||
|
||
\begin{definition} | ||
The risk function $R_{exp}(f)$ can be estimated from the training data as | ||
\begin{equation} | ||
R_{emp}(f)=\dfrac{1}{N}\sum\limits_{i=1}^{N} L\left(y_i,f(x_i)\right) | ||
\end{equation} | ||
which is also called empirical loss or \textbf{empirical risk}. | ||
\end{definition} | ||
|
||
You can define your own loss function, but if you're a novice, you're probably better off using one from the literature. There are conditions that loss functions should meet\footnote{\url{http://t.cn/zTrDxLO}}: | ||
\begin{enumerate} | ||
\item They should approximate the actual loss you're trying to minimize. As was said in the other answer, the standard loss functions for classification is zero-one-loss (misclassification rate) and the ones used for training classifiers are approximations of that loss. | ||
\item The loss function should work with your intended optimization algorithm. That's why zero-one-loss is not used directly: it doesn't work with gradient-based optimization methods since it doesn't have a well-defined gradient (or even a subgradient, like the hinge loss for SVMs has). | ||
|
||
The main algorithm that optimizes the zero-one-loss directly is the old perceptron algorithm(chapter \S \ref{chap:Perceptron}). | ||
\end{enumerate} | ||
|
||
\subsubsection{ERM and SRM} | ||
\begin{definition} | ||
ERM(Empirical risk minimization) | ||
\begin{equation} | ||
\min\limits _{f \in \mathcal{F}} R_{emp}(f)=\min\limits _{f \in \mathcal{F}} \dfrac{1}{N}\sum\limits_{i=1}^{N} L\left(y_i,f(x_i)\right) | ||
\end{equation} | ||
\end{definition} | ||
|
||
\begin{definition} | ||
Structural risk | ||
\begin{equation} | ||
R_{smp}(f)=\dfrac{1}{N}\sum\limits_{i=1}^{N} L\left(y_i,f(x_i)\right) +\lambda J(f) | ||
\end{equation} | ||
\end{definition} | ||
|
||
\begin{definition} | ||
SRM(Structural risk minimization) | ||
\begin{equation} | ||
\min\limits _{f \in \mathcal{F}} R_{srm}(f)=\min\limits _{f \in \mathcal{F}} \dfrac{1}{N}\sum\limits_{i=1}^{N} L\left(y_i,f(x_i)\right) +\lambda J(f) | ||
\end{equation} | ||
\end{definition} | ||
|
||
\subsection{Algorithm} | ||
Namely training algorithm(or learning algorithm), which is used to compute the optimal result according to the strategy. It's a procedural concept. | ||
|
||
\section{Cross validation} | ||
\begin{definition} | ||
\textbf{Cross validation}, sometimes called \emph{rotation estimation}, is a \emph{model validation} technique for assessing how the results of a statistical analysis will generalize to an independent data set\footnote{\url{http://en.wikipedia.org/wiki/Cross-validation_(statistics)}}. | ||
\end{definition} | ||
|
||
Common types of cross-validation: | ||
\begin{enumerate} | ||
\item K-fold cross-validation. In k-fold cross-validation, the original sample is randomly partitioned into k equal size subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data. | ||
\item 2-fold cross-validation. Also, called simple cross-validation or holdout method. This is the simplest variation of k-fold cross-validation, k=2. | ||
\item Leave-one-out cross-validation(\emph{LOOCV}). k=M, the number of original samples. | ||
\end{enumerate} | ||
|
||
\section{Linear Regression} | ||
Given | ||
\begin{equation} | ||
\begin{array}{lcl} | ||
\mathcal{D}=\left\{(\vec{x}_i,y_i) | i=1:M\right\} \\ | ||
\mathcal{H}=\left\{f(\vec{x_i})=\vec{w}^T\vec{x_i}+b | i=1:M\right\}\\ | ||
L(\vec{w},b)=\sum\limits_{i=1}^{M} \left(y_i-f(\vec{x}_i)-b\right)^2\\ | ||
\end{array} | ||
\end{equation} | ||
|
||
Let $\widehat{\vec{w}}=\left(\vec{w}^T,b\right)^T$, and | ||
\begin{equation} | ||
\widehat{\vec{X}}=\left(\begin{array}{lcr} | ||
\widehat{\vec{x}}_1^T\\ | ||
\widehat{\vec{x}}_2^T\\ | ||
\vdots \\ | ||
\widehat{\vec{x}}_M^T\\ | ||
\end{array} | ||
\right), where\; \widehat{\vec{x}}_i=\left(\vec{x}_i^T,1\right)^T | ||
\end{equation} | ||
|
||
We can get | ||
\begin{equation} | ||
\begin{array}{lcr} | ||
L(\widehat{\vec{w}})=\left(\vec{y}-\widehat{\vec{X}}\widehat{\vec{w}}\right)^T\left(\vec{y}-\widehat{\vec{X}}\widehat{\vec{w}}\right)\\ | ||
\dfrac{\partial L}{\partial{\widehat{\vec{w}}}}=-2\widehat{\vec{X}}^T\vec{y}+2\widehat{\vec{X}}^T\widehat{\vec{X}}\widehat{\vec{w}}=0\\ | ||
\widehat{\vec{X}}^T\vec{y}=\widehat{\vec{X}}^T\widehat{\vec{X}}\widehat{\vec{w}}\\ | ||
\widehat{\vec{w}}=\left(\widehat{\vec{X}}^T\widehat{\vec{X}}\right)^{-1}\widehat{\vec{X}}^T\vec{y} | ||
\end{array} | ||
\end{equation} | ||
|
||
If $\widehat{\vec{X}}^T\widehat{\vec{X}}$ is singular, the pseudo-inverse can be used, or else the technique of ridge regression described below can be applied. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
\chapter{Perceptron} | ||
\label{chap:Perceptron} | ||
|
||
\section{Model} | ||
\begin{equation} | ||
\mathcal{H}:f(\vec{x})=\text{sign}(\vec{w}\vec{x}+b) | ||
\end{equation} | ||
where $\text{sign}(x)=\begin{cases}+1, & x \geq 0\\-1, & x<0\\\end{cases}$, see Fig. ~\ref{fig:perceptron}\footnote{\url{https://en.wikipedia.org/wiki/Perceptron}}. | ||
\begin{figure}[hbtp] | ||
\centering | ||
\includegraphics[scale=.50]{figures/perceptron.png} | ||
\caption{Perceptron} | ||
\label{fig:perceptron} | ||
\end{figure} | ||
|
||
Perceptron is a binary linear classifier, which is a discriminant model. | ||
|
||
\section{Strategy} | ||
\begin{eqnarray} | ||
L(\vec{w},b)&=&-y_i(\vec{wx}_i+b)\\ | ||
R_{emp}(f)&=&-\sum\limits_i y_i(\vec{wx}_i+b)\\ | ||
\end{eqnarray} | ||
|
||
\section{Learning algorithm} | ||
\subsection{Primal form} | ||
Stochastic gradient descent, the pseudo code is as follows: | ||
\begin{algorithm}[htbp] | ||
%\SetAlgoLined | ||
\SetAlgoNoLine | ||
|
||
$\vec{w} \leftarrow 0;\; b \leftarrow 0;\; k \leftarrow 0$\; | ||
\While{no mistakes made within the for loop}{ | ||
\For{$i\leftarrow 1$ \KwTo $N$}{ | ||
\If{$y_i(\vec{w}^T\vec{x}_i+b) \leq 0$}{ | ||
$\vec{w} \leftarrow \vec{w}+\eta y_i \vec{x}_i$\; | ||
$b \leftarrow b+\eta y_i$\; | ||
$k \leftarrow k+1$\; | ||
} | ||
} | ||
} | ||
\caption{Perceptron learning algorithm, primal form} | ||
\end{algorithm} | ||
|
||
\subsection{Convergency} | ||
\begin{theorem} | ||
(\textbf{Novikoff}) If traning data set $\mathcal{D}$ is linearly separable, then | ||
\begin{enumerate} | ||
\item There exists a hyperplane denoted as $\widehat{\vec{w}}_{opt} \cdot \vec{x}+b_{opt}=0$ which can correctly seperate all samples, and $\exists\gamma>0,\forall i, y_i(\vec{w}_{opt} \cdot \vec{x}_i+b_{opt}) \geq \gamma$ | ||
\item $k \leq \left(\dfrac{R}{\gamma}\right)^2$, where $R=\max\limits_{1 \leq i \leq N} \abs{\abs{\widehat{\vec{x}}_i}}$ | ||
\end{enumerate} | ||
\end{theorem} | ||
|
||
\begin{proof} | ||
(1) let $\gamma=\min\limits_{i} y_i(\vec{w}_{opt} \cdot \vec{x}_i+b_{opt})$, then we get $y_i(\vec{w}_{opt} \cdot \vec{x}_i+b_{opt}) \geq \gamma$. | ||
|
||
(2) The algorithm start from $\widehat{\vec{x}_0}=0$, if a instance is misclassified, then update the weight. Let $\widehat{\vec{w}_{k-1}}$ denotes the extended weight before the k-th misclassified instance, then we can get | ||
\begin{eqnarray} | ||
y_i(\widehat{\vec{w}}_{k-1} \cdot \widehat{\vec{x}_i})&=&y_i(\vec{w}_{k-1} \cdot \vec{x}_i+b_{k-1}) \leq 0\\ | ||
\widehat{\vec{w}}_k&=&\widehat{\vec{w}}_{k-1}+\eta y_i \widehat{\vec{x}_i} | ||
\end{eqnarray} | ||
|
||
We could infer the following two equations, the proof procedure are omitted. | ||
\begin{enumerate} | ||
\item $\widehat{\vec{w}}_k \cdot \widehat{\vec{w}}_{opt} \geq k\eta\gamma$ | ||
\item $\abs{\abs{\widehat{\vec{w}}_k}}^2 \leq k\eta^2R^2$ | ||
\end{enumerate} | ||
|
||
From above two equations we get | ||
\begin{eqnarray} | ||
\nonumber k\eta\gamma & \leq & \widehat{\vec{w}}_k \cdot \widehat{\vec{w}}_{opt} \leq \abs{\abs{\widehat{\vec{w}}_k}}\abs{\abs{\widehat{\vec{w}}_{opt}}} \leq \sqrt k \eta R \\ | ||
\nonumber k^2\gamma^2 & \leq & kR^2 \\ | ||
\nonumber \text{i.e. } k & \leq & \left(\dfrac{R}{\gamma}\right)^2 | ||
\end{eqnarray} | ||
\end{proof} | ||
|
||
\subsection{Dual form} | ||
\begin{eqnarray} | ||
\vec{w}&=&\sum\limits_{i=1}^{N} \alpha_iy_i\vec{x}_i \\ | ||
b&=&\sum\limits_{i=1}^{N} \alpha_iy_i \\ | ||
f(\vec{x})&=&\text{sign}\left(\sum\limits_{j=1}^{N} \alpha_jy_j\vec{x}_j \cdot \vec{x}+b\right) | ||
\end{eqnarray} | ||
|
||
\begin{algorithm}[htbp] | ||
%\SetAlgoLined | ||
\SetAlgoNoLine | ||
|
||
$\vec{\alpha} \leftarrow 0;\; b \leftarrow 0;\; k \leftarrow 0$\; | ||
\While{no mistakes made within the for loop}{ | ||
\For{$i\leftarrow 1$ \KwTo $N$}{ | ||
\If{$y_i\left(\sum\limits_{j=1}^{N} \alpha_jy_j\vec{x}_j \cdot \vec{x}_i+b\right) \leq 0$}{ | ||
$\vec{\alpha} \leftarrow \vec{\alpha}+\eta$\; | ||
$b \leftarrow b+\eta y_i$\; | ||
$k \leftarrow k+1$\; | ||
} | ||
} | ||
} | ||
\caption{Perceptron learning algorithm, dual form} | ||
\end{algorithm} |
Oops, something went wrong.