Skip to content

Commit

Permalink
第一次提交,写完了感知机
Browse files Browse the repository at this point in the history
  • Loading branch information
soulmachine committed May 15, 2013
1 parent 0d3c3a2 commit eb96d6f
Show file tree
Hide file tree
Showing 26 changed files with 8,140 additions and 2 deletions.
22 changes: 20 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,22 @@
machine-learning-cheat-sheet
Machine learning cheat sheet
============================

classical equations and diagrams of machine learning
This cheat sheet contains many classical equations and diagrams on machine learning, which will help you quickly recall knowledge and ideas on machine learning.

The cheat sheet will also appeal to someone who is preparing for a job interview related to machine learning.

##LaTeX template
This open-source book adopts the [Springer latex templte](http://www.springer.com/authors/book+authors?SGWID=0-154102-12-970131-0).

##How to compile on Windows
1. Install [Tex Live 2012](http://www.tug.org/texlive/), then add its `bin` path for example `D:\texlive\2012\bin\win32` to he PATH environment variable.
2. Install [TeXstudio](http://texstudio.sourceforge.net/).
3. Configure TeXstudio.
Run TeXstudio, click `Options-->Configure Texstudio-->Commands`, set `XeLaTex` to `xelatex -synctex=1 -interaction=nonstopmode %.tex`.

Click `Options-->Configure Texstudio-->Build`,
set `Build & View` to `Compile & View`,
set `Default Compiler` to `XeLaTex`,
set `PDF Viewer` to `Internal PDF Viewer(windowed)`, so that when previewing it will pop up a standalone window, which will be convenient.
4. Compile. Use Open `main.tex` with TeXstudio,click the green arrow on the menu bar, then it will start to compile.
In the messages window below we can see the compilation command that TeXstudio is using is `xelatex -synctex=1 -interaction=nonstopmode "ACM-cheat-sheet".tex`
11 changes: 11 additions & 0 deletions acknow.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
%%%%%%%%%%%%%%%%%%%%%%acknow.tex%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% sample acknowledgement chapter
%
% Use this file as a template for your own input.
%
%%%%%%%%%%%%%%%%%%%%%%%% Springer %%%%%%%%%%%%%%%%%%%%%%%%%%

\extrachap{Acknowledgements}

Use the template \emph{acknow.tex} together with the Springer document class SVMono (monograph-type books) or SVMult (edited books) if you prefer to set your acknowledgement section as a separate chapter instead of including it as last part of your preface.

18 changes: 18 additions & 0 deletions acronym.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
%%%%%%%%%%%%%%%%%%%%%%acronym.tex%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% sample list of acronyms
%
% Use this file as a template for your own input.
%
%%%%%%%%%%%%%%%%%%%%%%%% Springer %%%%%%%%%%%%%%%%%%%%%%%%%%

\extrachap{Acronyms}

Use the template \emph{acronym.tex} together with the Springer document class SVMono (monograph-type books) or SVMult (edited books) to style your list(s) of abbreviations or symbols in the Springer layout.

Lists of abbreviations\index{acronyms, list of}, symbols\index{symbols, list of} and the like are easily formatted with the help of the Springer-enhanced \verb|description| environment.

\begin{description}[CABR]
\item[ABC]{Spelled-out abbreviation and definition}
\item[BABI]{Spelled-out abbreviation and definition}
\item[CABR]{Spelled-out abbreviation and definition}
\end{description}
79 changes: 79 additions & 0 deletions appendix.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
%%%%%%%%%%%%%%%%%%%%% appendix.tex %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% sample appendix
%
% Use this file as a template for your own input.
%
%%%%%%%%%%%%%%%%%%%%%%%% Springer-Verlag %%%%%%%%%%%%%%%%%%%%%%%%%%

\chapter{Chapter Heading}
\label{introA} % Always give a unique label
% use \chaptermark{}
% to alter or adjust the chapter heading in the running head

Use the template \emph{appendix.tex} together with the Springer document class SVMono (monograph-type books) or SVMult (edited books) to style appendix of your book in the Springer layout.


\section{Section Heading}
\label{sec:A1}
% Always give a unique label
% and use \ref{<label>} for cross-references
% and \cite{<label>} for bibliographic references
% use \sectionmark{}
% to alter or adjust the section heading in the running head
Instead of simply listing headings of different levels we recommend to let every heading be followed by at least a short passage of text. Further on please use the \LaTeX\ automatism for all your cross-references and citations.


\subsection{Subsection Heading}
\label{sec:A2}
Instead of simply listing headings of different levels we recommend to let every heading be followed by at least a short passage of text. Further on please use the \LaTeX\ automatism for all your cross-references and citations as has already been described in Sect.~\ref{sec:A1}.

For multiline equations we recommend to use the \verb|eqnarray| environment.
\begin{eqnarray}
\vec{a}\times\vec{b}=\vec{c} \nonumber\\
\vec{a}\times\vec{b}=\vec{c}
\label{eq:A01}
\end{eqnarray}

\subsubsection{Subsubsection Heading}
Instead of simply listing headings of different levels we recommend to let every heading be followed by at least a short passage of text. Further on please use the \LaTeX\ automatism for all your cross-references and citations as has already been described in Sect.~\ref{sec:A2}.

Please note that the first line of text that follows a heading is not indented, whereas the first lines of all subsequent paragraphs are.

% For figures use
%
\begin{figure}[t]
\sidecaption[t]
% Use the relevant command for your figure-insertion program
% to insert the figure file.
% For example, with the graphicx style use
\includegraphics[scale=.65]{figure}
%
% If no graphics program available, insert a blank space i.e. use
%\picplace{5cm}{2cm} % Give the correct figure height and width in cm
%
\caption{Please write your figure caption here}
\label{fig:A1} % Give a unique label
\end{figure}

% For tables use
%
\begin{table}
\caption{Please write your table caption here}
\label{tab:A1} % Give a unique label
%
% Follow this input for your own table layout
%
\begin{tabular}{p{2cm}p{2.4cm}p{2cm}p{4.9cm}}
\hline\noalign{\smallskip}
Classes & Subclass & Length & Action Mechanism \\
\noalign{\smallskip}\hline\noalign{\smallskip}
Translation & mRNA$^a$ & 22 (19--25) & Translation repression, mRNA cleavage\\
Translation & mRNA cleavage & 21 & mRNA cleavage\\
Translation & mRNA & 21--22 & mRNA cleavage\\
Translation & mRNA & 24--26 & Histone and DNA Modification\\
\noalign{\smallskip}\hline\noalign{\smallskip}
\end{tabular}
$^a$ Table foot note (with superscript)
\end{table}
%
16 changes: 16 additions & 0 deletions cblist.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
%%%%%%%%%%%%%%%%%%%%clist.tex %%%%%%%%%%%%%%%%%%%%%%%%
%
% sample list of contributors and their addresses
%
% Use this file as a template for your own input.
%
%%%%%%%%%%%%%%%%%%%%%%%% Springer %%%%%%%%%%%%%%%%%%%%
\contributors

\begin{thecontriblist}
Firstname Surname
\at ABC Institute, 123 Prime Street, Daisy Town, NA 01234, USA, \email{smith@smith.edu}
\and
Firstname Surname
\at XYZ Institute, Technical University, Albert-Schweitzer-Str. 34, 1000 Berlin, Germany, \email{meier@tu.edu}
\end{thecontriblist}
127 changes: 127 additions & 0 deletions chapterIntroduction.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
\chapter{Introduction}

\section{Types of machine learning}
\begin{equation}\nonumber
Machine\, learning\begin{cases}
Supervised\, learning \begin{cases} Classfication\, \\ Regression \end{cases}\\
Unsupervised\, learning \begin{cases} Discovering\, clusters\, \\ Discovering\, latent\, factors\, \\ Discovering\, graph\, structure\, \\ Matrix\, completion \end{cases}\\
\end{cases}
\end{equation}

\section{Three elements of a machine learning method}

\textbf{method = model + strategy + algorithm}

\subsection{Model}
In supervised learning, a model is a decision function or conditional probability distribution to be learned. The model's hypothesis space contains all possible decition fuctions $f(x)$ or conditional probability distributions $P(y|\vec{x})$.

\subsection{Strategy}
Given a model's hypothesis space, we need a strategy to select which hypothesis is optimal.

\subsubsection{Loss function and risk function}

\begin{definition}
In order to measure how well a function fits the training data, a \textbf{loss function} $L:Y \times Y \rightarrow R \geq 0$ is defined. For training example $(x_i,y_i)$, the loss of predicting the value $\widehat{y}$ is $L(y_i,\widehat{y})$.
\end{definition}

The following is some common loss functions:
\begin{enumerate}
\item 0-1 loss function $L(Y,f(X))=I(Y,f(X))=\begin{cases} 1, & Y=f(X) \\ 0, & Y \neq f(X) \end{cases}$
\item Quadratic loss function $L(Y,f(X))=\left(Y-f(X)\right)^2$
\item Absolute loss function $L(Y,f(X))=\abs{Y-f(X)}$
\item Logarithmic loss function $L(Y,P(Y|X))=-\log{P(Y|X)}$
\end{enumerate}

\begin{definition}
The risk of function $f$ is defined as the expected loss of $f$:
\begin{equation}
R_{exp}(f)=E_p\left[L\left(Y,f(X)\right)\right]=\int _{X \times Y} L\left(y,f(x)\right)P(x,y)dxdy
\end{equation}
which is also called expected loss or \textbf{risk function}.
\end{definition}

\begin{definition}
The risk function $R_{exp}(f)$ can be estimated from the training data as
\begin{equation}
R_{emp}(f)=\dfrac{1}{N}\sum\limits_{i=1}^{N} L\left(y_i,f(x_i)\right)
\end{equation}
which is also called empirical loss or \textbf{empirical risk}.
\end{definition}

You can define your own loss function, but if you're a novice, you're probably better off using one from the literature. There are conditions that loss functions should meet\footnote{\url{http://t.cn/zTrDxLO}}:
\begin{enumerate}
\item They should approximate the actual loss you're trying to minimize. As was said in the other answer, the standard loss functions for classification is zero-one-loss (misclassification rate) and the ones used for training classifiers are approximations of that loss.
\item The loss function should work with your intended optimization algorithm. That's why zero-one-loss is not used directly: it doesn't work with gradient-based optimization methods since it doesn't have a well-defined gradient (or even a subgradient, like the hinge loss for SVMs has).

The main algorithm that optimizes the zero-one-loss directly is the old perceptron algorithm(chapter \S \ref{chap:Perceptron}).
\end{enumerate}

\subsubsection{ERM and SRM}
\begin{definition}
ERM(Empirical risk minimization)
\begin{equation}
\min\limits _{f \in \mathcal{F}} R_{emp}(f)=\min\limits _{f \in \mathcal{F}} \dfrac{1}{N}\sum\limits_{i=1}^{N} L\left(y_i,f(x_i)\right)
\end{equation}
\end{definition}

\begin{definition}
Structural risk
\begin{equation}
R_{smp}(f)=\dfrac{1}{N}\sum\limits_{i=1}^{N} L\left(y_i,f(x_i)\right) +\lambda J(f)
\end{equation}
\end{definition}

\begin{definition}
SRM(Structural risk minimization)
\begin{equation}
\min\limits _{f \in \mathcal{F}} R_{srm}(f)=\min\limits _{f \in \mathcal{F}} \dfrac{1}{N}\sum\limits_{i=1}^{N} L\left(y_i,f(x_i)\right) +\lambda J(f)
\end{equation}
\end{definition}

\subsection{Algorithm}
Namely training algorithm(or learning algorithm), which is used to compute the optimal result according to the strategy. It's a procedural concept.

\section{Cross validation}
\begin{definition}
\textbf{Cross validation}, sometimes called \emph{rotation estimation}, is a \emph{model validation} technique for assessing how the results of a statistical analysis will generalize to an independent data set\footnote{\url{http://en.wikipedia.org/wiki/Cross-validation_(statistics)}}.
\end{definition}

Common types of cross-validation:
\begin{enumerate}
\item K-fold cross-validation. In k-fold cross-validation, the original sample is randomly partitioned into k equal size subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data.
\item 2-fold cross-validation. Also, called simple cross-validation or holdout method. This is the simplest variation of k-fold cross-validation, k=2.
\item Leave-one-out cross-validation(\emph{LOOCV}). k=M, the number of original samples.
\end{enumerate}

\section{Linear Regression}
Given
\begin{equation}
\begin{array}{lcl}
\mathcal{D}=\left\{(\vec{x}_i,y_i) | i=1:M\right\} \\
\mathcal{H}=\left\{f(\vec{x_i})=\vec{w}^T\vec{x_i}+b | i=1:M\right\}\\
L(\vec{w},b)=\sum\limits_{i=1}^{M} \left(y_i-f(\vec{x}_i)-b\right)^2\\
\end{array}
\end{equation}

Let $\widehat{\vec{w}}=\left(\vec{w}^T,b\right)^T$, and
\begin{equation}
\widehat{\vec{X}}=\left(\begin{array}{lcr}
\widehat{\vec{x}}_1^T\\
\widehat{\vec{x}}_2^T\\
\vdots \\
\widehat{\vec{x}}_M^T\\
\end{array}
\right), where\; \widehat{\vec{x}}_i=\left(\vec{x}_i^T,1\right)^T
\end{equation}

We can get
\begin{equation}
\begin{array}{lcr}
L(\widehat{\vec{w}})=\left(\vec{y}-\widehat{\vec{X}}\widehat{\vec{w}}\right)^T\left(\vec{y}-\widehat{\vec{X}}\widehat{\vec{w}}\right)\\
\dfrac{\partial L}{\partial{\widehat{\vec{w}}}}=-2\widehat{\vec{X}}^T\vec{y}+2\widehat{\vec{X}}^T\widehat{\vec{X}}\widehat{\vec{w}}=0\\
\widehat{\vec{X}}^T\vec{y}=\widehat{\vec{X}}^T\widehat{\vec{X}}\widehat{\vec{w}}\\
\widehat{\vec{w}}=\left(\widehat{\vec{X}}^T\widehat{\vec{X}}\right)^{-1}\widehat{\vec{X}}^T\vec{y}
\end{array}
\end{equation}

If $\widehat{\vec{X}}^T\widehat{\vec{X}}$ is singular, the pseudo-inverse can be used, or else the technique of ridge regression described below can be applied.
98 changes: 98 additions & 0 deletions chapterPerceptron.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
\chapter{Perceptron}
\label{chap:Perceptron}

\section{Model}
\begin{equation}
\mathcal{H}:f(\vec{x})=\text{sign}(\vec{w}\vec{x}+b)
\end{equation}
where $\text{sign}(x)=\begin{cases}+1, & x \geq 0\\-1, & x<0\\\end{cases}$, see Fig. ~\ref{fig:perceptron}\footnote{\url{https://en.wikipedia.org/wiki/Perceptron}}.
\begin{figure}[hbtp]
\centering
\includegraphics[scale=.50]{figures/perceptron.png}
\caption{Perceptron}
\label{fig:perceptron}
\end{figure}

Perceptron is a binary linear classifier, which is a discriminant model.

\section{Strategy}
\begin{eqnarray}
L(\vec{w},b)&=&-y_i(\vec{wx}_i+b)\\
R_{emp}(f)&=&-\sum\limits_i y_i(\vec{wx}_i+b)\\
\end{eqnarray}

\section{Learning algorithm}
\subsection{Primal form}
Stochastic gradient descent, the pseudo code is as follows:
\begin{algorithm}[htbp]
%\SetAlgoLined
\SetAlgoNoLine

$\vec{w} \leftarrow 0;\; b \leftarrow 0;\; k \leftarrow 0$\;
\While{no mistakes made within the for loop}{
\For{$i\leftarrow 1$ \KwTo $N$}{
\If{$y_i(\vec{w}^T\vec{x}_i+b) \leq 0$}{
$\vec{w} \leftarrow \vec{w}+\eta y_i \vec{x}_i$\;
$b \leftarrow b+\eta y_i$\;
$k \leftarrow k+1$\;
}
}
}
\caption{Perceptron learning algorithm, primal form}
\end{algorithm}

\subsection{Convergency}
\begin{theorem}
(\textbf{Novikoff}) If traning data set $\mathcal{D}$ is linearly separable, then
\begin{enumerate}
\item There exists a hyperplane denoted as $\widehat{\vec{w}}_{opt} \cdot \vec{x}+b_{opt}=0$ which can correctly seperate all samples, and $\exists\gamma>0,\forall i, y_i(\vec{w}_{opt} \cdot \vec{x}_i+b_{opt}) \geq \gamma$
\item $k \leq \left(\dfrac{R}{\gamma}\right)^2$, where $R=\max\limits_{1 \leq i \leq N} \abs{\abs{\widehat{\vec{x}}_i}}$
\end{enumerate}
\end{theorem}

\begin{proof}
(1) let $\gamma=\min\limits_{i} y_i(\vec{w}_{opt} \cdot \vec{x}_i+b_{opt})$, then we get $y_i(\vec{w}_{opt} \cdot \vec{x}_i+b_{opt}) \geq \gamma$.

(2) The algorithm start from $\widehat{\vec{x}_0}=0$, if a instance is misclassified, then update the weight. Let $\widehat{\vec{w}_{k-1}}$ denotes the extended weight before the k-th misclassified instance, then we can get
\begin{eqnarray}
y_i(\widehat{\vec{w}}_{k-1} \cdot \widehat{\vec{x}_i})&=&y_i(\vec{w}_{k-1} \cdot \vec{x}_i+b_{k-1}) \leq 0\\
\widehat{\vec{w}}_k&=&\widehat{\vec{w}}_{k-1}+\eta y_i \widehat{\vec{x}_i}
\end{eqnarray}

We could infer the following two equations, the proof procedure are omitted.
\begin{enumerate}
\item $\widehat{\vec{w}}_k \cdot \widehat{\vec{w}}_{opt} \geq k\eta\gamma$
\item $\abs{\abs{\widehat{\vec{w}}_k}}^2 \leq k\eta^2R^2$
\end{enumerate}

From above two equations we get
\begin{eqnarray}
\nonumber k\eta\gamma & \leq & \widehat{\vec{w}}_k \cdot \widehat{\vec{w}}_{opt} \leq \abs{\abs{\widehat{\vec{w}}_k}}\abs{\abs{\widehat{\vec{w}}_{opt}}} \leq \sqrt k \eta R \\
\nonumber k^2\gamma^2 & \leq & kR^2 \\
\nonumber \text{i.e. } k & \leq & \left(\dfrac{R}{\gamma}\right)^2
\end{eqnarray}
\end{proof}

\subsection{Dual form}
\begin{eqnarray}
\vec{w}&=&\sum\limits_{i=1}^{N} \alpha_iy_i\vec{x}_i \\
b&=&\sum\limits_{i=1}^{N} \alpha_iy_i \\
f(\vec{x})&=&\text{sign}\left(\sum\limits_{j=1}^{N} \alpha_jy_j\vec{x}_j \cdot \vec{x}+b\right)
\end{eqnarray}

\begin{algorithm}[htbp]
%\SetAlgoLined
\SetAlgoNoLine

$\vec{\alpha} \leftarrow 0;\; b \leftarrow 0;\; k \leftarrow 0$\;
\While{no mistakes made within the for loop}{
\For{$i\leftarrow 1$ \KwTo $N$}{
\If{$y_i\left(\sum\limits_{j=1}^{N} \alpha_jy_j\vec{x}_j \cdot \vec{x}_i+b\right) \leq 0$}{
$\vec{\alpha} \leftarrow \vec{\alpha}+\eta$\;
$b \leftarrow b+\eta y_i$\;
$k \leftarrow k+1$\;
}
}
}
\caption{Perceptron learning algorithm, dual form}
\end{algorithm}
Loading

0 comments on commit eb96d6f

Please sign in to comment.