## Table of Contents
-[1 Intro](#intro)
<br>
-[2 Cost Function](#cost_function)
<br>
-[3 Gradient Descent](#gd)
<br>
-[4 数学辅助](#addition)

<a id='intro'></a>
## 1 Intro
Logistic函数的一般形式为：
$$g(z) = \frac{1}{1+e^{-z}}$$
<br>
对于线性可分的情况，线性边界可以用函数表示为：
$$\theta_{0} + \theta_{1}x_{1}+...+ \theta_{n}x_{n} = \sum_{i=0}^{n}\theta_{i}x_{i}= \theta^{T}X$$
<br>
Logistic预测函数可以写成：
$$h_{\theta}(x) = g(\theta^{T}x) = \frac{1}{1+e^{-\theta^{T}X}}$$
<br>
对于分类结果,其数学形式为:
$$P(y=1\mid x;\theta) = h_{\theta}(x)$$
$$P(y=0\mid x;\theta) = 1-h_{\theta}(x)$$

<a id='cost_function'></a>
## 2 Cost Function
Andrew Ng给出了损失函数的形式：
$$Cost(h_{\theta}(x),y)= -log(h_{\theta}(x)), if y= 1$$
$$Cost(h_{\theta}(x),y)= -log(1-h_{\theta}(x)), if y= 0$$
<br>
**损失函数$J(\theta)$可以写成：**
<font color=blue>
$$J(\theta) = \frac{1}{m}\sum_{i=1}^{m} Cost(h_{\theta}(x^{(i)}),y^{(i)})\\
        = - \frac{1}{m}[\sum_{i=1}^{m}y^{(i)}logh_{\theta}(x^{(i)}) + (1-y)log(1-h_{\theta}x^{(i)})] $$ </font>
<br>
推导过程如下：
$$P(y\mid x; \theta) = (h_{\theta}(x))^{y}(1-h_{\theta}(x))^{1-y}$$
<br>
其最大似然函数$L(\theta)$为:
$$L(\theta) = \prod_{i=1}^{m}P(y^{(i)}\mid x^{(i)};\theta)\\
=\prod_{i=1}^{m}(h_{\theta}(x^{(i)}))^{y^{(i)}}(1-h_{\theta}(x^{(i)}))^{1-y^{(i)}}$$
<br>
**其对数似然函数$l(\theta)$为：**
$$l(\theta) = log(L(\theta))\\
=\prod_{i=1}^{m}(y^{(i)}logh_{\theta}(x^{(i)}))+(1-y^{(i)})log(1-h_{\theta}(x^{(i)}))$$

<a id='gd'></a>
## 3 Gradient Descent
$$\hat{\beta} = \underset{\beta}{argmin}J(\theta)$$
<br>
梯度下降的一般形式：
$$\theta_{j}:= \theta_{j}-\alpha \frac{\partial}{\partial \theta_{j}}J(\theta), (j=0..n)$$
需要对$J(\theta)$求偏导：
\begin{equation}
\frac{\partial}{\partial \theta_{j}}J(\theta) = - \frac{1}{m}\sum_{i=1}^{m} \left \{ y^{(i)}\frac{1}{h_{\theta}(x^{(i)}) \frac{\partial}{\partial \theta_{j}}}h_{\theta}(x^{(i)}) - (1-y^{(i)})\frac{1}{1-h_{\theta}(x^{(i)})}\frac{\partial}{\partial \theta_{j}}h_{\theta}(x^{(i)}) \right \} \\
=- \frac{1}{m}\sum_{i=1}^{m}  \left [ y^{(i)}\frac{1}{g(\theta^{T}x^{(i)})} -(1-y^{(i)})\frac{1}{1-g(\theta^{T}x^{(i)})}\right ] \frac{\partial}{\partial \theta_{j}}g(\theta^{T}x^{(i)})\\
=- \frac{1}{m}\sum_{i=1}^{m}  \left [ y^{(i)}\frac{1}{g(\theta^{T}x^{(i)})} -(1-y^{(i)})\frac{1}{1-g(\theta^{T}x^{(i)})}\right ] g(\theta^{T}x^{(i)})(1-g(\theta^{T}x^{(i)}))\frac{\partial}{\partial_{\theta_{j}}}\theta^{T}x^{(i)} \\
= - \frac{1}{m}\sum_{i=1}^{m}\left [ y^{(i)}(1-g(\theta^{T}x^{(i)}))-(1-y^{(i)})g(\theta^{T}x^{(i)})\right ]x_{j}^{(i)} \\
= - \frac{1}{m}\sum_{i=1}^{m}(y^{(i)} - g(\theta^{T}x^{(i)}))x_{j}^{(i)} \\
=- \frac{1}{m}\sum_{i=1}^{m}(y^{(i)} - h_{\theta}(x^{(i)}))x_{j}^{(i)} \\
= \frac{1}{m}\sum_{i=1}^{m}(h_{\theta}(x^{(i)})- y^{(i)})x_{j}^{(i)}
\end{equation}
<br>
更新梯度下降的一般形式：
$$\theta_{j}:= \theta_{j}-\alpha \frac{1}{m}\sum_{i=1}^{m}(h_{\theta}(x^{(i)})- y^{(i)})x_{j}^{(i)}$$

<a id='addition'></a>
## 数学推导辅助
上面对$J(\theta)$求偏导第2步用到的数学变形为：

$$f(x) = \frac{1}{1+e^{g(x)}}\\
\frac{\partial}{\partial x}f(x) = -\frac{1}{(1+e^{g(x)})^{2}}e^{g(x)}\frac{\partial}{\partial x}g(x) \\
= - \frac{1}{1+e^{g(x)}}\frac{e^{g(x)}}{1+e^{g(x)}}\frac{\partial}{\partial x}g(x) \\
= -f(x)(1-f(x))\frac{\partial}{\partial x}g(x)$$
<br>
如果$g(x) = -\theta^{T}x$,则可以将前面的负号抵消。