## Linear Models for Binary Classification

linear scoring function: $ s = w^T x $

|algorithm|hypothesis|err|solution|
|       -               |        -      | - |      - |
| linear classification | sign(s)       | plausible err = 0/1           | NP-HARD              |
| linear regression     | s             | friendly err = squared        | closed-form solution |
| logistic regression   | $ \theta(s) $ | plausible err = cross-entropy | gradient descent     |

linear classification 是 HP-HARD,  
可否利用 linear regression or logistic regression 來幫助進行 linear classification ?

下面將三種算法的 err 整理成 s,y 的 表示法 err(s,y)；  

因為針對 linear classification 整理，y 只有 +1, -1 兩種可能。

#### Linear Classification

$ h(x) = sign(s) $

$ err(h,x,y) = \Big[ h(x) \ne y \Big]_{bool} $

$ err_{0/1} (s,y) = \Big[ sign(s) \ne y \Big]_{bool} = \Big[ sign(ys) \ne 1 \Big]_{bool} $

#### Linear Regression

$ h(x) = s $

$ err(h,x,y) = \big( h(x) - y \big)^2 $

$ err_{SQR}(s,y) = (s - y)^2 = (ys - 1)^2 $

#### Logistic Regression

$ h(x) = \theta(s) $

$ err(h,x,y) = - \ln h(y x) $

$ err_{CE}(s,y) = \ln \big( 1 + \exp(-ys) \big) $

(ys) : classification correctness score

(ys) 越大，代表 ys 同號，classification 結果正確。  
(ys) 越小，代表 ys 異號，classification 結果錯誤。  

scaled $ err_{CE}(s,y) = \frac{1}{ln2} err_{CE}(s,y) = err_{SCE}(s,y) $

將三種 err(s,y) 畫在 x-y 平面上，可以看出

$ err_{0/1}(s,y) \le err_{SCE}(s,y) = \frac{1}{\ln 2} err_{CE}(s,y) $

$ \to $

$ E_{in}^{0/1}(w) \le E_{in}^{SCE}(w) = \frac{1}{\ln 2} E_{in}^{CE}(w) $

$ E_{out}^{0/1}(w) \le E_{out}^{SCE}(w) = \frac{1}{\ln 2} E_{out}^{CE}(w) $

VC on 0/1:

$
\begin{align}
E_{out}^{0/1}(w) & \le E_{in}^{0/1}(w) + \Omega^{0/1} \\
& \le \frac{1}{\ln 2} E_{in}^{CE}(w) + \Omega^{0/1}
\end{align}
$

VC-Reg on CE:

$
\begin{align}
E_{out}^{0/1}(w) & \le \frac{1}{\ln 2} E_{out}^{CE}(w) \\
& \le \frac{1}{\ln 2} E_{in}^{CE}(w) + \frac{1}{\ln 2} \Omega^{CE}
\end{align}
$

由以上推論，logistic/linear reg. 可以用來做 linear classification:

$ \text{small } E_{in}^{CE}(w) \to \text{ small } E_{out}^{0/1}(w) $

### Regression for Classification

STEP 1. run logistic/linear reg. on D with $ y_n \in \{ -1, +1 \} $ to get $ w_{REG} $

STEP 2. return $ g(x) = sign( w_{REG}^T x ) $

使用 logistic/linear reg. 來做 classification 的優缺點:

Linear Reg. - 優:Easiest Optimization. 缺:loose bound of $ err_{0/1} $ for large |ys|.

Logistic Reg. - 優:Easy Optimization. 缺:loose bound of $ err_{0/1} $ for very negative ys.

### 實務上

- 常使用 Linear Reg. 獲得初始的 $ w_0 $，再接著進行 PLA/pocket/logistic Reg.
- Logistic Reg. often preferred over Pocket.