## Introduction to Logistic Regression

### Standard Form of a Line

$$0 = ax + by + c$$

### Hypothesis function

$$h(x,y) = a \ function \ in \ x \ and \ y$$

### Conversion to Vectors

$$(x, y) \rightarrow (x_{1}, x_{2}) = \textbf{x}$$

Constants in the linear equation are renamed to $w_{i}$. The bias term, or intercept is $w_{0}$. Vector $\textbf{x}$ is a vector of weights.

$$h(\textbf{x}) = \textbf{w}^{\textbf{T}}\textbf{x}$$

## Logistic Function

### Notation
$N = Number \ of \ samples $ <br>
$D = Number \ of \ dimensions \ (features) $ <br>
$\textbf{X} = N\ x\ D\  matrix $ <br>
$\textbf{w} = N\ x\ 1\  matrix \ of \ weights $ <br>
$h(x) = hypothesis \ function $ <br>
$z = \textbf{w}^{\textbf{T}}\textbf{x}$


https://en.wikipedia.org/wiki/Logistic_function

In logistic regression, this is referred to as the sigmoid.

$$\sigma (z) = \frac{1}{1 + e^{-z}}$$

### Logistic Function in Vector Form

$$P(y = 1 \ | \ x) = \sigma (w^{T}x)$$

### Basic Example of Logistic Regression

In [41]:
import numpy as np

N = 100
D = 2

# Generate NxD matrix with random values.
# randn pulls random numbers from the normal distribution 
# with mean = 0 and variance = 1
X = np.random.randn(N,D)
print(type(X))
print(X)



<class 'numpy.ndarray'>
[[ -4.86884944e-01  -3.14214678e-01]
 [  2.63962817e-01  -5.62602492e-01]
 [ -1.66048457e-01   5.31665529e-01]
 [ -2.62903617e-01   8.48151940e-01]
 [  8.08819785e-02   1.40970810e-02]
 [ -5.19676460e-01  -4.83824487e-01]
 [  2.77718516e-01   4.64008472e-01]
 [  2.79140139e-01  -7.91252219e-02]
 [  1.04088748e+00  -2.55707216e-02]
 [ -3.45776297e-01  -9.05005394e-02]
 [  2.38974282e-01   2.06140031e-01]
 [  6.72900877e-01   1.32054779e+00]
 [  5.33245730e-01  -1.16622915e+00]
 [  3.28605815e-01   9.95127801e-01]
 [ -1.12365455e+00  -1.01323597e+00]
 [  1.51254002e+00   9.73109586e-01]
 [ -7.50756142e-01   5.34793982e-01]
 [ -4.39575241e-01   1.20493489e+00]
 [ -3.24318149e-01  -9.64784515e-01]
 [ -1.98789997e+00   5.39503403e-01]
 [ -2.65770564e-01  -2.30488491e+00]
 [  2.63749206e-01   3.45128101e-01]
 [  3.50563620e-01  -1.79865878e-01]
 [ -1.03054772e+00  -9.21329514e-01]
 [  1.05681631e-03  -1.51162972e-01]
 [  1.12620242e+00  -4.52884784e-01]
 [ -4.76663802

In [None]:
#Add a bias term by
#(1) Add a column on 1s in the original data.
#(2) Include the bias in the weights w[0]

# Transpose a 1xN matrix to get an Nx1 matrix
ones = np.array([[1]*N]).T
print(ones)


In [7]:
#Concatenate the vector of 1s to the original dataset to make vector Xb
Xb = np.concatenate((ones, X), axis = 1)
print(Xb)

[[  1.00000000e+00   5.12499476e-01   8.24985293e-01]
 [  1.00000000e+00  -2.97769740e-01  -1.07955968e+00]
 [  1.00000000e+00   2.08987563e-01   8.35141279e-01]
 [  1.00000000e+00  -7.90140861e-01   9.31091348e-01]
 [  1.00000000e+00  -8.97628214e-01  -1.34534890e+00]
 [  1.00000000e+00  -1.11147109e+00  -9.69534125e-04]
 [  1.00000000e+00  -7.64277282e-01  -4.81471441e-01]
 [  1.00000000e+00  -2.03562482e+00   1.81405027e+00]
 [  1.00000000e+00   9.41461078e-01  -1.27002883e+00]
 [  1.00000000e+00  -9.20321929e-01   1.29929415e+00]
 [  1.00000000e+00  -4.54928217e-01  -1.15874836e-01]
 [  1.00000000e+00  -2.10773759e-01   6.70186275e-01]
 [  1.00000000e+00   4.85384512e-01  -3.46263901e-01]
 [  1.00000000e+00  -8.57399212e-01   1.44707497e-01]
 [  1.00000000e+00  -1.98975042e-01   5.47269332e-01]
 [  1.00000000e+00   5.11858314e-01   1.61527159e+00]
 [  1.00000000e+00   4.00003108e-02  -1.83812365e+00]
 [  1.00000000e+00   3.10198939e-01   3.78186039e-01]
 [  1.00000000e+00   3.05680

In [39]:
#Randomly initialize a weight vector
w = np.random.randn(D + 1)
print(w)
#w2 = np.random.randn(D + 1, 1)
#print(w2)
#print(w2.T)

[ 0.62035588 -0.10889877 -0.36982741]


In [40]:
#Calculate the dot product between each row of X and w
z = Xb.dot(w)
print(z)
#z2 = Xb.dot(w2.T)
#z2 = (w2.T) @ (Xb)
#z2 = np.matmul(w2.T, Xb)
#w2.T.dot(Xb)
#print(z2)

[ 0.25944314  1.0520334   0.28873925  0.36205814  1.21565339  0.74175227
  0.88164607  0.1711474   0.9875234   0.24006321  0.71275069  0.39545563
  0.69555598  0.6602088   0.43962881 -0.03275657  1.2957884   0.44671203
  0.33941432  1.34539062  1.32350012  0.26889576  0.93225188  0.68172379
  0.5546289   1.33700966  0.599562    1.15719981  0.31630476  0.14729528
  1.1682143  -0.00899921  0.70842772  1.141682    0.40105507  0.81006234
  0.22748726  0.46509104  0.75670891  0.00998484  0.77829888  0.51268756
  1.01033202 -0.07497864  1.01287527  1.02107941  0.24678176  0.90483159
  0.14614636  0.40580592  0.39557268  0.64559416  0.70444571  0.66215799
  0.8052053   1.35521275 -0.08878565  1.15581855  0.9215529   0.56268281
  0.53731761  0.91413798  1.01229239  0.97505186  0.54329876  0.45434536
  0.56202444  0.78585269  0.4929313  -0.29918146  1.25469325  0.18262743
  1.10422123  0.76768438  0.64474827  0.42952164  0.86478716  1.02570781
  0.30850861  0.3617365   0.44241626  0.83540468  0

In [16]:
def sigmoid(z):
    return 1/(1 + np.exp(-z))

In [17]:
# Results are Nx1
print(sigmoid(z))

[ 0.89837892  0.32602634  0.88651374  0.8496258   0.20556824  0.58509855
  0.46509805  0.91257845  0.39659096  0.89693972  0.61951187  0.83833064
  0.64693442  0.6576545   0.81567642  0.96204447  0.17012293  0.81640595
  0.86556333  0.13871823  0.13869862  0.90560853  0.44250901  0.68614663
  0.73515348  0.1435416   0.73436048  0.2490743   0.87364664  0.92357254
  0.21824455  0.95527195  0.62806428  0.2549739   0.8280719   0.52158996
  0.91203285  0.7942026   0.57091707  0.95175007  0.56269618  0.77762058
  0.35190278  0.96664914  0.35589845  0.35796599  0.90402085  0.45537285
  0.93513625  0.83400833  0.84730593  0.70225481  0.60750667  0.67682619
  0.56375786  0.13105233  0.97184128  0.25467604  0.41987009  0.72912767
  0.75036794  0.44884901  0.38710188  0.3914553   0.76353724  0.82154894
  0.72846023  0.56768828  0.81245582  0.98495063  0.18355692  0.91948303
  0.28343223  0.58988801  0.67972119  0.81122135  0.50174363  0.33731717
  0.88217649  0.86126237  0.81335414  0.51305224  0

## Cross-entropy cost function for binary classification.
Is also the negative log-likelihood of the model outputs.

$J = cost \ function \ (error \ function \ or \ objective \ function ) $<br>
$N = samples $<br>
$y = target? $<br>
$Y = short \ form \ of \ P(Y=1 \ | \ X)$

$$J = - \sum_{i = 1}^{N} t_{i}log(y_{i}) + (1 - t_{i})log(1 - y_{i})$$

## Naive Bayes


http://scikit-learn.org/stable/modules/naive_bayes.html

$$P(y|x_{i},...x_{n}) = P(y) \prod_{i=1}^{n} P(x_{i}|y) $$

$$\hat{y} = arg\;  max\; P(y) \prod_{i=1}^{n} P(x_{i}|y)$$
