## Introduction to Logistic Regression

### Standard Form of a Line

# $$0 = ax + by + c$$

### Hypothesis function

# $$h(x,y) $$

### Conversion to Vectors

# $$(x, y) \rightarrow (x_{1}, x_{2}) = \textbf{x}$$

Constants in the linear equation are renamed to $w_{i}$. The bias term, or intercept is $w_{0}$. Vector $\textbf{x}$ is a vector of weights.

# $$h(\textbf{x}) = \textbf{w}^{\textbf{T}}\textbf{x}$$

## Logistic Function

### Notation
$N = Number \ of \ samples $ <br>
$D = Number \ of \ dimensions \ (features) $ <br>
$\textbf{X} = N\ x\ D\  matrix $ <br>
$\textbf{w} = N\ x\ 1\  matrix \ of \ weights $ <br>
$h(x) = hypothesis \ function $ <br>
$z = \textbf{w}^{\textbf{T}}\textbf{x}$ <br>

https://en.wikipedia.org/wiki/Logistic_function

In logistic regression, this is referred to as the sigmoid.

# $$\sigma (z) = \frac{1}{1 + e^{-z}}$$

### Logistic Function in Vector Form

# $$P(y = 1 \ | \ x) = \sigma (w^{T}x)$$

### Basic Example of Logistic Regression

In [1]:
import numpy as np

N = 100
D = 2

# Generate NxD matrix with random values.
# randn pulls random numbers from the normal distribution 
# with mean = 0 and variance = 1
X = np.random.randn(N,D)
print(type(X))
print(X)



<class 'numpy.ndarray'>
[[ 0.21449327 -0.80526627]
 [-1.38864417  0.12571619]
 [-0.47090518 -0.65369704]
 [ 1.41116231  1.24481777]
 [-1.90108482 -0.47810569]
 [ 0.89599428  2.30413091]
 [ 0.29172727  0.97021679]
 [-1.02739141  0.44465572]
 [ 0.40344284  0.69118486]
 [-0.64824256 -0.13068064]
 [-1.52012756 -2.6099504 ]
 [-0.77995713  1.34843846]
 [ 0.26833334  0.11778975]
 [ 0.77662871  0.81964335]
 [-0.12210401 -0.76449289]
 [-0.5440752  -0.20049827]
 [-0.74056223  0.5928156 ]
 [-1.01702452 -0.80291566]
 [ 1.51156623 -0.64734692]
 [ 0.66129579 -0.4559829 ]
 [ 0.92140911 -2.26714493]
 [ 0.35099223 -0.34287245]
 [ 0.82469312  0.1773235 ]
 [-0.52965175 -0.71526131]
 [ 0.36598506  1.03232494]
 [-0.45814938  0.56209826]
 [-1.22765134  0.56876765]
 [ 0.24583954  0.11444648]
 [ 0.44163473 -0.44337278]
 [-1.04618969  2.22837385]
 [-0.80834257  0.03009571]
 [ 0.70128506  1.21130442]
 [-0.13256035  0.35956212]
 [-0.81241878  0.98771101]
 [-0.55229797  0.00600959]
 [-0.05336014  0.05643295]
 [-0

In [2]:
#Add a bias term by
#(1) Add a column on 1s in the original data.
#(2) Include the bias in the weights w[0]

# Transpose a 1xN matrix to get an Nx1 matrix
ones = np.array([[1]*N]).T
print(ones)


[[1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]]


In [3]:
#Concatenate the vector of 1s to the original dataset to make vector Xb
Xb = np.concatenate((ones, X), axis = 1)
print(Xb)

[[ 1.          0.21449327 -0.80526627]
 [ 1.         -1.38864417  0.12571619]
 [ 1.         -0.47090518 -0.65369704]
 [ 1.          1.41116231  1.24481777]
 [ 1.         -1.90108482 -0.47810569]
 [ 1.          0.89599428  2.30413091]
 [ 1.          0.29172727  0.97021679]
 [ 1.         -1.02739141  0.44465572]
 [ 1.          0.40344284  0.69118486]
 [ 1.         -0.64824256 -0.13068064]
 [ 1.         -1.52012756 -2.6099504 ]
 [ 1.         -0.77995713  1.34843846]
 [ 1.          0.26833334  0.11778975]
 [ 1.          0.77662871  0.81964335]
 [ 1.         -0.12210401 -0.76449289]
 [ 1.         -0.5440752  -0.20049827]
 [ 1.         -0.74056223  0.5928156 ]
 [ 1.         -1.01702452 -0.80291566]
 [ 1.          1.51156623 -0.64734692]
 [ 1.          0.66129579 -0.4559829 ]
 [ 1.          0.92140911 -2.26714493]
 [ 1.          0.35099223 -0.34287245]
 [ 1.          0.82469312  0.1773235 ]
 [ 1.         -0.52965175 -0.71526131]
 [ 1.          0.36598506  1.03232494]
 [ 1.         -0.45814938

In [4]:
#Randomly initialize a weight vector
w = np.random.randn(D + 1)
print(w)
# One-dimensional row vector.

[ 0.33344013  1.91628473 -0.28317519]


In [5]:
# ASIDE:
# How to transpose convert a row vector to a
# 2-dimensional 1xD array and 
# transpose it to a 2-dimensional Nx1 array.
w2 = w[np.newaxis]
print(w2.T)

[[ 0.33344013]
 [ 1.91628473]
 [-0.28317519]]


In [6]:
# ASIDE:
# Multiplication of 100x3 and 3x1 arrays using "@",
# which was introduced in Python 3.5 and refers to the 
# matmul function.
print(w2.shape)
print(Xb.shape)
z2 = Xb @ w2.T

# Transpose 100x1 array to 1x100 array.
print(z2.T)

(1, 3)
(100, 3)
[[ 0.97250174 -2.3631972  -0.3838375   2.6851274  -3.17419201  1.39794758
   0.61773131 -1.76124982  0.91082507 -0.87177168 -1.84048391 -1.54302414
   0.81428808  1.58957919  0.31593949 -0.65238674 -1.25355865 -1.38810264
   3.41334401  1.7297942   2.74112152  1.10313414  1.86357335 -0.47897917
   0.7424429  -0.70367682 -2.18015027  0.77213029  1.30529018 -2.30237739
  -1.22409676  1.33429061 -0.02240233 -1.50308083 -0.72662182  0.21520649
  -0.74521773  0.11192459  3.81685195  1.45673958 -0.35739592  1.67808208
  -2.81908838 -0.9856864  -3.43691624  0.38711787 -3.01333117  4.62662338
  -0.52496597 -0.66992816 -1.17460533 -2.04939674 -3.80626568 -2.08106576
   1.47549465  1.75228402  1.59059762 -3.69610842  1.65693043 -2.81322434
  -1.90920027  1.3471064  -1.53635422 -0.41861855  3.39907035  1.62585787
  -2.43721924 -0.55796004 -1.44236889 -2.97305433  2.77826497  4.07334741
   1.77181386 -0.68431417 -1.94145033 -0.7159848   2.67117146 -1.04943311
  -0.12435307  3.03871

In [7]:
#Calculate the dot product between each row of X and w
z = Xb.dot(w)
print(z)

[ 0.97250174 -2.3631972  -0.3838375   2.6851274  -3.17419201  1.39794758
  0.61773131 -1.76124982  0.91082507 -0.87177168 -1.84048391 -1.54302414
  0.81428808  1.58957919  0.31593949 -0.65238674 -1.25355865 -1.38810264
  3.41334401  1.7297942   2.74112152  1.10313414  1.86357335 -0.47897917
  0.7424429  -0.70367682 -2.18015027  0.77213029  1.30529018 -2.30237739
 -1.22409676  1.33429061 -0.02240233 -1.50308083 -0.72662182  0.21520649
 -0.74521773  0.11192459  3.81685195  1.45673958 -0.35739592  1.67808208
 -2.81908838 -0.9856864  -3.43691624  0.38711787 -3.01333117  4.62662338
 -0.52496597 -0.66992816 -1.17460533 -2.04939674 -3.80626568 -2.08106576
  1.47549465  1.75228402  1.59059762 -3.69610842  1.65693043 -2.81322434
 -1.90920027  1.3471064  -1.53635422 -0.41861855  3.39907035  1.62585787
 -2.43721924 -0.55796004 -1.44236889 -2.97305433  2.77826497  4.07334741
  1.77181386 -0.68431417 -1.94145033 -0.7159848   2.67117146 -1.04943311
 -0.12435307  3.03871553 -0.90142717  4.80471484  3

In [8]:
def sigmoid(z):
    return 1/(1 + np.exp(-z))

In [9]:
# Results are Nx1
print(sigmoid(z))

[0.72561787 0.08602249 0.40520167 0.93614332 0.04014856 0.801858
 0.6497024  0.14663388 0.71316897 0.29488579 0.13699407 0.17609608
 0.69302251 0.83055689 0.57833436 0.34245189 0.22208473 0.19971083
 0.96811898 0.84938609 0.93940996 0.75084689 0.86571291 0.38249321
 0.67752982 0.33099753 0.10154722 0.68398154 0.78672397 0.09092626
 0.2272163  0.79154947 0.49439965 0.18196648 0.32593648 0.55359493
 0.32186423 0.52795197 0.97847651 0.8110335  0.41159009 0.8426504
 0.05630135 0.27176494 0.03116145 0.59558869 0.04682724 0.99030712
 0.37169176 0.33851293 0.23602356 0.11411335 0.02174757 0.1109508
 0.81389111 0.85224065 0.83070017 0.02421882 0.83982552 0.05661373
 0.12907072 0.79365616 0.17706589 0.39684737 0.96767547 0.83560142
 0.08037822 0.3640196  0.19117878 0.04865814 0.94148994 0.98326452
 0.8546831  0.33529911 0.12548861 0.32827777 0.93530395 0.25933397
 0.46895173 0.95429284 0.2887573  0.99187551 0.96497495 0.61915294
 0.55699397 0.87594646 0.36934228 0.97389863 0.47670381 0.09909631

## Matrix Multiplication

### Option 1: Declare Objects as np.matrix

In [10]:
A = np.matrix([[1,1,1],[2,2,2],[3,3,3],[4,4,4],[5,5,5]])
print(A)
print(A.shape)
B = np.matrix([[2],[3],[4]])
print(B)
print(B.shape)
print(A*B)

[[1 1 1]
 [2 2 2]
 [3 3 3]
 [4 4 4]
 [5 5 5]]
(5, 3)
[[2]
 [3]
 [4]]
(3, 1)
[[ 9]
 [18]
 [27]
 [36]
 [45]]


### Option 2: Use the @ Operator

In [11]:
E = np.array([[1,1,1],[2,2,2],[3,3,3],[4,4,4],[5,5,5]])
print(E)
print(E.shape)
F = np.array([[2],[3],[4]])
print(F)
print(F.shape)
print((E@F).T)

[[1 1 1]
 [2 2 2]
 [3 3 3]
 [4 4 4]
 [5 5 5]]
(5, 3)
[[2]
 [3]
 [4]]
(3, 1)
[[ 9 18 27 36 45]]


### Option 3: Use the Dot Product

In [12]:
print(E.dot(F))

[[ 9]
 [18]
 [27]
 [36]
 [45]]


## Cross-entropy cost function for binary classification.
Is also the negative log-likelihood of the model outputs.

$J = cost \ function \ (error \ function \ or \ objective \ function ) $<br>
$N = samples $<br>
$y = target? $<br>
$Y = short \ form \ of \ P(Y=1 \ | \ X)$

# $$J = - \sum_{i = 1}^{N} t_{i}log(y_{i}) + (1 - t_{i})log(1 - y_{i})$$

## Naive Bayes


http://scikit-learn.org/stable/modules/naive_bayes.html

# $$P(y|x_{i},...x_{n}) = P(y) \prod_{i=1}^{n} P(x_{i}|y) $$

# $$\hat{y} = argmax\; P(y) \prod_{i=1}^{n} P(x_{i}|y)$$
