---

# Logistic Regression

By: Laura Moses 

--- 

## Definition
**Logistic Regression** is a supervised learning classification algorithm used to predict the probability of a target variable $y$. The nature of target or dependent variable is dichotomous, meaning there are only two possible classes. Thus, the target is binary, and represents either `1`, a success/yes/win, or `0`, a failure/no/loss. 

Mathematically, a logistic regression model predicts $P(Y=1)$ as a function of $X$. It is one of the simplest machine learning algorithms that can be used for various classification problems that involve detection.

---

### Assumptions

* The target variable is binary, with the desired outcome represented by the factor level 1
* There is no multi-collinearity present in the model
* Model variables are meaningful
* Large sample size 

---

### Advantages 

* Simple algorithm
* Useful for predicting 

---

### Disadvantages

* Adding independent variables to model can result in overfitting
* R^2^ can have computation issues causing them to be artifically high or low, so must be mindful when interpreting goodness of fit

---

### Implementation

We want to predict whether or not a student will be admitted to University A based off of their GMAT score and their GPA. 

<ins>Feed Forward</ins> (with a single neuron)
$$
(x^1, y^1),\cdots,(x^N, y^N); \space \space
x^i =  
 \left[{\begin{array}{c}
student \space i \space GMAT \space score \\
student \space i \space GPA
 \end{array}}\right] \\
y^i \in \{0,1\} \\
where \space 1 \space indicates \space student \space i \space was \space accepted, \space 0 \space otherwise
 $$
 
 $$Z^i = w^T \cdot x^i + b$$
 
---

<ins>Loss Function</ins>

We want $L(\hat{y^i}, y^i)=$ How close $\hat{y^i}$ is to $y^i$!

First consider maximizing $P(y^i|x^i)$ the probability that $\hat{y^i}$ predicts $y^i$. Since there are two discrete outputs, this is subject to the following formula by Bernoulli 

Maximize $\rightarrow P(y^i|x^i) = \hat y^y (1-\hat y)^{1-y}$

Prediction is 1 if $\hat{y^i} \ge 0.5$, 0 otherwise.

---

The following packages will be needed to run the code below: 

* CSV [documentation](https://csv.juliadata.org/stable/)
* DataFrames [documentation](https://dataframes.juliadata.org/stable/)

---

In [7]:
using CSV
using DataFrames

data = CSV.read("candidates_data.csv", DataFrame)

Unnamed: 0_level_0,gmat,gpa,work_experience,admitted
Unnamed: 0_level_1,Int64,Float64,Int64,Int64
1,780,4.0,3,1
2,750,3.9,4,1
3,690,3.3,3,0
4,710,3.7,5,1
5,680,3.9,4,0
6,730,3.7,6,1
7,690,2.3,1,0
8,720,3.3,4,1
9,740,3.3,5,1
10,690,1.7,1,0


In [44]:
x_data = [[x[1], x[2], x[3]] for x in zip(data.gmat, data.gpa, data.work_experience)]
y_data = [x for x in data.admitted];

In [14]:
σ(x) = 1/(1+exp(-x))

function cross_entropy_loss(x, y, w, b)
    return -y*log(σ(w'x + b)) - (1-y)*log(1 - σ(w'x+b))
end

function average_cost(features, labels, w, b)
    N = length(features)
    return (1/N)*sum([cross_entropy_loss(features[i], labels[i], w, b) for i = 1:N])
end

average_cost (generic function with 1 method)

In [17]:
function batch_gradient_descent(features, labels, w, b, α)
    del_w = [0.0 for i = 1:length(w)]
    del_b = 0.0
    
    N = length(features)
    
    for i = 1:N
        del_w += (σ(w'features[i] + b) - labels[i]) * features[i]
        del_b += (σ(w'features[i] + b) - labels[i])
    end
    
    w = w - α*del_w
    b = b - α*del_b
    
    return w, b
end;


batch_gradient_descent (generic function with 1 method)

In [25]:
w = [0.0, 0.0]
b = 0.0
alpha = 0.0000001

println("The initial cost is: ", average_cost(x_data, y_data, w, b))

w, b = batch_gradient_descent(x_data, y_data, w, b, alpha)
println("The new cost is: ", average_cost(x_data, y_data, w, b))

w, b = batch_gradient_descent(x_data, y_data, w, b, alpha)
println("The new cost is: ", average_cost(x_data, y_data, w, b))

w, b = batch_gradient_descent(x_data, y_data, w, b, alpha)
println("The new cost is: ", average_cost(x_data, y_data, w, b))

The initial cost is: 0.6931471805599451
The new cost is: 0.6931188566349795
The new cost is: 0.6931096471365109
The new cost is: 0.693106617309021


In [27]:
function train_batch_gradient_descent(features, labels, w, b, α, epochs)
    for i = 1:epochs
        w, b = batch_gradient_descent(features, labels, w, b, α)
        
        if i == 1
            println("Epoch ", i, " with cost: ", average_cost(features, labels, w, b))
        end
        
        if i == 100
            println("Epoch ", i, " with cost: ", average_cost(features, labels, w, b))
        end
        
        if i == 1000
            println("Epoch ", i, " with cost: ", average_cost(features, labels, w, b))
        end
        
        if i == 10000
            println("Epoch ", i, " with cost: ", average_cost(features, labels, w, b))
        end        
        
        if i == 100000
            println("Epoch ", i, " with cost: ", average_cost(features, labels, w, b))
        end
    end
    return w, b
end

train_batch_gradient_descent (generic function with 1 method)

In [29]:
w = [0.0, 0.0]
b = 0.0

w, b = train_batch_gradient_descent(x_data, y_data, w, b, alpha, 10000)

Epoch 1 with cost: 0.6931188566349795
Epoch 100 with cost: 0.6930977152288289
Epoch 1000 with cost: 0.6930282266294219
Epoch 10000 with cost: 0.6923351299473173


([3.899469673485949e-6, 0.005423710876459817], -0.0011817085723866323)

In [33]:
w = randn(2)
b = randn(1)[1]
alpha = 0.000001

w, b = train_batch_gradient_descent(x_data, y_data, w, b, alpha, 1000000)

Epoch 1 with cost: Inf
Epoch 100 with cost: 2.049575499076534
Epoch 1000 with cost: 2.0488317476761444
Epoch 10000 with cost: 2.041321750747137
Epoch 100000 with cost: 1.9669951685345766


([-0.014927450289209103, 4.522843677287361], -1.1120956929443366)

In [34]:
w, b = train_batch_gradient_descent(x_data, y_data, w, b, alpha, 1000000)

Epoch 1 with cost: 0.9767539152256929
Epoch 100 with cost: 1.2030512616375892
Epoch 1000 with cost: 1.2025427350571167
Epoch 10000 with cost: 1.1974981352024467
Epoch 100000 with cost: 1.1509047775073882


([-0.01821390732497044, 5.268830266014666], -2.1382436494762125)

In [35]:
w, b = train_batch_gradient_descent(x_data, y_data, w, b, alpha, 1000000)

Epoch 1 with cost: 0.79456687830405
Epoch 100 with cost: 0.9118123026442021
Epoch 1000 with cost: 0.9116885972291986
Epoch 10000 with cost: 0.9104587498696306
Epoch 100000 with cost: 0.8988453650000553


([-0.017734470275478375, 5.393564488314007], -3.093365672835176)

In [36]:
w, b = train_batch_gradient_descent(x_data, y_data, w, b, alpha, 1000000)

Epoch 1 with cost: 0.7344737655381757
Epoch 100 with cost: 0.8265318803665865
Epoch 1000 with cost: 0.8264841842458494
Epoch 10000 with cost: 0.8260088881315215
Epoch 100000 with cost: 0.8214148478034495


([-0.016376960653775086, 5.372869356803333], -4.003565651816795)

In [39]:
w, b = train_batch_gradient_descent(x_data, y_data, w, b, 0.0000001, 1000000)

Epoch 1 with cost: 0.8346582861099829
Epoch 100 with cost: 0.5063542465461425
Epoch 1000 with cost: 0.5063512218106138
Epoch 10000 with cost: 0.5063209803876851
Epoch 100000 with cost: 0.5060191582848257


([-0.0178370043205895, 4.0924608391655655], -1.2542241993656118)

In [40]:
w, b = train_batch_gradient_descent(x_data, y_data, w, b, 0.0000001, 1000000)

Epoch 1 with cost: 0.5030590407003086
Epoch 100 with cost: 0.503058720789801
Epoch 1000 with cost: 0.5030558125689913
Epoch 10000 with cost: 0.5030267359622006
Epoch 100000 with cost: 0.5027365289314798


([-0.01738561307976791, 4.028965621714494], -1.347210089838961)

In [41]:
w, b = train_batch_gradient_descent(x_data, y_data, w, b, 0.0000001, 1000000)

Epoch 1 with cost: 0.4998892712370567
Epoch 100 with cost: 0.4998889634139637
Epoch 1000 with cost: 0.4998861650754635
Epoch 10000 with cost: 0.4998581869666292
Epoch 100000 with cost: 0.4995789324255806


([-0.016950296337876602, 3.9686024422233883], -1.439732863156997)

In [42]:
function predict(x, y, w, b)
    if σ(w'x+b) >= .5
        println("Predict Accepted")
        y == 1 ? println("Was Accepted") : println("Was Not Accepted")
    else
        println("Predict Not Accepted")
        y == 1 ? println("Was Accepted") : println("Was Not Accepted")
        
    end
end

predict (generic function with 1 method)

In [43]:
for i = 1:length(x_data)
    predict(x_data[i], y_data[i], w, b)
    println()
end

Predict Accepted
Was Accepted

Predict Accepted
Was Accepted

Predict Not Accepted
Was Not Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was Not Accepted

Predict Accepted
Was Accepted

Predict Not Accepted
Was Not Accepted

Predict Not Accepted
Was Accepted

Predict Not Accepted
Was Accepted

Predict Not Accepted
Was Not Accepted

Predict Not Accepted
Was Not Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was Not Accepted

Predict Not Accepted
Was Accepted

Predict Accepted
Was Not Accepted

Predict Not Accepted
Was Not Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was Not Accepted

Predict Not Accepted
Was Not Accepted

Predict Accepted
Was Accepted

Predict Not Accepted
Was Not Accepted

Predict Not Accepted
Was Not Accepted

Predict Not Accepted
Was Not Accepted

Predict Accepted
Was Not Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was Accepted

Predict Not Accepted
Was Not Accepted

Predict Accepted
Was

In [47]:
w = [0.0, 0.0, 0.0]
b = 0.0
alpha = 0.0000001
epochs = 100000

w, b = train_batch_gradient_descent(x_data, y_data, w, b, alpha, epochs)

Epoch 1 with cost: 0.6931177157407156
Epoch 100 with cost: 0.6929888339906559
Epoch 1000 with cost: 0.6919428619010852
Epoch 10000 with cost: 0.6817529632034032
Epoch 100000 with cost: 0.6024310614307002


([-0.0011579213136820733, 0.04975692225071529, 0.18290667154994167], -0.011472919398336808)

In [48]:
w, b = train_batch_gradient_descent(x_data, y_data, w, b, alpha, epochs)

Epoch 1 with cost: 0.6024303687427798
Epoch 100 with cost: 0.6023618099205645
Epoch 1000 with cost: 0.60174011433922
Epoch 10000 with cost: 0.5956753895704099
Epoch 100000 with cost: 0.5476543722140943


([-0.00208861549084507, 0.09213882840745298, 0.32393091174023153], -0.022369659983565725)

In [49]:
w, b = train_batch_gradient_descent(x_data, y_data, w, b, alpha, epochs)

Epoch 1 with cost: 0.547653943841169
Epoch 100 with cost: 0.5476115445813797
Epoch 1000 with cost: 0.5472269723939766
Epoch 10000 with cost: 0.5434664068011568
Epoch 100000 with cost: 0.5129873679408394


([-0.00283734417848026, 0.12929822452823045, 0.4349305669981876], -0.03283740622246671)

In [50]:
w, b = train_batch_gradient_descent(x_data, y_data, w, b, alpha, 10000)

Epoch 1 with cost: 0.5129870893669561
Epoch 100 with cost: 0.5129595161436339
Epoch 1000 with cost: 0.5127093573663242
Epoch 10000 with cost: 0.5102571376930498


([-0.002904328515929091, 0.13278965745622207, 0.4447262316047905], -0.0338649504871565)

In [51]:
w, b = train_batch_gradient_descent(x_data, y_data, w, b, alpha, 10000)

Epoch 1 with cost: 0.510256870126297
Epoch 100 with cost: 0.5102303863247294
Epoch 1000 with cost: 0.5099901056279379
Epoch 10000 with cost: 0.5076341658645985


([-0.002970054410219168, 0.13624544373329164, 0.45431367965209923], -0.03488936899911179)

In [52]:
w, b = train_batch_gradient_descent(x_data, y_data, w, b, alpha, 10000)

Epoch 1 with cost: 0.5076339087480123
Epoch 100 with cost: 0.5076084592456328
Epoch 1000 with cost: 0.5073775570405491
Epoch 10000 with cost: 0.5051130491343722


([-0.0030345603137766682, 0.1396667081732651, 0.4636992515653897], -0.035910744006958914)

In [53]:
w, b = train_batch_gradient_descent(x_data, y_data, w, b, alpha, 10000)

Epoch 1 with cost: 0.505112801943887
Epoch 100 with cost: 0.5050883348739134
Epoch 1000 with cost: 0.5048663410705962
Epoch 10000 with cost: 0.5026887023500276


([-0.0030978831752307705, 0.1430545295753102, 0.4728890433094475], -0.03692915487873566)

In [54]:
w, b = train_batch_gradient_descent(x_data, y_data, w, b, alpha, 10000)

Epoch 1 with cost: 0.5026884645923376
Epoch 100 with cost: 0.5026649311322305
Epoch 1000 with cost: 0.5024514032230817
Epoch 10000 with cost: 0.5003563378724779


([-0.0031600585028829383, 0.1464099426822, 0.48188891645607423], -0.03794467821113925)

In [55]:
for i = 1:length(x_data)
    predict(x_data[i], y_data[i], w, b)
    println()
end

Predict Not Accepted
Was Accepted

Predict Accepted
Was Accepted

Predict Not Accepted
Was Not Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was Not Accepted

Predict Accepted
Was Accepted

Predict Not Accepted
Was Not Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was Accepted

Predict Not Accepted
Was Not Accepted

Predict Not Accepted
Was Not Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was Not Accepted

Predict Not Accepted
Was Accepted

Predict Not Accepted
Was Not Accepted

Predict Accepted
Was Not Accepted

Predict Accepted
Was Accepted

Predict Not Accepted
Was Not Accepted

Predict Not Accepted
Was Not Accepted

Predict Not Accepted
Was Accepted

Predict Not Accepted
Was Not Accepted

Predict Accepted
Was Not Accepted

Predict Not Accepted
Was Not Accepted

Predict Not Accepted
Was Not Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was Accepted

Predict Not Accepted
Was Not Accepted

Predict Accepted

In [56]:
function new_predict(x, y, w, b)
    if σ(w'x+b) >= .5
        return 1
    else
        return 0
    end
end

new_predict (generic function with 1 method)

In [57]:
mean_error = 0.0
for i = 1:length(x_data)
    mean_error += (new_predict(x_data[i], y_data[i], w, b) - y_data[i])^2
end

println(mean_error/length(x_data))

0.2
