## Logistic Regression

Logistic regression is a statistical model that in its basic form uses a logistic (sigmoid) function to model a binary dependent variable. This would be a categorical variable such as pass/fail, paid/unpaid, etc. The output will be 0 or 1.

<img src="sigmoid.png" width=600 />

___

In this notebook we will use the following packages:
- [CSV](https://csv.juliadata.org/stable/)
- [DataFrames](https://dataframes.juliadata.org/stable/|)
___

In [1]:
using CSV
using DataFrames

In [2]:
data = DataFrame(CSV.File("candidates_data.csv"))

Unnamed: 0_level_0,gmat,gpa,work_experience,admitted
Unnamed: 0_level_1,Int64,Float64,Int64,Int64
1,780,4.0,3,1
2,750,3.9,4,1
3,690,3.3,3,0
4,710,3.7,5,1
5,680,3.9,4,0
6,730,3.7,6,1
7,690,2.3,1,0
8,720,3.3,4,1
9,740,3.3,5,1
10,690,1.7,1,0


In [14]:
x_data = [[x[1], x[2], x[3]] for x in zip(data.gmat, data.gpa, data.work_experience)]
y_data = [x for x in data.admitted];

In [15]:
σ(x) = 1 / (1 + exp(-x))

function cross_entropy_loss(x, y, w, b)
    return -y * log(σ(w'x + b)) - (1 - y) * log(1 - σ(w'x + b))
end

function average_cost(features, labels, w, b)
    N = length(features)
    return (1 / N) * sum([cross_entropy_loss(features[i], labels[i], w, b) for i = 1:N])
end


average_cost (generic function with 1 method)

In [16]:
function batch_gradient_descent(features, labels, w, b, α)
    del_w = [0.0 for i = 1:length(w)]
    del_b = 0.0
    
    N = length(features)
    
    for i = 1:N
        del_w += (σ(w'features[i] + b) - labels[i]) * features[i]
        del_b += (σ(w'features[i] + b) - labels[i])
    end
    
    w = w - α * del_w
    b = b - α * del_b
    
    return w, b
end


batch_gradient_descent (generic function with 1 method)

In [17]:
w = [0.0, 0.0, 0.0]
b = 0.0
println("The initial cost is: ", average_cost(x_data, y_data, w, b))

w, b = batch_gradient_descent(x_data, y_data, w, b, 0.0000001)
println("The new cost is: ", average_cost(x_data, y_data, w, b))

w, b = batch_gradient_descent(x_data, y_data, w, b, 0.0000001)
println("The new cost is: ", average_cost(x_data, y_data, w, b))

w, b = batch_gradient_descent(x_data, y_data, w, b, 0.0000001)
println("The new cost is: ", average_cost(x_data, y_data, w, b))

The initial cost is: 0.6931471805599451
The new cost is: 0.6931177157407156
The new cost is: 0.6931074013001591
The new cost is: 0.6931032778951178


In [18]:
function train_batch_gradient_descent(features, labels, w, b, α, epochs)
    
    for i = 1:epochs
        
        w, b = batch_gradient_descent(features, labels, w, b, α)
        
        if i == 1
            println("Epoch ", i, " with cost: ", average_cost(x_data, y_data, w, b))
        end
        
        if i == 100
            println("Epoch ", i, " with cost: ", average_cost(x_data, y_data, w, b))
        end
        
        if i == 1000
            println("Epoch ", i, " with cost: ", average_cost(x_data, y_data, w, b))
        end
        
        if i == 10000
            println("Epoch ", i, " with cost: ", average_cost(x_data, y_data, w, b))
        end
        
        if i == 100000
            println("Epoch ", i, " with cost: ", average_cost(x_data, y_data, w, b))
        end
        
        if i == 1000000
            println("Epoch ", i, " with cost: ", average_cost(x_data, y_data, w, b))
        end
        
        if i == 10000000
            println("Epoch ", i, " with cost: ", average_cost(x_data, y_data, w, b))
        end
    end
    
    return w, b
end


train_batch_gradient_descent (generic function with 1 method)

In [20]:
w = [0.0, 0.0, 0.0]
b = 0.0

w, b = train_batch_gradient_descent(x_data, y_data, w, b, 0.000001, 14000000)

Epoch 1 with cost: 0.6935528232717131
Epoch 100 with cost: 1.95992233790548
Epoch 1000 with cost: 1.9483437679105173
Epoch 10000 with cost: 1.8384129779157607
Epoch 100000 with cost: 0.5857047037944407
Epoch 1000000 with cost: 0.42154237597461997
Epoch 10000000 with cost: 0.28502171820793104


([-0.0046450250984596715, 2.3105787535268325, 0.9784990913616533], -7.717336212336842)

In [21]:
function predict(x, y, w, b)
    if σ(w'x + b) >= 0.5
        println("Predict Accepted")
        y == 1 ? println("Was Accepted") : println("Was not Accepted")
    else
        println("Predict Not Accepted")
        y == 1 ? println("Was Accepted") : println("Was Not Accepted")
    end
end

predict (generic function with 1 method)

In [22]:
for i = 1:length(x_data)
    predict(x_data[i], y_data[i], w, b)
    println()
end

Predict Accepted
Was Accepted

Predict Accepted
Was Accepted

Predict Not Accepted
Was Not Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was not Accepted

Predict Accepted
Was Accepted

Predict Not Accepted
Was Not Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was Accepted

Predict Not Accepted
Was Not Accepted

Predict Not Accepted
Was Not Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was not Accepted

Predict Not Accepted
Was Accepted

Predict Not Accepted
Was Not Accepted

Predict Not Accepted
Was Not Accepted

Predict Accepted
Was Accepted

Predict Not Accepted
Was Not Accepted

Predict Not Accepted
Was Not Accepted

Predict Not Accepted
Was Accepted

Predict Not Accepted
Was Not Accepted

Predict Not Accepted
Was Not Accepted

Predict Not Accepted
Was Not Accepted

Predict Not Accepted
Was Not Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was Accepted

Predict Not Accepted
Was Not Accepted

Predict Acce

In [23]:
function newpredict(x, y, w, b)
    if σ(w'x + b) >= 0.5
        return 1
    else
        return 0
    end
end

newpredict (generic function with 1 method)

In [24]:
mean_error = 0.0
for i = 1:length(x_data)
    mean_error += (newpredict(x_data[i], y_data[i], w, b) - y_data[i]) ^ 2
end

println(mean_error/length(x_data))

0.1
