## LOGISTIC REGRESSION ALGORITHM

**Logistic regression** *is a statistical method for predicting binary classes. The outcome or target variable is dichotomous in nature meaning, there are only two possible classes. For example, it can be used for cancer detection problems. It computes the probability of an event occurrence.*

*Logistic regression is a special case of linear regression where the target variable is categorical in nature. It uses a log of odds as the dependent variable. It also predicts the probability of occurrence of a binary event utilizing a logit function.*

#### Properties of Logistic Regression:

**1.The dependent variable in logistic regression follows Bernoulli Distribution.**

**2.Estimation is done through maximum likelihood.**

**3.No R Square, Model fitness is calculated through Concordance, KS-Statistics.**


In [1]:
#loading packages
using CSV
using DataFrames 

In [2]:
data = CSV.read("candidates_data.csv", DataFrame)
x_data = [[x[1], x[2], x[3]] for x in zip(data.gmat, data.gpa, data.work_experience)]
y_data = [x for x in data.admitted];

In [3]:
using Lathe.preprocess: TrainTestSplit
train, test = TrainTestSplit(data, 0.6);

## TRAIN DATA

In [4]:
#training data
train_x = [[x[1], x[2], x[3]] for x in zip(train.gmat, train.gpa, train.work_experience)]
train_y = [x for x in train.admitted]
train_data = [x for x in zip(train_x, train_y)]

25-element Array{Tuple{Array{Float64,1},Int64},1}:
 ([780.0, 4.0, 3.0], 1)
 ([690.0, 3.3, 3.0], 0)
 ([710.0, 3.7, 5.0], 1)
 ([680.0, 3.9, 4.0], 0)
 ([690.0, 2.3, 1.0], 0)
 ([720.0, 3.3, 4.0], 1)
 ([740.0, 3.3, 5.0], 1)
 ([690.0, 1.7, 1.0], 0)
 ([610.0, 2.7, 3.0], 0)
 ([690.0, 3.7, 5.0], 1)
 ([680.0, 3.3, 4.0], 0)
 ([610.0, 3.0, 1.0], 0)
 ([650.0, 3.7, 6.0], 1)
 ([540.0, 2.7, 2.0], 0)
 ([670.0, 3.3, 6.0], 1)
 ([660.0, 3.7, 4.0], 1)
 ([580.0, 2.3, 2.0], 0)
 ([650.0, 3.7, 6.0], 1)
 ([640.0, 3.0, 1.0], 0)
 ([620.0, 2.7, 2.0], 0)
 ([660.0, 3.3, 6.0], 1)
 ([670.0, 2.7, 2.0], 0)
 ([580.0, 3.3, 1.0], 0)
 ([590.0, 1.7, 4.0], 0)
 ([690.0, 3.7, 5.0], 1)

## Running algorithm on train data 

In [5]:
σ(x) = 1/(1+ exp(-x))

function cross_entropy_loss(x, y, w, b)
    return -y*log(σ(w'x + b)) - (1-y)*log(1- σ(w'x + b))
end

function average_cost(features, labels, w, b)
    N = length(features)
    return (1/N)*sum([cross_entropy_loss(features[i], labels[i], w, b) for i = 1:N])
end

average_cost (generic function with 1 method)

In [6]:
function batch_gradient_descent(features, labels, w, b, α)
    del_w = [0.0 for i = 1:length(w)]
    del_b = 0.0
    
    N = length(features)
    
    for i = 1:N
        del_w += (σ(w'features[i]+b) - labels[i])*features[i]
        del_b += (σ(w'features[i]+b) - labels[i])
    end
    
    w = w - α*del_w
    b = b - α*del_b
    
    return w, b
end

batch_gradient_descent (generic function with 1 method)

In [7]:
w = [0.0, 0.0, 0.0]
b = 0.0
println("The initial cost is: ", average_cost(train_x, train_y, w, b))

w, b = batch_gradient_descent(train_x, train_y, w, b, 0.0000001)
println("The new cost is: ", average_cost(train_x, train_y, w, b))

w, b = batch_gradient_descent(train_x, train_y, w, b, 0.0000001)
println("The new cost is: ", average_cost(train_x, train_y, w, b))

w, b = batch_gradient_descent(train_x, train_y, w, b, 0.0000001)
println("The new cost is: ", average_cost(train_x, train_y, w, b))


The initial cost is: 0.6931471805599452
The new cost is: 0.6917978742964508
The new cost is: 0.6910855407631421
The new cost is: 0.6907090512344928


In [8]:
function batch_gradient_descent(features, labels, w, b, α)
    del_w = [0.0 for i = 1:length(w)]
    del_b = 0.0
    
    N = length(features)
    
    for i = 1:N
        del_w += (σ(w'features[i]+b) - labels[i])*features[i]
        del_b += (σ(w'features[i]+b) - labels[i])
    end
    
    w = w - α*del_w
    b = b - α*del_b
    
    return w, b
end

batch_gradient_descent (generic function with 1 method)

In [9]:
function train_batch_gradient_descent(features, labels, w, b, α, epochs)
    for i = 1:epochs
        
        w, b = batch_gradient_descent(features, labels, w, b, α)
        
        if i == 1
            println("Epoch ", i, " with cost: ", average_cost(train_x, train_y, w, b))
        end
        
        if i == 10
            println("Epoch ", i, " with cost: ", average_cost(train_x, train_y, w, b))
        end
        
        if i == 100
            println("Epoch ", i, " with cost: ", average_cost(train_x, train_y, w, b))
        end
        
        if i == 1000
            println("Epoch ", i, " with cost: ", average_cost(train_x, train_y, w, b))
        end
        
        if i == 10000
            println("Epoch ", i, " with cost: ", average_cost(train_x, train_y, w, b))
        end
        
        if i == 100000
            println("Epoch ", i, " with cost: ", average_cost(train_x, train_y, w, b))
        end
        
        if i == 1000000
            println("Epoch ", i, " with cost: ", average_cost(train_x, train_y, w, b))
        end
        
        if i == 10000000
            println("Epoch ", i, " with cost: ", average_cost(train_x, train_y, w, b))
        end
    end
    
    return w, b
end

train_batch_gradient_descent (generic function with 1 method)

In [10]:
#random initialization of weights and bias 
w = randn(3)
b = randn(1)[1]

w, b = train_batch_gradient_descent(train_x, train_y, w, b, 0.000001, 1000000)

Epoch 1 with cost: 36.30210868507543
Epoch 10 with cost: 15.397233275476252
Epoch 100 with cost: 1.697734553359878
Epoch 1000 with cost: 1.6680420014082895
Epoch 10000 with cost: 1.3910592693614807
Epoch 100000 with cost: 0.6145896711798952
Epoch 1000000 with cost: 0.31534522119154834


([-0.012170782556997174, 0.7484944014006082, 1.5396491746592345], -0.053599965799610516)

In [11]:
#using w,b assigned above
w, b = train_batch_gradient_descent(train_x, train_y, w, b, 0.000001, 1000000)

Epoch 1 with cost: 0.31534519677326356
Epoch 10 with cost: 0.31534497700999503
Epoch 100 with cost: 0.3153427795054063
Epoch 1000 with cost: 0.3153208172610333
Epoch 10000 with cost: 0.3151024667856411
Epoch 100000 with cost: 0.3130386167929044
Epoch 1000000 with cost: 0.2999624686893665


([-0.01428750704562652, 1.262010425141379, 1.5668715436555116], -0.38665064635636504)

In [12]:
w, b = train_batch_gradient_descent(train_x, train_y, w, b, 0.000001, 1000000)

Epoch 1 with cost: 0.2999624590332158
Epoch 10 with cost: 0.29996237212818094
Epoch 100 with cost: 0.2999615031096004
Epoch 1000 with cost: 0.2999528160993786
Epoch 10000 with cost: 0.2998662622744392
Epoch 100000 with cost: 0.2990311122890295
Epoch 1000000 with cost: 0.2927999589535113


([-0.015162733620390877, 1.5438643384796278, 1.5651062397162048], -0.6987150074923895)

In [13]:
function predict(x, y, w, b)
    if σ(w'x+b) >= .5
        return 1
    else
        return 0
    end
end


predict (generic function with 1 method)

In [14]:
mean_error = 0.0
for i = 1:length(train_x)
    mean_error += (predict(train_x[i], train_y[i], w, b)-train_y[i])^2
end
println(mean_error/length(train_x))

0.16


## TEST DATA

In [15]:
#test data
test_x = [[x[1], x[2], x[3]] for x in zip(test.gmat, test.gpa, test.work_experience)]
test_y = [x for x in test.admitted]
test_data = [x for x in zip(test_x, test_y)]

15-element Array{Tuple{Array{Float64,1},Int64},1}:
 ([750.0, 3.9, 4.0], 1)
 ([730.0, 3.7, 6.0], 1)
 ([710.0, 3.7, 6.0], 1)
 ([770.0, 3.3, 3.0], 1)
 ([580.0, 2.7, 4.0], 0)
 ([590.0, 2.3, 3.0], 0)
 ([620.0, 3.3, 2.0], 1)
 ([600.0, 2.0, 1.0], 0)
 ([550.0, 2.3, 4.0], 0)
 ([550.0, 2.7, 1.0], 0)
 ([570.0, 3.0, 2.0], 0)
 ([660.0, 3.3, 5.0], 1)
 ([660.0, 4.0, 4.0], 1)
 ([680.0, 3.3, 5.0], 1)
 ([650.0, 2.3, 1.0], 0)

## Running algorithm on test data

In [16]:
σ(x) = 1/(1+ exp(-x))

function cross_entropy_loss(x, y, w, b)
    return -y*log(σ(w'x + b)) - (1-y)*log(1- σ(w'x + b))
end

function average_cost(features, labels, w, b)
    N = length(features)
    return (1/N)*sum([cross_entropy_loss(features[i], labels[i], w, b) for i = 1:N])
end

average_cost (generic function with 1 method)

In [17]:
function batch_gradient_descent(features, labels, w, b, α)
    del_w = [0.0 for i = 1:length(w)]
    del_b = 0.0
    
    N = length(features)
    
    for i = 1:N
        del_w += (σ(w'features[i]+b) - labels[i])*features[i]
        del_b += (σ(w'features[i]+b) - labels[i])
    end
    
    w = w - α*del_w
    b = b - α*del_b
    
    return w, b
end

batch_gradient_descent (generic function with 1 method)

In [18]:
w = [0.0, 0.0, 0.0]
b = 0.0
println("The initial cost is: ", average_cost(test_x, test_y, w, b))

w, b = batch_gradient_descent(test_x, test_y, w, b, 0.0000001)
println("The new cost is: ", average_cost(test_x, test_y, w, b))

w, b = batch_gradient_descent(test_x, test_y, w, b, 0.0000001)
println("The new cost is: ", average_cost(test_x, test_y, w, b))

w, b = batch_gradient_descent(test_x, test_y, w, b, 0.0000001)
println("The new cost is: ", average_cost(test_x, test_y, w, b))


The initial cost is: 0.6931471805599453
The new cost is: 0.6897379588079332
The new cost is: 0.68731872501649
The new cost is: 0.6856011930418255


In [19]:
function batch_gradient_descent(features, labels, w, b, α)
    del_w = [0.0 for i = 1:length(w)]
    del_b = 0.0
    
    N = length(features)
    
    for i = 1:N
        del_w += (σ(w'features[i]+b) - labels[i])*features[i]
        del_b += (σ(w'features[i]+b) - labels[i])
    end
    
    w = w - α*del_w
    b = b - α*del_b
    
    return w, b
end

batch_gradient_descent (generic function with 1 method)

In [20]:
function test_batch_gradient_descent(features, labels, w, b, α, epochs)
    for i = 1:epochs
        
        w, b = batch_gradient_descent(features, labels, w, b, α)
        
        if i == 1
            println("Epoch ", i, " with cost: ", average_cost(test_x, test_y, w, b))
        end
        
        if i == 10
            println("Epoch ", i, " with cost: ", average_cost(test_x, test_y, w, b))
        end
        
        if i == 100
            println("Epoch ", i, " with cost: ", average_cost(test_x, test_y, w, b))
        end
        
        if i == 1000
            println("Epoch ", i, " with cost: ", average_cost(test_x, test_y, w, b))
        end
        
        if i == 10000
            println("Epoch ", i, " with cost: ", average_cost(test_x, test_y, w, b))
        end
        
        if i == 100000
            println("Epoch ", i, " with cost: ", average_cost(test_x, test_y, w, b))
        end
        
        if i == 1000000
            println("Epoch ", i, " with cost: ", average_cost(test_x, test_y, w, b))
        end
        
        if i == 10000000
            println("Epoch ", i, " with cost: ", average_cost(test_x, test_y, w, b))
        end
    end
    
    return w, b
end

test_batch_gradient_descent (generic function with 1 method)

In [21]:
#random initialization of weights and bias 
w = randn(3)
b = randn(1)[1]

w, b = test_batch_gradient_descent(test_x, test_y, w, b, 0.000001, 1000000)

Epoch 1 with cost: 84.46479790802545
Epoch 10 with cost: 65.78169715802545
Epoch 100 with cost: 0.640615368893551
Epoch 1000 with cost: 0.6400653877413574
Epoch 10000 with cost: 0.6346395963083428
Epoch 100000 with cost: 0.5877927093831397
Epoch 1000000 with cost: 0.4749454842722376


([-0.011655922027250062, 1.6308276456454425, 0.6520970352847868], 0.6252010518163176)

In [22]:
#using w,b assigned above
w, b = test_batch_gradient_descent(test_x, test_y, w, b, 0.000001, 1000000)

Epoch 1 with cost: 0.474945443419782
Epoch 10 with cost: 0.47494507574865186
Epoch 100 with cost: 0.47494139913354877
Epoch 1000 with cost: 0.4749046425953545
Epoch 10000 with cost: 0.47453803167649206
Epoch 100000 with cost: 0.47096147795253346
Epoch 1000000 with cost: 0.44149744938233004


([-0.012853418284992051, 2.1092273614214068, 0.5974533011824606], 0.10849657236775667)

In [23]:
w, b = test_batch_gradient_descent(test_x, test_y, w, b, 0.000001, 1000000)

Epoch 1 with cost: 0.4414974214801037
Epoch 10 with cost: 0.4414971703604529
Epoch 100 with cost: 0.44149465920209496
Epoch 1000 with cost: 0.4414695514326542
Epoch 10000 with cost: 0.4412188543007811
Epoch 100000 with cost: 0.4387491059840249
Epoch 1000000 with cost: 0.41709081998654524


([-0.013602825259597221, 2.453200636145337, 0.5745545892345967], -0.3877274323863138)

In [24]:
function predict(x, y, w, b)
    if σ(w'x+b) >= .5
        return 1
    else
        return 0
    end
end


predict (generic function with 1 method)

In [25]:
mean_error = 0.0
for i = 1:length(test_x)
    mean_error += (predict(test_x[i], test_y[i], w, b)-test_y[i])^2
end
println(mean_error/length(test_x))

0.26666666666666666


**We can conclude that there is high accuracy in predicting whether or not a student got an admission. However, the train data appears to be more accurate with an average mean error of 0.16 vs 0.27 for the test data.**