# Logistic Regression
In this notebook we will be implementing batch gradient descent to train for Logistic Regression.

Logistic Regression is a predictive algorithm in Machine Learning that is used for binary classification. It predicts the probability of a class and then classifies it based on the predictor variables' values. Logistic regression uses logistic sigmoid activation, in contrast to linear regression, which
uses the identity function. As we've seen before, the output of the logistic sigmoid is in the
(0,1) range and can be interpreted as a probability function. We can use logistic regression
for a 2-class (binary) classification problem, where our target, t, can have two values,
usually 0 and 1 for the two corresponding classes. These discrete values shouldn't be
confused with the values of the logistic sigmoid function, which is a continuous real-valued
function between 0 and 1. The value of the sigmoid function represents the probability that
the output is in class 0 or class 1.

A logistic function or logistic curve is a common S-shaped curve (sigmoid curve) with the equation:

$${\displaystyle f(x)={\frac {L}{1+e^{-k(x-x_{0})}}},}$$

In statistics, logistic regression is a predictive analysis that used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables. Here is the basic formula of logistic regression:

$$ln({\frac{P}{1-P})}=a+bX$$

$${\frac{P}{1-P}}=e^{a+bX}$$

$$P={\frac{e^{a+bX}}{1+e^{a+bX}}}$$

We will be implementing these formulas to determine the likelyhood of college applicants being admitted to a univeristy. For this we will need the following packages:
* CSV [documentation](https://juliadata.github.io/CSV.jl/stable/)
* DataFrames [documentation](https://juliadata.github.io/DataFrames.jl/stable/)
_______

In [2]:
using CSV
using DataFrames

data = CSV.read("candidates_data.csv", DataFrame)
first(data, 10)

Unnamed: 0_level_0,gmat,gpa,work_experience,admitted
Unnamed: 0_level_1,Int64,Float64,Int64,Int64
1,780,4.0,3,1
2,750,3.9,4,1
3,690,3.3,3,0
4,710,3.7,5,1
5,680,3.9,4,0
6,730,3.7,6,1
7,690,2.3,1,0
8,720,3.3,4,1
9,740,3.3,5,1
10,690,1.7,1,0


### Selecting Features
Here, we need to divide the given columns into two types of variables dependent(or target variable) and independent variable(or feature variables).

In [3]:
x_data = [[x[1], x[2]] for x in zip(data.gmat, data.gpa)]
y_data = [x for x in data.admitted];

In the code below we will model the probability using the logistic regression function: $$g(z)=1/(1+e^{-x})$$ 

We will us cross-entrophy  to define a loss function $$(-ylog\hat{y}-(1-y)log(1-\hat{y}))$$ 

along with the average loss. $$-1/N \sum_{n=1}^{N} [y_n log\hat{y}_n + (1 - y_n) log (1 - \hat{y}_n)]$$




In [4]:
σ(x) = 1/(1+exp(-x))

function cross_entrophy_loss(x, y, w, b)
    return -y*log(σ(w'x + b)) -(1-y)*log(1 - σ(w'x+b))
end

function average_loss(features, labels, w, b)
    N = length(features)
    return (1/N)*sum([cross_entrophy_loss(features[i], labels[i], w, b) for i = 1:N])
end

average_loss (generic function with 1 method)

In [5]:
function batch_gradient_descent(features, labels, w, b, α)
    del_w = [0.0 for i = 1:length(w)]
    del_b = 0.0
    
    N = length(features)
    
    for i = 1;N
        del_w += (σ(w'features[i]+b) - labels[i])*features[i]
        del_b += (σ(w'features[i]+b) - labels[i])
    end
    
    w = w - α*del_w
    b = b - α*del_b
    
    return w, b
end

batch_gradient_descent (generic function with 1 method)

In [6]:
w = [0.0, 0.0]
b = 0.0
println("The initial cost is: ", average_loss(x_data, y_data, w, b))

         #notice it getting smaller

The initial cost is: 0.6931471805599451


In [11]:
function train_batch_gradient_descent(features,labels, w,b,α, epochs)
    for i = 1:epochs 
        
        w, b = batch_gradient_descent(features, labels, w,b,α)
        if i == 1
            println("Epochs ", i , " with loss ", average_loss(x_data, y_data,w,b))
        end
        if i == epochs/10
            println("Epochs ", i , " with loss ", average_loss(x_data, y_data,w,b))
        end
        if i == epochs/8
            println("Epochs ", i , " with loss ", average_loss(x_data, y_data,w,b))
        end
        if i == epochs/4
            println("Epochs ", i , " with loss ", average_loss(x_data, y_data,w,b))
        end
        if i == epochs/2
            println("Epochs ", i , " with loss ", average_loss(x_data, y_data,w,b))
        end
        if i == epochs
            println("Epochs ", i , " with loss ", average_loss(x_data, y_data,w,b))
        end
        end 
    return w,b
end

train_batch_gradient_descent (generic function with 1 method)

In [12]:
w = [0.0, 0.0]
b = 0.0

w, b = train_batch_gradient_descent(x_data, y_data, w, b, 0.0000001, 10000)

Epochs 1 with loss 0.6931121727730898
Epochs 1000 with loss 1.7201365047552426
Epochs 1250 with loss 1.8100004134107341
Epochs 2500 with loss 2.0925733760893857
Epochs 5000 with loss 2.3779706486277505
Epochs 10000 with loss 2.6646298882428856


([0.008207757919005253, 4.209106625130911e-5], 1.0522766562827277e-5)

In [13]:
function predict(x, y, w, b)
    if σ(w'x+b) >= .5
        println("Predict Accepted")
        y == 1 ? println("Was Accepted") : println("Was Not Accepted")
    else
        println("Predict Not Accepted")
        y == 1 ? println("Was Accepted") : println("Was Not Accepted")
        
    end
end

predict (generic function with 1 method)

In [14]:
for i = 1:length(x_data)
    predict(x_data[i], y_data[i], w, b)
    println()
end

Predict Accepted
Was Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was Not Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was Not Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was Not Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was Not Accepted

Predict Accepted
Was Not Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was Not Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was Not Accepted

Predict Accepted
Was Not Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was Not Accepted

Predict Accepted
Was Not Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was Not Accepted

Predict Accepted
Was Not Accepted

Predict Accepted
Was Not Accepted

Predict Accepted
Was Not Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was Not Accepted

Predict Accepted
Was Accepted

Predict Accepted
Was Accepted

Predict Ac

In [15]:
function predict(x, y, w, b)
    if σ(w'x+b) >= .5
        return 1
    else
        return 0
    end
end

predict (generic function with 1 method)

In [16]:
mean_error = 0.0
for i = 1:length(x_data)
    mean_error += (predict(x_data[i], y_data[i], w, b) - y_data[i])^2
end

println(mean_error/length(x_data))

0.525


### Closing statements:


- We see that the Logistic Regression model can accurately predict the likelyhood of being admitted based on data and results using GPA, GMAT, and work experience. 