# Project 3  - Logistic Regression - using Julia
## Nhi Le
## DATA 4319 

Logistic regression uses logistic sigmoid activation, in contrast to linear regression, which
uses the identity function. As we've seen before, the output of the logistic sigmoid is in the
(0,1) range and can be interpreted as a probability function. We can use logistic regression
for a 2-class (binary) classification problem, where our target, t, can have two values,
usually 0 and 1 for the two corresponding classes. These discrete values shouldn't be
confused with the values of the logistic sigmoid function, which is a continuous real-valued
function between 0 and 1. The value of the sigmoid function represents the probability that
the output is in class 0 or class 1.

A logistic function or logistic curve is a common S-shaped curve (sigmoid curve) with equation

$${\displaystyle f(x)={\frac {L}{1+e^{-k(x-x_{0})}}},}$$


In statistics, logistic regression is a predictive analysis that used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables. Here is the basic formula of logistic regression:

![](https://camo.githubusercontent.com/c5e464fcd1955db626a19adf846bfb57ab5007e607b040e8f07ac9f579c8a5a1/687474703a2f2f666163756c74792e6361732e7573662e6564752f6d6272616e6e69636b2f72656772657373696f6e2f676966732f6c6f382e676966)

![](https://upload.wikimedia.org/wikipedia/commons/thumb/8/88/Logistic-curve.svg/480px-Logistic-curve.svg.png)

_Standard logistic sigmoid function i.e. L = 1, k = 0, x0 = 0_

#### Linear Regression Vs. Logistic Regression
Linear regression gives you a continuous output, but logistic regression provides a constant output. An example of the continuous output is house price and stock price. Example's of the discrete output is predicting whether a patient has cancer or not, predicting whether the customer will churn. Linear regression is estimated using Ordinary Least Squares (OLS) while logistic regression is estimated using Maximum Likelihood Estimation (MLE) approach.

![](https://res.cloudinary.com/dyd911kmh/image/upload/f_auto,q_auto:best/v1534281070/linear_vs_logistic_regression_edxw03.png)

### Types of Logistic Regression:

**Binary Logistic Regression:** The target variable has only two possible outcomes such as Spam or Not Spam, Cancer or No Cancer.

**Multinomial Logistic Regression:** The target variable has three or more nominal categories such as predicting the type of Wine.

**Ordinal Logistic Regression:** The target variable has three or more ordinal categories such as restaurant or product rating from 1 to 5.

In [1]:
#Load Dataset
using CSV
data= CSV.read("candidates_data.csv")

Unnamed: 0_level_0,gmat,gpa,work_experience,admitted
Unnamed: 0_level_1,Int64,Float64,Int64,Int64
1,780,4.0,3,1
2,750,3.9,4,1
3,690,3.3,3,0
4,710,3.7,5,1
5,680,3.9,4,0
6,730,3.7,6,1
7,690,2.3,1,0
8,720,3.3,4,1
9,740,3.3,5,1
10,690,1.7,1,0


### Selecting Feature
Here, you need to divide the given columns into two types of variables dependent(or target variable) and independent variable(or feature variables).

In [2]:
x_data=[[x[1],x[2]] for x in zip(data.gmat,data.gpa)]
y_data=[x for x in data.admitted]

40-element Array{Int64,1}:
 1
 1
 0
 1
 0
 1
 0
 1
 1
 0
 0
 1
 1
 ⋮
 1
 1
 0
 0
 1
 1
 1
 0
 0
 0
 0
 1

**Implementing logistic regression**

$$J(\mathbf{w}) = \sum_{i=1}^{m} - y^{(i)} log \bigg( \phi\big(z^{(i)}\big) \bigg) - \big(1 - y^{(i)}\big) log\bigg(1-\phi\big(z^{(i)}\big)\bigg).$$

In [3]:
σ(x) = 1/(1+exp(-x))

function cross_entropy_loss(x,y,w,b)
    return -y*log(σ(w'x + b)) - (1-y)*log(1-σ(w'x+b))
end 

function average_cost(features, labels, w, b)
    N = length(features)
    return (1/N)*sum([cross_entropy_loss(features[i], labels[i],w,b) for i = 1:N])
end

average_cost (generic function with 1 method)

In [4]:
function batch_gradient_descent(features,labels,w,b,α)
    del_w = [0.0 for i = 1:length(w)]
    del_b = 0.0
    N = length(features)
    for i = 1:N
        del_w += (σ(w'features[i] + b) - labels[i])*features[i]
        del_b += (σ(w'features[i] + b) - labels[i])
    end
    w = w - α*del_w
    b = b - α*del_b
    return w,b
end

batch_gradient_descent (generic function with 1 method)

In [6]:
w,b = batch_gradient_descent(x_data, y_data, [0.0,0.0], 0.0, 0.0001)

([0.012, 0.0006000000000000002], -0.0001)

In [7]:
function train_batch_gradient_descent(features,labels, w,b,α, epochs)
    for i = 1:epochs 
        
        w, b = batch_gradient_descent(features, labels, w,b,α)
        if i == 1
            println("Epochs ", i , " with loss ", average_cost(x_data, y_data,w,b))
        end
        if i == epochs/10
            println("Epochs ", i , " with loss ", average_cost(x_data, y_data,w,b))
        end
        if i == epochs/8
            println("Epochs ", i , " with loss ", average_cost(x_data, y_data,w,b))
        end
        if i == epochs/4
            println("Epochs ", i , " with loss ", average_cost(x_data, y_data,w,b))
        end
        if i == epochs/2
            println("Epochs ", i , " with loss ", average_cost(x_data, y_data,w,b))
        end
        if i == epochs
            println("Epochs ", i , " with loss ", average_cost(x_data, y_data,w,b))
        end
        end 
    return w,b
end

train_batch_gradient_descent (generic function with 1 method)

In [11]:
w = [0.0,0.0]
b = 0.0

w,b = train_batch_gradient_descent(x_data,y_data, w,b,0.0000001,1000000)

Epochs 1 with loss 0.6931188566349795
Epochs 100000 with loss 0.6855799117618873
Epochs 125000 with loss 0.6837589079497152
Epochs 250000 with loss 0.6749998518952888
Epochs 500000 with loss 0.6590868605720882
Epochs 1000000 with loss 0.6326918737673819


([-0.0020551903863979, 0.47622113690915635], -0.11626329950708125)

In [12]:
function predict(x,y,w,b)
    if σ(w'x+b) >= 0.5
        println("predict accepted")
        y==1 ? println("was accepted") : println("was not accepted")
    else
        println("predict not accepted")
        y==1 ? println("was accepted") : println("was not accepted")
    end 
end

predict (generic function with 1 method)

In [14]:
for i =1:length(x_data)
    predict(x_data[i],y_data[i],w,b)
    println("")
end

predict accepted
was accepted

predict accepted
was accepted

predict accepted
was not accepted

predict accepted
was accepted

predict accepted
was not accepted

predict accepted
was accepted

predict not accepted
was not accepted

predict not accepted
was accepted

predict not accepted
was accepted

predict not accepted
was not accepted

predict not accepted
was not accepted

predict accepted
was accepted

predict accepted
was accepted

predict accepted
was not accepted

predict not accepted
was accepted

predict accepted
was not accepted

predict not accepted
was not accepted

predict accepted
was accepted

predict accepted
was not accepted

predict not accepted
was not accepted

predict accepted
was accepted

predict not accepted
was not accepted

predict not accepted
was not accepted

predict accepted
was not accepted

predict accepted
was not accepted

predict accepted
was accepted

predict accepted
was accepted

predict not accepted
was not accepted

predict accepted
was accepte

In [16]:
function predict(x,y,w,b)
    if σ(w'x+b) >= 0.5
        return 1
    else
        return 0
    end 
end

predict (generic function with 1 method)

In [17]:
mean_error = 0.0
for i = 1:length(x_data)
    mean_error += (predict(x_data[i], y_data[i], w, b) - y_data[i])^2
end
print(mean_error/length(x_data))

0.275

___
### Conclusion:
The Logistic Regression model can accurately predict if someone will be admitted based on the data and result using the combination of their GPA, GMAT and amount of work experience
___