# Project 1: Logistic Regression with Julia

## Huy Huynh

## DATA 4319

Logistic regression is a supervised learning classification algorithm used to predict the probability of a target variable. The nature of target or dependent variable is dichotomous, which means there would be only two possible classes.

In simple words, the dependent variable is binary in nature having data coded as either 1 (stands for success/yes) or 0 (stands for failure/no).



The dependent variable (Y) is binary, that is, it can take only two possible values 0 or 1. Example: If the objective is to determine a given transaction is fraudulent or not, the Y will have a value of 1 if it is fraudulent and 0 if not.

![1](https://miro.medium.com/max/1400/0*qDhdxS4TjN_TUJHV.jpg)

Logistic regression is a classification algorithm used to assign observations to a discrete set of classes. Unlike linear regression which outputs continuous number values, logistic regression transforms its output using the logistic sigmoid function to return a probability value which can then be mapped to two or more discrete classes.

The sigmoid function/logistic function is a function that resembles an “S” shaped curve when plotted on a graph. It takes values between 0 and 1 and “squishes” them towards the margins at the top and bottom, labeling them as 0 or 1.

The equation for the Sigmoid function is this:

![2](https://miro.medium.com/max/1280/1*OUOB_YF41M-O4GgZH_F2rw.png)

### Types of Logistic Regression ###

Generally, logistic regression means binary logistic regression having binary target variables, but there can be two more categories of target variables that can be predicted by it. Based on those number of categories, Logistic regression can be divided into following types 

- Binary or Binomial: when a dependent variable will have only two possible types either 1 and 0. 

- Multinomial: when dependent variable can have 3 or more possible unordered types or the types having no quantitative significance.

- Ordinal: when dependent variable can have 3 or more possible ordered types or the types having a quantitative significance. 

## Project ##

In [8]:
#Load Dataset
using CSV
using DataFrames
data= CSV.read("candidates.csv",DataFrame)

Unnamed: 0_level_0,Column1,gmat,gpa,work_experience,admitted
Unnamed: 0_level_1,Int64,Int64,Float64,Int64,Int64
1,0,780,4.0,3,1
2,1,750,3.9,4,1
3,2,690,3.3,3,0
4,3,710,3.7,5,1
5,4,680,3.9,4,0
6,5,730,3.7,6,1
7,6,690,2.3,1,0
8,7,720,3.3,4,1
9,8,740,3.3,5,1
10,9,690,1.7,1,0


### Selecting Feature
Here, you need to divide the given columns into two types of variables dependent(or target variable) and independent variable(or feature variables).

In [9]:
x_data=[[x[1],x[2]] for x in zip(data.gmat,data.gpa)]
y_data=[x for x in data.admitted]

40-element Vector{Int64}:
 1
 1
 0
 1
 0
 1
 0
 1
 1
 0
 0
 1
 1
 ⋮
 1
 1
 0
 0
 1
 1
 1
 0
 0
 0
 0
 1

In [10]:
σ(x) = 1/(1+exp(-x))

function cross_entropy_loss(x,y,w,b)
    return -y*log(σ(w'x + b)) - (1-y)*log(1-σ(w'x+b))
end 

function average_cost(features, labels, w, b)
    N = length(features)
    return (1/N)*sum([cross_entropy_loss(features[i], labels[i],w,b) for i = 1:N])
end

average_cost (generic function with 1 method)

In [11]:
function batch_gradient_descent(features,labels,w,b,α)
    del_w = [0.0 for i = 1:length(w)]
    del_b = 0.0
    N = length(features)
    for i = 1:N
        del_w += (σ(w'features[i] + b) - labels[i])*features[i]
        del_b += (σ(w'features[i] + b) - labels[i])
    end
    w = w - α*del_w
    b = b - α*del_b
    return w,b
end

batch_gradient_descent (generic function with 1 method)

In [13]:
w,b = batch_gradient_descent(x_data, y_data, [0.0,0.0], 0.0, 0.0001)

([0.012, 0.0006000000000000002], -0.0001)

In [14]:
function train_batch_gradient_descent(features,labels, w,b,α, epochs)
    for i = 1:epochs 
        
        w, b = batch_gradient_descent(features, labels, w,b,α)
        if i == 1
            println("Epochs ", i , " with loss ", average_cost(x_data, y_data,w,b))
        end
        if i == epochs/10
            println("Epochs ", i , " with loss ", average_cost(x_data, y_data,w,b))
        end
        if i == epochs/8
            println("Epochs ", i , " with loss ", average_cost(x_data, y_data,w,b))
        end
        if i == epochs/4
            println("Epochs ", i , " with loss ", average_cost(x_data, y_data,w,b))
        end
        if i == epochs/2
            println("Epochs ", i , " with loss ", average_cost(x_data, y_data,w,b))
        end
        if i == epochs
            println("Epochs ", i , " with loss ", average_cost(x_data, y_data,w,b))
        end
        end 
    return w,b
end

train_batch_gradient_descent (generic function with 1 method)

In [15]:
w = [0.0,0.0]
b = 0.0

w,b = train_batch_gradient_descent(x_data,y_data, w,b,0.0000001,1000000)

Epochs 1 with loss 0.6931188566349795
Epochs 100000 with loss 0.6855799117618873
Epochs 125000 with loss 0.6837589079497152
Epochs 250000 with loss 0.6749998518952888
Epochs 500000 with loss 0.6590868605720882
Epochs 1000000 with loss 0.6326918737673819


([-0.0020551903863979, 0.47622113690915635], -0.11626329950708124)

In [16]:
function predict(x,y,w,b)
    if σ(w'x+b) >= 0.5
        println("predict accepted")
        y==1 ? println("was accepted") : println("was not accepted")
    else
        println("predict not accepted")
        y==1 ? println("was accepted") : println("was not accepted")
    end 
end

predict (generic function with 1 method)

In [17]:
for i =1:length(x_data)
    predict(x_data[i],y_data[i],w,b)
    println("")
end

predict accepted
was accepted

predict accepted
was accepted

predict accepted
was not accepted

predict accepted
was accepted

predict accepted
was not accepted

predict accepted
was accepted

predict not accepted
was not accepted

predict not accepted
was accepted

predict not accepted
was accepted

predict not accepted
was not accepted

predict not accepted
was not accepted

predict accepted
was accepted

predict accepted
was accepted

predict accepted
was not accepted

predict not accepted
was accepted

predict accepted
was not accepted

predict not accepted
was not accepted

predict accepted
was accepted

predict accepted
was not accepted

predict not accepted
was not accepted

predict accepted
was accepted

predict not accepted
was not accepted

predict not accepted
was not accepted

predict accepted
was not accepted

predict accepted
was not accepted

predict accepted
was accepted

predict accepted
was accepted

predict not accepted
was not accepted

predict accepted
was accepte

In [18]:
function predict(x,y,w,b)
    if σ(w'x+b) >= 0.5
        return 1
    else
        return 0
    end 
end

predict (generic function with 1 method)

In [19]:
mean_error = 0.0
for i = 1:length(x_data)
    mean_error += (predict(x_data[i], y_data[i], w, b) - y_data[i])^2
end
print(mean_error/length(x_data))

0.275

## Interpret

The Logistic Regression model can accurately predict if someone will be admitted based on the data and result using the combination of their GPA, GMAT and amount of work experience