# INTRODUCTION

In this kernel, we will apply Logistic Regression procedure to "Gender Recognition by Voice Data"
1. [Read Data](#1)
1. [Logistic Regression](#2)
    1. [Determine Values](#3)
    1. [Train Test Split](#4)
    1. [Forward Backward Propagation](#5)
    1. [Prediction](#6)
    1. [Logistic Regression Algorithm](#7)
    1. [Logistic Regression with sklearn Library](#8)
1. [Conclusion](#9) 

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt


# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("../input"))

# Any results you write to the current directory are saved as output.

<a id="1"></a>
# Read Data

In [None]:
data=pd.read_csv("../input/voice.csv")

In [None]:
data.head()

In [None]:
data.info()

In [None]:
data.label.value_counts()

Let's classify male and female as male=1 and female=0. 

In [None]:
data.label=[  1 if i=="male" else 0 for i in data.label]

Now, label is a binary output, and our data is convenient for Logistic Regression.

In [None]:
data.head()

In [None]:
data.info()

<a id="2"></a>
# Logistic Regression

Logistic Regression is a classification algortihm. It is the simplest deep learning (neural network). 

First of all we want to train our data. So , we will use Computation Graph. Here are the components of Computation Graph.
* parameters: weights and bias(w and b )
* weights: coefficents of values of  each feature 
* z = ((w)^T)*x + b  or we can write  z = b + p1*w1 + p2*w2 + ... + p20*w20 for our data
* p1, p2,..., p20: values of each feature in data (this will be meaningful after train test split method !)
* y_head = sigmoid(z)
    * Sigmoid function (which is called as activation function) makes z between 0 and 1 so that is a probabilitic result. 
    * Mathematical equation of sigmoid function is   $f(x)=\displaystyle \frac{1}{1+\mathbb{e}^{-x}}$.


<a id="3"></a>
# Determine Values
First of all, we will determine x and y values for Logistic Regression.

In [None]:
y=data.label.values
x_data=data.drop(["label"],axis=1)

Let's check what is y and x_data.
* y is our output
* the values in x_data will be coefficients of weights. 

In [None]:
y

In [None]:
x_data.head()

To get an appropriate model we need to normalize the values in x_data.

In [None]:
# normalization =(a-min(a))/(max(a)-min(a))

x=(x_data-np.min(x_data))/(np.max(x_data)-np.min(x_data)).values

In [None]:
x.head()

<a id="4"></a>
# Train Test Split
We want to train our data by Linear Regression. But after getting our model, we need another data to test our model. So we will use **train_test_split**
to control the acurracy of our model.
* train_test_split says that take 80% of data to get the model and use 20% of data to control the model.

In [None]:
# create x_train, y_train, x_test, y_test arrays
from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42)

# our features must be row in our matrix.

x_train=x_train.T
x_test=x_test.T
y_train=y_train.T
y_test=y_test.T

print("x_train: ", x_train.shape)
print("x_test: ", x_test.shape)
print("y_train: ", y_train.shape)
print("y_test: ", y_test.shape)


Now, we split our data with train_test_split and we will use x_train and y_train  for Linear Regression Model. 
* We will define the functions which we'll use in Linear Regression. 
*  First, we will define initial weights, initial bias and sigmoid function.

In [None]:
# lets initialize parameters
# So what we need is dimension, that is, the number of features as a parameter for our initialize method(def)
# dimension=20
#initial weights=0.01, initial bias=0

def initialize_weights_and_bias(dimension):
    w = np.full((dimension,1),0.01)
    b = 0.0
    return w, b

#sigmoid function

def sigmoid(z):
    
    y_head=1/(1+np.exp(-z))
    return y_head
    

<a id="5"></a>
# Forward Backward Propagation

**Forward propagation** is the all steps from features (x_train) to cost.
*  z = ((w)^T)*x + b  or we can write  z = b + p1*w1 + p2*w2 + ... + p20*w20
* Then compute y_head=sigmoid(z)
* Calculate loss(error) function= $-(1-y).\log(1-\widetilde{y})-y.\log(\widetilde{y})$; ( actually we are finding y_head for each column in x_train matrix.)
    * We are using loss function to decide whether our prediction is correct or not.    
* Cost function=Summation of all loss functions.

**Backward propagation** means that we are updating parameters in terms of the value of Cost Funciton. So we will use y_head that we found in forward propagation.

*  Updating: There is a cost function(takes weight and bias). Take derivative of cost function according to weight and bias. Then multiply it with  α (learning rate). Then update weight. 
    * w = w - learning_rate * gradients["derivative_bias"]
* We will do the same thing for bias. 
     *     i.e.  Take derivative of bias according to weight and bias. Then multiply it with  α (learning rate). Then update bias.
         * b = b - learning_rate * gradients["derivative_bias"]        



In [None]:
# forward backward propagation

def forward_backward_propagation(w,b,x_train,y_train):
    #forward propagation
    z=np.dot(w.T,x_train)+b
    y_head=sigmoid(z)
    loss=-y_train*np.log(y_head)-(1-y_train)*np.log(1-y_head)
    cost=(np.sum(loss))/x_train.shape[1] # x_train.shape[1] is for scaling
    
    #backward propagation
    # In backward propagation we will use y_head that found in forward propagation
    derivative_weight=(np.dot(x_train,((y_head-y_train).T)))/x_train.shape[1] # x_train.shape[1] is for scaling
    derivative_bias=np.sum(y_head-y_train)/x_train.shape[1]                   # x_train.shape[1] is for scaling
    gradients = {"derivative_weight": derivative_weight,"derivative_bias": derivative_bias}
    
    return cost,gradients

When updating parameters, we need to choose wisely learning rate. Learning rate should be neither too big nor too small.
* Here, number_of_iterations and learning_rate are called as hyperparameter. That is, we need to set the values by hand. 

In [None]:
# Updating(learning) parameters
def update(w, b, x_train, y_train, learning_rate,number_of_iteration):
    cost_list = []
    cost_list2 = []
    index = []
    # updating(learning) parameters is number_of_iterarion times
    for i in range(number_of_iteration):
        # make forward and backward propagation and find cost and gradients
        cost,gradients = forward_backward_propagation(w,b,x_train,y_train)
        cost_list.append(cost)
        # lets update
        w = w - learning_rate * gradients["derivative_weight"]
        b = b - learning_rate * gradients["derivative_bias"]
        if i % 10 == 0:
            cost_list2.append(cost)
            index.append(i)
            print ("Cost after iteration %i: %f" %(i, cost))
    # we update(learn) parameters weights and bias
    parameters = {"weight": w,"bias": b}
    plt.plot(index,cost_list2)
    plt.xticks(index,rotation='vertical')
    plt.xlabel("Number of Iteration")
    plt.ylabel("Cost")
    plt.show()
    return parameters, gradients, cost_list

<a id="6"></a>
# Prediction
Up to here, we do:
* prepare our data for LR
* parameters: weights and bias
* initialize parameters
* sigmoid fuction
* loss function
* Cost function
* updating parameters
* Now let's predict.  In prediction step we have x_test as input 

In [None]:
#prediction
def predict(w,b,x_test):
    # x_test is an input for forward propagation
    z = sigmoid(np.dot(w.T,x_test)+b)
    Y_prediction = np.zeros((1,x_test.shape[1]))
    # if z is bigger than 0.5, our prediction is sign one (y_head=1),
    # if z is smaller than 0.5, our prediction is sign zero (y_head=0),
    for i in range(z.shape[1]):
        if z[0,i]<= 0.5:
            Y_prediction[0,i] = 0
        else:
            Y_prediction[0,i] = 1

    return Y_prediction

<a id="7"></a>
# Logistic Regression Algorithm
We make prediction.  Let's define logistic_regression function with learning_rate = 1, num_iterations = 100

In [None]:
#Logistic Regression

def logistic_regression(x_train, y_train, x_test, y_test, learning_rate ,  num_iterations):
    # initialize
    dimension =  x_train.shape[0]  # that is 20
    w,b = initialize_weights_and_bias(dimension)
    # do not change learning rate
    parameters, gradients, cost_list = update(w, b, x_train, y_train, learning_rate,num_iterations)
    
    y_prediction_test = predict(parameters["weight"],parameters["bias"],x_test)

    # Print train/test Errors
    print("test accuracy: {} %".format(100 - np.mean(np.abs(y_prediction_test - y_test)) * 100))
    
logistic_regression(x_train, y_train, x_test, y_test,learning_rate = 1, num_iterations = 100)

In [None]:
logistic_regression(x_train, y_train, x_test, y_test,learning_rate = 1, num_iterations = 300)

In [None]:
logistic_regression(x_train, y_train, x_test, y_test,learning_rate = 1, num_iterations = 500)

In [None]:
logistic_regression(x_train, y_train, x_test, y_test,learning_rate = 2, num_iterations = 500)

In [None]:
logistic_regression(x_train, y_train, x_test, y_test,learning_rate = 3, num_iterations = 300)

As we see, we have the best result when  learning_rate = 3, num_iterations = 300 or  learning_rate = 2, num_iterations = 500.

<a id="8"></a>
# Logistic Regression with sklearn Library
Also, we can use sklearn library to make Linear Regression.

In [None]:
from sklearn.linear_model import LogisticRegression
lr=LogisticRegression()

lr.fit(x_train.T,y_train.T)
print("test accuracy {}".format(lr.score(x_test.T,y_test.T)))


<a id="9"></a>
# Conclusion

If we write all the functions we need, then we get the best result for accuracy as  
* test accuracy: 97.94952681388013 % when  learning_rate = 2, num_iterations = 500.
* But if we use sklearn libray for Linear Regression our test accuracy is 0.9810725552050473.