
<font color='black'>
# Content:
*  [Introduction](#1)
    
*  [Data Set](#2)

    * [Loading the Dataset](#2)
    * [Exploratory Data Analysis](#3)
    * [Missing Values Treatment](#4)
    * [Visualization](#5)
    
    
*  [Logistic Regression](#6)

    * [Preprocessing: Label encoder and Normalization](#7)
    * [Train Test Split](#8)
    * [Parameter Initialize and Sigmoid Function](#9)
    * [Forward and Backward Propogarion Combined](#10)
    * [Gradient Decent for Logistic Regression](#11)
    * [Prediction](#12)
    * [Logistic Regression with Math](#13)
    * [Logistic Regression with Sklearn](#14)

---

<a id = '1'></a>
## Introduction
Determining a personâ€™s gender as male or female, based upon a sample of their voice seems to initially be an easy task. Often, the human ear can easily detect the difference between a male or female voice within the first few spoken words. However, designing a computer program to do this turns out to be a bit trickier.

The model is constructed using 3,168 recorded samples of male and female voices, speech, and utterances. The samples are processed using acoustic analysis and then applied to an artificial intelligence/machine learning algorithm to learn gender-specific traits.

---

#### A short description as on 'Data' tab on kaggle is :

meanfreq: mean frequency (in kHz)

sd: standard deviation of frequency

median: median frequency (in kHz)

Q25: first quantile (in kHz)

Q75: third quantile (in kHz)

IQR: interquantile range (in kHz)

skew: skewness (see note in specprop description)

kurt: kurtosis (see note in specprop description)

sp.ent: spectral entropy

sfm: spectral flatness

mode: mode frequency

centroid: frequency centroid (see specprop)

peakf: peak frequency (frequency with highest energy)

meanfun: average of fundamental frequency measured across acoustic signal

minfun: minimum fundamental frequency measured across acoustic signal

maxfun: maximum fundamental frequency measured across acoustic signal

meandom: average of dominant frequency measured across acoustic signal

mindom: minimum of dominant frequency measured across acoustic signal

maxdom: maximum of dominant frequency measured across acoustic signal

dfrange: range of dominant frequency measured across acoustic signal

modindx: modulation index. Calculated as the accumulated absolute difference between adjacent measurements of fundamental frequencies divided by the frequency range

label: male or female

Note that we have 3168 voice samples and for each of sample 20 different acoustic properties are recorded. Finally the 'label' column is the target variable which we have to predict which is the gender of the person****


---

<h2><center>Importing Various Modules</center></h2>

---

In [None]:

import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno
import warnings
warnings.filterwarnings("ignore")

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


---

<a id = '2'></a>
<h2><center>Loading the Dataset</center></h2>


---

In [None]:
data_voice = pd.read_csv("/kaggle/input/voicegender/voice.csv")

data = data_voice.copy()

data.tail()

---

<a id = '3'></a>
<h2><center>Exploratory Data Analysis</center></h2>

---

In [None]:
data.describe().T

In [None]:
data.corr()

In [None]:
data.info()

---

<a id = '4'></a>
<h2><center>Missing Values Treatment</center></h2>

---

In [None]:
data.isnull().sum()

<h2><center>Missing data visualization</center></h2>

In [None]:
msno.matrix(data)
plt.show()

---

<a id = '5'></a>
<h2><center>Visualization</center></h2>

---

In [None]:
plt.subplots(4,5,figsize=(15,15))
for i in range(1,21):
    plt.subplot(4,5,i)
    plt.title(data.columns[i-1])
    sns.kdeplot(data.loc[data['label'] == "female", data.columns[i-1]], color= 'red', label='Female')
    sns.kdeplot(data.loc[data['label'] == "male", data.columns[i-1]], color= 'blue', label='Male')

---
* Most significant features are Q25, IQR and meanfun. We will build models by using the 20 features and the 3 distinct features
* We will plot female and male classes according to their meanfun(X), IQR(Y), and Q25(Z).
---

In [None]:
# import data again to avoid confusion

import plotly.graph_objs as go
from plotly.offline import init_notebook_mode, iplot
data_voice = pd.read_csv("/kaggle/input/voicegender/voice.csv")
data = data_voice.copy()

male = data[data.label == "male"]

female = data[data.label == "female"]

# trace1
trace1 = go.Scatter3d(
    x=male.meanfun,
    y=male.IQR,
    z=male.Q25,
    mode='markers',
    name = "MALE",
    marker=dict(
        color='rgb(54, 170, 127)',
        size=12,
        line=dict(
            color='rgb(204, 204, 204)',
            width=0.1
        )
    )
)

trace2 = go.Scatter3d(
    x=female.meanfun,
    y=female.IQR,
    z=female.Q25,
    mode='markers',
    name = "FEMALE",
    marker=dict(
        color='rgb(217, 100, 100)',
        size=12,
        line=dict(
            color='rgb(255, 255, 255)',
            width=0.1
        )
    )
)

data1 = [trace1, trace2]
layout = go.Layout(
    title = ' 3D VOICE DATA ',
    margin=dict(
        l=0,
        r=0,
        b=0,
        t=0
    )
)
fig = go.Figure(data=data1, layout=layout)

iplot(fig)

---

<a id = '6'></a>
<h2><center>Logistic Regression</center></h2>

Logistic regression is a classification algorithm used to assign observations to a discrete set of classes. Unlike linear regression which outputs continuous number values, logistic regression transforms its output using the logistic sigmoid function to return a probability value which can then be mapped to two or more discrete classes.

---

---

<h2><center>Computation Graph of Logistic Regression</center></h2>

![](https://machinethink.net/images/tensorflow-on-ios/LogisticRegression@2x.png)

*      Parameters are weight and bias.
*      Weights: coefficients of each accoustic properties
*      Bias: intercept
*      z = (w.t)x + b  => z equals to (transpose of weights times input x) + bias 
*      In an other saying => z = b + x1*w1 + x2*w2 + ... + x18*w18 + x19*w19


---

---

<a id = '7'></a>
<h2><center>Preprocessing: label encoder and normalization</center></h2>

Normalization is a systematic approach of decomposing tables to eliminate data redundancy(repetition) and undesirable characteristics like Insertion, Update and Deletion Anomalies. It is a multi-step process that puts data into tabular form, removing duplicated data from the 
relation tables.

---

In [None]:
#Normalization

data.label = [1 if each == "male" else 0 for each in data_voice.label]
y = data.label.values
x_data = data.drop(["label"],axis=1)
x = (x_data - np.min(x_data))/(np.max(x_data)-np.min(x_data)).values

---

<a id = '8'></a>
<h2><center>Train Test Split</center></h2>


<center>80% of the data will be used for the training, rest of the data will be used for the test</center>

---

In [None]:

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = 0.2,random_state=42)

x_train = x_train.T
x_test = x_test.T
y_train = y_train.T
y_test = y_test.T

---

<a id = '9'></a>
<h2><center>Parameter Initialize and Sigmoid Function</center></h2>


Sigmoid activation
In order to map predicted values to probabilities, we use the sigmoid function. The function maps any real value into another value between 0 and 1. In machine learning, we use sigmoid to map predictions to probabilities.
#  $$S(z) = \frac{1} {1 + e^{-z}}$$

![](https://upload.wikimedia.org/wikipedia/commons/8/88/Logistic-curve.svg)


*Figure1 : https://upload.wikimedia.org/wikipedia/commons/8/88/Logistic-curve.svg*

---

In [None]:

# w,b = Initialize Weights and Bias

def initialize_weights_and_bias(dimension):
    
    w = np.full((dimension,1),0.01)
    b = 0.0
    return w,b


# Sigmoid Function

def sigmoid(z):
    
    y_head = 1/(1+ np.exp(-z))
    return y_head

---

<a id = '10'></a>
<h2><center>Forward and Backward Propogarion Combined</center></h2>

---

## $$Loss Function   =>  L(\hat{y},y) =  -[ylog\hat{y}+(1-y)log(1-\hat{y})]$$

* Cost function is summation of all loss(error).

## $$Cost Function = \frac{1}{m} \sum_{i=1}^{m} L(\hat{y},y)$$

In [None]:

def forward_backward_propagation(w,b,x_train,y_train):
    # forward propagation
    z = np.dot(w.T,x_train) + b
    y_head = sigmoid(z)
    loss = -y_train*np.log(y_head)-(1-y_train)*np.log(1-y_head) # Loss Function
    cost = (np.sum(loss))/x_train.shape[1]                      # Cost Function
    # backward propagation
    derivative_weight = (np.dot(x_train,((y_head-y_train).T)))/x_train.shape[1] # x_train.shape[1]  is for scaling
    derivative_bias = np.sum(y_head-y_train)/x_train.shape[1]                 # x_train.shape[1]  is for scaling
    gradients = {"derivative_weight": derivative_weight, "derivative_bias": derivative_bias}
    
    return cost,gradients

---
<a id = '11'></a>
<h2><center>Gradient Decent for Logistic Regression</center></h2>

* Unlike linear regression, which has a closed-form solution, gradient decent is applied in logistic regression. The general idea of gradient descent is to tweak parameters w and b iteratively to minimize a cost function. There are three typical gradient decent, including Batch Gradient Decent, Mini-batch Gradient Decent and Stochastic Gradient Decent. In this blog, Batch Gradient Decent is used.

---


![](https://miro.medium.com/max/1200/1*iNPHcCxIvcm7RwkRaMTx1g.jpeg)
Figure Source: https://saugatbhattarai.com.np/what-is-gradient-descent-in-machine-learning/

---

* An initial value is assigned to w; then iteratively update w by Learning Rate * Gradient of cost function. The algorithm will not stop until it converges.

![](https://miro.medium.com/max/966/1*kUmtH0lRS-euZriSNaqgHQ.gif)

---

In [None]:
def update(w, b, x_train, y_train, learning_rate,number_of_iterarion):
    cost_list = []
    cost_list2 = []
    index = []
    
    # updating(learning) parameters is number_of_iterarion times
    for i in range(number_of_iterarion):
        # make forward and backward propagation and find cost and gradients
        cost,gradients = forward_backward_propagation(w,b,x_train,y_train)
        cost_list.append(cost)
        # lets update
        w = w - learning_rate * gradients["derivative_weight"]
        b = b - learning_rate * gradients["derivative_bias"]
        if i % 1000 == 0:
            cost_list2.append(cost)
            index.append(i)
            print ("Cost after iteration %i: %f" %(i, cost))
            
    # we update(learn) parameters weights and bias
    parameters = {"weight": w,"bias": b}
    plt.plot(index,cost_list2)
    plt.xticks(index,rotation='vertical')
    plt.xlabel("Number of Iterarion")
    plt.ylabel("Cost")
    plt.grid(True)
    plt.title("We Update(learn) Parameters Weights and Bias")
    plt.show()
    return parameters, gradients, cost_list

---

<a id = '12'></a>
<h2><center>Prediction</center></h2>

---

In [None]:
def predict(w,b,x_test):
    # x_test is a input for forward propagation
    z = sigmoid(np.dot(w.T,x_test)+b)
    Y_prediction = np.zeros((1,x_test.shape[1]))
    # if z is bigger than 0.5, our prediction is sign one (y_head=1),
    # if z is smaller than 0.5, our prediction is sign zero (y_head=0),
    for i in range(z.shape[1]):
        if z[0,i]<= 0.5:
            Y_prediction[0,i] = 0
        else:
            Y_prediction[0,i] = 1

    return Y_prediction

---

<a id = '13'></a>
<h2><center>Logistic Regression with Math</center></h2>

---

<h2><center>Hyperparameters</center></h2>
<h4><center>* Learning Rate = 0.2 </center></h4>
<h4><center>* Iteration = 15000</center></h4>



In [None]:

def logistic_regression(x_train, y_train, x_test, y_test, learning_rate ,  num_iterations):
    # initialize
    dimension =  x_train.shape[0]  
    w,b = initialize_weights_and_bias(dimension)
    parameters, gradients, cost_list = update(w, b, x_train, y_train, learning_rate,num_iterations)
    
    y_prediction_test = predict(parameters["weight"],parameters["bias"],x_test)
    y_prediction_train = predict(parameters["weight"],parameters["bias"],x_train)

    # Print test Errors
    print("train accuracy: {} %".format(100 - np.mean(np.abs(y_prediction_train - y_train)) * 100))
    print("test accuracy: {} %".format(100 - np.mean(np.abs(y_prediction_test - y_test)) * 100))
    
logistic_regression(x_train, y_train, x_test, y_test,learning_rate = 0.2, num_iterations = 15000)

---

<a id = '14'></a>
<h2><center>Logistic Regression with Sklearn</center></h2>

---

In [None]:
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
lr.fit(x_train.T,y_train.T)
xx = print("train accuracy {} %".format(lr.score(x_train.T,y_train.T)*100))
yy = print("test accuracy {} %".format(lr.score(x_test.T,y_test.T)*100))



---

|   | <center><h2>With Math </h2></center>|<center><h2> With Sklearn </h2></center>|
|------------|-------------|--------------|
|<h3>Train Accuracy</h3>|<center><h3>96.882399368 %</h3></center>|<center><h3>96.8034727703 %</h3></center>|
|<h3>Test Accuracy</h3>|<h2><font color='red'>98.107255520 %</font></h2>|<h2><font color='red'>98.264984227 %</font></h2>|


---

## References

https://towardsdatascience.com/an-introduction-to-logistic-regression-8136ad65da2e<br>
https://www.kaggle.com/kanncaa1/deep-learning-tutorial-for-beginners<br>
https://machinethink.net/blog/tensorflow-on-ios/