# Probabilistic Neural Network : Classification
[Iqbal Basyar](https://github.com/underground-11), 2018

*Notebook ini digunakan untuk referensi tugas 1.3 Machine Learning : PNN* <br>
Deksripsi tugas : [link](https://github.com/underground-11/Semester_6/blob/master/Asdos%20Machine%20Learning/Tugas%203%20Machine%20Learning/Tugas1.3.pdf) <br>
Data Train :  [link](https://github.com/underground-11/Semester_6/blob/master/Asdos%20Machine%20Learning/Tugas%203%20Machine%20Learning/data_train_PNN.txt) <br>
Data Test (no label): [link](https://github.com/underground-11/Semester_6/blob/master/Asdos%20Machine%20Learning/Tugas%203%20Machine%20Learning/data_test_PNN.txt) <br>

*Catatan : Install Plotly di mesin anda untuk menggunakan plot dalam notebook ini*




In [1]:
# Library Needs
from IPython.display import Image
import numpy as np
import pandas as pd
import plotly
import plotly.graph_objs as go
import matplotlib
import matplotlib.pyplot as plt
from numpy import linalg as la
import plotly.plotly as ply
from plotly.offline import *
from plotly.graph_objs import *
from plotly import __version__
# print ('Plotly Version : ',__version__) # requires version >= 1.9.0
init_notebook_mode(connected=True) # Set this to True if you want to use plotly Offline

#You can change this credential below if you have pltoly account
plotly.tools.set_credentials_file(username='Underground-11', api_key='jCY5gx5Il7KMN2I03FeH') #My account's credential. Don't worry, it's for public use




## 1. Load data to Dataframe

In [2]:
dataTrain = pd.read_csv('data_train_PNN.txt', delimiter = "\t").round(9)
dataTest = pd.read_csv('data_test_PNN.txt', delimiter = "\t").round(9)

Seperate the data to make it easier to plot (You don't have to to this. But it helps)

In [3]:
class0 = dataTrain.loc[dataTrain['label'] == 0].reset_index(drop = True)
class1 = dataTrain.loc[dataTrain['label'] == 1].reset_index(drop = True)
class2 = dataTrain.loc[dataTrain['label'] == 2].reset_index(drop = True)

## 2. Plot Data Train


In [4]:

trace1 = go.Scatter3d(
    name = 'Class 0',
    x=class0['att1'],
    y=class0['att2'],
    z=class0['att3'],
    mode='markers',
    marker=dict(
        size=12,
        line=dict(
            color='rgba(217, 217, 0, 0.14)',
            width=0.2
        ),
        opacity=0.9
    )
)

trace2 = go.Scatter3d(
    name = 'Class 1',
    x=class1['att1'],
    y=class1['att2'],
    z=class1['att3'],
    mode='markers',
    marker=dict(
        size=12,
        line=dict(
            color='rgba(217, 217, 217, 0.14)',
            width=0.2
        ),
        opacity=0.9
    )
)

trace3 = go.Scatter3d(
    name = 'Class 2',
    x=class2['att1'],
    y=class2['att2'],
    z=class2['att3'],
    mode='markers',
    marker=dict(
        size=12,
        line=dict(
            color='rgba(217, 217, 217, 0.14)',
            width=0.2
        ),
        opacity=0.9
    )
)

data = [trace1, trace2, trace3]

this_layout = go.Layout(
    title = 'Data Train',
#     height = '800' ,
    scene = dict(
        xaxis = dict(
            title='Att 1'),
        yaxis = dict(
            title='Att 2'),
        zaxis = dict(
            title='Att 3'),),
    margin=dict(
        l=0,
        r=0,
        b=0,
        t=25
    )
)

fig = go.Figure(data=data, layout=this_layout)
ply.iplot(fig, filename='Data Train Scatter') #it took some times depend on your internet connection to Plotly
# iplot(fig, filename='Data Train Scatter') #for offline usage (I'm not recommend this)

### PNN Architecture
As we see in the pnn architecture, 
![PNN Architecture](img/PNN.PNG)
*Source : [Probabilistic Neural Netwoork Slide - Telkom University](https://www.google.com) <br>


## Hidden Layer (or Pattern Layer)
#### $g_i$ function 

$$\huge g_i(X; \sigma_i, W_i) = e^{-\frac{1}{2} (\frac{\lVert X - W_i \rVert}{\sigma_i})^2} = e^{-\frac{\lVert  X - W_i  \rVert ^2}{2\sigma_i^2}} $$

Where : <br>
    $\sigma_i$: the smoothing parameter for class $i$<br>
    $W_i$ : The $i$-th dataTrain

In [5]:
def g(x,sigma,w) :
    y = x.sub(w, axis = 0) #data subtraction
    y = la.norm(y.values)
    return np.exp(- y / (2*sigma)**2)
    

So, in general we could write the function for patternLayer as below : 

In [6]:
def patternLayer(data,dataTrain, sigma_dict):
    for i in dataTrain.index:
        this_label = int(dataTrain.loc[i,'label'])
        dataTrain.loc[i,'sigma'] = sigma_dict[this_label]
        dataTrain.loc[i,'g'] = g(data, sigma_dict[this_label], dataTrain.loc[i,'att1':'att3'])
    return dataTrain

# def patternLayer(data,dataTrain, sigma_dict):
#     for index, row in dataTrain.iterrows():
#         this_label = row['label']
#         row['sigma'] = sigma_dict[this_label]
#         row['g']= g(data, sigma_dict[this_label], dataTrain.loc[i,'att1':'att3'])
#     return dataTrain

This is the example use of the pattern layer

In [7]:
z1 = dataTrain.loc[1,'att1':'att3']
z2 = dataTrain.loc[:, 'att1':'label']

sigmas = {0:1, 1:1, 2:1} #assuming that all classse have the sigma, 1
z3 = patternLayer(z1,z2, sigmas)
print(z3)


         att1      att2      att3  label  sigma         g
0    1.026777 -3.279030 -0.883644      2    1.0  0.556049
1    1.628673 -3.215970 -3.151889      2    1.0  1.000000
2    0.923110  0.185698 -3.081089      2    1.0  0.419497
3    1.210612  0.291462 -2.449537      2    1.0  0.406426
4    2.544333  1.333560  2.078647      0    1.0  0.174100
5   -0.505071  1.875051  3.537703      2    1.0  0.114376
6    2.568030  1.993095  1.384366      0    1.0  0.175040
7    1.145914 -3.007590 -1.695142      2    1.0  0.678966
8   -2.642700  2.619429  1.057048      1    1.0  0.123462
9    2.967126  0.940226  2.333582      0    1.0  0.173292
10   1.241284  1.923449  1.571323      0    1.0  0.174171
11  -0.440388  1.470366  3.067793      1    1.0  0.133396
12   3.222067  2.810043  3.331670      0    1.0  0.105563
13   2.378012  1.866489  2.025627      0    1.0  0.161468
14   2.401919 -2.380048 -2.996957      2    1.0  0.750285
15   2.649838 -3.573183 -1.532058      2    1.0  0.614486
16   1.641112 

## Summation Layer
![PNN Architecture](img/PNN.PNG)
*Source : [Probabilistic Neural Netwoork Slide - Telkom University](https://www.google.com) <br>


The summation layer sums up all the g values of the data. But, the summation is applied only for the corresponding class. For example, $f_1$ is the summation for class 1. It sums up all the $g$s values in the data wich the class of the data is 1. <br>

The $f$ function is written as follow : 

Univariate cases:
$$ \large f_i(g, \sigma) = \frac{1}{n\sigma} \sum_{k=1}^{n} g_k $$

Multivariate cases:
$$ \large f_i(g,  \sigma) = 
\frac{1}{(2\pi)^\frac{p}{2}  \sigma^p n} \sum_{k=1}^{n}g_k $$

So we could write the $f_i$ function in general like below:


In [8]:
def f(g, sigma):
    #g is in 'dataframe' format. 
#     return g.sum() / (len(g.index) * sigma)
    return g.sum()['g'] / ((2*np.pi)**(3/2) * sigma**3 * len(g.index) )

We could try this function by the following code 

In [9]:
sigmas = {0:1, 1:1, 2:1} #assuming that all classse have the sigma, 1
z1 = dataTrain.loc[1,'att1':'att3'] #data that we want to test
z2 = dataTrain.loc[:, 'att1':'label'] #dataTrain

plo = patternLayer(z1,z2, sigmas) #Pattern Layer Output
plo_0 = plo.loc[plo['label'] == 0].reset_index(drop = True) #Patter Layer Output, Class 0
plo_1 = plo.loc[plo['label'] == 1].reset_index(drop = True) #Patter Layer Output, Class 0
plo_2 = plo.loc[plo['label'] == 2].reset_index(drop = True) #Patter Layer Output, Class 0

print ('Summation for class 0: ',f(plo_0, sigmas[0]))
print ('Summation for class 1: ',f(plo_1, sigmas[1]))
print ('Summation for class 2: ',f(plo_2, sigmas[2]))

Summation for class 0:  0.0102739804738
Summation for class 1:  0.00929605611774
Summation for class 2:  0.0314150702323


### What is smoothing parameter ($\sigma$)  ??
 $\sigma$ is the smoothing parameter for each classes that responsible for the data's width of pdf curve. To find  $\sigma$ , do the function 
 $$sigma_i(\textbf{data}) = \large \alpha  \frac{\sum_{k=1}^{n}  d_k}{n}   $$

*Notes* : <br>
    data : All the data in dataTrain that have the label of $i$. <br>
    $\alpha$ : a constant that you must observe <br>
    $d_k$ : The nearest distance of the $k$-th data against another data. (You can simply use euclidian distance function)
    
In python you could write as follow :

In [74]:
def f_sigma(data, a):
    idx = data.index
    d = []
    for i in idx:
        currentData  = data.loc[i]
#         print (currentData.shape)
        neighborData = data.drop(data.index[i]) 
        neighborData = neighborData.sub(currentData,axis=1).values #Distance matrix
        norms = [la.norm(j) for j in neighborData ]
        d.append(min(norms)) #adding the minimum distance value
    return (a * np.mean(d))

For example, let's try to find all the sigma for each classes using $\alpha$ = 5. 

In [75]:
a = 5
print ('Sigma for class 0: ', f_sigma(class0, a))
print ('Sigma for class 1: ', f_sigma(class1, a))
print ('Sigma for class 2: ', f_sigma(class2, a))

Sigma for class 0:  3.01123472667
Sigma for class 1:  3.23698550025
Sigma for class 2:  3.88967586818


Now, let's continue to the summation layer

## Summation Layer (cont'd)

As we've seen that the summation layer only sum the corresponding classes of data. In other word, we must do $n$ times of $f_i$ in summation layer. And it will return $n$ values to the output layer.

$$ Summation(data, sigma) = \{A \mid A_i = f_i(data_{class\:i}, sigma_i)\}$$

So, we could generally write summation layer as : 

In [12]:
def summation(data, sigma):
    classes = np.unique(data['label']) #find how many distinct classes 
    f_values = {}
    for i in range (len(classes)):
        class_i = data.loc[data['label'] == classes[i]].reset_index(drop = True)
        sigma_i = sigma[classes[i]]
        f_values.update({f(class_i, sigma_i):classes[i]})
    return f_values

Now, let's try to use it

In [99]:
sigmas = {0:1, 1:1, 2:1} #assumption that all the sigma is 1
z5 = dataTrain[1:2]

for i in z5.index:
    newData = z5.loc[i,'att1':'att3']
    newDataTrain = dataTrain
    plo = patternLayer(newData,newDataTrain, sigmas)
    f_values = summation(plo, sigmas)
    print (f_values)

{0.010273980473821663: 0, 0.0092960561177364096: 1, 0.031415070232304311: 2}


## Output Layer
As we have the f_value , we can easily take the biggest value in f_value list bye the following code : <br>
*Note : previous f_values is {0.010273980473821663: 0, 0.0092960561177364096: 1, 0.031415070232304311: 2}. <br>
The biggest value is 0.031415070232304311, that should return 2*

In [103]:
def outputLayer(f_values):
    maximum = -1
    for key in f_values :
        if maximum < key:
            maximum = key
    return f_values[maximum] 

In [104]:
print (outputLayer(f_values))

2


## Done. 
Now, let's try to wrap it all as one classification function. :)


In [105]:
def classify(data, dataTrain, a):
    
    #Todo: Finding the sigmas 
    sigmas = {}
    classlist = np.unique(dataTrain['label'])
    for i in range (len(classlist)):
        
        class_i = dataTrain.loc[dataTrain['label'] == classlist[i]].reset_index(drop = True)
        class_i = class_i.iloc[:,0:4]
#         print (class_i.shape)
#         print(f_sigma(class_i, a))
        sigmas.update({classlist[i]:f_sigma(class_i, a)})
    
    
    #Pattern Layer
    plo = patternLayer(data,dataTrain, sigmas)
    
    #Summation Layer
    f_values = summation(plo, sigmas)
    #output Layer
    newLabel = outputLayer(f_values)
    return newLabel

In [126]:
for i in dataTest.index:
    newData = dataTest.loc[i,'att1':'att3']
    print (classify(newData,dataTrain,1))

0
0
0
0
0
0
0
1
0
0
1
1
1
1
1
2
1
1
1
1
2
2
2
0
0
2
2
2
2
0


## 3.  $\sigma$ observation

In order to observe the $\sigma$, we must observe the value of $\alpha$.

In [None]:
#Try it yourself. Find a that give the optimum accuracy

## 4. Classification

If you found the best $\alpha$ that gives the optimum accuracy (or minimum error), For example, the $\alpha = 1$, so i try to classify the first 10 dataTrain and see how accurate it is

In [131]:
a = 1 #assume that a = 1
newDataTrain = dataTrain.iloc[:,0:4]
outputData = dataTrain.iloc[1:11,0:4] #taking the first 10 data as output data
print ('Original data')
print (outputData)
print()
outputData = outputData.drop('label', axis=1, inplace=False) 
print ('After label deletion')
print (outputData)
print()

# Classification
for i in outputData.index:
    newData = outputData.loc[i,'att1':'att3']
    outputData.loc[i,'label'] = (classify (newData, newDataTrain,a))  
outputData['label'] = outputData['label'].astype(np.int64) 
print ('After Classification')

print (outputData)


Original data
        att1      att2      att3  label
1   1.628673 -3.215970 -3.151889      2
2   0.923110  0.185698 -3.081089      2
3   1.210612  0.291462 -2.449537      2
4   2.544333  1.333560  2.078647      0
5  -0.505071  1.875051  3.537703      2
6   2.568030  1.993095  1.384366      0
7   1.145914 -3.007590 -1.695142      2
8  -2.642700  2.619429  1.057048      1
9   2.967126  0.940226  2.333582      0
10  1.241284  1.923449  1.571323      0

After label deletion
        att1      att2      att3
1   1.628673 -3.215970 -3.151889
2   0.923110  0.185698 -3.081089
3   1.210612  0.291462 -2.449537
4   2.544333  1.333560  2.078647
5  -0.505071  1.875051  3.537703
6   2.568030  1.993095  1.384366
7   1.145914 -3.007590 -1.695142
8  -2.642700  2.619429  1.057048
9   2.967126  0.940226  2.333582
10  1.241284  1.923449  1.571323

After Classification
        att1      att2      att3  label
1   1.628673 -3.215970 -3.151889      2
2   0.923110  0.185698 -3.081089      2
3   1.210612  0.291

As you can see, it's not 100% accurate. You sould try and find another $\alpha$. 
Thankyou :)