### Congressional Voting ANN

Korišćen dataset [Congressional Voting Records](https://archive.ics.uci.edu/ml/datasets/Congressional+Voting+Records) sadrži 435 instanci i 17 atributa. Instance se dele na dve klase, odnosno partije u američkom kongresu: Republikance i Demokrate. Atributi koje sadrži dataset predstavljaju različita pitanja ili predloge koji su dati pred kongres na usvajanje.

Citation: Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf

In [2]:
dataset = pd.read_csv('house-votes-84.data')
dataset

Unnamed: 0,class,handicapped-infants,water-project-cost-sharing,adoption-of-the-budget-resolution,physician-fee-freeze,el-salvador-aid,religious-groups-in-schools,anti-satellite-test-ban,aid-to-nicaraguan-contras,mx-missile,immigration,synfuels-corporation-cutback,education-spending,superfund-right-to-sue,crime,duty-free-exports,export-administration-act-south-africa
0,republican,n,y,n,y,y,y,n,n,n,y,?,y,y,y,n,y
1,republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,?
2,democrat,?,y,y,?,y,y,n,n,n,n,y,n,y,y,n,n
3,democrat,n,y,y,n,?,y,n,n,n,n,y,n,y,n,n,y
4,democrat,y,y,y,n,y,y,n,n,n,n,y,?,y,y,y,y
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
430,republican,n,n,y,y,y,y,n,n,y,y,n,y,y,y,n,y
431,democrat,n,n,y,n,n,n,y,y,y,y,n,n,n,n,n,y
432,republican,n,?,n,y,y,y,n,n,n,n,y,y,y,y,n,y
433,republican,n,n,n,y,y,y,?,?,?,?,n,y,y,y,n,y


Zbog nejednake zastupljenosti republikanaca i demokrata vrši se up-sampling metoda nad datasetom.

In [3]:
r = dataset[dataset['class'] == 'republican'] 
d = dataset[dataset['class'] == 'democrat']
non_normalized_dataset = dataset
from sklearn.utils import resample
rup = resample(r, replace=True,  n_samples=267, random_state=10)
dataset = pd.concat([rup, d])
dataset

Unnamed: 0,class,handicapped-infants,water-project-cost-sharing,adoption-of-the-budget-resolution,physician-fee-freeze,el-salvador-aid,religious-groups-in-schools,anti-satellite-test-ban,aid-to-nicaraguan-contras,mx-missile,immigration,synfuels-corporation-cutback,education-spending,superfund-right-to-sue,crime,duty-free-exports,export-administration-act-south-africa
28,republican,y,n,n,y,y,n,y,y,y,n,n,y,y,y,n,y
327,republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,y
38,republican,n,y,n,y,y,y,n,n,n,y,n,y,y,y,n,n
158,republican,n,y,n,y,y,y,n,n,n,y,n,y,y,y,n,n
300,republican,n,n,n,y,y,n,y,y,y,y,n,y,y,y,n,y
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
425,democrat,n,n,y,n,n,n,y,y,n,y,y,n,n,n,y,?
426,democrat,y,n,y,n,n,n,y,y,y,y,n,n,n,n,y,y
428,democrat,?,?,?,n,n,n,y,y,y,y,n,n,y,n,y,y
429,democrat,y,n,y,n,?,n,y,y,y,y,n,y,n,?,y,y


Pošto su glasovi predstavnika partija predstavljeni kao bool vrednosti "y" za "yae" ili "n" za "nay", vrši se izmena u vrednosti 1 i 0 respektivno. Takođe, pošto imamo samo dve partije menjaju se i vrednosti u koloni koja predstavlja klasu, a potom se vrednosti koje nedostaju upotpunjavaju najčešćim vrednostima iz određene kolone.

In [4]:
dataset = dataset.replace(['y', 'n', 'democrat', 'republican'], [1, 0, 1, 0])

X = dataset.iloc[:, 1:16].values
y = dataset.iloc[:, 0].values

from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values = '?', strategy = 'most_frequent')
imputer.fit(X)
X = imputer.transform(X)

Neuronska mreža kreira se korišćenjem tensorflow i keras biblioteke za treniranje deep learning modela.

Pošto je problem klasifikacioni, klasifikator (classifier) kreiramo kao model sekvenci, a njemu možemo dodavati više nivoa sa različitim funkcijama. Kreiraju se jedan ulazni, jedan sakriveni među-nivo i jedan izlazni nivo na kraju.

In [5]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size = 0.25, random_state = 0)

import keras
from keras.models import Sequential
from keras.layers import Dense

Using TensorFlow backend.


In [6]:
# Inicijalizacija ANN
classifier = Sequential() 

# Input layer
classifier.add(Dense(output_dim = 8, init = 'uniform', activation = 'relu', input_dim = 15))

# Hidden layer
classifier.add(Dense(output_dim = 8, init = 'uniform', activation = 'relu'))

# Output layer 
classifier.add(Dense(output_dim = 1, init = 'uniform', activation = 'sigmoid'))

  """
  
  # This is added back by InteractiveShellApp.init_path()


Za optimizaciju se mogu koristiti Stochastic Gradient Descent, Adam i modifikacije SGD-a uz Momentum i Nesterov optimizator.

In [7]:
# Compiling the ANN
opt = 'adam'
opt1 = tf.keras.optimizers.SGD(lr=0.1)
opt2 = tf.keras.optimizers.SGD(momentum=0.1)
opt3 = tf.keras.optimizers.SGD(nesterov = True)

classifier.compile(optimizer = opt, loss='binary_crossentropy', metrics = ['accuracy'])

Za Adam optimizator i aktivacionu funkciju relu dobija se preciznost od 93,28% 

Za Adam optimizator i aktivacionu funkciju tanh dobija se preciznost od 91,78%

Za SGD optimizator i aktivacionu funkciju relu dobija se preciznost od 94,77%

Isto se dobija i za SGD uz momentum=0.01, a SGD uz nesterov=true dobija se 94,02%

In [8]:
classifier.fit(X_train, y_train, batch_size = 10, epochs = 100)
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Klasifikator odnosno model se treniraju za batch size 10 i 100 epoha, nakon čega se vrši predikcija.

In [9]:
y_pred

array([[False],
       [ True],
       [ True],
       [ True],
       [ True],
       [False],
       [ True],
       [False],
       [ True],
       [False],
       [ True],
       [False],
       [ True],
       [ True],
       [False],
       [ True],
       [ True],
       [ True],
       [ True],
       [False],
       [False],
       [ True],
       [False],
       [ True],
       [False],
       [ True],
       [False],
       [ True],
       [False],
       [False],
       [False],
       [False],
       [False],
       [ True],
       [False],
       [False],
       [ True],
       [False],
       [False],
       [ True],
       [False],
       [ True],
       [False],
       [False],
       [ True],
       [ True],
       [ True],
       [False],
       [ True],
       [ True],
       [ True],
       [False],
       [False],
       [ True],
       [ True],
       [False],
       [False],
       [ True],
       [False],
       [ True],
       [ True],
       [ True],
       [

Matrica konfuzije i preciznost modela veštačke neuronske mreže izračunati su u nastavku.

In [10]:
from sklearn.metrics import confusion_matrix
mat = confusion_matrix(y_test, y_pred)

from sklearn.metrics import accuracy_score
ac=accuracy_score(y_test, y_pred)

print(mat, ac)

[[57  6]
 [ 3 68]] 0.9328358208955224
