# Keras Exercise

## Predict political party based on votes

As a fun little example, we'll use a public data set of how US congressmen voted on 17 different issues in the year 1984. Let's see if we can figure out their political party based on their votes alone, using a deep neural network!

For those outside the United States, our two main political parties are "Democrat" and "Republican." In modern times they represent progressive and conservative ideologies, respectively.

Politics in 1984 weren't quite as polarized as they are today, but you should still be able to get over 90% accuracy without much trouble.

Since the point of this exercise is implementing neural networks in Keras, I'll help you to load and prepare the data.

Let's start by importing the raw CSV file using Pandas, and make a DataFrame out of it with nice column labels:

In [1]:
import pandas as pd

feature_names =  ['party','handicapped-infants', 'water-project-cost-sharing', 
                    'adoption-of-the-budget-resolution', 'physician-fee-freeze',
                    'el-salvador-aid', 'religious-groups-in-schools',
                    'anti-satellite-test-ban', 'aid-to-nicaraguan-contras',
                    'mx-missle', 'immigration', 'synfuels-corporation-cutback',
                    'education-spending', 'superfund-right-to-sue', 'crime',
                    'duty-free-exports', 'export-administration-act-south-africa']

voting_data = pd.read_csv('house-votes-84.data.txt', na_values=['?'], 
                          names = feature_names)
voting_data.head()

Unnamed: 0,party,handicapped-infants,water-project-cost-sharing,adoption-of-the-budget-resolution,physician-fee-freeze,el-salvador-aid,religious-groups-in-schools,anti-satellite-test-ban,aid-to-nicaraguan-contras,mx-missle,immigration,synfuels-corporation-cutback,education-spending,superfund-right-to-sue,crime,duty-free-exports,export-administration-act-south-africa
0,republican,n,y,n,y,y,y,n,n,n,y,,y,y,y,n,y
1,republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,
2,democrat,,y,y,,y,y,n,n,n,n,y,n,y,y,n,n
3,democrat,n,y,y,n,,y,n,n,n,n,y,n,y,n,n,y
4,democrat,y,y,y,n,y,y,n,n,n,n,y,,y,y,y,y


We can use describe() to get a feel of how the data looks in aggregate:

In [2]:
voting_data.describe()

Unnamed: 0,party,handicapped-infants,water-project-cost-sharing,adoption-of-the-budget-resolution,physician-fee-freeze,el-salvador-aid,religious-groups-in-schools,anti-satellite-test-ban,aid-to-nicaraguan-contras,mx-missle,immigration,synfuels-corporation-cutback,education-spending,superfund-right-to-sue,crime,duty-free-exports,export-administration-act-south-africa
count,435,423,387,424,424,420,424,421,420,413,428,414,404,410,418,407,331
unique,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2
top,democrat,n,y,y,n,y,y,y,y,y,y,n,n,y,y,n,y
freq,267,236,195,253,247,212,272,239,242,207,216,264,233,209,248,233,269


We can see there's some missing data to deal with here; some politicians abstained on some votes, or just weren't present when the vote was taken. We will just drop the rows with missing data to keep it simple, but in practice you'd want to first make sure that doing so didn't introduce any sort of bias into your analysis (if one party abstains more than another, that could be problematic for example.)

In [3]:
voting_data.dropna(inplace=True)
voting_data.describe()

Unnamed: 0,party,handicapped-infants,water-project-cost-sharing,adoption-of-the-budget-resolution,physician-fee-freeze,el-salvador-aid,religious-groups-in-schools,anti-satellite-test-ban,aid-to-nicaraguan-contras,mx-missle,immigration,synfuels-corporation-cutback,education-spending,superfund-right-to-sue,crime,duty-free-exports,export-administration-act-south-africa
count,232,232,232,232,232,232,232,232,232,232,232,232,232,232,232,232,232
unique,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2
top,democrat,n,n,y,n,y,y,y,y,n,y,n,n,y,y,n,y
freq,124,136,125,123,119,128,149,124,119,119,128,152,124,127,149,146,189


Our neural network needs normalized numbers, not strings, to work. So let's replace all the y's and n's with 1's and 0's, and represent the parties as 1's and 0's as well.

In [4]:
voting_data.replace(('y', 'n'), (1, 0), inplace=True)
voting_data.replace(('democrat', 'republican'), (1, 0), inplace=True)

In [27]:
voting_data.head()

Unnamed: 0,party,handicapped-infants,water-project-cost-sharing,adoption-of-the-budget-resolution,physician-fee-freeze,el-salvador-aid,religious-groups-in-schools,anti-satellite-test-ban,aid-to-nicaraguan-contras,mx-missle,immigration,synfuels-corporation-cutback,education-spending,superfund-right-to-sue,crime,duty-free-exports,export-administration-act-south-africa
5,1,0,1,1,0,1,1,0,0,0,0,0,0,1,1,1,1
8,0,0,1,0,1,1,1,0,0,0,0,0,1,1,1,0,1
19,1,1,1,1,0,0,0,1,1,1,0,1,0,0,0,1,1
23,1,1,1,1,0,0,0,1,1,1,0,0,0,0,0,1,1
25,1,1,0,1,0,0,0,1,1,1,1,0,0,0,0,1,1


Finally let's extract the features and labels in the form that Keras will expect:

In [25]:
all_features = voting_data[feature_names].drop('party', axis=1).values
all_classes = voting_data['party'].values

all_features

(232, 17)

OK, so have a go at it! You'll want to refer back to the slide on using Keras with binary classification - there are only two parties, so this is a binary problem. This also saves us the hassle of representing classes with "one-hot" format like we had to do with MNIST; our output is just a single 0 or 1 value.

Also refer to the scikit_learn integration slide, and use cross_val_score to evaluate your resulting model with 10-fold cross-validation.

**If you're using tensorflow-gpu on a Windows machine** by the way, you probably *do* want to peek a little bit at my solution - if you run into memory allocation errors, there's a workaround there you can use.

Try out your code here:

In [23]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

In [29]:
# separate to train/test
mask = np.random.rand(len(all_features)) < 0.8
X_train, X_test = all_features[mask], all_features[~mask]
y_train, y_test = all_classes[mask], all_classes[~mask]

# build model
model = Sequential()
model.add(Dense(64, input_dim=16, activation="relu"))
model.add(Dropout(0.5))
model.add(Dense(64, activation="relu"))
model.add(Dropout(0.5))
model.add(Dense(1, activation="sigmoid"))
model.compile(loss="binary_crossentropy", optimizer="rmsprop", metrics=["accuracy"])

model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_3 (Dense)              (None, 64)                1088      
_________________________________________________________________
dropout_2 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_4 (Dense)              (None, 64)                4160      
_________________________________________________________________
dropout_3 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_5 (Dense)              (None, 1)                 65        
Total params: 5,313
Trainable params: 5,313
Non-trainable params: 0
_________________________________________________________________


In [30]:
# fit
history = model.fit(X_train, y_train,
                    batch_size=20,
                    epochs=10,
                    verbose=2,
                    validation_data=(X_test, y_test))

Train on 187 samples, validate on 45 samples
Epoch 1/10
187/187 - 4s - loss: 0.6480 - accuracy: 0.5882 - val_loss: 0.5447 - val_accuracy: 0.7778
Epoch 2/10
187/187 - 0s - loss: 0.5829 - accuracy: 0.7273 - val_loss: 0.4652 - val_accuracy: 0.8889
Epoch 3/10
187/187 - 0s - loss: 0.5055 - accuracy: 0.7594 - val_loss: 0.4008 - val_accuracy: 0.9333
Epoch 4/10
187/187 - 0s - loss: 0.4808 - accuracy: 0.8235 - val_loss: 0.3496 - val_accuracy: 0.9333
Epoch 5/10
187/187 - 0s - loss: 0.4081 - accuracy: 0.8770 - val_loss: 0.3129 - val_accuracy: 0.9333
Epoch 6/10
187/187 - 0s - loss: 0.4062 - accuracy: 0.8610 - val_loss: 0.2771 - val_accuracy: 0.9333
Epoch 7/10
187/187 - 0s - loss: 0.3713 - accuracy: 0.8556 - val_loss: 0.2498 - val_accuracy: 0.9333
Epoch 8/10
187/187 - 0s - loss: 0.3347 - accuracy: 0.8717 - val_loss: 0.2251 - val_accuracy: 0.9333
Epoch 9/10
187/187 - 0s - loss: 0.3222 - accuracy: 0.8984 - val_loss: 0.2033 - val_accuracy: 0.9333
Epoch 10/10
187/187 - 0s - loss: 0.3459 - accuracy: 0.8

In [31]:
# evaluate
score = model.evaluate(X_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 0.1924165040254593
Test accuracy: 0.93333334


In [38]:
# using sklearn cross validation
from sklearn.model_selection import cross_val_score
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

def create_model():
    model = Sequential()
    model.add(Dense(64, input_dim=16, activation="relu"))
    model.add(Dropout(0.5))
    model.add(Dense(64, activation="relu"))
    model.add(Dropout(0.5))
    model.add(Dense(1, activation="sigmoid"))
    model.compile(loss="binary_crossentropy", optimizer="rmsprop", metrics=["accuracy"])
    return model

estimator = KerasClassifier(build_fn=create_model, epochs=100, verbose=2)
cv_scores = cross_val_score(estimator, all_features, all_classes, cv=10)
print(cv_scores.mean())

Train on 208 samples
Epoch 1/100
208/208 - 1s - loss: 0.6851 - accuracy: 0.5721
Epoch 2/100
208/208 - 0s - loss: 0.5622 - accuracy: 0.7308
Epoch 3/100
208/208 - 0s - loss: 0.4604 - accuracy: 0.8365
Epoch 4/100
208/208 - 0s - loss: 0.4499 - accuracy: 0.8558
Epoch 5/100
208/208 - 0s - loss: 0.4082 - accuracy: 0.8606
Epoch 6/100
208/208 - 0s - loss: 0.3599 - accuracy: 0.8894
Epoch 7/100
208/208 - 0s - loss: 0.3322 - accuracy: 0.8558
Epoch 8/100
208/208 - 0s - loss: 0.3095 - accuracy: 0.8894
Epoch 9/100
208/208 - 0s - loss: 0.3149 - accuracy: 0.8702
Epoch 10/100
208/208 - 0s - loss: 0.2769 - accuracy: 0.8990
Epoch 11/100
208/208 - 0s - loss: 0.2640 - accuracy: 0.9327
Epoch 12/100
208/208 - 0s - loss: 0.2206 - accuracy: 0.9423
Epoch 13/100
208/208 - 0s - loss: 0.2697 - accuracy: 0.9038
Epoch 14/100
208/208 - 0s - loss: 0.2501 - accuracy: 0.8990
Epoch 15/100
208/208 - 0s - loss: 0.2036 - accuracy: 0.9231
Epoch 16/100
208/208 - 0s - loss: 0.2432 - accuracy: 0.9135
Epoch 17/100
208/208 - 0s - 

Epoch 37/100
208/208 - 0s - loss: 0.1169 - accuracy: 0.9519
Epoch 38/100
208/208 - 0s - loss: 0.1042 - accuracy: 0.9663
Epoch 39/100
208/208 - 0s - loss: 0.0857 - accuracy: 0.9760
Epoch 40/100
208/208 - 0s - loss: 0.1090 - accuracy: 0.9423
Epoch 41/100
208/208 - 0s - loss: 0.0982 - accuracy: 0.9615
Epoch 42/100
208/208 - 0s - loss: 0.0842 - accuracy: 0.9663
Epoch 43/100
208/208 - 0s - loss: 0.0859 - accuracy: 0.9808
Epoch 44/100
208/208 - 0s - loss: 0.0841 - accuracy: 0.9760
Epoch 45/100
208/208 - 0s - loss: 0.1128 - accuracy: 0.9567
Epoch 46/100
208/208 - 0s - loss: 0.0898 - accuracy: 0.9663
Epoch 47/100
208/208 - 0s - loss: 0.0684 - accuracy: 0.9712
Epoch 48/100
208/208 - 0s - loss: 0.0910 - accuracy: 0.9519
Epoch 49/100
208/208 - 0s - loss: 0.0971 - accuracy: 0.9663
Epoch 50/100
208/208 - 0s - loss: 0.0794 - accuracy: 0.9760
Epoch 51/100
208/208 - 0s - loss: 0.0774 - accuracy: 0.9808
Epoch 52/100
208/208 - 0s - loss: 0.0982 - accuracy: 0.9663
Epoch 53/100
208/208 - 0s - loss: 0.0836

Epoch 73/100
209/209 - 0s - loss: 0.0577 - accuracy: 0.9904
Epoch 74/100
209/209 - 0s - loss: 0.0698 - accuracy: 0.9713
Epoch 75/100
209/209 - 0s - loss: 0.0646 - accuracy: 0.9713
Epoch 76/100
209/209 - 0s - loss: 0.0643 - accuracy: 0.9856
Epoch 77/100
209/209 - 0s - loss: 0.0802 - accuracy: 0.9522
Epoch 78/100
209/209 - 0s - loss: 0.0349 - accuracy: 0.9904
Epoch 79/100
209/209 - 0s - loss: 0.0574 - accuracy: 0.9809
Epoch 80/100
209/209 - 0s - loss: 0.0887 - accuracy: 0.9713
Epoch 81/100
209/209 - 0s - loss: 0.0437 - accuracy: 0.9809
Epoch 82/100
209/209 - 0s - loss: 0.0440 - accuracy: 0.9856
Epoch 83/100
209/209 - 0s - loss: 0.0699 - accuracy: 0.9809
Epoch 84/100
209/209 - 0s - loss: 0.0502 - accuracy: 0.9809
Epoch 85/100
209/209 - 0s - loss: 0.0649 - accuracy: 0.9809
Epoch 86/100
209/209 - 0s - loss: 0.0459 - accuracy: 0.9856
Epoch 87/100
209/209 - 0s - loss: 0.0713 - accuracy: 0.9809
Epoch 88/100
209/209 - 0s - loss: 0.0477 - accuracy: 0.9904
Epoch 89/100
209/209 - 0s - loss: 0.0567

Epoch 8/100
209/209 - 0s - loss: 0.3841 - accuracy: 0.8325
Epoch 9/100
209/209 - 0s - loss: 0.3405 - accuracy: 0.8660
Epoch 10/100
209/209 - 0s - loss: 0.3138 - accuracy: 0.8900
Epoch 11/100
209/209 - 0s - loss: 0.3049 - accuracy: 0.8900
Epoch 12/100
209/209 - 0s - loss: 0.3009 - accuracy: 0.8660
Epoch 13/100
209/209 - 0s - loss: 0.3009 - accuracy: 0.8852
Epoch 14/100
209/209 - 0s - loss: 0.2779 - accuracy: 0.9043
Epoch 15/100
209/209 - 0s - loss: 0.2561 - accuracy: 0.9091
Epoch 16/100
209/209 - 0s - loss: 0.2417 - accuracy: 0.9139
Epoch 17/100
209/209 - 0s - loss: 0.2564 - accuracy: 0.8995
Epoch 18/100
209/209 - 0s - loss: 0.2202 - accuracy: 0.9234
Epoch 19/100
209/209 - 0s - loss: 0.1991 - accuracy: 0.9282
Epoch 20/100
209/209 - 0s - loss: 0.1932 - accuracy: 0.9426
Epoch 21/100
209/209 - 0s - loss: 0.2100 - accuracy: 0.9234
Epoch 22/100
209/209 - 0s - loss: 0.1907 - accuracy: 0.9282
Epoch 23/100
209/209 - 0s - loss: 0.1977 - accuracy: 0.9187
Epoch 24/100
209/209 - 0s - loss: 0.2000 -

Epoch 44/100
209/209 - 0s - loss: 0.0915 - accuracy: 0.9809
Epoch 45/100
209/209 - 0s - loss: 0.1258 - accuracy: 0.9569
Epoch 46/100
209/209 - 0s - loss: 0.1031 - accuracy: 0.9713
Epoch 47/100
209/209 - 0s - loss: 0.0835 - accuracy: 0.9761
Epoch 48/100
209/209 - 0s - loss: 0.0886 - accuracy: 0.9713
Epoch 49/100
209/209 - 0s - loss: 0.1149 - accuracy: 0.9665
Epoch 50/100
209/209 - 0s - loss: 0.0996 - accuracy: 0.9665
Epoch 51/100
209/209 - 0s - loss: 0.0930 - accuracy: 0.9617
Epoch 52/100
209/209 - 0s - loss: 0.0986 - accuracy: 0.9569
Epoch 53/100
209/209 - 0s - loss: 0.0935 - accuracy: 0.9617
Epoch 54/100
209/209 - 0s - loss: 0.0889 - accuracy: 0.9713
Epoch 55/100
209/209 - 0s - loss: 0.0634 - accuracy: 0.9856
Epoch 56/100
209/209 - 0s - loss: 0.0744 - accuracy: 0.9761
Epoch 57/100
209/209 - 0s - loss: 0.0732 - accuracy: 0.9713
Epoch 58/100
209/209 - 0s - loss: 0.1200 - accuracy: 0.9569
Epoch 59/100
209/209 - 0s - loss: 0.0642 - accuracy: 0.9713
Epoch 60/100
209/209 - 0s - loss: 0.0638

Epoch 80/100
209/209 - 0s - loss: 0.0831 - accuracy: 0.9665
Epoch 81/100
209/209 - 0s - loss: 0.0725 - accuracy: 0.9617
Epoch 82/100
209/209 - 0s - loss: 0.0769 - accuracy: 0.9713
Epoch 83/100
209/209 - 0s - loss: 0.0651 - accuracy: 0.9713
Epoch 84/100
209/209 - 0s - loss: 0.0658 - accuracy: 0.9809
Epoch 85/100
209/209 - 0s - loss: 0.0374 - accuracy: 0.9856
Epoch 86/100
209/209 - 0s - loss: 0.0541 - accuracy: 0.9761
Epoch 87/100
209/209 - 0s - loss: 0.0374 - accuracy: 0.9856
Epoch 88/100
209/209 - 0s - loss: 0.0543 - accuracy: 0.9761
Epoch 89/100
209/209 - 0s - loss: 0.0654 - accuracy: 0.9761
Epoch 90/100
209/209 - 0s - loss: 0.0489 - accuracy: 0.9809
Epoch 91/100
209/209 - 0s - loss: 0.0583 - accuracy: 0.9665
Epoch 92/100
209/209 - 0s - loss: 0.0598 - accuracy: 0.9809
Epoch 93/100
209/209 - 0s - loss: 0.0475 - accuracy: 0.9809
Epoch 94/100
209/209 - 0s - loss: 0.0606 - accuracy: 0.9713
Epoch 95/100
209/209 - 0s - loss: 0.0593 - accuracy: 0.9713
Epoch 96/100
209/209 - 0s - loss: 0.0708

Epoch 15/100
209/209 - 0s - loss: 0.2096 - accuracy: 0.9282
Epoch 16/100
209/209 - 0s - loss: 0.2181 - accuracy: 0.9234
Epoch 17/100
209/209 - 0s - loss: 0.1852 - accuracy: 0.9378
Epoch 18/100
209/209 - 0s - loss: 0.1679 - accuracy: 0.9234
Epoch 19/100
209/209 - 0s - loss: 0.1953 - accuracy: 0.9330
Epoch 20/100
209/209 - 0s - loss: 0.1632 - accuracy: 0.9378
Epoch 21/100
209/209 - 0s - loss: 0.1600 - accuracy: 0.9330
Epoch 22/100
209/209 - 0s - loss: 0.1667 - accuracy: 0.9330
Epoch 23/100
209/209 - 0s - loss: 0.1542 - accuracy: 0.9330
Epoch 24/100
209/209 - 0s - loss: 0.1699 - accuracy: 0.9378
Epoch 25/100
209/209 - 0s - loss: 0.1457 - accuracy: 0.9474
Epoch 26/100
209/209 - 0s - loss: 0.1584 - accuracy: 0.9522
Epoch 27/100
209/209 - 0s - loss: 0.1331 - accuracy: 0.9426
Epoch 28/100
209/209 - 0s - loss: 0.1568 - accuracy: 0.9378
Epoch 29/100
209/209 - 0s - loss: 0.1344 - accuracy: 0.9474
Epoch 30/100
209/209 - 0s - loss: 0.1332 - accuracy: 0.9522
Epoch 31/100
209/209 - 0s - loss: 0.1244

Epoch 51/100
209/209 - 0s - loss: 0.0877 - accuracy: 0.9761
Epoch 52/100
209/209 - 0s - loss: 0.0720 - accuracy: 0.9665
Epoch 53/100
209/209 - 0s - loss: 0.0720 - accuracy: 0.9713
Epoch 54/100
209/209 - 0s - loss: 0.1087 - accuracy: 0.9569
Epoch 55/100
209/209 - 0s - loss: 0.0784 - accuracy: 0.9713
Epoch 56/100
209/209 - 0s - loss: 0.0856 - accuracy: 0.9665
Epoch 57/100
209/209 - 0s - loss: 0.0806 - accuracy: 0.9617
Epoch 58/100
209/209 - 0s - loss: 0.0878 - accuracy: 0.9617
Epoch 59/100
209/209 - 0s - loss: 0.1066 - accuracy: 0.9713
Epoch 60/100
209/209 - 0s - loss: 0.0901 - accuracy: 0.9617
Epoch 61/100
209/209 - 0s - loss: 0.0814 - accuracy: 0.9713
Epoch 62/100
209/209 - 0s - loss: 0.1021 - accuracy: 0.9569
Epoch 63/100
209/209 - 0s - loss: 0.0871 - accuracy: 0.9665
Epoch 64/100
209/209 - 0s - loss: 0.0722 - accuracy: 0.9713
Epoch 65/100
209/209 - 0s - loss: 0.0886 - accuracy: 0.9665
Epoch 66/100
209/209 - 0s - loss: 0.0432 - accuracy: 0.9952
Epoch 67/100
209/209 - 0s - loss: 0.0661

## My implementation is below

# No peeking!

![title](peek.jpg)

In [37]:
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.models import Sequential
from sklearn.model_selection import cross_val_score

def create_model():
    model = Sequential()
    #16 feature inputs (votes) going into an 32-unit layer 
    model.add(Dense(32, input_dim=16, kernel_initializer='normal', activation='relu'))
    # Another hidden layer of 16 units
    model.add(Dense(16, kernel_initializer='normal', activation='relu'))
    # Output layer with a binary classification (Democrat or Republican political party)
    model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

# Wrap our Keras model in an estimator compatible with scikit_learn
estimator = KerasClassifier(build_fn=create_model, epochs=100, verbose=0)
# Now we can use scikit_learn's cross_val_score to evaluate this model identically to the others
cv_scores = cross_val_score(estimator, all_features, all_classes, cv=10)
cv_scores.mean()

0.9438405811786652

94% without even trying too hard! Did you do better? Maybe more neurons, more layers, or Dropout layers would help even more.