# Predicting Political Party Based on Votes

The goal of this project is to predict the political party of the US congressman based on their votes on sixteen different issues in 1984 using a deep neural network

## Data Source
We are going to use the public dataset on how the US congressman voted for sixteen different issues in 1984.

## Exploring Data

Let's start by reading the dataset to a dataframe

In [1]:
# Importing Libararies #
import pandas as pd
import warnings
warnings.filterwarnings("ignore")

# Adding column names to the dataset
col_names =  ['party','handicapped-infants', 'water-project-cost-sharing', 
                    'adoption-of-the-budget-resolution', 'physician-fee-freeze',
                    'el-salvador-aid', 'religious-groups-in-schools',
                    'anti-satellite-test-ban', 'aid-to-nicaraguan-contras',
                    'mx-missle', 'immigration', 'synfuels-corporation-cutback',
                    'education-spending', 'superfund-right-to-sue', 'crime',
                    'duty-free-exports', 'export-administration-act-south-africa']

voting_df = pd.read_csv('house-votes-84.data.txt', na_values=["?"], names=col_names)
voting_df.head()

Unnamed: 0,party,handicapped-infants,water-project-cost-sharing,adoption-of-the-budget-resolution,physician-fee-freeze,el-salvador-aid,religious-groups-in-schools,anti-satellite-test-ban,aid-to-nicaraguan-contras,mx-missle,immigration,synfuels-corporation-cutback,education-spending,superfund-right-to-sue,crime,duty-free-exports,export-administration-act-south-africa
0,republican,n,y,n,y,y,y,n,n,n,y,,y,y,y,n,y
1,republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,
2,democrat,,y,y,,y,y,n,n,n,n,y,n,y,y,n,n
3,democrat,n,y,y,n,,y,n,n,n,n,y,n,y,n,n,y
4,democrat,y,y,y,n,y,y,n,n,n,n,y,,y,y,y,y


## Data Cleaning

We can use the describe function on the dataframe to get more information about the data

In [2]:
voting_df.describe()

Unnamed: 0,party,handicapped-infants,water-project-cost-sharing,adoption-of-the-budget-resolution,physician-fee-freeze,el-salvador-aid,religious-groups-in-schools,anti-satellite-test-ban,aid-to-nicaraguan-contras,mx-missle,immigration,synfuels-corporation-cutback,education-spending,superfund-right-to-sue,crime,duty-free-exports,export-administration-act-south-africa
count,435,423,387,424,424,420,424,421,420,413,428,414,404,410,418,407,331
unique,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2
top,democrat,n,y,y,n,y,y,y,y,y,y,n,n,y,y,n,y
freq,267,236,195,253,247,212,272,239,242,207,216,264,233,209,248,233,269


We can observe that there are some missing data. For example, even though there are 435 people who are associated with a party there are only 387 votes for the water project cost sharing issue. This means that some of the congressman abstained from voting or were not present when the voting took place. Let's drop those rows with missing values to keep the data simple and clean

In [3]:
voting_df.dropna(inplace=True)
voting_df.describe()

Unnamed: 0,party,handicapped-infants,water-project-cost-sharing,adoption-of-the-budget-resolution,physician-fee-freeze,el-salvador-aid,religious-groups-in-schools,anti-satellite-test-ban,aid-to-nicaraguan-contras,mx-missle,immigration,synfuels-corporation-cutback,education-spending,superfund-right-to-sue,crime,duty-free-exports,export-administration-act-south-africa
count,232,232,232,232,232,232,232,232,232,232,232,232,232,232,232,232,232
unique,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2
top,democrat,n,n,y,n,y,y,y,y,n,y,n,n,y,y,n,y
freq,124,136,125,123,119,128,149,124,119,119,128,152,124,127,149,146,189


Our neural network expects numbers, not strings. So let's replace all the 'y' and 'n' to 1 and 0 and also replace the party names with 1 and 0.

In [4]:
voting_df = voting_df.replace(('y','n'), (1,0))
voting_df = voting_df.replace(("democrat", "republican"), (1,0))

In [5]:
voting_df.head()

Unnamed: 0,party,handicapped-infants,water-project-cost-sharing,adoption-of-the-budget-resolution,physician-fee-freeze,el-salvador-aid,religious-groups-in-schools,anti-satellite-test-ban,aid-to-nicaraguan-contras,mx-missle,immigration,synfuels-corporation-cutback,education-spending,superfund-right-to-sue,crime,duty-free-exports,export-administration-act-south-africa
5,1,0,1,1,0,1,1,0,0,0,0,0,0,1,1,1,1
8,0,0,1,0,1,1,1,0,0,0,0,0,1,1,1,0,1
19,1,1,1,1,0,0,0,1,1,1,0,1,0,0,0,1,1
23,1,1,1,1,0,0,0,1,1,1,0,0,0,0,0,1,1
25,1,1,0,1,0,0,0,1,1,1,1,0,0,0,0,1,1


Let's extract the features and labels to feed to our neural network.


In [6]:
features =  voting_df[col_names].drop("party",axis=1).values # To extract feature columns
labels = voting_df["party"].values  # To extract feature arrays
#print(features)
#print(labels)

## Model Training and Evaluation

This is a binary classification problem as there are only two parties : democrat and republican.

In [7]:
# Importing Keras Libraries

from tensorflow.keras.layers import Dense,Dropout
from tensorflow.keras.models import Sequential
from sklearn.model_selection import cross_val_score

def model_creation():
    model  =  Sequential()
    model.add(Dense(32, input_dim=16, kernel_initializer="normal",activation="relu")) # passing 16 input features to a 32 unit layer
    model.add(Dense(16, kernel_initializer='normal', activation='relu')) # Adding another 16 unit hidden layer
    model.add(Dense(1,kernel_initializer="normal",activation="sigmoid")) # Output Layer           
    # Compile the model 
    model.compile(loss="binary_crossentropy", optimizer="adam",metrics=["accuracy"])
    return model


In [8]:
# Let's create a estimator object that is compatible with scikit_learn and then perform cross validation #
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
estimator = KerasClassifier(build_fn=model_creation,epochs =100,verbose=0)
cross_scores = cross_val_score(estimator,features,labels,cv=25)
# Compute the mean of cross validation scores
print(cross_scores.mean())

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
0.9520000028610229


## Conclusion

We have achieved an accuracy of approximately 95% which is a pretty good. We can still do better. The next steps would be to tune the hyperparameters to see if i can get a better accuracy.