# Keras Exercise

## Predict political party based on votes

As a fun little example, we'll use a public data set of how US congressmen voted on 17 different issues in the year 1984. Let's see if we can figure out their political party based on their votes alone, using a deep neural network!

For those outside the United States, our two main political parties are "Democrat" and "Republican." In modern times they represent progressive and conservative ideologies, respectively.

Politics in 1984 weren't quite as polarized as they are today, but you should still be able to get over 90% accuracy without much trouble.

Since the point of this exercise is implementing neural networks in Keras, I'll help you to load and prepare the data.

Let's start by importing the raw CSV file using Pandas, and make a DataFrame out of it with nice column labels:

In [1]:
import pandas as pd

feature_names =  ['party','handicapped-infants', 'water-project-cost-sharing', 
                    'adoption-of-the-budget-resolution', 'physician-fee-freeze',
                    'el-salvador-aid', 'religious-groups-in-schools',
                    'anti-satellite-test-ban', 'aid-to-nicaraguan-contras',
                    'mx-missle', 'immigration', 'synfuels-corporation-cutback',
                    'education-spending', 'superfund-right-to-sue', 'crime',
                    'duty-free-exports', 'export-administration-act-south-africa']

voting_data = pd.read_csv('house-votes-84.data.txt', na_values=['?'], 
                          names = feature_names)
voting_data.head()

Unnamed: 0,party,handicapped-infants,water-project-cost-sharing,adoption-of-the-budget-resolution,physician-fee-freeze,el-salvador-aid,religious-groups-in-schools,anti-satellite-test-ban,aid-to-nicaraguan-contras,mx-missle,immigration,synfuels-corporation-cutback,education-spending,superfund-right-to-sue,crime,duty-free-exports,export-administration-act-south-africa
0,republican,n,y,n,y,y,y,n,n,n,y,,y,y,y,n,y
1,republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,
2,democrat,,y,y,,y,y,n,n,n,n,y,n,y,y,n,n
3,democrat,n,y,y,n,,y,n,n,n,n,y,n,y,n,n,y
4,democrat,y,y,y,n,y,y,n,n,n,n,y,,y,y,y,y


We can use describe() to get a feel of how the data looks in aggregate:

In [2]:
voting_data.describe()

Unnamed: 0,party,handicapped-infants,water-project-cost-sharing,adoption-of-the-budget-resolution,physician-fee-freeze,el-salvador-aid,religious-groups-in-schools,anti-satellite-test-ban,aid-to-nicaraguan-contras,mx-missle,immigration,synfuels-corporation-cutback,education-spending,superfund-right-to-sue,crime,duty-free-exports,export-administration-act-south-africa
count,435,423,387,424,424,420,424,421,420,413,428,414,404,410,418,407,331
unique,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2
top,democrat,n,y,y,n,y,y,y,y,y,y,n,n,y,y,n,y
freq,267,236,195,253,247,212,272,239,242,207,216,264,233,209,248,233,269


We can see there's some missing data to deal with here; some politicians abstained on some votes, or just weren't present when the vote was taken. We will just drop the rows with missing data to keep it simple, but in practice you'd want to first make sure that doing so didn't introduce any sort of bias into your analysis (if one party abstains more than another, that could be problematic for example.)

In [3]:
voting_data.dropna(inplace=True)
voting_data.describe()

Unnamed: 0,party,handicapped-infants,water-project-cost-sharing,adoption-of-the-budget-resolution,physician-fee-freeze,el-salvador-aid,religious-groups-in-schools,anti-satellite-test-ban,aid-to-nicaraguan-contras,mx-missle,immigration,synfuels-corporation-cutback,education-spending,superfund-right-to-sue,crime,duty-free-exports,export-administration-act-south-africa
count,232,232,232,232,232,232,232,232,232,232,232,232,232,232,232,232,232
unique,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2
top,democrat,n,n,y,n,y,y,y,y,n,y,n,n,y,y,n,y
freq,124,136,125,123,119,128,149,124,119,119,128,152,124,127,149,146,189


Our neural network needs normalized numbers, not strings, to work. So let's replace all the y's and n's with 1's and 0's, and represent the parties as 1's and 0's as well.

In [4]:
voting_data.replace(('y', 'n'), (1, 0), inplace=True)
voting_data.replace(('democrat', 'republican'), (1, 0), inplace=True)

In [5]:
voting_data.head()

Unnamed: 0,party,handicapped-infants,water-project-cost-sharing,adoption-of-the-budget-resolution,physician-fee-freeze,el-salvador-aid,religious-groups-in-schools,anti-satellite-test-ban,aid-to-nicaraguan-contras,mx-missle,immigration,synfuels-corporation-cutback,education-spending,superfund-right-to-sue,crime,duty-free-exports,export-administration-act-south-africa
5,1,0,1,1,0,1,1,0,0,0,0,0,0,1,1,1,1
8,0,0,1,0,1,1,1,0,0,0,0,0,1,1,1,0,1
19,1,1,1,1,0,0,0,1,1,1,0,1,0,0,0,1,1
23,1,1,1,1,0,0,0,1,1,1,0,0,0,0,0,1,1
25,1,1,0,1,0,0,0,1,1,1,1,0,0,0,0,1,1


Finally let's extract the features and labels in the form that Keras will expect:

In [6]:
all_features = voting_data[feature_names].drop('party', axis=1).values
all_classes = voting_data['party'].values

OK, so have a go at it! You'll want to refer back to the slide on using Keras with binary classification - there are only two parties, so this is a binary problem. This also saves us the hassle of representing classes with "one-hot" format like we had to do with MNIST; our output is just a single 0 or 1 value.

Also refer to the scikit_learn integration slide, and use cross_val_score to evaluate your resulting model with 10-fold cross-validation.

Try out your code here:

## My implementation is below

# No peeking!

![title](peek.jpg)

In [15]:
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.models import Sequential
from sklearn.model_selection import cross_val_score

def create_model():
    model = Sequential()
    #16 feature inputs (votes) going into an 32-unit layer 
    model.add(Dense(32, input_dim=16, kernel_initializer='normal', activation='relu'))
    # Another hidden layer of 16 units
    model.add(Dense(16, kernel_initializer='normal', activation='relu'))
    # Output layer with a binary classification (Democrat or Republican political party)
    model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

# Wrap our Keras model in an estimator compatible with scikit_learn
estimator = KerasClassifier(build_fn=create_model, epochs=100, verbose=0)
# Now we can use scikit_learn's cross_val_score to evaluate this model identically to the others
cv_scores = cross_val_score(estimator, all_features, all_classes, cv=10)
cv_scores.mean()

0.9349637687206268

91% without even trying too hard! Did you do better? Maybe more neurons, more layers, or Dropout layers would help even more.

In [1]:
import pandas as pd 
import pandas as pd 
import numpy as np
from xlwings import view

In [2]:
dataframe = pd.read_csv('iris.csv')
dataframe.head(2)
# view(dataframe)
dataframe.columns

Index(['150', '4', 'setosa', 'versicolor', 'virginica'], dtype='object')

In [3]:
dataframe.head(2)

Unnamed: 0,150,4,setosa,versicolor,virginica
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0


In [24]:
test = [dataframe[col] for col in dataframe.columns if col in ['setosa', 'versicolor']]

In [25]:
type(test)

list

In [26]:
test

[0      1.4
 1      1.4
 2      1.3
 3      1.5
 4      1.4
       ... 
 145    5.2
 146    5.0
 147    5.2
 148    5.4
 149    5.1
 Name: setosa, Length: 150, dtype: float64,
 0      0.2
 1      0.2
 2      0.2
 3      0.2
 4      0.2
       ... 
 145    2.3
 146    1.9
 147    2.0
 148    2.3
 149    1.8
 Name: versicolor, Length: 150, dtype: float64]

In [21]:
df = pd.DataFrame(test, columns=['col1', 'col2'])

In [23]:
df.shape

(2, 2)

In [None]:
filter_value = 1.4
for col in dataframe.columns:
    if col !=

In [27]:
df = dataframe

In [28]:
filter_value = 1.4
for col in df.columns:
    if col != '150':
        filtered_df = df.loc[df[col] == filter_value]
        if not filtered_df.empty:
            print(f"{col} contains {filter_value}:")
            print(filtered_df)

setosa contains 1.4:
    150    4  setosa  versicolor  virginica
0   5.1  3.5     1.4         0.2          0
1   4.9  3.0     1.4         0.2          0
4   5.0  3.6     1.4         0.2          0
6   4.6  3.4     1.4         0.3          0
8   4.4  2.9     1.4         0.2          0
12  4.8  3.0     1.4         0.1          0
17  5.1  3.5     1.4         0.3          0
28  5.2  3.4     1.4         0.2          0
33  5.5  4.2     1.4         0.2          0
37  4.9  3.6     1.4         0.1          0
45  4.8  3.0     1.4         0.3          0
47  4.6  3.2     1.4         0.2          0
49  5.0  3.3     1.4         0.2          0
versicolor contains 1.4:
     150    4  setosa  versicolor  virginica
50   7.0  3.2     4.7         1.4          1
59   5.2  2.7     3.9         1.4          1
63   6.1  2.9     4.7         1.4          1
65   6.7  3.1     4.4         1.4          1
75   6.6  3.0     4.4         1.4          1
76   6.8  2.8     4.8         1.4          1
91   6.1  3.0     4.6  

In [29]:
df['setosa'] == 1.4

0       True
1       True
2      False
3      False
4       True
       ...  
145    False
146    False
147    False
148    False
149    False
Name: setosa, Length: 150, dtype: bool

In [37]:
filter_value = 1.4
filtered_data_list = []
for col in dataframe.columns:
    filtered_data = dataframe.loc[dataframe[col] == filter_value]
    filtered_data_list.append(filtered_data)

In [30]:
dataframe.head(2)

Unnamed: 0,150,4,setosa,versicolor,virginica
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0


In [34]:
filtered_data.head()

Unnamed: 0,150,4,setosa,versicolor,virginica


In [36]:
final_filtered_data(4)

TypeError: 'list' object is not callable

In [38]:
filtered_data_list

[Empty DataFrame
 Columns: [150, 4, setosa, versicolor, virginica]
 Index: [],
 Empty DataFrame
 Columns: [150, 4, setosa, versicolor, virginica]
 Index: [],
     150    4  setosa  versicolor  virginica
 0   5.1  3.5     1.4         0.2          0
 1   4.9  3.0     1.4         0.2          0
 4   5.0  3.6     1.4         0.2          0
 6   4.6  3.4     1.4         0.3          0
 8   4.4  2.9     1.4         0.2          0
 12  4.8  3.0     1.4         0.1          0
 17  5.1  3.5     1.4         0.3          0
 28  5.2  3.4     1.4         0.2          0
 33  5.5  4.2     1.4         0.2          0
 37  4.9  3.6     1.4         0.1          0
 45  4.8  3.0     1.4         0.3          0
 47  4.6  3.2     1.4         0.2          0
 49  5.0  3.3     1.4         0.2          0,
      150    4  setosa  versicolor  virginica
 50   7.0  3.2     4.7         1.4          1
 59   5.2  2.7     3.9         1.4          1
 63   6.1  2.9     4.7         1.4          1
 65   6.7  3.1     4.4     