In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import sklearn.decomposition
%matplotlib notebook

# Dimensionality Reduction

`house-votes-84.data` contains the voting record of every member of the 1984 House of Representatives. In particular, it contains whether each representative voted yes, no, or abstained on each of 16 different bills. As such, each congressperson is a point in $\mathbb R^{16}$. We can't visualize such high-dimensional data, but if we could we'd expect to see a cluster of Republicans and a cluster of Democrats.

In this notebook, we'll use a method called PCA to reduce the dimensionality from 16 to 3 in order to visualize it.

In [4]:
data = pd.read_csv('house-votes-84.data', header=None).sample(frac=1)
data = data.replace({'y': 1, 'n': 0, '?': .5, 'republican': 0, 'democrat': 1})
features = data.iloc[:,1:].values
labels = data.iloc[:,0].values

In [5]:
pca = sklearn.decomposition.PCA(n_components=3)

In [6]:
new_data = pca.fit_transform(features)

In [15]:
%matplotlib notebook
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(*new_data.T)
# uncomment next line to add color
# ax.scatter(*new_data.T, c=labels, cmap='RdBu')

<IPython.core.display.Javascript object>

<mpl_toolkits.mplot3d.art3d.Path3DCollection at 0x7f3152bc52e8>