0a) Unpack the images into a folder that you can find from a Jupyter notebook. Important: do not look at the images. Part of the 'fun' of this homework is that the outlier images are hidden. This is mandatory fun.

0b) Write a method that converts an image into a graph. There are several steps you'll need to follow:
 * Import the image using skimage.io.imread, and normalize it to have float entries in [0,1] rather than ints [0,255];
 * Increase the contrast of the image. I did this by setting any pixel with value 0< pixel < .5 to 0, and .5 < pixel 1.0 to 1.0. I'd recommend doing the same thing, but you're welcome to try other methods.
 * Skeletonize the image. We discussed this a little in class; you should just use skimage.morphology.skeletonize. This produces an image where each line has been reduced to a single pixel width.
 * Turn the skeletonized image into a graph. Each node of this graph represents a 'live' pixel of the skeleton image, and two pixels are connected if they are adjacent. This includes the pixels that are diagonal from the current pixel. There are many ways to construct this graph; I would suggest using sklearn.neighbors.radius_neigbors_graph, with an appropriate choice of radius.


In [1]:
import os
import skimage.io
from skimage.morphology import skeletonize
from sklearn.neighbors import radius_neighbors_graph
from skimage.color import rgb2gray, rgba2rgb
import matplotlib.pyplot as plt
import networkx as nx
import math
import pandas as pd
import numpy as np

In [2]:
def im_2_graph(im):
    if len(im.shape) == 3:
        # if im.shape[2] == 3:
        #     im = rgb2gray(im)
        # else:
        #     im = rgb2gray(rgba2rgb(im)) # ignore alhpa channel
        im = im[:,:,0]
    
    im = im.astype('float') / 255.0 # normalize
    im = 1 - im
    im[im <= 0.5] = 0.0
    im[im >0.5] = 1.0
    # im = skeletonize(im, method='lee')
    im = skeletonize(im)
    # plt.matshow(im)
    locs_x, locs_y = np.nonzero(im)
    adj = radius_neighbors_graph(np.stack([locs_x, locs_y]).T, math.sqrt(2))
    
    return nx.from_numpy_matrix(adj)

In [3]:
im_2_graph(skimage.io.imread('/Users/maxperozek/CP341/Day6/Fingerprint_data/final_ims/ef2d127d.png'))

<networkx.classes.graph.Graph at 0x7f94475264f0>

In [4]:
im_2_graph(skimage.io.imread('/Users/maxperozek/CP341/Day6/Fingerprint_data/final_ims/677fe64a.png'))

<networkx.classes.graph.Graph at 0x7f94474fc1c0>

In [5]:
labels = pd.read_csv('/Users/maxperozek/CP341/Day6/Fingerprint_data/labels.csv', header=None)

In [6]:
labels[labels[0].isin(['2abaca49'])].iloc()[0][1]

'CLEAN'

In [7]:
# running into an issue with some of the files where there are some RGB/ RGBA images which don't play nice with the operations that
print(skimage.io.imread('/Users/maxperozek/CP341/Day6/Fingerprint_data/final_ims/677fe64a.png').shape)
print(skimage.io.imread('/Users/maxperozek/CP341/Day6/Fingerprint_data/final_ims/ef2d127d.png').shape)

(189, 188)
(190, 188, 4)


In [8]:
rootdir = '/Users/maxperozek/CP341/Day6/Fingerprint_data/final_ims/'
graphs = 20000
fp_graphs = []

i = 0
for file in os.listdir(rootdir):
    if i >= graphs:
        break
    im = skimage.io.imread(rootdir + file)
    # print(file)
    gr = im_2_graph(im)
    label = labels[labels[0].isin([file[:-4]])].iloc()[0][1]
    fp_graphs.append((gr, label))
    i += 1

1a) Write a method that measures the following things about each graph:
 * A histogram of component sizes.
 * A histogram of node degrees.
 * A histogram of lengths of the components which are path graphs.

In [9]:
import torch

  from .autonotebook import tqdm as notebook_tqdm


In [10]:
def graph_2_vec(gr):
    comp_list = [len(c) for c in sorted(nx.connected_components(gr), key=len, reverse=True)]
    comp_hist = np.histogram(comp_list, bins=[0,4,8,12,16,20,24,28,32,5000])
    # print(comp_list)
    # print(comp_hist)
    # print(len(comp_list), comp_hist[0].sum())

    deg_list = [gr.degree[i] for i in range(len(gr.nodes))]
    # print(deg_list)
    deg_hist = np.histogram(deg_list, bins=[0,1,2,3,4,5,5000])
    # print(deg_hist)
    
    path_lengths = []
    for comp in nx.connected_components(gr):
        ind_sg = gr.subgraph(comp)
        length = len(ind_sg.nodes)
        # all nodes should have degree 2 except 2 nodes with degree 1
        if (np.array([deg == 1 for node, deg in ind_sg.degree]).sum() == 2 and 
            np.array([deg == 2 for node, deg in ind_sg.degree]).sum() == length - 2):
            path_lengths.append(len(ind_sg.nodes))
    path_hist = np.histogram(path_lengths, bins=[0,4,8,12,16,20,24,28,32,5000])
    # print(path_hist)
    return np.hstack((comp_hist[0], deg_hist[0], path_hist[0]))

In [11]:
gr_feat_vecs = []
for gr, lab in fp_graphs:
    gr_feat_vecs.append(graph_2_vec(gr))
    
    

In [12]:
label_arr = np.array([x[1] for x in fp_graphs])
bin_label_arr = [0 if label == 'CLEAN' else 1 for label in label_arr]
label_tensor = torch.tensor(bin_label_arr).float()

1b) Train a simple neural network model to predict whether a fingerprint is damaged or not from the features you collected earlier. Try some of the best practices we talked about today for training neural networks. You should set aside a random chunk of your data as a 'test' set, and report the final accuracy on that dataset.

In [13]:
np.arange(len(gr_feat_vecs)).dtype

dtype('int64')

In [14]:
test_idx = np.random.choice(np.arange(len(gr_feat_vecs)), size=int(len(gr_feat_vecs)/10), replace=False)

In [15]:
test = np.take(np.array(gr_feat_vecs), test_idx, 0)
test_labels = np.take(np.array(label_tensor), test_idx, 0)

In [16]:
train = np.take(np.array(gr_feat_vecs), [i for i in range(len(gr_feat_vecs)) if not i in test_idx], 0)
train_labels = np.take(np.array(label_tensor), [i for i in range(len(gr_feat_vecs)) if not i in test_idx], 0)

In [17]:
train = torch.tensor(train).float()
train_labels = torch.tensor(train_labels).float()

test = torch.tensor(test).float()
test_labels = torch.tensor(test_labels).float()

In [18]:
neural_net_classifier = torch.nn.Sequential(
    torch.nn.Linear(24,8),
    torch.nn.ELU(),
    torch.nn.Linear(8,8),
    torch.nn.ELU(),
    torch.nn.Linear(8,16),
    torch.nn.ELU(),
    torch.nn.Linear(16,1)
)

In [19]:
error_function = torch.nn.MSELoss()

In [20]:
optimizer = torch.optim.Adam(neural_net_classifier.parameters(), .001)

In [21]:
data_tensor = torch.tensor(train).float()

  data_tensor = torch.tensor(train).float()


In [None]:
for step in range(1000):
    optimizer.zero_grad()
    predictions = torch.sigmoid(neural_net_classifier(train))
    error = error_function(predictions, train_labels)
    # print(error)
    error.backward()
    optimizer.step()

In [23]:
error

tensor(0.2242, grad_fn=<MseLossBackward0>)

In [25]:
test_predictions = torch.sigmoid(neural_net_classifier(test))

In [26]:
error = error_function(test_predictions, test_labels)

  return F.mse_loss(input, target, reduction=self.reduction)


In [27]:
error

tensor(0.2051, grad_fn=<MseLossBackward0>)

# Problem 2

Fingerprint analysis falls into the field of biometric identification --a field whose ethics are widely debated. While the field of biometric identification is broad and developing quickly with DNA and facial recognition technology, fingerprint analysis was the original means of modern biometric identification and it has become a fixture of crime and detective media in the last century. In the last 2 decades, fingerprints have been used widely outside of law enforcement in applications such as: personal and private security, border control, and health organizations. Additionally, advances in the ability to analyze fingerprints by powerful governments like the Next Generation Identification system (NGI) which is operated by the FBI have significantly increased the number of 'matches' found in searches relating to crimes. In his paper titled: "Biometric Identification, Law and Ethics: The Rise of Biometric Identification: Fingerprints and Applied Ethics" Marcus Smith notes that the scale and heirarchical structure of the organizations which implement large scale biometric analysis can result in a diminished sense of moral responsibility since their role is merely to carry out the instructions of their supperiors. The distributed moral responsibility from these large organizations for biometric analysis makes the responsibility for the ethical implications of fingerprint analysis more ambiguous. One key ethical concern is the idea that fingerprints (as with other biometric identification data) are the property of an individual, and the means by which authorities collect this data may be coercive and may impede on the right of an individual to not self incriminate.