In [3]:
!pip install pandas
!pip install keras

Collecting pandas
[?25l  Downloading https://files.pythonhosted.org/packages/74/24/0cdbf8907e1e3bc5a8da03345c23cbed7044330bb8f73bb12e711a640a00/pandas-0.24.2-cp35-cp35m-manylinux1_x86_64.whl (10.0MB)
[K    100% |████████████████████████████████| 10.0MB 1.3MB/s ta 0:00:01    32% |██████████▍                     | 3.2MB 7.1MB/s eta 0:00:01
Collecting pytz>=2011k (from pandas)
[?25l  Downloading https://files.pythonhosted.org/packages/3d/73/fe30c2daaaa0713420d0382b16fbb761409f532c56bdcc514bf7b6262bb6/pytz-2019.1-py2.py3-none-any.whl (510kB)
[K    100% |████████████████████████████████| 512kB 1.6MB/s ta 0:00:01
Installing collected packages: pytz, pandas
Successfully installed pandas-0.24.2 pytz-2019.1
[33mYou are using pip version 19.0.3, however version 19.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
Collecting keras
[?25l  Downloading https://files.pythonhosted.org/packages/5e/10/aa32dad071ce52b5502266b5c659451cfd6ffcbf14e6c8c4f1

# AdamNet
##### A closer look at targeted dropout

I have attached the model weights, and you can [skip ahead](#pruning) to that part of the notebook.

### Now we will train our neural network

Please note, in the process of training this, I took some time to play around and push the hyperparams around to push the test accuracy to over 90%, and to allow the model to converge with as many weights falling victim to dying ReLU.

In [None]:
from keras.datasets import fashion_mnist
from keras.layers import Input, Flatten
from keras.optimizers import Adam
from keras.layers import InputLayer
from keras.models import Sequential
from keras.layers import Dense

(training_images, training_labels), (test_images, test_labels) = fashion_mnist.load_data()

training_images = training_images / 255.0
test_images = test_images / 255.0

model = Sequential([
        Flatten(),
        Dense(1000, activation="relu"),
        Dense(1000, activation="relu"),
        Dense(500, activation="relu"),
        Dense(200, activation="relu"),
        Dense(10, activation="softmax")
])


model.compile(optimizer = 'Adam', loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])

model.fit(training_images, training_labels, batch_size = 32, epochs = 100)

print("Test Accuracy = ", model.evaluate(test_images, test_labels)[1])

In [None]:
model_json = model.to_json()
with open("AdamNet2.json", "w") as json_file:
    json_file.write(model_json)
    
model.save_weights("AdamNet2.h5")
print("Saved model to disk")

<a id='pruning'></a>
# Pruning AdamNet

Skip to here if you do not want to recook my network.

### Start by loading the saved model

In [4]:
import numpy as np
import pandas as pd
import tensorflow as tf
from keras.optimizers import Adam
from keras.datasets import fashion_mnist
from keras.models import model_from_json

json_file = open('AdamNet2.json', 'r')
loaded_model_json = json_file.read()
json_file.close()

(training_images, training_labels), (test_images, test_labels) = fashion_mnist.load_data()

test_images = test_images / 255.0

df = pd.DataFrame(columns=['Sparcity', "Unit or Weight Pruned", "Test Loss", "Test Accuracy"])

percents = [0, 25, 50, 60, 70, 80, 90, 95, 97, 99]

Using TensorFlow backend.


Downloading data from http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading data from http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading data from http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading data from http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz


Now that the model is loaded, we need to construct our weight and unit pruning tools.

### Weight Pruning

I found this on [some blog somewhere](https://for.ai/blog/targeted-dropout/), and modified my code to match the architecture. 

Side note, the code from the blog was missing some things (where does w come from?)

In [73]:
#from model.get_weights, not layer.get_weights()
def prune_weights(weights, k):
    
    new_weights = []

    for w in weights[:-1]:
        
        w_shape = w.shape
        
        w = np.reshape(w, [-1, w_shape[-1]])
        w = np.abs(w)
    
        kth_percentile = int(w.shape[0] * k / 100.0)
        threshold = np.sort(w, axis =0)[kth_percentile]
                       
        discard_mask = w < threshold[None, :]
        w = (1. - discard_mask) * w 
        w = np.reshape(w, w_shape)
        
        new_weights.append(w)
        
    new_weights.append(weights[-1])
        
    return new_weights

        
for k in percents:
    
    loaded_model = model_from_json(loaded_model_json)
    loaded_model.load_weights("AdamNet2.h5")
    
    new_weights = prune_weights(loaded_model.get_weights(), k)
    loaded_model.set_weights(new_weights)
    
    loaded_model.compile(optimizer = 'Adam', loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])
    
    print(k, loaded_model.evaluate(test_images, test_labels))
        
    df = df.append({
        'Sparcity' : k,
        "Unit or Weight Pruned" : "Weight",
#         'Test Loss' : w_loss,
#         "Test Accuracy" : w_acc
    }, ignore_index=True)
    


0 [14.50628757019043, 0.1]
25 [14.50628757019043, 0.1]
50 [14.50628757019043, 0.1]


KeyboardInterrupt: 


### Unit Pruning

I tried to reimpliment the previous architecture with tf, but couldn't and since tensorflow is nice, but unessecary, I decided to replicate with numpy.

In [45]:
def prune_units(weights, k):
    
    weights_shape = weights.shape
    final_weights = np.zeros(weights_shape)
    total_weights = weights_shape[0] * weights_shape[1]
    
    num_cols = weights_shape[1]
    
    l1 = np.linalg.norm(weights, ord=1, axis = 0)
    
    k_cutoff = np.float32(np.percentile(l1 , k))
    k_arr = np.full((1, num_cols), k_cutoff)
    discard = l1 < k_arr
    
    final_weights = (1. - discard) * weights
    
    percent = round(100. * (total_weights - np.count_nonzero(final_weights)) / float(total_weights))
    if percent != k:
        print("FAILURE: Pruned by ", percent,"%, when k =", k)
    
    return final_weights


for k in percents:
    
    loaded_model = model_from_json(loaded_model_json)
    loaded_model.load_weights("model.h5")

    unit_pruned_model = loaded_model

    for layer in range(1, 5):
                
        weights_and_bias = loaded_model.layers[layer].get_weights()
        bias = weights_and_bias[1]         
        original_weights = weights_and_bias[0]
        total_units = original_weights.shape[1]
        
        new_unit_weights = prune_units(original_weights, k)
        
        unit_pruned_model.layers[layer].set_weights([new_unit_weights, bias])
    
    optimizer = Adam(lr = 0.0003)

    unit_pruned_model.compile(optimizer = optimizer, loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])
        
    u_loss, u_acc = unit_pruned_model.evaluate(test_images, test_labels)
    df = df.append({
        'Sparcity' : k,
        "Unit or Weight Pruned" : "Unit",
        'Test Loss' : u_loss,
        "Test Accuracy" : u_acc
    }, ignore_index=True)
    
df



Unnamed: 0,Sparcity,Unit or Weight Pruned,Test Loss,Test Accuracy
0,0,Weight,0.86999,0.8978
1,25,Weight,14.506286,0.1
2,50,Weight,14.506286,0.1
3,60,Weight,14.506286,0.1
4,70,Weight,14.506286,0.1
5,80,Weight,14.506286,0.1
6,90,Weight,14.506286,0.1
7,95,Weight,14.506286,0.1
8,97,Weight,14.506286,0.1
9,99,Weight,14.210718,0.1


### Modify the Original Model

Now that we have our methods that return pruned weight matrixes, we must implement them and record their performance

### Graph and Analyze the Results

# Questions


# TODO

Find a guide to manipulating weight matrices

Edit dockerfile to execute to this jupyter notebook

Build repository + README.md

edit var names

maybe add an exploration into which type of hyper params cause more or less zeroing

comments!

# Rules

1. Please use TensorFlow (it’s what we use for all of our projects). Consider using colab for access to free GPUs/TPUs.

2. You may use frameworks or libraries as you see fit. If you borrow code please include proper attribution and have​ a ​clear​ ​separation​ ​between​ ​the​ ​code​ ​you borrowed​ ​and​ ​the​ ​code​ ​you​ ​wrote​ ​yourself​.

3. You should keep​ ​your​ ​code​ ​simple​ ​and​ ​focus​ ​on​ ​readability​ of your code. Include any instructions for running and reading your code in a README file. We​ ​value​ ​thoughtfully written ​ ​clean,​ ​and​ ​communicative​ ​code​ ​so other​ ​contributors​ ​can​ ​easily​ ​understand​ ​and​ ​build​ ​on​ ​top​ ​of​ ​it.

4. You may skip any parts of the challenge if you get stuck or don’t have relevant experience. However, we encourage you to learn and demonstrate newly acquired skills. 

5. You should check your solution into GitHub​ or Colab and provide basic instructions on how to reproduce your results. 

6. You are free to spend as little or as much time as you want on this challenge.

7. You are expected to learn something new after you complete the challenge :)


# PROMPT

[Here is the original google doc](https://docs.google.com/document/d/1cW-bP_7hw22Wi5nwWOcmMo7Pp9J04nwif8OiNqXRQ3o/edit)

1. Read the ​Rules​! 

2. Install Tensorflow 

3. Construct a ReLU-activated neural network with four hidden layers with sizes [1000, 1000, 500, 200]. Note: you’ll have a fifth layer for your output logits, which you will have 10 of.

4. Prune away (set to zero) the k% of weights using weight and unit pruning for k in [0, 25, 50, 60, 70, 80, 90, 95, 97, 99]. Remember not to prune the weights leading to the output logits.

5. Create a table or plot showing the percent sparsity (number of weights in your network that are zero) versus percent accuracy with two curves (one for weight pruning and one for unit pruning).

6. Make your code clean and readable. Add comments where needed. 

7. Analyze your results. What interesting insights did you find? Do the curves differ? Why do you think that is/isn’t? 

8. Do you have any hypotheses as to why we are able to delete so much of the network without hurting performance (this is an open research question)?

9. Bonus: See if you can find a way to use your new-found sparsity to speed up the execution of your neural net! Hint: ctrl + f “sparse” in the TF docs, or use unit level sparsity (which deletes entire rows and columns from weight matrices). This can be tricky but is a worthwhile engineering lesson in the optimization of Tensorflow models.


### Answers

#### Interesting insights found, differing curves, potential reasons?


#### Potential reasons for successfully deleting massive amounts of the network
- Relu is mostly 0
- lack of training dropout encourages subnetwork dependance allowing for
- lottery ticket hypothesis

#### How much faster is this? %%timeit