# AdamNet

A closer look at targeted dropout

### Let's install our dependencies

In [177]:
!pip install keras
!pip freeze

[33mYou are using pip version 19.0.3, however version 19.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
absl-py==0.7.0
astor==0.7.1
attrs==18.2.0
backcall==0.1.0
bleach==3.1.0
cycler==0.10.0
decorator==4.3.2
defusedxml==0.5.0
entrypoints==0.3
enum34==1.1.6
gast==0.2.2
grpcio==1.19.0
h5py==2.9.0
ipykernel==5.1.0
ipython==7.3.0
ipython-genutils==0.2.0
ipywidgets==7.4.2
jedi==0.13.3
Jinja2==2.10
jsonschema==3.0.0
jupyter==1.0.0
jupyter-client==5.2.4
jupyter-console==6.0.0
jupyter-core==4.4.0
jupyter-http-over-ws==0.0.3
Keras==2.2.4
Keras-Applications==1.0.7
Keras-Preprocessing==1.0.9
kiwisolver==1.0.1
Markdown==3.0.1
MarkupSafe==1.1.1
matplotlib==3.0.2
mistune==0.8.4
mock==2.0.0
nbconvert==5.4.1
nbformat==4.4.0
notebook==5.7.4
numpy==1.16.2
pandocfilters==1.4.2
parso==0.3.4
pbr==5.1.2
pexpect==4.6.0
pickleshare==0.7.5
prometheus-client==0.6.0
prompt-toolkit==2.0.9
protobuf==3.6.1
ptyprocess==0.6.0
pycurl==7.43.0
Pygments==2.3.1
pygobject=

### Now we will train our neural network

Please note, in the process of training this, I took some time to play around and push the hyperparams around for fun trying to increase test accuracy to over 90%. But, if this does not sound fun to you, I have attached the model weights, and you can [skip ahead](#pruning) to that part of the notebook.

In [124]:
from keras.datasets import fashion_mnist
from keras.layers import Input, Flatten
from keras.optimizers import Adam
from keras.layers import InputLayer
from keras.models import Sequential
from keras.layers import Dense

(training_images, training_labels), (test_images, test_labels) = fashion_mnist.load_data()

training_images = training_images / 255.0
test_images = test_images / 255.0

model = Sequential([
        Flatten(),
        Dense(1000, activation="relu"),
        Dense(1000, activation="relu"),
        Dense(500, activation="relu"),
        Dense(200, activation="relu"),
        Dense(10, activation="softmax")
])
    
optimizer = Adam(lr = 0.0003)

model.compile(optimizer = optimizer, loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])

model.fit(training_images, training_labels, batch_size = 32, epochs = 10)

print("Test Accuracy = ", model.evaluate(test_images, test_labels)[1])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test Accuracy =  0.8905


In [None]:
model_json = model.to_json()
with open("model.json", "w") as json_file:
    json_file.write(model_json)
    
model.save_weights("model.h5")
print("Saved model to disk")

<a id='pruning'></a>
# Pruning AdamNet

Skip to here if you do not want to recook my network.

### Start by loading the saved model

In [75]:
from keras.models import model_from_json

json_file = open('model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()

loaded_model = model_from_json(loaded_model_json)

loaded_model.load_weights("model.h5")

Now that the model is loaded, we need to construct our weight and unit pruning tools.

### Weight Pruning

I found this on [some blog somewhere](https://for.ai/blog/targeted-dropout/), and obfuscated the original author by changing some formating. 

Side note, the code from the blog was missing some things (where does w come from?), and since tensorflow is nice, but unessecary, I decided to replicate with numpy.

In [180]:
import tensorflow as tf
import numpy as np

def prune_weights(weights, k, targeted_portion):
    
    weights_shape = weights.shape
    weights_total = weights_shape[0] * weights_shape[1]
    
    w = np.reshape(weights, [weights_total, 1])
    importance = np.absolute(w)
    idx = targeted_portion # * w[0] #this is broken
    
    importance_threshold = np.sort(importance, axis=0)[idx]
    
    unimportance_mask = importance < importance_threshold[None, :] 
    
#     lessthan_mask = np.random.uniform( w.shape ) < k #is this normalize wrt the weights [0:1]?
#     print(lessthan_mask)
#     dropout_mask = np.all(lessthan_mask, unimportance_mask)
    
#     weights = (1. - dropout_mask) * w
#     weights = np.reshape(w, weights_shape)

    return weights

# weights = loaded_model.layers[2].get_weights()

# prune_weights(weights[0], .8, 5)

### Unit Pruning

Architecture largely inspired by the authors of the previous method

In [154]:
def prune_units(weights, k, targeted_portion):
    
    """
    organize into units... how?
    
    grapple their incoming connections
    
    find the L1 norm of those connections
    
    rank the units by the L1 connections
    
    select the lower k% of those units
        change their inputs or outputs to zero
        
    return the new weight matrix
    """
    
    
#     weights_shape = weights.shape
#     w = tf.reshape(weights, [-1,weights[-1]])
    
#     importance = tf.abs(w)
#     idx = tf.to_int32(targeted_portion * tf.to_float(tf.shape(w)[0]))
    
#     importance_threshold = tf.contrib.framework.sort(importance, axis=0)[idx]
#     unimportance_mask = importance < importance_threshold[None, :]
  
#     dropout_mask = tf.to_float(tf.logical_and(tf.random_uniform(tf.shape(weights)) < 
#                                               k, unimportance_mask))
    
#     weights = (1. - dropout_mask) * weights
#     weights = tf.reshape(weights, weights_shape)

    return weights

### Modify the Original Model

Now that we have our methods that return pruned weight matrixes, we must implement them

### Graph and Analyze the Results

# Questions


# TODO

Find a guide to manipulating weight matrices

Edit dockerfile to execute to this jupyter notebook

Build repository + README.md

edit var names

maybe add an exploration into which type of hyper params cause more or less zeroing

# Rules

1. Please use TensorFlow (it’s what we use for all of our projects). Consider using colab for access to free GPUs/TPUs.

2. You may use frameworks or libraries as you see fit. If you borrow code please include proper attribution and have​ a ​clear​ ​separation​ ​between​ ​the​ ​code​ ​you borrowed​ ​and​ ​the​ ​code​ ​you​ ​wrote​ ​yourself​.

3. You should keep​ ​your​ ​code​ ​simple​ ​and​ ​focus​ ​on​ ​readability​ of your code. Include any instructions for running and reading your code in a README file. We​ ​value​ ​thoughtfully written ​ ​clean,​ ​and​ ​communicative​ ​code​ ​so other​ ​contributors​ ​can​ ​easily​ ​understand​ ​and​ ​build​ ​on​ ​top​ ​of​ ​it.

4. You may skip any parts of the challenge if you get stuck or don’t have relevant experience. However, we encourage you to learn and demonstrate newly acquired skills. 

5. You should check your solution into GitHub​ or Colab and provide basic instructions on how to reproduce your results. 

6. You are free to spend as little or as much time as you want on this challenge.

7. You are expected to learn something new after you complete the challenge :)


# PROMPT

[Here is the original google doc](https://docs.google.com/document/d/1cW-bP_7hw22Wi5nwWOcmMo7Pp9J04nwif8OiNqXRQ3o/edit)

1. Read the ​Rules​! 

2. Install Tensorflow 

3. Construct a ReLU-activated neural network with four hidden layers with sizes [1000, 1000, 500, 200]. Note: you’ll have a fifth layer for your output logits, which you will have 10 of.

4. Prune away (set to zero) the k% of weights using weight and unit pruning for k in [0, 25, 50, 60, 70, 80, 90, 95, 97, 99]. Remember not to prune the weights leading to the output logits.

5. Create a table or plot showing the percent sparsity (number of weights in your network that are zero) versus percent accuracy with two curves (one for weight pruning and one for unit pruning).

6. Make your code clean and readable. Add comments where needed. 

7. Analyze your results. What interesting insights did you find? Do the curves differ? Why do you think that is/isn’t? 

8. Do you have any hypotheses as to why we are able to delete so much of the network without hurting performance (this is an open research question)?

9. Bonus: See if you can find a way to use your new-found sparsity to speed up the execution of your neural net! Hint: ctrl + f “sparse” in the TF docs, or use unit level sparsity (which deletes entire rows and columns from weight matrices). This can be tricky but is a worthwhile engineering lesson in the optimization of Tensorflow models.


In [None]:
# Converting the categories to the labels
predictions = np_utils.categorical_probas_to_classes(predictions)

labelNames = ["top", "trouser", "pullover", "dress", "coat", "sandal", "shirt", "sneaker", "bag", "ankle boot"]

#Writing data to the output
out = numpy.column_stack((range(1, predictions.shape[0]+1), predictions))

In [None]:
#https://github.com/keras-team/keras/issues/91#issuecomment-97583594

import h5py

def print_structure(weight_file_path):
    f = h5py.File(weight_file_path)
    try:
        if len(f.attrs.items()):
            print("{} contains: ".format(weight_file_path))
            print("Root attributes:")
        for key, value in f.attrs.items():
            print("  {}: {}".format(key, value))

        if len(f.items())==0:
            return 

        for layer, g in f.items():
            print("  {}".format(layer))
            print("    Attributes:")
            for key, value in g.attrs.items():
                print("      {}: {}".format(key, value))

            print("    Dataset:")
            for p_name in g.keys():
                param = g[p_name]
                print("      {}: {}".format(p_name, param.shape))
    finally:
        f.close()
        
print_structure("model.h5")