# AdamNet

A closer look at targeted dropout

### Let's install our dependencies

In [1]:
!pip install keras
# !pip freeze
!pip install tensorflow

Collecting keras
[?25l  Downloading https://files.pythonhosted.org/packages/5e/10/aa32dad071ce52b5502266b5c659451cfd6ffcbf14e6c8c4f16c0ff5aaab/Keras-2.2.4-py2.py3-none-any.whl (312kB)
[K    100% |████████████████████████████████| 317kB 8.7MB/s ta 0:00:01
Installing collected packages: keras
[31mCould not install packages due to an EnvironmentError: [Errno 13] Permission denied: '/usr/local/lib64/python3.6/site-packages/Keras-2.2.4.dist-info'
Consider using the `--user` option or check the permissions.
[0m
[33mYou are using pip version 19.0.3, however version 19.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
[33mYou are using pip version 19.0.3, however version 19.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


### Now we will train our neural network

Please note, in the process of training this, I took some time to play around and push the hyperparams around for fun trying to increase test accuracy to over 90%. But, if this does not sound fun to you, I have attached the model weights, and you can [skip ahead](#pruning) to that part of the notebook.

In [124]:
from keras.datasets import fashion_mnist
from keras.layers import Input, Flatten
from keras.optimizers import Adam
from keras.layers import InputLayer
from keras.models import Sequential
from keras.layers import Dense

(training_images, training_labels), (test_images, test_labels) = fashion_mnist.load_data()

training_images = training_images / 255.0
test_images = test_images / 255.0

model = Sequential([
        Flatten(),
        Dense(1000, activation="relu"),
        Dense(1000, activation="relu"),
        Dense(500, activation="relu"),
        Dense(200, activation="relu"),
        Dense(10, activation="softmax")
])
    
optimizer = Adam(lr = 0.0003)

model.compile(optimizer = optimizer, loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])

model.fit(training_images, training_labels, batch_size = 32, epochs = 10)

print("Test Accuracy = ", model.evaluate(test_images, test_labels)[1])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test Accuracy =  0.8905


In [None]:
model_json = model.to_json()
with open("model.json", "w") as json_file:
    json_file.write(model_json)
    
model.save_weights("model.h5")
print("Saved model to disk")

<a id='pruning'></a>
# Pruning AdamNet

Skip to here if you do not want to recook my network.

### Start by loading the saved model

In [2]:
from keras.models import model_from_json

json_file = open('model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()

loaded_model = model_from_json(loaded_model_json)

loaded_model.load_weights("model.h5")

Instructions for updating:
Colocations handled automatically by placer.


Using TensorFlow backend.


Now that the model is loaded, we need to construct our weight and unit pruning tools.

### Weight Pruning

I found this on [some blog somewhere](https://for.ai/blog/targeted-dropout/), and obfuscated the original author by changing some formating ;)

Side note, the code from the blog was missing some things (where does w come from?), and since tensorflow is nice, but unessecary, I decided to replicate with numpy.

In [61]:
import tensorflow as tf
import numpy as np

def prune_weights(weights, k, targeted_unit):
    
    weights_shape = weights.shape
    weights_total = weights_shape[0] * weights_shape[1]
    
    w = weights[:,targeted_unit - 1]
    
    importance = np.absolute(w)
    importance_threshold = np.sort(importance, axis=0)
    unimportance_mask = importance < importance_threshold 
    
#     lessthan_mask = np.random.uniform( w.shape ) < k
    lessthan_mask = tf.random_uniform(tf.shape(w)) < k
    print(lessthan_mask)
#                                                  
    dropout_mask = np.all(np.random.uniform( w.shape ) < k, unimportance_mask) #WOOHOOO DEMORGANS LAW BABY
    
    w = (1. - lessthan_mask) * w #dropout_mask
    
    weights[:, targeted_unit - 1] = w
    return weights


weights = loaded_model.layers[3].get_weights()[0]

new = prune_weights(weights, .8, 5)
new

Tensor("Less_2:0", shape=(1000,), dtype=bool)


TypeError: only integer scalar arrays can be converted to a scalar index


### Unit Pruning

Architecture largely inspired by the authors of the previous method

In [67]:
def prune_units(weights, k):
    
    weights_shape = weights.shape
    final_weights = np.zeros(weights_shape)
    
    num_cols = weights_shape[1]
    
    d = {} #nd init 2(x)num_cols
    
    for i in range(num_cols): #nditter
        
        column = weights[:,i]
        
        l1 = np.linalg.norm(column, ord=1)
        d[i] = l1
        
    np.percentile(d , k) #to take the kth percentile
    
    #create a mask
    
    #if mask then populate keyth column or final_weights with original weights column

    
    """
    organize into units... how?
    
    grapple their incoming connections
    
    find the L1 norm of those connections
    
    rank the units by the L1 connections
    
    select the lower k% of those units
        change their inputs to zero
        
    return the new weight matrix
    """
    
    
#     weights_shape = weights.shape
#     w = tf.reshape(weights, [-1,weights[-1]])
    
#     importance = tf.abs(w)
#     idx = tf.to_int32(targeted_portion * tf.to_float(tf.shape(w)[0]))
    
#     importance_threshold = tf.contrib.framework.sort(importance, axis=0)[idx]
#     unimportance_mask = importance < importance_threshold[None, :]
  
#     dropout_mask = tf.to_float(tf.logical_and(tf.random_uniform(tf.shape(weights)) < 
#                                               k, unimportance_mask))
    
#     weights = (1. - dropout_mask) * weights
#     weights = tf.reshape(weights, weights_shape)

    return weights


weights = loaded_model.layers[3].get_weights()[0]

prune_units(weights , 50)

[(19, 30.1453), (160, 30.48764), (190, 30.670528), (197, 30.691174), (273, 30.759851), (238, 30.882511), (277, 30.885025), (165, 30.89934), (203, 30.919605), (178, 30.922749), (41, 30.965157), (481, 30.99667), (324, 31.006624), (341, 31.00729), (272, 31.017365), (212, 31.070967), (296, 31.076511), (499, 31.100332), (357, 31.112171), (93, 31.149559), (215, 31.159388), (302, 31.165188), (242, 31.18659), (344, 31.205662), (309, 31.224598), (245, 31.235931), (394, 31.252478), (437, 31.259018), (280, 31.306561), (198, 31.323797), (244, 31.331583), (107, 31.354841), (469, 31.375763), (94, 31.376011), (345, 31.384953), (379, 31.394802), (59, 31.423632), (480, 31.457111), (372, 31.471775), (434, 31.485882), (172, 31.48941), (260, 31.491617), (233, 31.509159), (102, 31.514992), (320, 31.527737), (58, 31.549093), (450, 31.557928), (322, 31.573526), (35, 31.57785), (6, 31.584248), (405, 31.584389), (335, 31.584576), (99, 31.585121), (412, 31.588703), (287, 31.611925), (447, 31.628237), (209, 31.6

array([[ 0.0358753 ,  0.00203734, -0.03108707, ...,  0.0472178 ,
         0.05643078, -0.02678137],
       [ 0.0421016 ,  0.05543165, -0.03793364, ..., -0.05707404,
         0.04389518,  0.04346066],
       [-0.04796297,  0.03422289, -0.01008656, ...,  0.01951565,
         0.00379465,  0.03773079],
       ..., 
       [ 0.01552797, -0.03858643,  0.01705509, ...,  0.05192143,
        -0.0167532 , -0.00647251],
       [-0.03163313,  0.01798149, -0.05768332, ..., -0.00999958,
        -0.02496116,  0.03626469],
       [ 0.04924151, -0.01077353,  0.01674547, ..., -0.06326522,
        -0.0086967 , -0.02718662]], dtype=float32)

### Modify the Original Model

Now that we have our methods that return pruned weight matrixes, we must implement them

### Graph and Analyze the Results

In [72]:
 percents = [.0, .25, .50, .60, .70, .80, .90, .95, .97, .99]

for k in percents:
    pass

# Questions


# TODO

Find a guide to manipulating weight matrices

Edit dockerfile to execute to this jupyter notebook

Build repository + README.md

edit var names

maybe add an exploration into which type of hyper params cause more or less zeroing

comments!

# Rules

1. Please use TensorFlow (it’s what we use for all of our projects). Consider using colab for access to free GPUs/TPUs.

2. You may use frameworks or libraries as you see fit. If you borrow code please include proper attribution and have​ a ​clear​ ​separation​ ​between​ ​the​ ​code​ ​you borrowed​ ​and​ ​the​ ​code​ ​you​ ​wrote​ ​yourself​.

3. You should keep​ ​your​ ​code​ ​simple​ ​and​ ​focus​ ​on​ ​readability​ of your code. Include any instructions for running and reading your code in a README file. We​ ​value​ ​thoughtfully written ​ ​clean,​ ​and​ ​communicative​ ​code​ ​so other​ ​contributors​ ​can​ ​easily​ ​understand​ ​and​ ​build​ ​on​ ​top​ ​of​ ​it.

4. You may skip any parts of the challenge if you get stuck or don’t have relevant experience. However, we encourage you to learn and demonstrate newly acquired skills. 

5. You should check your solution into GitHub​ or Colab and provide basic instructions on how to reproduce your results. 

6. You are free to spend as little or as much time as you want on this challenge.

7. You are expected to learn something new after you complete the challenge :)


# PROMPT

[Here is the original google doc](https://docs.google.com/document/d/1cW-bP_7hw22Wi5nwWOcmMo7Pp9J04nwif8OiNqXRQ3o/edit)

1. Read the ​Rules​! 

2. Install Tensorflow 

3. Construct a ReLU-activated neural network with four hidden layers with sizes [1000, 1000, 500, 200]. Note: you’ll have a fifth layer for your output logits, which you will have 10 of.

4. Prune away (set to zero) the k% of weights using weight and unit pruning for k in [0, 25, 50, 60, 70, 80, 90, 95, 97, 99]. Remember not to prune the weights leading to the output logits.

5. Create a table or plot showing the percent sparsity (number of weights in your network that are zero) versus percent accuracy with two curves (one for weight pruning and one for unit pruning).

6. Make your code clean and readable. Add comments where needed. 

7. Analyze your results. What interesting insights did you find? Do the curves differ? Why do you think that is/isn’t? 

8. Do you have any hypotheses as to why we are able to delete so much of the network without hurting performance (this is an open research question)?

9. Bonus: See if you can find a way to use your new-found sparsity to speed up the execution of your neural net! Hint: ctrl + f “sparse” in the TF docs, or use unit level sparsity (which deletes entire rows and columns from weight matrices). This can be tricky but is a worthwhile engineering lesson in the optimization of Tensorflow models.


### Answers

#### Interesting insights found, differing curves, potential reasons?


#### Potential reasons for successfully deleting massive amounts of the network
- Relu is mostly 0
- lack of training dropout encourages subnetwork dependance allowing for
- lottery ticket hypothesis

#### How much faster is this? %%timeit

In [None]:
# Converting the categories to the labels
predictions = np_utils.categorical_probas_to_classes(predictions)

labelNames = ["top", "trouser", "pullover", "dress", "coat", "sandal", "shirt", "sneaker", "bag", "ankle boot"]

#Writing data to the output
out = numpy.column_stack((range(1, predictions.shape[0]+1), predictions))

In [None]:
#https://github.com/keras-team/keras/issues/91#issuecomment-97583594

import h5py

def print_structure(weight_file_path):
    f = h5py.File(weight_file_path)
    try:
        if len(f.attrs.items()):
            print("{} contains: ".format(weight_file_path))
            print("Root attributes:")
        for key, value in f.attrs.items():
            print("  {}: {}".format(key, value))

        if len(f.items())==0:
            return 

        for layer, g in f.items():
            print("  {}".format(layer))
            print("    Attributes:")
            for key, value in g.attrs.items():
                print("      {}: {}".format(key, value))

            print("    Dataset:")
            for p_name in g.keys():
                param = g[p_name]
                print("      {}: {}".format(p_name, param.shape))
    finally:
        f.close()
        
print_structure("model.h5")