_Note_: The three exercises in this tutorial can be done in any order. Decide which interests you the most, and start with that one. You don't have to do all of them.

## Installation

1. If you haven't already installed Python3, get it from [Python.org](https://www.python.org/downloads/)
1. If you haven't already installed Jupyter Notebook, run `python3 -m pip install jupyter`
1. In Terminal, cd to the folder in which you downloaded this file and run `jupyter notebook`. This should open up a page in your web browser that shows all of the files in the current directory, so that you can open this file. You will need to leave this Terminal window up and running and use a different one for the rest of the instructions.
1. If you didn't install keras previously, install it now
    1. Install the tensorflow machine learning library by typing the following into Terminal:
    `pip3 install --upgrade tensorflow`
    1. Install the keras machine learning library by typing the following into Terminal:
    `pip3 install keras`


## Documentation/Sources
* [https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/](https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/) for information on sequence classification with keras
* [https://keras.io/](https://keras.io/) Keras API documentation
* [Keras recurrent tutorial](https://github.com/Vict0rSch/deep_learning/tree/master/keras/recurrent)

## The IMDB Dataset
The [IMDB dataset](https://keras.io/datasets/#imdb-movie-reviews-sentiment-classification) consists of movie reviews (x_train) that have been marked as positive or negative (y_train). See the [Word Vectors Tutorial](https://github.com/jennselby/MachineLearningTutorials/blob/master/WordVectors.ipynb) for more details on the IMDB dataset.

In [2]:
from keras.datasets import imdb
from keras.preprocessing import sequence
import numpy as np

Using TensorFlow backend.
  return f(*args, **kwds)


In [3]:
(imdb_x_train, imdb_y_train), (imdb_x_test, imdb_y_test) = imdb.load_data()

In [4]:
imdb_y_test_reverse = np.subtract(1, imdb_y_test)

For a standard keras model, every input has to be the same length, so we need to set some length after which we will cutoff the rest of the review. (We will also need to pad the shorter reviews with zeros to make them the same length).

## Classification

In [5]:
from keras.models import Sequential
from keras.layers import Embedding, LSTM, Dense

Define our model.

Unlike last time, when we used convolutional layers, we're going to use an LSTM, a special type of recurrent network.

Using recurrent networks means that rather than seeing these reviews as one input happening all at one, with the convolutional layers taking into account which words are next to each other, we are going to see them as a sequence of inputs, with one word occurring at each timestep.

In [6]:
imdb_lstm_model = Sequential()
imdb_lstm_model.add(Embedding(input_dim=len(imdb.get_word_index())+3, output_dim=100, input_length=cutoff))
# return_sequences tells the LSTM to output the full sequence, for use by the next LSTM layer. The final
# LSTM layer should return only the output sequence, for use in the Dense output layer
imdb_lstm_model.add(LSTM(units=32, return_sequences=True))
imdb_lstm_model.add(LSTM(units=32))
imdb_lstm_model.add(Dense(units=1, activation='sigmoid')) # because at the end, we want one yes/no answer
imdb_lstm_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['binary_accuracy'])

NameError: name 'cutoff' is not defined

Train the model. __This takes awhile. You might not want to re-run it.__

In [None]:
imdb_lstm_model.fit(imdb_x_train_padded, imdb_y_train, epochs=1, batch_size=64)

Assess the model. __This takes awhile. You might not want to re-run it.__

In [None]:
imdb_lstm_scores = imdb_lstm_model.evaluate(imdb_x_test_padded, imdb_y_test)
print('loss: {} accuracy: {}'.format(*imdb_lstm_scores))

# Exercise 1

Experiment with different model configurations from the one above. Try other recurrent layers, different numbers of layers, change some of the defaults. See [Keras Recurrent Layers](https://keras.io/layers/recurrent/)

In [6]:
cutoff = 500
out_dim = 80
units = 40
dropout_1 = 0.14
dropout_2 = 0.01
activation_1 = 'selu'
activation_2 = 'relu'

In [7]:
imdb_x_train_padded = sequence.pad_sequences(imdb_x_train, maxlen=cutoff)
imdb_x_test_padded = sequence.pad_sequences(imdb_x_test, maxlen=cutoff)

In [8]:

model = Sequential()
model.add(Embedding(input_dim=len(imdb.get_word_index())+3, output_dim=out_dim, input_length=cutoff))
# return_sequences tells the LSTM to output the full sequence, for use by the next LSTM layer. The final
# LSTM layer should return only the output sequence, for use in the Dense output layer
model.add(LSTM(units=40, activation=activation_1, return_sequences=True))#dropout=dropout_1)
model.add(LSTM(units=40, activation=activation_2)) #dropout=dropout_2))
model.add(Dense(units=1, activation='sigmoid')) # because at the end, we want one yes/no answer
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['binary_accuracy'])

In [184]:
model.fit(imdb_x_train_padded, imdb_y_train, epochs=1, batch_size=64)

Epoch 1/1


<keras.callbacks.History at 0x123e310f0>

In [185]:
scores = model.evaluate(imdb_x_test_padded, imdb_y_test_reverse)
print('loss: {} accuracy: {}'.format(*scores))

loss: nan accuracy: 0.0


In [186]:
scores = model.evaluate(imdb_x_test_padded, imdb_y_test)
print('loss: {} accuracy: {}'.format(*scores))



KeyboardInterrupt: 

In [26]:
model.predict(imdb_x_train_padded)

array([[nan],
       [nan],
       [nan],
       ...,
       [nan],
       [nan],
       [nan]], dtype=float32)

In [23]:
imdb_y_test_reverse, imdb_y_test

(array([1, 0, 0, ..., 1, 1, 1]), array([0, 1, 1, ..., 0, 0, 0]))

## Exploring Simple Recurrent Layers

Before we dive into something as complicated as LSTMs, Let's take a deeper look at simple recurrent layer weights.

In [9]:
import numpy
from keras.layers import SimpleRNN

The neurons in the recurrent layer pass their output to the next layer, but also back to themselves. The input shape says that we'll be passing in one-dimensional inputs of unspecified length (the None is what makes it unspecified).

In [10]:
one_unit_SRNN = Sequential()
one_unit_SRNN.add(SimpleRNN(units=1, input_shape=(None, 1), activation='linear', use_bias=False))

In [11]:
one_unit_SRNN_weights = one_unit_SRNN.get_weights()
one_unit_SRNN_weights

[array([[1.366263]], dtype=float32), array([[1.]], dtype=float32)]

In [12]:
one_unit_SRNN_weights[0][0][0] = 1
one_unit_SRNN_weights[1][0][0] = 1
one_unit_SRNN.set_weights(one_unit_SRNN_weights)
one_unit_SRNN.get_weights()

[array([[1.]], dtype=float32), array([[1.]], dtype=float32)]

This passes in a single sample that has three time steps.

In [13]:
one_unit_SRNN.predict(numpy.array([ [[3], [3], [7]] ]))

array([[13.]], dtype=float32)

# Exercise 2a
Figure out what the two weights in the one_unit_SRNN model control. Be sure to test your hypothesis thoroughly. Use different weights and different inputs.

In [14]:
one_unit_SRNN_weights[0][0][0] = 1
one_unit_SRNN_weights[1][0][0] = 0.1
one_unit_SRNN.set_weights(one_unit_SRNN_weights)
one_unit_SRNN.get_weights()

[array([[1.]], dtype=float32), array([[0.1]], dtype=float32)]

In [15]:
one_unit_SRNN.predict(numpy.array([ [[3], [3], [7]] ]))

array([[7.33]], dtype=float32)

Let's try a slightly larger simple recurrent model.

In [16]:
two_unit_SRNN = Sequential()
two_unit_SRNN.add(SimpleRNN(units=2, input_shape=(None, 1), activation='linear', use_bias=False))

In [17]:
two_unit_SRNN_weights = two_unit_SRNN.get_weights()
two_unit_SRNN_weights

[array([[-0.8075575,  1.1117176]], dtype=float32),
 array([[-0.7666786 ,  0.64203113],
        [ 0.64203113,  0.7666786 ]], dtype=float32)]

In [18]:
two_unit_SRNN_weights[0][0][0] = 1
two_unit_SRNN_weights[0][0][1] = 1
two_unit_SRNN_weights[1][0][0] = 0
two_unit_SRNN_weights[1][0][1] = 1
two_unit_SRNN_weights[1][1][0] = 0
two_unit_SRNN_weights[1][1][1] = 1
two_unit_SRNN.set_weights(two_unit_SRNN_weights)
two_unit_SRNN.get_weights()

[array([[1., 1.]], dtype=float32), array([[0., 1.],
        [0., 1.]], dtype=float32)]

This passes in a single sample with four time steps.

In [19]:
two_unit_SRNN.predict(numpy.array([ [[3], [3], [7], [5]] ]))

array([[ 5., 31.]], dtype=float32)

# Exercise 2b
What do each of the six weights of the two_unit_SRNN control? Again, test out your hypotheses carefully.

In [20]:
two_unit_SRNN_weights[0][0][0] = 1
two_unit_SRNN_weights[0][0][1] = 2
two_unit_SRNN_weights[1][0][0] = 0.1
two_unit_SRNN_weights[1][0][1] = 0
two_unit_SRNN_weights[1][1][0] = 0
two_unit_SRNN_weights[1][1][1] = 0.01
two_unit_SRNN.set_weights(two_unit_SRNN_weights)
two_unit_SRNN.get_weights()

[array([[1., 2.]], dtype=float32), array([[0.1 , 0.  ],
        [0.  , 0.01]], dtype=float32)]

In [21]:
two_unit_SRNN.predict(numpy.array([ [[3], [3], [7], [5]] ]))

array([[ 5.733   , 10.140606]], dtype=float32)

In [22]:
two_unit_SRNN_weights[0][0][0] = 1
two_unit_SRNN_weights[0][0][1] = 0
two_unit_SRNN_weights[1][0][0] = 0.1
two_unit_SRNN_weights[1][0][1] = 1
two_unit_SRNN_weights[1][1][0] = 0.00001
two_unit_SRNN_weights[1][1][1] = 0.0001
two_unit_SRNN.set_weights(two_unit_SRNN_weights)
two_unit_SRNN.get_weights()

[array([[1., 0.]], dtype=float32), array([[1.e-01, 1.e+00],
        [1.e-05, 1.e-04]], dtype=float32)]

In [23]:
two_unit_SRNN.predict(numpy.array([ [[3], [3], [7], [5]] ]))

array([[5.733036, 7.33036 ]], dtype=float32)

the weights are organized by:

weights[0] input

weights[1] last output

weights[1][0] weight to be multiplied by the first element of last output

weights[1][1] weight to be multiplied by the second element of last output

weights[:][:][0] output to first element of output

weights[:][:][1] output to second element of output



## Exploring LSTMs


In [24]:
one_unit_LSTM = Sequential()
one_unit_LSTM.add(LSTM(units=1, input_shape=(None, 1),
                       activation='linear', recurrent_activation='linear',
                       use_bias=False, unit_forget_bias=False,
                       kernel_initializer='zeros',
                       recurrent_initializer='zeros',
                       return_sequences=True))

In [25]:
one_unit_LSTM_weights = one_unit_LSTM.get_weights()
one_unit_LSTM_weights

[array([[0., 0., 0., 0.]], dtype=float32),
 array([[0., 0., 0., 0.]], dtype=float32)]

In [26]:
one_unit_LSTM_weights[0][0][0] = 1
one_unit_LSTM_weights[0][0][1] = 0
one_unit_LSTM_weights[0][0][2] = 1
one_unit_LSTM_weights[0][0][3] = 1
one_unit_LSTM_weights[1][0][0] = 0
one_unit_LSTM_weights[1][0][1] = 0
one_unit_LSTM_weights[1][0][2] = 0
one_unit_LSTM_weights[1][0][3] = 0
one_unit_LSTM.set_weights(one_unit_LSTM_weights)
one_unit_LSTM.get_weights()

[array([[1., 0., 1., 1.]], dtype=float32),
 array([[0., 0., 0., 0.]], dtype=float32)]

In [27]:
one_unit_LSTM.predict(numpy.array([ [[0], [1], [2], [4]] ]))

array([[[ 0.],
        [ 1.],
        [ 8.],
        [64.]]], dtype=float32)

# Exercise 2c
Conceptually, the [LSTM](http://colah.github.io/posts/2015-08-Understanding-LSTMs/) has several _gates_:

* __Forget gate__: these weights allow some long-term memories to be forgotten.
* __Input gate__: these weights decide what new information will be added to the context cell.
* __Output gate__: these weights decide what pieces of the new information and updated context will be passed on to the output.

It also has a __cell__ that can hold onto information from the current input (as well as things it has remembered from previous inputs), so that it can be used in later outputs.

Identify which weights in the one_unit_LSTM model are connected with the context and which are associated with the three gates?

_Note_: The output from the predict call is what the linked explanation calls $h_{t}$.

In [28]:
one_unit_LSTM_weights[0][0][0] = 1
one_unit_LSTM_weights[0][0][1] = 0.1
one_unit_LSTM_weights[0][0][2] = 1
one_unit_LSTM_weights[0][0][3] = 1
one_unit_LSTM_weights[1][0][0] = 0
one_unit_LSTM_weights[1][0][1] = 2
one_unit_LSTM_weights[1][0][2] = 0
one_unit_LSTM_weights[1][0][3] = 0
one_unit_LSTM.set_weights(one_unit_LSTM_weights)
one_unit_LSTM.get_weights()

[array([[1. , 0.1, 1. , 1. ]], dtype=float32),
 array([[0., 2., 0., 0.]], dtype=float32)]

In [29]:
one_unit_LSTM.predict(numpy.array([ [[1], [1], [1], [1]] ]))

array([[[  1.      ],
        [  3.1     ],
        [ 20.529999],
        [846.01465 ]]], dtype=float32)

## the weights are organized by:

weights[0][0][0,2,3] input to memory weights (identical impact for all 3 with these linear activations)

weights[0][0][1] forget weight

weights[0][0][0] input weight

weights[0][0][3] output weight

weights[1] last output

weights[1][0] weight to be multiplied by the first element of last output

weights[1][1] weight to be multiplied by the second element of last output

weights[:][:][0] output to first element of output

weights[:][:][1] output to second element of output



# Exercise 3

Take the model from exercise 1 (imdb_lstm_model) and modify it to classify the [Reuters data](https://keras.io/datasets/#reuters-newswire-topics-classification).

Think about what you are trying to predict in this case, and how you will have to change your model to deal with this.

In [30]:
from keras.datasets import reuters
from keras.utils import np_utils

In [31]:
(reuters_x_train, reuters_y_train), (reuters_x_test, reuters_y_test) = reuters.load_data()

In [32]:
reuters_map = reuters.get_word_index()

In [33]:
# reuters_x_train[0:5], reuters_y_train[0:5]

In [34]:
lengths = [len(review) for review in list(reuters_x_train) + list(reuters_x_test)]
print('Longest review: {} Shortest review: {}'.format(max(lengths), min(lengths)))

cutoff = 220
print('{} reviews out of {} are over {}.'.format(
    sum([1 for length in lengths if length > cutoff]), 
    len(lengths), 
    cutoff))

Longest review: 2376 Shortest review: 2
2032 reviews out of 11228 are over 220.


In [73]:
reuters_x_train_padded = sequence.pad_sequences(reuters_x_train, maxlen=cutoff)
reuters_x_test_padded = sequence.pad_sequences(reuters_x_test, maxlen=cutoff)
reuters_y_train_one_hot = np_utils.to_categorical(reuters_y_train)
reuters_y_test_one_hot = np_utils.to_categorical(reuters_y_test)

In [36]:
reuters_y_test_one_hot

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 1., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [67]:

model = Sequential()
model.add(Embedding(input_dim=len(reuters_map)+3, output_dim=100, input_length=cutoff))
# return_sequences tells the LSTM to output the full sequence, for use by the next LSTM layer. The final
# LSTM layer should return only the output sequence, for use in the Dense output layer
model.add(LSTM(units=32, activation='relu', return_sequences=True))#dropout=dropout_1)
model.add(LSTM(units=32, activation='relu')) #dropout=dropout_2))
model.add(Dense(units=46, activation='sigmoid')) # because at the end, we want one yes/no answer
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['categorical_accuracy'])

In [74]:
model.fit(reuters_x_train_padded, reuters_y_train_one_hot, epochs=1, batch_size=64)

Epoch 1/1


<keras.callbacks.History at 0x127e5b438>

In [37]:
scores = model.evaluate(reuters_x_test_padded, reuters_y_test_one_hot)
print('loss: {} accuracy: {}'.format(*scores))

loss: 3.132006221866353 accuracy: 0.2506678539891383


In [68]:
def _gridsearch(lstm_units=[32], activations=['sigmoid'], cutoffs=[220], output_dims=[100]):
    table = []
    for cutoff in cutoffs:
        reuters_x_train_padded = sequence.pad_sequences(reuters_x_train, maxlen=cutoff)
        reuters_x_test_padded = sequence.pad_sequences(reuters_x_test, maxlen=cutoff)
        for lstm1_units in lstm_units:
            for lstm2_units in lstm_units:
                for lstm1_activation in activations:
                    for lstm2_activation in activations:
                        model = Sequential()
                        model.add(Embedding(input_dim=len(reuters_map)+3, output_dim=100, input_length=cutoff))
                        model.add(LSTM(units=lstm1_units, activation=lstm1_activation, return_sequences=True))#dropout=dropout_1)
                        model.add(LSTM(units=lstm2_units, activation=lstm2_activation)) #dropout=dropout_2))
                        model.add(Dense(units=46, activation='sigmoid')) # because at the end, we want one yes/no answer
                        model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['categorical_accuracy'])
                        model.fit(reuters_x_train_padded, reuters_y_train_one_hot, epochs=1, batch_size=64)
                        scores = model.evaluate(reuters_x_test_padded, reuters_y_test_one_hot)
                        loss, accuracy = scores[0], scores[1]
                        table.append([accuracy, loss, lstm2_activation, lstm1_activation, lstm2_units, lstm1_units, cutoff])
                        print("ACCURACY: {}, LOSS: {}".format(accuracy, loss))
                        print("lstm2_activation: {}, lstm1_activation: {}, lstm2_units: {}, lstm1_units: {}, cutoff: {}".format(lstm2_activation, lstm1_activation, lstm2_units, lstm1_units, cutoff))

In [39]:
baseline = gridsearch()

Epoch 1/1
ACCURACY: 0.36197684778237277, LOSS: 2.524043882944068
lstm2_activation: sigmoid, lstm1_activation: sigmoid, lstm2_units: 32, lstm1_units: 32, cutoff: 220


In [107]:
def make_model(hyperparameters):
    model = Sequential()
    model.add(Embedding(input_dim=hyperparameters['input_dim'], output_dim=hyperparameters['embedding_out_dim'], input_length=hyperparameters['cutoff']))
    model.add(LSTM(units=hyperparameters['lstm1_units'], activation=hyperparameters['lstm1_activation'], return_sequences=True))
    model.add(LSTM(units=hyperparameters['lstm2_units'], activation=hyperparameters['lstm2_activation']))
    model.add(Dense(units=hyperparameters['output_dim'], activation=hyperparameters['final_activation'])) # because at the end, we want one yes/no answer
    model.compile(loss=hyperparameters['loss'], optimizer=hyperparameters['optimizer'], metrics=list(hyperparameters['metrics']))
    return model

def train_model(model, x_train, y_train, hyperparameters):
    model.fit(x_train, y_train, epochs=hyperparameters['epochs'], batch_size=hyperparameters['batch_size'])

def test_model(model, x_test, y_test):
    scores = model.evaluate(x_test, y_test)
    loss, accuracy = scores[0], scores[1]
    return loss, accuracy

In [108]:
possible_units = [32, 64]
possible_activations = ['sigmoid', 'tanh', 'relu']
hyperparameters_possibilities = {
    'input_dim': [len(reuters_map)+3], # TODO: find actual value
    'output_dim': [46],
    'embedding_out_dim': [80, 130],
    'cutoff': [180,280],
    'lstm1_units': possible_units,
    'lstm2_units': possible_units,
    'lstm1_activation': possible_activations,
    'lstm2_activation': possible_activations,
    'final_activation': possible_activations,
    'optimizer': ['adam'],
    'loss': ['categorical_crossentropy'], #probably shouldn't optimize with this
    'metrics': [['categorical_accuracy']],
    'batch_size': [32, 128],
    'epochs': [1]
}
hyperparameters_possibilities = {
    'input_dim': [len(reuters_map)+3], # TODO: find actual value
    'output_dim': [46],
    'embedding_out_dim': [100],
    'cutoff': [220],
    'lstm1_units': possible_units,
    'lstm2_units': [32],
    'lstm1_activation': possible_activations,
    'lstm2_activation': ['sigmoid'],
    'final_activation': ['sigmoid'],
    'optimizer': ['adam'],
    'loss': ['categorical_crossentropy'], #probably shouldn't optimize with this
    'metrics': [['categorical_accuracy']],
    'batch_size': [32],
    'epochs': [1]
}
hyperparameters_possibilities = {
    'input_dim': (len(reuters_map)+3), # TODO: find actual value
    'output_dim': (46),
    'embedding_out_dim': (100),
    'cutoff': (220),
    'lstm1_units': (32, 64),
    'lstm2_units': (32),
    'lstm1_activation': ('sigmoid', 'relu'),
    'lstm2_activation': ('sigmoid'),
    'final_activation': ('sigmoid'),
    'optimizer': ('adam'),
    'loss': ('categorical_crossentropy'), #probably shouldn't optimize with this
    'metrics': (['categorical_accuracy']),
    'batch_size': (32),
    'epochs': (1)
}
hyperparameters_possibilities = {
    'input_dim': [len(reuters_map)+3], # TODO: find actual value
    'output_dim': [46],
    'embedding_out_dim': [100],
    'cutoff': [220],
    'lstm1_units': possible_units,
    'lstm2_units': [32],
    'lstm1_activation': possible_activations,
    'lstm2_activation': ['sigmoid'],
    'final_activation': ['sigmoid'],
    'optimizer': ['adam'],
    'loss': ['categorical_crossentropy'], #probably shouldn't optimize with this
    'metrics': [tuple(['categorical_accuracy'])],
    'batch_size': [32],
    'epochs': [1]
}
hyperparameters_possibilities

{'batch_size': [32],
 'cutoff': [220],
 'embedding_out_dim': [100],
 'epochs': [1],
 'final_activation': ['sigmoid'],
 'input_dim': [30982],
 'loss': ['categorical_crossentropy'],
 'lstm1_activation': ['sigmoid', 'tanh', 'relu'],
 'lstm1_units': [32, 64],
 'lstm2_activation': ['sigmoid'],
 'lstm2_units': [32],
 'metrics': [('categorical_accuracy',)],
 'optimizer': ['adam'],
 'output_dim': [46]}

In [111]:
import itertools
def get_possible_dicts(dicts):
    return [dict(zip(dicts, x)) for x in itertools.product(*dicts.values())]

def gridsearch(hyperparameters_possibilities, x_train, y_train, x_test, y_test):
#     results = {}
    results = {key: {possibility: 0 for possibility in hyperparameters_possibilities[key]} for key in hyperparameters_possibilities}
    listed_possibilities = get_possible_dicts(hyperparameters_possibilities)
    n = len(listed_possibilities)
    for params in listed_possibilities:
        if 'cutoff' in params:
            x_train_padded = sequence.pad_sequences(x_train, maxlen=params['cutoff'])
            x_test_padded = sequence.pad_sequences(x_test, maxlen=params['cutoff'])
        else:
            x_train_padded = x_train
            x_test_padded = x_test
#         print(x_train_padded.shape, params['cutoff'])
        print(list(params['metrics']))
        model = make_model(params)
        train_model(model, x_train_padded, y_train, params)
        loss, accuracy = test_model(model, x_test_padded, y_test)
        for p in params:
            results[p][params[p]] += loss
#         results[params] = [loss, accuracy]
    for key in results:
        for possibility in results[key]:
            results[key][possibility] = results[key][possibility]/n*len(results[key])
    return results

# def sort_results(results):
#     for key in results.keys()[0]:
#         for 
    

In [None]:
tests = gridsearch(hyperparameters_possibilities, reuters_x_train, reuters_y_train_one_hot, reuters_x_test, reuters_y_test_one_hot)

['categorical_accuracy']
Epoch 1/1
['categorical_accuracy']
Epoch 1/1
['categorical_accuracy']
Epoch 1/1
['categorical_accuracy']
Epoch 1/1

In [41]:
def sortby(reward, params):
    rewards = {}
    for i in range(len(params)):
        if params[i] not in rewards:
            rewards[params[i]] = [reward[i]]
        else:
            rewards[params[i]].append(reward[i])
    for param in rewards:
        rewards[param] = sum(rewards[param])/len(rewards[param])
    sort