# Evolve a neural network with a genetic algorithm


Building the perfect deep learning network involves a hefty amount of art to accompany sound science. One way to go about finding the right hyperparameters is through brute force trial and error: Try every combination of sensible parameters, send them to your Spark cluster, go about your daily jive, and come back when you have an answer.

But there’s gotta be a better way!

Here, we try to improve upon the brute force method by applying a genetic algorithm to evolve a network with the goal of achieving optimal hyperparameters in a fraction the time of a brute force search

# How much faster?
Let’s say it takes five minutes to train and evaluate a network on your dataset. And let’s say we have four parameters with five possible settings each. To try them all would take (5**4) * 5 minutes, or 3,125 minutes, or about 52 hours.

Now let’s say we use a genetic algorithm to evolve 10 generations with a population of 20 (more on what this means below), with a plan to keep the top 25% plus a few more, so ~8 per generation. This means that in our first generation we score 20 networks (20 * 5 = 100 minutes). Every generation after that only requires around 12 runs, since we don’t have the score the ones we keep. That’s 100 + (9 generations * 5 minutes * 12 networks) = 640 minutes, or 11 hours.

We’ve just reduced our parameter tuning time by almost 80%! That is, assuming it finds the best parameters…

# How do genetic algorithms work?
At its core, a genetic algorithm…

- Creates a population of (randomly generated) members
- Scores each member of the population based on some goal. This score is called a fitness function.
- Selects and breeds the best members of the population to produce more like them
- Mutates some members randomly to attempt to find even better candidates
- Kills off the rest (survival of the fittest and all), and
- Repeats from step 2. Each iteration through these steps is called a generation.

# Applying genetic algorithms to Neural Networks
We’ll attempt to evolve a GRU based network. Our goal is to find the best parameters for Toxic Comment Classification task.

We’ll tune four parameters:

- Number of layers (or the network depth)
- Neurons per layer (or the network width)
- Dense layer activation function
- Network optimizer

Lets start by Initializing the variables

In [None]:
import random
import logging
from functools import reduce
from operator import add
import random
import logging
from tqdm import tqdm
import sys, os, re, csv, codecs, numpy as np, pandas as pd
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Dense, Input, LSTM, Embedding, Dropout, Activation
from keras.layers import Bidirectional, GlobalMaxPool1D
from keras.models import Model
from keras import initializers, regularizers, constraints, optimizers, layers
from keras.callbacks import EarlyStopping
tqdm.monitor_interval = 0

This will be the class that will save the configuration of the parameters of the model, creates the model, trains on the model, and predicts with the model on the test set. It is also the class that represents the network to be evolved.


In [None]:
class Network():
    """Represent a network and let us operate on it.
    """

    def __init__(self, nn_param_choices=None):
        """Initialize our network.
        Args:
            nn_param_choices (dict): Parameters for the network, includes:
                nb_neurons (list): [64, 128, 256]
                nb_layers (list): [1, 2, 3, 4]
                activation (list): ['relu', 'elu']
                optimizer (list): ['rmsprop', 'adam']
        """
        self.accuracy = 0.
        self.nn_param_choices = nn_param_choices
        self.network = {}  # (dic): represents MLP network parameters
        self.predictions = []

    def create_random(self):
        """Create a random network."""
        for key in self.nn_param_choices:
            self.network[key] = random.choice(self.nn_param_choices[key])

    def create_set(self, network):
        """Set network properties.
        Args:
            network (dict): The network parameters
        """
        self.network = network
    def compile_model(self, network, nb_classes, embedding_matrix):
        """Compile a sequential model.
        Args:
            network (dict): the parameters of the network
        Returns:
            a compiled network.
        """
        # Get our network parameters.
        nb_layers = network['nb_layers']
        nb_neurons = network['nb_neurons']
        activation = network['activation']
        optimizer = network['optimizer']
        embed_size = 50 # how big is each word vector
        max_features = 20000 # how many unique words to use (i.e num rows in embedding vector)
        maxlen = 100 # max number of words in a comment to use

        inp = Input(shape=(maxlen,))
        x = Embedding(max_features, embed_size, weights=[embedding_matrix])(inp)
        # Add each layer.
        for i in range(nb_layers):

            # Need input shape for first layer.
            x = Bidirectional(LSTM(50, return_sequences=True, dropout=0.1, recurrent_dropout=0.1))(x)
            x = Dense(nb_neurons, activation=activation)(x)
            x = Dropout(0.2)(x)  # hard-coded dropout
        x = GlobalMaxPool1D()(x)
        x = Dense(nb_classes, activation="sigmoid")(x)
        model = Model(inputs=inp, outputs=x)

        model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
        return model
    def train_and_score(self, network, train, label,  embedding_matrix, test):
        """Train the model, return test loss.
        Args:
            network (dict): the parameters of the network
            dataset (str): Dataset to use for training/evaluating
        
        if dataset == 'cifar10':
            nb_classes, batch_size, input_shape, x_train, \
                x_test, y_train, y_test = get_cifar10()
        elif dataset == 'mnist':
            nb_classes, batch_size, input_shape, x_train, \
                x_test, y_train, y_test = get_mnist()
        """
        batch_size=32
        epochs=2
        validation_split=0.1
        model = self.compile_model(network, 6, embedding_matrix)
        early_stopper = EarlyStopping(patience=5)
        model.fit(train, label,
                  batch_size=batch_size,
                  epochs=10000,  # using early stopping, so no real limit
                  verbose=0,
                  validation_split=0.1,
                  callbacks=[early_stopper])
        score = model.evaluate(x_test, y_test, verbose=0)
        y_test = model.predict([test], batch_size=1024, verbose=1)
        '''
        sample_submission = pd.read_csv(f'{path}{comp}sample_submission.csv')
        list_classes = ["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"]
        sample_submission[list_classes] = y_test
        sample_submission.to_csv('submission.csv', index=False)
        '''
        
        self.predictions = y_test
        return score[1]  # 1 is accuracy. 0 is loss.

    def train(self, train, label, embedding_matrix, test):
        """Train the network and record the accuracy.
        Args:
            dataset (str): Name of dataset to use.
        """
        if self.accuracy == 0.:
            self.accuracy = self.train_and_score(self.network, train, label, embedding_matrix, test)

    def print_network(self):
        """Print out a network."""
        logging.info(self.network)
        logging.info("Network accuracy: %.2f%%" % (self.accuracy * 100))

This is the Optimizer class that holds a genetic algorithm for evolving a network.

In [None]:
class Optimizer():
    """Class that implements genetic algorithm for MLP optimization."""

    def __init__(self, nn_param_choices, retain=0.4,
                 random_select=0.1, mutate_chance=0.2):
        """Create an optimizer.
        Args:
            nn_param_choices (dict): Possible network paremters
            retain (float): Percentage of population to retain after
                each generation
            random_select (float): Probability of a rejected network
                remaining in the population
            mutate_chance (float): Probability a network will be
                randomly mutated
        """
        self.mutate_chance = mutate_chance
        self.random_select = random_select
        self.retain = retain
        self.nn_param_choices = nn_param_choices

    def create_population(self, count):
        """Create a population of random networks.
        Args:
            count (int): Number of networks to generate, aka the
                size of the population
        Returns:
            (list): Population of network objects
        """
        pop = []
        for _ in range(0, count):
            # Create a random network.
            network = Network(self.nn_param_choices)
            network.create_random()

            # Add the network to our population.
            pop.append(network)

        return pop

    @staticmethod
    def fitness(network):
        """Return the accuracy, which is our fitness function."""
        return network.accuracy

    def grade(self, pop):
        """Find average fitness for a population.
        Args:
            pop (list): The population of networks
        Returns:
            (float): The average accuracy of the population
        """
        summed = reduce(add, (self.fitness(network) for network in pop))
        return summed / float((len(pop)))

    def breed(self, mother, father):
        """Make two children as parts of their parents.
        Args:
            mother (dict): Network parameters
            father (dict): Network parameters
        Returns:
            (list): Two network objects
        """
        children = []
        for _ in range(2):

            child = {}

            # Loop through the parameters and pick params for the kid.
            for param in self.nn_param_choices:
                child[param] = random.choice(
                    [mother.network[param], father.network[param]]
                )

            # Now create a network object.
            network = Network(self.nn_param_choices)
            network.create_set(child)

            # Randomly mutate some of the children.
            if self.mutate_chance > random.random():
                network = self.mutate(network)

            children.append(network)

        return children

    def mutate(self, network):
        """Randomly mutate one part of the network.
        Args:
            network (dict): The network parameters to mutate
        Returns:
            (Network): A randomly mutated network object
        """
        # Choose a random key.
        mutation = random.choice(list(self.nn_param_choices.keys()))

        # Mutate one of the params.
        network.network[mutation] = random.choice(self.nn_param_choices[mutation])

        return network
    
       

        return model
    def evolve(self, pop):
        """Evolve a population of networks.
        Args:
            pop (list): A list of network parameters
        Returns:
            (list): The evolved population of networks
        """
        # Get scores for each network.
        graded = [(self.fitness(network), network) for network in pop]

        # Sort on the scores.
        graded = [x[1] for x in sorted(graded, key=lambda x: x[0], reverse=True)]

        # Get the number we want to keep for the next gen.
        retain_length = int(len(graded)*self.retain)

        # The parents are every network we want to keep.
        parents = graded[:retain_length]

        # For those we aren't keeping, randomly keep some anyway.
        for individual in graded[retain_length:]:
            if self.random_select > random.random():
                parents.append(individual)

        # Now find out how many spots we have left to fill.
        parents_length = len(parents)
        desired_length = len(pop) - parents_length
        children = []

        # Add children, which are bred from two remaining networks.
        while len(children) < desired_length:

            # Get a random mom and dad.
            male = random.randint(0, parents_length-1)
            female = random.randint(0, parents_length-1)

            # Assuming they aren't the same network...
            if male != female:
                male = parents[male]
                female = parents[female]

                # Breed them.
                babies = self.breed(male, female)

                # Add the children one at a time.
                for baby in babies:
                    # Don't grow larger than desired length.
                    if len(children) < desired_length:
                        children.append(baby)

        parents.extend(children)

        return parents

This will setup some logs and create the functions that will train the AI with GA

In [None]:
logging.basicConfig(
    format='%(asctime)s - %(levelname)s - %(message)s',
    datefmt='%m/%d/%Y %I:%M:%S %p',
    level=logging.DEBUG,
    filename='log.txt'
)

def train_networks(networks, train, label, test):
    """Train each network.
    Args:
        networks (list): Current population of networks
        dataset (str): Dataset to use for training/evaluating
    """
    pbar = tqdm(total=len(networks))
    for network in networks:
        network.train(train, label, embedding_matrix, test)
        pbar.update(1)
    pbar.close()

def get_average_accuracy(networks):
    """Get the average accuracy for a group of networks.
    Args:
        networks (list): List of networks
    Returns:
        float: The average accuracy of a population of networks.
    """
    total_accuracy = 0
    for network in networks:
        total_accuracy += network.accuracy

    return total_accuracy / len(networks)

def generate(generations, population, nn_param_choices, train, label, embedding_matrix, test):
    """Generate a network with the genetic algorithm.
    Args:
        generations (int): Number of times to evole the population
        population (int): Number of networks in each generation
        nn_param_choices (dict): Parameter choices for networks
        dataset (str): Dataset to use for training/evaluating
    """
    optimizer = Optimizer(nn_param_choices)
    networks = optimizer.create_population(population)

    # Evolve the generation.
    for i in range(generations):
        logging.info("***Doing generation %d of %d***" %
                     (i + 1, generations))

        # Train and get accuracy for networks.
        train_networks(networks, train, label, test)

        # Get the average accuracy for this generation.
        average_accuracy = get_average_accuracy(networks)

        # Print out the average accuracy each generation.
        logging.info("Generation average: %.2f%%" % (average_accuracy * 100))
        logging.info('-'*80)

        # Evolve, except on the last iteration.
        if i != generations - 1:
            # Do the evolution.
            networks = optimizer.evolve(networks)

    # Sort our final population.
    networks = sorted(networks, key=lambda x: x.accuracy, reverse=True)

    # Print out the top 5 networks.
    PROBABILITIES_NORMALIZE_COEFFICIENT = 1.4
    print_networks(networks[:5])
    top_predictions = []
    for network in networks[:5]:
        top_predictions.append(network.predictions)
        
    test_predicts = np.ones(top_predictions[0].shape)
    for pred in top_predictions:
        test_predicts *= pred

    test_predicts **= (1. / len(top_predictions))
    test_predicts **= PROBABILITIES_NORMALIZE_COEFFICIENT

    path = '../input/'
    comp = 'jigsaw-toxic-comment-classification-challenge/'
    sample_submission = pd.read_csv(f'{path}{comp}sample_submission.csv')
    list_classes = ["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"]
    sample_submission[list_classes] = test_predicts
    sample_submission.to_csv('genetic_algorithm_predict.csv', index=False)

def print_networks(networks):
    """Print a list of networks.
    Args:
        networks (list): The population of networks
    """
    logging.info('-'*80)
    for network in networks:
        network.print_network()

We include the GloVe word vectors in our input files. To include these in your kernel, simple click 'input files' at the top of the notebook, and search 'glove' in the 'datasets' section.

In [None]:
path = '../input/'
comp = 'jigsaw-toxic-comment-classification-challenge/'
EMBEDDING_FILE=f'{path}glove6b50d/glove.6B.50d.txt'
TRAIN_DATA_FILE=f'{path}{comp}train.csv'
TEST_DATA_FILE=f'{path}{comp}test.csv'

|Set some basic config parameters:

In [None]:
embed_size = 50 # how big is each word vector
max_features = 20000 # how many unique words to use (i.e num rows in embedding vector)
maxlen = 100 # max number of words in a comment to use

Read in our data and replace missing values:

In [None]:
train = pd.read_csv(TRAIN_DATA_FILE)
test = pd.read_csv(TEST_DATA_FILE)

list_sentences_train = train["comment_text"].fillna("_na_").values
list_classes = ["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"]
y = train[list_classes].values
list_sentences_test = test["comment_text"].fillna("_na_").values

Standard keras preprocessing, to turn each comment into a list of word indexes of equal length (with truncation or padding as needed).

In [None]:
tokenizer = Tokenizer(num_words=max_features)
tokenizer.fit_on_texts(list(list_sentences_train))
list_tokenized_train = tokenizer.texts_to_sequences(list_sentences_train)
list_tokenized_test = tokenizer.texts_to_sequences(list_sentences_test)
X_t = pad_sequences(list_tokenized_train, maxlen=maxlen)
X_te = pad_sequences(list_tokenized_test, maxlen=maxlen)

Read the glove word vectors (space delimited strings) into a dictionary from word->vector.

In [None]:
def get_coefs(word,*arr): return word, np.asarray(arr, dtype='float32')
embeddings_index = dict(get_coefs(*o.strip().split()) for o in open(EMBEDDING_FILE))

Use these vectors to create our embedding matrix, with random initialization for words that aren't in GloVe. We'll use the same mean and stdev of embeddings the GloVe has when generating the random init.

In [None]:
all_embs = np.stack(embeddings_index.values())
emb_mean,emb_std = all_embs.mean(), all_embs.std()
emb_mean,emb_std

In [None]:
word_index = tokenizer.word_index
nb_words = min(max_features, len(word_index))
embedding_matrix = np.random.normal(emb_mean, emb_std, (nb_words, embed_size))
for word, i in word_index.items():
    if i >= max_features: continue
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None: embedding_matrix[i] = embedding_vector

Lets Evolve the network!!

In [None]:
generations = 10  # Number of times to evole the population.
population = 20  # Number of networks in each generation.
dataset = 'cifar10'

nn_param_choices = {
    'nb_neurons': [64, 128, 256, 512, 768, 1024],
    'nb_layers': [1, 2, 3, 4],
    'activation': ['relu', 'elu', 'tanh', 'sigmoid'],
    'optimizer': ['rmsprop', 'adam', 'sgd', 'adagrad',
                    'adadelta', 'adamax', 'nadam'],
}

logging.info("***Evolving %d generations with population %d***" %
                (generations, population))

generate(generations, population, nn_param_choices, X_t, y, embedding_matrix, X_te)

Please do try add more parameter choices to increase the variations for training!

PS: The original code is mostly referenced from this link https://blog.coast.ai/lets-evolve-a-neural-network-with-a-genetic-algorithm-code-included-8809bece164. Hope the code runs! Please fork it to run as kaggle don't allow such long time of processing. Unfortunately I do not have the resource to try the code at this moment, so if anyone found any errors please comment and I will update thanks!