!!!This README.md will contain all of the markdown information stored within "Evolutionary Algorithm MISP.ipynb"
For this project, I will be applying an evolutionary algorithm to solve/approximate a solution to the maximum independent set problem. I will be using the DEAP library to implement my genetic algorithms and the NetworkX library for graph representation.
The first step is to import the required modules needed for this algorithm. If attempting to run this code on your own machine and run into "module not found" errors, you may need to run a pip install
command for the specified libraries.
random
is used for generating random bits, seeding for deterministic randomness, and graph generationmath
is used in generating the underlying graphsnetworkx
provides framework for generating and manipulating graph datamatplotlib.pyplot
allows us to plot the graphs generated by NetworkXbase
provides access to Toolbox and base Fitnesscreator
is used to create typestools
provides access to genetic operators and other useful methodsalgorithms
provides access to premade evolutionary loops. (will be implementing custom version, however some methods are still necessary)
I will now create the underlying graph to be used in the algorithm. To create the graph, I will be using the nx.random_tree()
function which takes the following parameters:
n
[int] The number of nodes in the treeseed
[integer, random_state, or None (default)] Indicator of random number generation state
I use a tree as a base for the graph to ensure that the graph is connected. If any nodes are not connected, then the fitness function will always be biased towards the unconnected nodes. After creating the base for the graph, I create a second graph using the nx.gnm_random_graph()
function which takes in the following parameters:
n
[int] The number of nodes in the graphm
[int] The number of edgesseed
[integer, random_state, or None (default)] Indicator of random number generation statedirected
[bool, optional (default=False)] If True return a directed graph
The edges from this graph are then added to the original tree to produce a deterministically random connected graph.
When using the DEAP library, you must create a fitness and individual type. For the MISP, we want to maximize the amount of nodes included in the independent set so we create our FitnessMax
type with positive weights.
The individual type represents a single solution to the MISP. We will use binary vectors of length n (num_nodes) where a bit of '1' represents membership to the independent set.
Example: Binary Vector [0, 1, 0, 0, 1] implies a subset of graph G that includes nodes 1 and 4
Naturally, we will use the list data type to represent out individual and assign our FitnessMax
to the fitness attribute of the Individual
type.
After creating our types, the next step is to create our toolbox and register the necessary methods. The goal of the toolbox is to register a standard set of functions under an alias that are to be later used in your algorithms.
We will now register a few functions to the toolbox that will aid in generating individuals and populations in a heirarchical fashion:
attr_bool
is used to generate a random binary bit where a bit indicates set membershipindividual
repeatedly callstoolbox.attr_bool
to populate acreator.Individual
type with binary bitspopulation
repeatedly calledtoolbox.individual
to create a list of individuals (population)
The goal of the MISP is to maximize the amount of nodes included in the solution subset. To accomplish this, we need a fitness function that can accurately measure the quality of the solution. For this implementation that uses binary vectors to represent the individual, there are two options for measuring fitness that come to mind:
- Measure fitness based on number of nodes included in the subset, i.e. the sum of all elements in the vector, and then penalizing the fitness score for infeasible solutions (not an independent set).
- Before measuring fitness of individual, run it through a repair function that checks to see if the solution is feasible. If it is not, correct the solution, then proceed to measure the fitness via summation.
In this implementation, I will be using the second option. More information will be provided about the repair funtion near it's code's declaration. Below I define the simple fitness function and register it to the toolbox.
The next step to using the DEAP library is registering the genetic operators. DEAP provides multiple predefined operators for the crossover, mutation, and selection steps. We can register these methods to the toolbox with the appropriate alias.
Now I will implement the repair function mentioned earlier. Since each individual will be passed into the repair function, we need to check if the individual represents a feasible solution. Here are the primary steps we use to check if a solution is an independent set:
- Record index of each bit that equals 1 from the individual and store them to list
nodes
. This is so we can reference the correct nodes in the underlying graph. - Iterate through each node in
nodes
using a nested for loop to check if there exists an edge between u and v. We useG.has_edge(u,v)
provided by NetworkX. - Check
u != v
to ensure you're not comparing the same node. - If there exists an edge between u and v, return
False, nodes
- If the solution is feasible, return
True, nodes
We return nodes, since the indexes will be reused in the repair function and we do not want to recompute the values.
Next I define the greedy recursive function repairIndividual
to actually take action on the individuals if need be. The steps of the method are as follows:
- Check if the individual needs to be repaired by calling
is_ind_set(G, individual)
- If the individual is an independent set return. Otherwise, iterate through each node in the subset.
- Call
G.degree[node]
and compare to the currentmax_deg
(highest degree of a node in set). - If current node has a higher degree, store the index of the node in
node_to_remove
and updatemax_deg
- "Remove"
node_to_remove
from the individual by updating the cooresponding bit in the individual to 0 - Make a recursive calls to
repairIndividual(G, individual)
until the solution is feasible
After defining the repair function, register it to the toolbox under the alias repair
The DEAP library provides multiple predefined algorithms for executing the evolutionary loop. Since I am implementing the repair function and want to run the repair step mid-loop, I need to either design my own evolutionary loop, or modify the DEAP source code. I've decided to copy the eaSimple
algorithm from the DEAP source code and modify it.
I've added the repair step after the crossover and mutation steps. This way, individuals can be "repaired" if they are broken during the previous steps and before they are evaluated. The rest of the steps in eaSimpleWithRepair
are identical to eaSimple
source code from the DEAP GitHub repository.
In GA()
, I begin by creating our population using the methods registered in the toolbox earlier. After creating the first generation, I run the repair function for each individual before starting the evolutionary loop. This is so that computation time is not wasted on the first generation, as some of the individuals may not be feasible do to random generation.
The eaSimpleWithRepair
method takes in a stats object similar to the toolbox. You can register metrics to the stats object that will be measure on each generation of the algorithm. Here I register avg
, min
, and max
that will calculate the average, minimum, and maximum fitness score of each generation respectively.
Next I call eaSimpleRepair
with the following parameters:
population
: The initial list of individuals the algorithm starts withtoolbox
: The toolbox with all registered methods and genetic operatorscxpb
: The probability of mating two individualsmutpb
: The probability of mutating an individualngen
: The number of generations (stopping condition of evolutionary loop)stats
: The statistics object created previously that will collect data on population fitness valueshalloffame
: A HallOfFame object that contains the best individualsverbose
: Boolean value that determines if logbook is printed during running or not
This function returns the final population, the logbook, and the HallOfFame object.
The last step is to run the algorithm on some input graphs. Keeping in mind the structure and execution order of Jupyter Notebooks along with certain DEAP restrictions, all of the code previously written has been placed into methods. This is so that new DEAP environments can be iteratively created based on the problem input. It also allows a user of this notebook to run the entire notebook from top to bottom with no error messages.