Skip to content

A repository for CSCI 7501 Computational Intelligence Project 1

Notifications You must be signed in to change notification settings

hspinos/Evolutionary-Algorithm-MISP

Repository files navigation

Evolutionary-Algorithm-MISP

!!!This README.md will contain all of the markdown information stored within "Evolutionary Algorithm MISP.ipynb"

Solving Maximum Independent Set Problem w/ Evolutionary Algorithm

For this project, I will be applying an evolutionary algorithm to solve/approximate a solution to the maximum independent set problem. I will be using the DEAP library to implement my genetic algorithms and the NetworkX library for graph representation.

Imports

The first step is to import the required modules needed for this algorithm. If attempting to run this code on your own machine and run into "module not found" errors, you may need to run a pip install command for the specified libraries.

  • random is used for generating random bits, seeding for deterministic randomness, and graph generation
  • math is used in generating the underlying graphs
  • networkxprovides framework for generating and manipulating graph data
  • matplotlib.pyplot allows us to plot the graphs generated by NetworkX
  • base provides access to Toolbox and base Fitness
  • creator is used to create types
  • tools provides access to genetic operators and other useful methods
  • algorithms provides access to premade evolutionary loops. (will be implementing custom version, however some methods are still necessary)

Creating Underlying Graph

I will now create the underlying graph to be used in the algorithm. To create the graph, I will be using the nx.random_tree() function which takes the following parameters:

  • n [int] The number of nodes in the tree
  • seed [integer, random_state, or None (default)] Indicator of random number generation state

I use a tree as a base for the graph to ensure that the graph is connected. If any nodes are not connected, then the fitness function will always be biased towards the unconnected nodes. After creating the base for the graph, I create a second graph using the nx.gnm_random_graph() function which takes in the following parameters:

  • n [int] The number of nodes in the graph
  • m [int] The number of edges
  • seed [integer, random_state, or None (default)] Indicator of random number generation state
  • directed [bool, optional (default=False)] If True return a directed graph

The edges from this graph are then added to the original tree to produce a deterministically random connected graph.

Creating DEAP Required Types

When using the DEAP library, you must create a fitness and individual type. For the MISP, we want to maximize the amount of nodes included in the independent set so we create our FitnessMax type with positive weights.

The individual type represents a single solution to the MISP. We will use binary vectors of length n (num_nodes) where a bit of '1' represents membership to the independent set.

Example: Binary Vector [0, 1, 0, 0, 1] implies a subset of graph G that includes nodes 1 and 4

Naturally, we will use the list data type to represent out individual and assign our FitnessMax to the fitness attribute of the Individual type.

Toolbox

After creating our types, the next step is to create our toolbox and register the necessary methods. The goal of the toolbox is to register a standard set of functions under an alias that are to be later used in your algorithms.

Creating Individuals and Populations

We will now register a few functions to the toolbox that will aid in generating individuals and populations in a heirarchical fashion:

  • attr_bool is used to generate a random binary bit where a bit indicates set membership
  • individual repeatedly calls toolbox.attr_bool to populate a creator.Individual type with binary bits
  • population repeatedly called toolbox.individual to create a list of individuals (population)

Fitness Function

The goal of the MISP is to maximize the amount of nodes included in the solution subset. To accomplish this, we need a fitness function that can accurately measure the quality of the solution. For this implementation that uses binary vectors to represent the individual, there are two options for measuring fitness that come to mind:

  1. Measure fitness based on number of nodes included in the subset, i.e. the sum of all elements in the vector, and then penalizing the fitness score for infeasible solutions (not an independent set).
  2. Before measuring fitness of individual, run it through a repair function that checks to see if the solution is feasible. If it is not, correct the solution, then proceed to measure the fitness via summation.

In this implementation, I will be using the second option. More information will be provided about the repair funtion near it's code's declaration. Below I define the simple fitness function and register it to the toolbox.

Register Genetic Operators

The next step to using the DEAP library is registering the genetic operators. DEAP provides multiple predefined operators for the crossover, mutation, and selection steps. We can register these methods to the toolbox with the appropriate alias.

Repair Function: Part 1

Now I will implement the repair function mentioned earlier. Since each individual will be passed into the repair function, we need to check if the individual represents a feasible solution. Here are the primary steps we use to check if a solution is an independent set:

  1. Record index of each bit that equals 1 from the individual and store them to list nodes. This is so we can reference the correct nodes in the underlying graph.
  2. Iterate through each node in nodes using a nested for loop to check if there exists an edge between u and v. We use G.has_edge(u,v) provided by NetworkX.
  3. Check u != v to ensure you're not comparing the same node.
  4. If there exists an edge between u and v, return False, nodes
  5. If the solution is feasible, return True, nodes

We return nodes, since the indexes will be reused in the repair function and we do not want to recompute the values.

Repair Function: Part 2

Next I define the greedy recursive function repairIndividual to actually take action on the individuals if need be. The steps of the method are as follows:

  1. Check if the individual needs to be repaired by calling is_ind_set(G, individual)
  2. If the individual is an independent set return. Otherwise, iterate through each node in the subset.
  3. Call G.degree[node] and compare to the current max_deg (highest degree of a node in set).
  4. If current node has a higher degree, store the index of the node in node_to_remove and update max_deg
  5. "Remove" node_to_remove from the individual by updating the cooresponding bit in the individual to 0
  6. Make a recursive calls to repairIndividual(G, individual) until the solution is feasible

After defining the repair function, register it to the toolbox under the alias repair

Implement Evolutionary Loop

The DEAP library provides multiple predefined algorithms for executing the evolutionary loop. Since I am implementing the repair function and want to run the repair step mid-loop, I need to either design my own evolutionary loop, or modify the DEAP source code. I've decided to copy the eaSimple algorithm from the DEAP source code and modify it.

I've added the repair step after the crossover and mutation steps. This way, individuals can be "repaired" if they are broken during the previous steps and before they are evaluated. The rest of the steps in eaSimpleWithRepair are identical to eaSimple source code from the DEAP GitHub repository.

Define Genetic Algorithm

In GA(), I begin by creating our population using the methods registered in the toolbox earlier. After creating the first generation, I run the repair function for each individual before starting the evolutionary loop. This is so that computation time is not wasted on the first generation, as some of the individuals may not be feasible do to random generation.

Data Collection

The eaSimpleWithRepair method takes in a stats object similar to the toolbox. You can register metrics to the stats object that will be measure on each generation of the algorithm. Here I register avg, min, and max that will calculate the average, minimum, and maximum fitness score of each generation respectively.

Evolutionary Loop

Next I call eaSimpleRepair with the following parameters:

  • population: The initial list of individuals the algorithm starts with
  • toolbox: The toolbox with all registered methods and genetic operators
  • cxpb: The probability of mating two individuals
  • mutpb: The probability of mutating an individual
  • ngen: The number of generations (stopping condition of evolutionary loop)
  • stats: The statistics object created previously that will collect data on population fitness values
  • halloffame: A HallOfFame object that contains the best individuals
  • verbose: Boolean value that determines if logbook is printed during running or not

Return Values

This function returns the final population, the logbook, and the HallOfFame object.

Running the Algorithm

The last step is to run the algorithm on some input graphs. Keeping in mind the structure and execution order of Jupyter Notebooks along with certain DEAP restrictions, all of the code previously written has been placed into methods. This is so that new DEAP environments can be iteratively created based on the problem input. It also allows a user of this notebook to run the entire notebook from top to bottom with no error messages.

About

A repository for CSCI 7501 Computational Intelligence Project 1

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published