## Genetic Algorithm for Course Schedule

##### We will see how we can apply a genetic algorithm to solve a prerequisite problem similar to the abilities of topological sort. We will then, however, take a look at how it can extend beyond a traditional algorithm to a more complex utility. This will hopefully showcase how we as students could leverage genetic algorithms for constrained optimization problems we may face.


In [32]:
import random
from copy import deepcopy
from tqdm import tqdm

This code defines a class called `Course` with a few key attributes.

In [33]:
class Course:
    def __init__(self, name, hours, subject, difficulty):
        self.name = name
        self.hours = hours
        self.subject = subject
        self.difficulty = difficulty
    
    def __str__(self):
        return self.name

This code defines a class `Semester` which will have a list of courses and a few key attributes.

In [34]:
class Semester:
    def __init__(self, courses):
        self.courses = courses
        self.hours = sum([course.hours for course in courses])
        self.difficulty = sum([course.difficulty for course in courses])
        self.subjects = set(course.subject for course in courses)
    
    def __str__(self):
        return f"Semester: {self.hours} hours, {self.difficulty} difficulty, {self.subjects} subjects"

Finally, this code defines a class `Schedule` which will have a list of semesters and a few key attributes.

In [35]:
class Schedule:
    def __init__(self, semesters):
        self.semesters = semesters

    def __str__(self):
        str = ''
        for semester in self.semesters:
            for c in semester.courses:
                str += f"{c.name} "
            str += '\n'
        return str

All of the relevant courses are below

In [36]:
# create courses
courses = [
    # CS classes
    Course("CS 1101", 3, "CS", 2),
    Course("CS 2201", 3, "CS", 5),
    Course("CS 2212", 3, "CS", 3),
    Course("CS 3251", 3, "CS", 5),
    Course("CS 3250", 3, "CS", 5),
    Course("CS 3270", 3, "CS", 4),
    Course("CS 3281", 3, "CS", 4),
    # Math classes
    Course("MATH 1300", 3, "MATH", 4),
    Course("MATH 1301", 3, "MATH", 4),
    Course("MATH 2300", 3, "MATH", 4),
    # Physics classes
    Course("PHYS 1600", 3, "PHYS", 4),
    Course("PHYS 1601", 3, "PHYS", 4),
    # English classes
    Course("ENGL 1101", 3, "ENGL", 3),
    Course("ENGL 1102", 3, "ENGL", 3),
    # History classes
    Course("HIST 1101", 3, "HIST", 3),
    Course("HIST 1102", 3, "HIST", 3),
    # Philosophy classes
    Course("PHIL 1101", 3, "PHIL", 3),
    Course("PHIL 1102", 3, "PHIL", 3),
    # Religion classes
    Course("RELG 1101", 3, "RELG", 3),
    Course("RELG 1102", 3, "RELG", 3),
]

We will store prerequisites in a dictionary where the key is the course and the value is a set of prerequisites.

In [37]:
# set up prereqs
prereqs = {
    "CS 2201": set(["CS 1101"]),
    "CS 2212": set(["CS 1101"]),
    "CS 3251": set(["CS 2201", "CS 2212"]),
    "CS 3250": set(["CS 2201", "CS 2212"]),
    "CS 3270": set(["CS 2201", "CS 2212"]),
    "CS 3281": set(["CS 2201", "CS 2212"]),
    "MATH 1301": set(["MATH 1300"]),
    "MATH 2300": set(["MATH 1301"]),
    "PHYS 1601": set(["PHYS 1600"]),
    "ENGL 1102": set(["ENGL 1101"]),
    "HIST 1102": set(["HIST 1101"]),
    "PHIL 1102": set(["PHIL 1101"]),
    "RELG 1102": set(["RELG 1101"]),
}

The code defines a function called `schedule_fitness` that calculates the utility of a given schedule. This is the key to our genetic algorithm. We will use this function to determine how good a given schedule is. This is where we can and will put any types of goals we want to achieve. For example, we could try to minimize the number of math classes or maximize philosophy classes.

In our case we will take a more realistic approach. Our goal will be to take all the classes without violating any prerequisites. 

In [38]:
def schedule_fitness(schedule, verbose=False):
    utility = 0
    
    # we should now check for prereqs
    taken_names = set()
    for semester in schedule.semesters:
        temp_taken = []
        for course in semester.courses:
            if course.name in prereqs:
                for prereq in prereqs[course.name]:
                    if prereq not in taken_names:
                        if verbose:
                            print(f"Prereq not taken: {prereq} for {course.name}")
                        utility -= 1000
            temp_taken.append(course.name)
        taken_names.update(temp_taken)

    # now we should check for duplicates
    taken_names = set()
    for semester in schedule.semesters:
        for course in semester.courses:
            if course.name in taken_names:
                utility -= 1000
            taken_names.add(course.name)

    return utility

Below is also an important component which will allow us to mutate a schedule by swapping a pair of courses in two arbitrary semesters. This will allow us to explore the space of possible schedules.

In [39]:
def schedule_mutate(schedule):
    # make a copy of the schedule
    schedule = deepcopy(schedule)
    # swap two courses
    semester1 = random.choice(schedule.semesters)
    semester2 = random.choice(schedule.semesters)
    while semester1 == semester2:
        semester2 = random.choice(schedule.semesters)
    course1 = random.choice(semester1.courses)
    course2 = random.choice(semester2.courses)
    if course1 == course2:
        return schedule
    semester1.courses.remove(course1)
    semester2.courses.remove(course2)
    semester2.courses.append(course1)
    semester1.courses.append(course2)

    return schedule

In [40]:
def random_schedule():
    # create a list of courses
    course_list = deepcopy(courses)
    # shuffle the list
    random.shuffle(course_list)
    # create 4 semesters of 5 courses each
    semesters = []
    for i in range(4):
        semesters.append(Semester(course_list[i*5:(i+1)*5]))
    # return a new schedule
    return Schedule(semesters)

Our main loop constructs 100 random schedules. It then runs the genetic algorithm for a specified number of iterations. Each time we take the top 10 schedules and breed them to create 90 new schedules. We then sort our schedules by fitness and use the top 10 to breed again. We repeat this process until we have reached the specified number of iterations.

In our case, it does not make sense to try to join two schedules as we will almost certainly have duplicates or a nonsensical schedule. Instead, we will perform something more akin to asexual reproduction. We will take two schedules and swap a pair of courses in two semesters. This will allow us to explore the space of possible schedules.

In [41]:
def run(num_iterations=10000, schedule_fitness=schedule_fitness):
    initial_population = [random_schedule() for i in range(100)]
    best_schedule = None
    best_fitness = -1000000000

    for i in range(num_iterations):
        # pick top 10 parents
        parents = sorted(initial_population, key=schedule_fitness, reverse=True)[:10]
        # create 90 children
        children = []
        for j in range(90):
            # pick a random parent
            parent = random.choice(parents)
            # mutate the parent
            child = schedule_mutate(parent)
            # add the child to the children
            children.append(child)
        # combine the parents and children
        initial_population = parents + children
        # sort the population by fitness
        initial_population = sorted(initial_population, key=schedule_fitness, reverse=True)
        # kill the bottom 90
        initial_population = initial_population[:10]
        # check if the top schedule is the best
        if schedule_fitness(initial_population[0]) > best_fitness:
            best_fitness = schedule_fitness(initial_population[0])
            best_schedule = initial_population[0]

        if i % 1000 == 0:
            print(f"Iteration: {i}")
            print(f"Best fitness: {best_fitness}")
            print(best_schedule)

    print(f"Best fitness: {best_fitness}")
    print(best_schedule)
    return best_schedule

In [42]:
schedule = run(10)

Iteration: 0
Best fitness: -4000
MATH 2300 MATH 1301 CS 2201 CS 2212 MATH 1300 
HIST 1101 RELG 1101 PHYS 1600 ENGL 1101 CS 3270 
PHYS 1601 CS 3281 ENGL 1102 CS 3250 PHIL 1101 
CS 3251 HIST 1102 PHIL 1102 CS 1101 RELG 1102 

Best fitness: 0
ENGL 1101 CS 1101 PHYS 1600 PHIL 1101 HIST 1101 
CS 2201 PHYS 1601 MATH 1300 RELG 1101 CS 2212 
CS 3270 MATH 1301 CS 3250 PHIL 1102 ENGL 1102 
HIST 1102 CS 3251 MATH 2300 RELG 1102 CS 3281 



We can see in a very short period of time that our genetic algorithm learns a proper schedule where no prerequisites are violated. This, of course, can be accomplished through other means such as a simple topological sort, but the genetic algorithm has advantages in that we can modify the fitness function to optimize for other things.

Let's say that we are planning to do a Physics related internship during the summer of Freshman year. We can modify the fitness function to optimize for the number of Physics related courses taken during the first two semesters.

In [43]:
def schedule_fitness_2(schedule, verbose=False):
    utility = 0
    
    # we should now check for prereqs
    taken_names = set()
    for semester in schedule.semesters:
        temp_taken = []
        for course in semester.courses:
            if course.name in prereqs:
                for prereq in prereqs[course.name]:
                    if prereq not in taken_names:
                        if verbose:
                            print(f"Prereq not taken: {prereq} for {course.name}")
                        utility -= 1000
            temp_taken.append(course.name)
        taken_names.update(temp_taken)

    # now we should check for duplicates
    taken_names = set()
    for semester in schedule.semesters:
        for course in semester.courses:
            if course.name in taken_names:
                utility -= 1000
            taken_names.add(course.name)

    # lets reward physics classes in the first two semesters
    if "PHYS 1600" in [course.name for course in schedule.semesters[0].courses]:
        utility += 300
    if "PHYS 1601" in [course.name for course in schedule.semesters[1].courses]:
        utility += 300

    return utility

In [45]:
schedule_2 = run(1000, schedule_fitness_2)

Iteration: 0
Best fitness: -3700
MATH 1301 HIST 1101 CS 2201 PHIL 1101 PHYS 1600 
HIST 1102 MATH 1300 CS 2212 MATH 2300 ENGL 1102 
ENGL 1101 CS 3281 RELG 1101 PHIL 1102 PHYS 1601 
CS 3250 CS 3251 CS 1101 CS 3270 RELG 1102 

Best fitness: 600
HIST 1101 PHIL 1101 PHYS 1600 CS 1101 MATH 1300 
HIST 1102 CS 2212 PHYS 1601 ENGL 1101 MATH 1301 
RELG 1101 PHIL 1102 ENGL 1102 MATH 2300 CS 2201 
CS 3251 CS 3270 RELG 1102 CS 3281 CS 3250 



Now we can see that the genetic algorithm has learned a schedule that has 2 Physics related courses in the first two semesters. This is a very simple example, but it shows how we can modify the utility function to optimize for different things. In many cases, our schedule may not be able to accomplish all utility points but we can still optimize for the most important ones even if we ourselves don't know what an ideal schedule looks like.

Hopefully now we can appreciate the power of genetic algorithms and how they can be applied to a wide variety of problems including those within our own lives that we do not know the optimal solution to.