<a href="https://colab.research.google.com/github/manuel-alvarez/scheduling/blob/master/poc_dev.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Schedule Generator with Genetic Algorithms

This script is intended to be a schedule generator for schools.

There are three type of resources:
 - Classrooms
 - Teachers
 - Subjects

For this script we will consider that all students in a class go together and don't move, so they are something like "attached to the classroom" and therefore not considered as resources, but they could.

## Description

In this version, we are going to consider two of the three types of resources as constant, and one of them as variable. For example:
 - 1 classroom
 - 1 teacher
 - Multiple subjects (where number of subjects is greater than segment length)

### Distribution

In first version, subjects were randomly distributed, but now we want the subjects to be distributed with sense. Some subject must have 5 hours per week, while other just have one. We will use a dictionary with distribution and we will compare this to the unique (with counts) method in get_fitness.

### Constraints

In this example, we will use constraints. Sometimes, resources (subjects, teachers, ...) cannot be used in certain hours. This will be configured with an array of zeros and ones. 0 means it can be used, 1 means it cannot.

## Requirements

In [0]:
import numpy

## Settings

All this settings would be better in a database table, but we will use them as constants in those examples.

In [0]:
# Population (total size of time slots)
POPULATION_SIZE = 100
# Each segment represents a day
SEGMENT = 5
# Five days, since the schedule is weekly based, that are 5 segments
INDIVIDUAL_SIZE = 25
# Resources, currently, of just one type, let's say subjects
RESOURCES = 10
# Number of individuals that pass to the next generation
SURVIVAL_RATE = .3
# Rate of elements that are going to mutate
MUTATION_RATE = .1
# Rate of elements (in first population * rate positions) that are not mutating
STEADY_POPULATION = .1
# Rate of genes that are going to be mutate
MUTATIONS = .2
# Number of iterations we do in order to get the best approach
STEPS = 500
# Positions that must remain empty
EMPTY = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]
# Severity
SEVERITY = {
    'high': 6,
    'medium': 3,
    'low': 1
}
DISTRIBUTION = {  # With examples for a primary school in Spain
    1: 5,  # Spanish
    2: 4,  # Math
    3: 4,  # English
    4: 2,  # Social Science
    5: 2,  # Natural Science
    6: 2,  # PE
    7: 1,  # Religion
    8: 1,  # Arts & crafts
    9: 1,  # Music
    10: 1  # Local culture (Asturian)
}
CONSTRAINTS = {
    1: [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1],
    2: None,
    3: None,
    4: None,
    5: None,
    6: None,
    7: None,
    8: None,
    9: None,
    10: None
}

In [0]:
def create_individual(resources_size, individual_size):
  """
  Individuals are created with some constraints, so it's easier to find a valid
  one
  """
  individual = []
  for key in DISTRIBUTION.keys():
    for i in range(DISTRIBUTION[key]):
      individual.append(key)
  numpy.random.shuffle(individual)
  for i in range(len(individual), INDIVIDUAL_SIZE):
    individual.append(0)

  return individual  

In [0]:
create_individual(RESOURCES, INDIVIDUAL_SIZE)


In [0]:
def initialize_population(population_size, individual_size, resources_size):
  population = None
  for i in range(population_size):
    if population is None:
      population = numpy.array([create_individual(resources_size, individual_size)])
    else:
      population = numpy.append(population, [create_individual(resources_size, individual_size)], axis=0)
  return population

In [0]:
def get_distribution(individual):
  unique, counts = numpy.unique(individual, return_counts=True)
  actual_distribution = dict(zip(unique, counts))
  return actual_distribution

In [0]:
def get_fitness(the_try):
  fitness = 0
  for i in range(int(numpy.ceil(INDIVIDUAL_SIZE / SEGMENT))):
    # print(f'Evaluating', i*SEGMENT, (i+1)*SEGMENT, the_try[i*SEGMENT:(i+1)*SEGMENT], 'from', the_try)
    unique, counts = numpy.unique(the_try[i*SEGMENT:(i+1)*SEGMENT], return_counts=True)
    # Not repeated resources in same segment
    # Except 0
    count = dict(zip(unique, counts))
    if 0 in count.keys():
      count.pop(0)
    fitness += sum(SEVERITY['medium'] for item in count.values() if item > 1)

  # fitness by distribution
  actual_distribution = get_distribution(the_try)
  fitness += sum(numpy.abs(actual_distribution.get(key, 0) - DISTRIBUTION[key]) for key in DISTRIBUTION.keys())
  # Some positions must remain empty
  fitness += sum(EMPTY[index] * SEVERITY['high'] * item / item for index, item in enumerate(the_try) if item != 0)
  # Not empty slots in try unless it's mandatory
  fitness += sum(SEVERITY['medium'] for index, item in enumerate(the_try) if item == 0 and EMPTY[index] == 0) 

  # Fitness by constraints
  for resource in CONSTRAINTS:
    if CONSTRAINTS[resource] is not None:
      items = numpy.array([1 if key == resource else 0 for key in the_try])
      constraints = numpy.array(CONSTRAINTS[resource])
      fitness += sum(items * constraints) * SEVERITY['medium']

  return fitness

In [0]:
def get_selection(population):
  """
  First, remove duplicates
  Then leave just the SURVIVAL_RATE of the total population
  Finally sort them by fitness
  """
  selection = numpy.unique(population, axis=0)
  selection = numpy.array(sorted([[get_fitness(the_try), the_try] for the_try in selection], key=lambda x:x[0])[:int(numpy.ceil(POPULATION_SIZE * SURVIVAL_RATE))])
  return selection


In [0]:
def breed(selection):
  population = numpy.array([the_try for the_try in selection])
  i = 0
  while len(population) < POPULATION_SIZE:
    if i <= len(selection):
      parents = population[i:i+2]
      numpy.random.shuffle(parents)
      slicery = numpy.random.randint(INDIVIDUAL_SIZE) 
      population = numpy.append(population, numpy.array([numpy.append(parents[0][slicery:], parents[1][:slicery], axis=0)]), axis=0)
      i += 1
    # Once reached the end, create new random individuals
    else:
      population = numpy.append(population, [create_individual(RESOURCES, INDIVIDUAL_SIZE)], axis=0)

  return population

In [0]:
def mutate(population):
  mutated_population = population.copy()
  mutants = int(numpy.ceil(POPULATION_SIZE * MUTATION_RATE))
  for i in range(mutants):
    index = numpy.random.randint(POPULATION_SIZE * STEADY_POPULATION, POPULATION_SIZE)
    # Do NOT mutate accurate individuals
    if get_fitness(mutated_population[index]) > 0:
      mutant = mutated_population[index]
      num_mutations = int(numpy.ceil(INDIVIDUAL_SIZE) * MUTATIONS)
      for mutation in range(num_mutations):
        mutant[numpy.random.randint(INDIVIDUAL_SIZE)] = numpy.random.randint(1, RESOURCES)
      mutated_population[index] = mutant

  return mutated_population

In [0]:
stats = []
for times in range(100):
  population = initialize_population(POPULATION_SIZE, INDIVIDUAL_SIZE, RESOURCES)
  for i in range(STEPS):
    # print(f'Step {i}')
    selection = get_selection(population)
    # print(selection[0:3])
    if all([item[0] == 0 for item in selection[0:3]]):
      stats.append(i)
      print(times + 1)
      break
    population = breed(selection[:,1])
    population = mutate(population)

for i in range(3):
  print(numpy.reshape(selection[:,1][i], (5, 5)))
  print(get_distribution(selection[:,1][i]))

print(stats)
print('times', len(stats))
print('max', numpy.max(stats))
print('min', numpy.min(stats))
print('average', numpy.average(stats))
print('median', numpy.median(stats))

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90


## Stats stored for comparison

In [0]:
stats = [33, 86, 233, 41, 65, 64, 155, 74, 118, 101, 115, 109, 71, 85, 70, 64, 184, 75, 232, 53, 82, 26, 238, 125, 76, 67, 83, 15, 222, 78, 211, 236, 187, 124, 106, 84, 199, 64, 110, 180, 26, 114, 90, 239, 69, 217, 49, 143, 164, 94, 135, 37, 152, 134, 123, 57, 55, 125, 25, 182, 176, 144, 56, 135, 96, 76, 31, 9, 163, 46, 24, 268, 175, 43, 125, 56, 95, 49, 128, 93, 205, 216, 73, 240, 178, 142, 127, 122, 342, 88, 109, 71, 150, 162, 86, 70, 94, 119, 266, 55]
print('times', len(stats))
print('max', numpy.max(stats))
print('min', numpy.min(stats))
print('average', numpy.average(stats))
print('median', numpy.median(stats))