# Genetic Algorithm
## Description
Genetic algorithm simulates the natural selection. At each step, the genetic algorithm selects individuals from the current population to be parents and uses them to produce the children for the next generation. Over successive generations, the population evolves toward an optimal solution. Like natural selection, this algorithm also has crossover to create new individuals and mutation to apply random changes to individuals.

### Fitness score
The fitness score show how well fit an individual is. The individuals having better fitness score have higher chance to be parents in the next reproduction.

Each new generation has on average more “better genes” than the individual (solution) of previous generations. Thus each new generations have better “partial solutions” than previous generations. Once the offspring produced having no significant difference from offspring produced by previous populations, the population is converged. The algorithm is said to be converged to a set of solutions for the problem.

### Operators of genetic algorithm
- **Selection operator**: Genetic algorithm randomly selects 2 individuals for reproduction. For more realistic, the probability of being chosen is proportional to the fitness score.
- **Crossover**: After selecting parents, the algorithm randomly choose a crossover point, then exchange the genes.
- **Mutation Operator**: The algorithm randomly choose a part of genes and changes. This is when new unique genes are created and prevents the population from premature.
## Explanation
### Individual Class
For simpler explanation, we build a class called individual to contain the gene and fitness score:
- Gene is an array containing bits, 0 for not choosing and 1 is choosing the item
- Fitness score is an integer showing how well the individual is fit to problem.

``` 
CLASS Individual
    PUBLIC gene
    PUBLIC fitness
```

The fitness score is calculated by a private method. We uses a sett called class set to store unique classes of items when choosing. If the item is chosen, we add its weight to current weight. And if the current weight is more than the capacity of the backpack, we return 0 as this individual is not fit the problem. Then, we add its class to classSet and its value to current value. Finally, if our class set has *m* unique number, which means we choose enough classes, we return the value and 0 if otherwise.

```
private FUNCTION calculateFitness
    RETURNS fitness score
    class set <- empty
    current weight, current value <- 0,0
    FOR i=0 TO size(gene) DO
        IF (gene[i]=0) CONTINUE
        current weight = current weight + w[i]
        IF (current weight > capacity) RETURN 0
        class set ADD c[i]
        current value = current value + v[i]
    
    IF (number value in class set >= number classes) RETURN value
    ELSE RETURN 0
```
### Selection Operator
This operator is used to select parent from the population. For more realistic to natural, the fitter the individual is, the higher change it is chosen to become parent.

First, we find the total fitness score of population and call that value max. If max is 0, that means no individual is fit the problems, we can return any individual in population. Next, the probability of being chosen is calculated by $p = \frac{fitness}{max}$. Finally, we return an individual in population that randomly chosen with probability of being chosen

```
FUNCTION selection(population)
    RETURNS individual that become parent
    max <- 0
    FOR i=0 TO SIZE(population) DO
        max = max + population[i].fitness

    IF (max = 0) RETURN an individual in population that randomly chosen

    selection props <- empty list
    FOR i=0 TO SIZE*population DO
        selection props APPEND population[i].fitness/max

    RETURN an individual in population that randomly chosen with probability of being chosen in selection props
```

### Crossover Operator
After selecting two parents, we randomly choose a cross point. Then, we replace a part of gene of first parent with part of gene of second parent.

```
FUNCTION crossover (p1, p2)
    RETURNS an individual
    p1 gene, p2 gene <- p1.gene, p2.gene
    cross point is randomly chosen with value between 0 and SIZE(p1 gene)-1
    new gene = p1[0 .. cross point] APPEND p2[cross point+1 .. SIZE(p2 gene)-1]
    RETURN an individual with gene is new gene    
```

### Mutation Operator
For each bits of choice, there is a chance that it will change.

We loop through the gene, and with a probability to have mutation, we may change that bit

```
FUNCTION mutation(individual)
    RETURNS an individual
    FOR i=0 TO SIZE(individual.gene) DO
        IF (this bit is chosen to be changed)
            individual.gene[i]= ~individual.gene[i]
            
```

# Source Code

In [53]:
# create individual
class Individual(object):
    def __init__(self, gene: list) -> None:
        # with gene is choices
        self.gene = gene
        self.fitness = self.calculateFitness()

    def calculateFitness(self):
        # create set to memorize the chosen classes
        classSet = set()
        # initial weight and value of individual
        curW, value = 0, 0
        # for each item in individual
        for i in range(len(self.gene)):
            # skip if the item is not chosen
            if (self.gene[i]==0):
                continue
            # try adding item to the backpack
            curW = curW + w[i]
            # if the current weight is more than the storage of backpack return 0 as this individual is not fit the problem
            if (curW>W):
                return 0
            # add 
            classSet.add(c[i])
            value = value + v[i]
        return value if (len(classSet)>=m) else 0

In [54]:
#import library for roulette wheel
import random
import numpy
def select(population):
    # find total fitness value
    max = sum([i.fitness for i in population])
    if (max == 0):
        return population[random.randint(0,len(population)-1)]
    # create probability proportion for each individual
    selection_props = [i.fitness/max for i in population]
    # return the chosen individual
    return population[numpy.random.choice(len(population), p=selection_props)]

In [55]:
from random import randint

# performing crossover
def crossover (p1, p2):
    # random position to crossover
    p1Gene, p2Gene = p1.gene, p2.gene
    crossPoint = randint(0,len(p1Gene)-1)
    # return the child after crossover 2 genes from parents
    newGene = p1Gene[crossPoint:] + p2Gene[:crossPoint]
    return Individual(newGene)

In [56]:
import random

# constance for mutation rate
MUTATION_RATE = 0.25
# performing mutation on individual
def mutation(individual):
    # for each parts of individual
    for i in individual.gene:
        # if this part has mutation
        if (random.random()<MUTATION_RATE):
            # change this part
            i = ~i

    return individual

In [57]:
# get initial population
# n is the number of individuals of initial population
def initialPopulation(n):
    # set population as set for preventing duplicated individuals
    population = set([])
    # generate population
    numItems = len(w)
    while (len(population)!=n):
        # randomly choose items
        gene=[]
        for _ in range(numItems):
            gene.append(randint(0,1))
        # add new individual to population
        individual = Individual(gene)
        population.add(individual)
    return list(population)

In [58]:
from copy import copy


def geneticAlgorithm(loop, numInitialPopulation):
    # initial population
    # we choose the number of initial individuals is 5
    population = initialPopulation(numInitialPopulation)
    # sorting population to get the fittest individual
    population = sorted(population, key = lambda i: i.fitness, reverse=True)

    bestIndividual = population[0] if population[0].fitness>0 else Individual([0]*len(w))
    # we get new generation until there is no new individuals
    for _ in range(loop):
        # initial new population
        newPopulation = set()
        for _ in range(len(population)):
            # select first parents
            p1,p2 = select(population), select(population)
            # print(f'{p1}\n{p2}\n')
            # doing crossover
            child = crossover(p1,p2)
            # print(f'New child: {child}\n')
            # doing mutation
            child = mutation(child)
            # print(f'After mutation: {child}')
            # add child to new population
            newPopulation.add(child)
            if (bestIndividual.fitness < child.fitness):
                bestIndividual=copy(child)
            # for visualize
        # stop generating if there is no new individual
        population = list(newPopulation)

    # return the fittest individual
    # print(f'Fittest: {bestIndividual}')
    return bestIndividual

In [59]:
import time

W,m=0,0
w,v,c = [], [] , []
NUM_GENERATION = 2000
NUM_INITIAL_POPULATION = 10

# set path for input files
input_dir='data/input/' 
for file in os.listdir(input_dir):
    # open each file
    with open(input_dir+file) as f:
        # check if that file is .txt file or not
        if (file[-4:]!='.txt'): continue
        # read 5 input strings from file to variables
        capacity,class_num,weight,val,label=f.readlines()
        # set value for W and m
        W,m=int(capacity),int(class_num)
        # set value for w 
        w=weight.replace(' ','').replace('\n','')
        w=[eval(num) for num in weight.split(',')]
        # set value for v
        v=val.replace(' ','').replace('\n','')
        v=[eval(num) for num in val.split(',')]
        # set value for c
        c=label.replace(' ','').replace('\n','')
        c=[eval(num) for num in label.split(',')]
        #solving problem using genetic algorithm
        start = time.time()
        bestIndividual = geneticAlgorithm(NUM_GENERATION, NUM_INITIAL_POPULATION)
        end = time.time()
        print(f'{bestIndividual.fitness}')
        print(', '.join(str(i) for i in bestIndividual.gene))
        print(f'Duration: {end-start}')
        print('\n')

117
0, 1, 0, 0, 1, 0, 0, 0, 0, 1
Duration: 3.973566770553589


1000
1, 1, 1, 1, 1, 1, 1, 1, 1, 1
Duration: 4.140781402587891


14211
1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
Duration: 5.871917247772217


0
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
Duration: 1.2932159900665283


0
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
Duration: 1.183478593826294


5000
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
Duration: 4.847155570983887


