# Genetic Algorithm
## Description
The Genetic Algorithm simulates natural selection. At each step, the Genetic Algorithm chooses members of the current population to act as parents and uses them to create children for the following generation. The population evolves in the direction of an ideal solution over successive generations. Like natural selection, this algorithm also has crossover to create new individuals and mutation to apply random changes to individuals.

### Fitness score
The fitness score shows how well-fit an individual is. The individuals having better fitness scores have a higher chance to be parents in the next reproduction.

The average number of "better genes" in a new generation is higher than it was in the generation before it. As a result, new generations always have more effective "partial solutions" than earlier ones. The population has converged once there are no discernible differences between the progeny and those of earlier populations. It is claimed that the algorithm is changed into a collection of answers to the issue.

### Operators of Genetic Algorithm
- **Selection operator**: Genetic Algorithm chooses two individuals randomly for reproduction. The likelihood of being chosen increases with a fitness score for more realistic.
- **Crossover operator**: After choosing parents, the algorithm selects a crossover point randomly, where the genes are exchanged.
- **Mutation operator**: The algorithm alters a bit of gene at random. This is when new, distinctive genes are made, preventing the population from being premature.
## Explanation
### Individual Class
To provide a simpler explanation, we build a class called individual to contain the gene and fitness score:
- Gene is an array of bits with the values 0 for not selecting and 1 for selecting the item.
- A fitness score, which is an integer, indicates how well an individual is fit for the problem.

``` 
CLASS Individual
    PUBLIC gene
    PUBLIC fitness
```

The fitness score is calculated by a private method. We use a set called class set to store distinct classes of items when choosing. If the item is chosen, we add its weight to the current weight. Additionally, we return 0 if the person's weight exceeds the backpack's carrying capacity because they are not a good fit for the problem. Then, its value is added to the current value and its class is added to the class set. Finally, we return the value and 0 otherwise if our class set contains m distinct numbers, indicating that we selected enough classes.

```
private FUNCTION calculateFitness
    RETURNS fitness score
    class set <- empty
    current weight, current value <- 0,0
    FOR i=0 TO size(gene) DO
        IF (gene[i]=0) CONTINUE
        current weight = current weight + w[i]
        IF (current weight > capacity) RETURN 0
        class set ADD c[i]
        current value = current value + v[i]
    
    IF (number value in class set >= number classes) RETURN value
    ELSE RETURN 0
```
### Selection Operator
This operator is used to select the parent from the population. For more realistic to natural, the fitter the individual is, the higher chance that individual is chosen to become a parent.

We start by calculating the population's overall fitness score, which we refer to as max. Any member of the population may be returned if the maximum is $0$, which indicates that no individual is fit for the problem. The probability of being selected is then determined using the formula $p = \frac{fitness}{max}$. Finally, we give a population member who was randomly chosen a probability of being chosen.

```
FUNCTION selection(population)
    RETURNS individual that become parent
    max <- 0
    FOR i=0 TO SIZE(population) DO
        max = max + population[i].fitness

    IF (max = 0) RETURN an individual in population that randomly chosen

    selection props <- empty list
    FOR i=0 TO SIZE*population DO
        selection props APPEND population[i].fitness/max

    RETURN an individual in population that randomly chosen with probability of being chosen in selection props
```

### Crossover Operator
We choose two parents and then a crosspoint at random. Then, we swap out a portion of a first parent's gene for a second parent's gene.

```
FUNCTION crossover (p1, p2)
    RETURNS an individual
    p1 gene, p2 gene <- p1.gene, p2.gene
    cross point is randomly chosen with value between 0 and SIZE(p1 gene)-1
    new gene = p1[0 .. cross point] APPEND p2[cross point+1 .. SIZE(p2 gene)-1]
    RETURN an individual with gene is new gene    
```

### Mutation Operator
For each bit of choice, there is a chance that it will change.

We loop through the gene, and if there is a chance for a mutation, we might alter that bit. In this report, the probability to have a mutation is $0.1$.

```
FUNCTION mutation(individual)
    RETURNS an individual
    FOR i=0 TO SIZE(individual.gene) DO
        IF (this bit is chosen to be changed)
            individual.gene[i]= ~individual.gene[i]
            ENDIF
    RETURN individual
```
### Generating Initial Population
This function is not necessary. However, we choose to separate this function from the main algorithm to make the presentation more understandable.

For each initial individual, we create a gene that each bit is randomly 0 or 1. In case there are some duplicated individuals, use a set for the population and then return it as an array.

```
FUNCTION initialPopulation(n)
    RETURNS list of initial population
    INITIALIZE population AS A SET
    WHILE (SIZE(population)!=n) DO
        INITIALIZE gene AS AN ARRAY
        FOR i=0 TO number of items DO
            gene APPEND choose randomly 0 or 1
        ADD gene TO population

    RETURN population AS AN ARRAY
```

### Fully implemented Genetic Algorithm
In the algorithm, we need to specify the number of generations and the number of initial individuals. In this report, we decided the number of generations is 2000 and the number of initial individuals is $10$.

First, we initialize the population as the first generation and best individual which is the fittest one among generated individuals.

For each generation, we create a set called the new population to store distinct individuals of the new population. We choose two parents for each generation, perform crossover to produce a child, and perform mutation to produce a new gene. After creating a new child, we add it to a new population and it may become the best individual if the fitness score of the child is greater. We do this process several times, in this report is the number of individuals in the current population, and assign the population as an array of the new population for the next generation.

After doing this process many times, we return the best individual as the answer to the Knapsack problem.

```
FUNCTION geneticAlgorithm(number of generation, number initial individual)
    RETURNS the best individual
    population <- initialPopulation(number initial individual)
    best individual <- fittest individual in population

    FOR i=1 TO number of generation DO
        INITIALIZE new population AS A SET
        FOR j=1 TO SIZE(population) DO
            p1 <- select(population)
            p2 <- select(population)

            child <- crossover(p1,p2)
            child <- mutation(child)

            ADD child TO new population
            IF (fitness score OF best individual < fitness score OF child) 
                best individual = child
                ENDIF
        population = new population AS AN ARRAY

    RETURN best individual
```
## Evaluation
Let $G$ be the number of generations, $P$ be the size of the population and $F$ be the run-time complexity of the fitness calculate function, the run-time complexity of the genetic algorithm is $O(G\cdot P\cdot F)$.

## Comments
The space of the algorithm depends on the number of generations and the size of the population. In this way of implementation, the space taken is not too large to find the optimal solution.

The number of initial individuals does not have too many effects since it only has more parents which gives more chances to find the best individual early. However, as more generations are produced, we may be able to find individuals who are fitter as we may be able to obtain genes that contain portions of the solution and use those components to produce a really good solution.

The most ideal implementation is utilizing a single population. We add a new child to the population set at a new generation until the size is not changing. However, this is not realistic since the size of the population is $2^n$, where $n$ is the number of items, which is too large to store in a computer.

We can also make a few small adjustments. To maintain population diversity, the crossover operation can bring back two additional children. We can also eliminate the bad individuals, for example, the fitness score is $0$. However, we discover that this is unwise since those individuals may contain good parts.

## Conclusion
The Genetic Algorithm is simple and fast since the core of this algorithm is random and it runs in almost constant time. Because it is based on natural phenomena, the result is not optimal but acceptable. Hence, it is good when solving a hard problem or a problem with a large state space.


## References
- [Generating initial population idea and run-time complexity](https://arpitbhayani.me/blogs/genetic-knapsack)
- [Roulette wheel simulation in source code](https://rocreguant.com/roulette-wheel-selection-python/2019/)

# Source Code

In [50]:
# create individual
class Individual(object):
    def __init__(self, gene: list) -> None:
        # with gene is choices
        self.gene = gene
        self.fitness = self.calculateFitness()

    def calculateFitness(self):
        # create set to memorize the chosen classes
        classSet = set()
        # initial weight and value of individual
        curW, value = 0, 0
        # for each item in individual
        for i in range(len(self.gene)):
            # skip if the item is not chosen
            if (self.gene[i]==0):
                continue
            # try adding item to the backpack
            curW = curW + w[i]
            # if the current weight is more than the storage of backpack return 0 as this individual is not fit the problem
            if (curW>W):
                return 0
            # add 
            classSet.add(c[i])
            value = value + v[i]
        return value if (len(classSet)>=m) else 0

In [51]:
#import library for roulette wheel
import random
import numpy
def select(population):
    # find total fitness value
    max = sum([i.fitness for i in population])
    if (max == 0):
        return population[random.randint(0,len(population)-1)]
    # create probability proportion for each individual
    selection_props = [i.fitness/max for i in population]
    # return the chosen individual
    return population[numpy.random.choice(len(population), p=selection_props)]

In [52]:
from random import randint

# performing crossover
def crossover (p1, p2):
    # random position to crossover
    p1Gene, p2Gene = p1.gene, p2.gene
    crossPoint = randint(0,len(p1Gene)-1)
    # return the child after crossover 2 genes from parents
    newGene = p1Gene[crossPoint:] + p2Gene[:crossPoint]
    return Individual(newGene)

In [53]:
import random

# constance for mutation rate
MUTATION_RATE = 0.1
# performing mutation on individual
def mutation(individual):
    # for each parts of individual
    for i in individual.gene:
        # if this part has mutation
        if (random.random()<MUTATION_RATE):
            # change this part
            i = ~i

    return individual

In [54]:
# get initial population
# n is the number of individuals of initial population
def initialPopulation(n):
    # set population as set for preventing duplicated individuals
    population = set()
    # generate population
    numItems = len(w)
    while (len(population)!=n):
        # randomly choose items
        gene=[]
        for _ in range(numItems):
            gene.append(randint(0,1))
        # add new individual to population
        individual = Individual(gene)
        population.add(individual)
    return list(population)

In [55]:
from copy import copy


def geneticAlgorithm(numberGeneration, numInitialIndividual):
    # initial population
    # we choose the number of initial individuals is 5
    population = initialPopulation(numInitialIndividual)
    # sorting population to get the fittest individual
    population = sorted(population, key = lambda i: i.fitness, reverse=True)

    bestIndividual = population[0] if population[0].fitness>0 else Individual([0]*len(w))
    # we get new generation until there is no new individuals
    for _ in range(numberGeneration):
        # initial new population
        newPopulation = set()
        for _ in range(len(population)):
            # select first parents
            p1,p2 = select(population), select(population)
            # print(f'{p1}\n{p2}\n')
            # doing crossover
            child = crossover(p1,p2)
            # print(f'New child: {child}\n')
            # doing mutation
            child = mutation(child)
            # print(f'After mutation: {child}')
            # add child to new population
            newPopulation.add(child)
            if (bestIndividual.fitness < child.fitness):
                bestIndividual=copy(child)
            # for visualize
        # stop generating if there is no new individual
        population = list(newPopulation)

    # return the fittest individual
    # print(f'Fittest: {bestIndividual}')
    return bestIndividual

In [56]:
import time

W,m=0,0
w,v,c = [], [] , []
NUM_GENERATION = 2000
NUM_INITIAL_POPULATION = 10

# set path for input files
input_dir='data/input/' 
for file in os.listdir(input_dir):
    # open each file
    with open(input_dir+file) as f:
        # check if that file is .txt file or not
        if (file[-4:]!='.txt'): continue
        # read 5 input strings from file to variables
        capacity,class_num,weight,val,label=f.readlines()
        # set value for W and m
        W,m=int(capacity),int(class_num)
        # set value for w 
        w=weight.replace(' ','').replace('\n','')
        w=[eval(num) for num in weight.split(',')]
        # set value for v
        v=val.replace(' ','').replace('\n','')
        v=[eval(num) for num in val.split(',')]
        # set value for c
        c=label.replace(' ','').replace('\n','')
        c=[eval(num) for num in label.split(',')]
        #solving problem using genetic algorithm
        start = time.time()
        bestIndividual = geneticAlgorithm(NUM_GENERATION, NUM_INITIAL_POPULATION)
        end = time.time()
        print(f'{bestIndividual.fitness}')
        print(', '.join(str(i) for i in bestIndividual.gene))
        print(f'Duration: {end-start}')
        print('\n')

117
0, 1, 0, 0, 1, 0, 0, 0, 0, 1
Duration: 2.7638871669769287


1000
1, 1, 1, 1, 1, 1, 1, 1, 1, 1
Duration: 3.424774408340454


14211
1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
Duration: 2.7349798679351807


5843
0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0
Duration: 2.0752854347229004


0
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
Duration: 0.34926772117614746


5000
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
Duration: 2.8802521228790283


77300
1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0