# Genetic Algorithm
## Description
Genetic algorithm simulates the natural selection. At each step, the genetic algorithm selects individuals from the current population to be parents and uses them to produce the children for the next generation. Over successive generations, the population evolves toward an optimal solution. Like natural selection, this algorithm also has crossover to create new individuals and mutation to apply random changes to individuals.

### Fitness score
The fitness score show how well fit an individual is. The individuals having better fitness score have higher chance to be parents in the next reproduction.

Each new generation has on average more “better genes” than the individual (solution) of previous generations. Thus each new generations have better “partial solutions” than previous generations. Once the offspring produced having no significant difference from offspring produced by previous populations, the population is converged. The algorithm is said to be converged to a set of solutions for the problem.

### Operators of genetic algorithm
- **Selection operator**: Genetic algorithm randomly selects 2 individuals for reproduction. For more realistic, the probability of being chosen is proportional to the fitness score.
- **Crossover**: After selecting parents, the algorithm randomly choose a crossover point, then exchange the genes.
- **Mutation Operator**: The algorithm randomly choose a part of genes and changes. This is when new unique genes are created and prevents the population from premature.
## Explanation
### Individual Class
For simpler explanation, we build a class called individual to contain the gene and fitness score:
- Gene is an array containing bits, 0 for not choosing and 1 is choosing the item
- Fitness score is an integer showing how well the individual is fit to problem.

``` 
class Individual
    public gene
    public fitness
```

The fitness score is calculated by a private method. We uses a sett called class set to store unique classes of items when choosing. If the item is chosen, we add its weight to current weight. And if the current weight is more than the capacity of the backpack, we return 0 as this individual is not fit the problem. Then, we add its class to classSet and its value to current value. Finally, if our class set has *m* unique number, which means we choose enough classes, we return the value and 0 if otherwise.

```
private function calculateFitness
    returns fitness score
    class set <- empty
    current weight
```

# Source Code

In [107]:
# create individual
class Individual(object):
    def __init__(self, gene: list) -> None:
        # with gene is choices
        self.gene = gene
        self.fitness = self.calculateFitness()

    def calculateFitness(self):
        # create set to memorize the chosen classes
        classSet = set()
        # initial weight and value of individual
        curW, value = 0, 0
        # for each item in individual
        for i in range(len(self.gene)):
            # skip if the item is not chosen
            if (self.gene[i]==0):
                continue
            # try adding item to the backpack
            curW = curW + w[i]
            # if the current weight is more than the storage of backpack return 0 as this individual is not fit the problem
            if (curW>W):
                return 0
            # add 
            classSet.add(c[i])
            value = value + v[i]
        return value if (len(classSet)>=m) else 0

In [108]:
#import library for roulette wheel
import random
import numpy
def select(population):
    # find total fitness value
    max = sum([i.fitness for i in population])
    if (max == 0):
        return population[random.randint(0,len(population)-1)]
    # create probability proportion for each individual
    selection_props = [i.fitness/max for i in population]
    # return the chosen individual
    return population[numpy.random.choice(len(population), p=selection_props)]

In [109]:
from random import randint

# performing crossover
def crossover (p1, p2):
    # random position to crossover
    p1Gene, p2Gene = p1.gene, p2.gene
    crossPoint = randint(0,len(p1Gene)-1)
    # return the child after crossover 2 genes from parents
    newGene = p1Gene[crossPoint:] + p2Gene[:crossPoint]
    return Individual(newGene)

In [110]:
import random

# constance for mutation rate
MUTATION_RATE = 0.5
# performing mutation on individual
def mutation(individual):
    # for each parts of individual
    for i in individual.gene:
        # if this part has mutation
        if (random.random()<MUTATION_RATE):
            # change this part
            i = 0 if (i==1) else 1

    return individual

In [111]:
# get initial population
# n is the number of individuals of initial population
def initialPopulation(n):
    # set population as set for preventing duplicated individuals
    population = set([])
    # generate population
    numItems = len(w)
    while (len(population)!=n):
        # randomly choose items
        gene=[]
        for _ in range(numItems):
            gene.append(randint(0,1))
        # add new individual to population
        individual = Individual(gene)
        population.add(individual)
    return list(population)

In [112]:
from copy import copy


def geneticAlgorithm(loop, numInitialPopulation):
    # initial population
    # we choose the number of initial individuals is 5
    population = initialPopulation(numInitialPopulation)
    # sorting population to get the fittest individual
    population = sorted(population, key = lambda i: i.fitness, reverse=True)

    bestIndividual = population[0] if population[0].fitness>0 else Individual([0]*len(w))
    # we get new generation until there is no new individuals
    for _ in range(loop):
        # initial new population
        newPopulation = []
        for _ in range(len(population)):
            # select first parents
            p1,p2 = select(population), select(population)
            # print(f'{p1}\n{p2}\n')
            # doing crossover
            child = crossover(p1,p2)
            # print(f'New child: {child}\n')
            # doing mutation
            child = mutation(child)
            # print(f'After mutation: {child}')
            # add child to new population
            newPopulation.append(child)
            if (bestIndividual.fitness < child.fitness):
                bestIndividual=copy(child)
            # for visualize
        # stop generating if there is no new individual
        population = newPopulation

    # return the fittest individual
    # print(f'Fittest: {bestIndividual}')
    return bestIndividual

In [113]:
import time

W,m=0,0
w,v,c = [], [] , []
NUM_GENERATION = 1000
NUM_INITIAL_POPULATION = 10

start = time.time()

# set path for input files
input_dir='data/input/' 
for file in os.listdir(input_dir):
    # open each file
    with open(input_dir+file) as f:
        # check if that file is .txt file or not
        if (file[-4:]!='.txt'): continue
        # read 5 input strings from file to variables
        capacity,class_num,weight,val,label=f.readlines()
        # set value for W and m
        W,m=int(capacity),int(class_num)
        # set value for w 
        w=weight.replace(' ','').replace('\n','')
        w=[float(num) for num in weight.split(',')]
        # set value for v
        v=val.replace(' ','').replace('\n','')
        v=[float(num) for num in val.split(',')]
        # set value for c
        c=label.replace(' ','').replace('\n','')
        c=[float(num) for num in label.split(',')]
        #solving problem using genetic algorithm
        bestIndividual = geneticAlgorithm(NUM_GENERATION, NUM_INITIAL_POPULATION)
        print(f'{bestIndividual.fitness}\n')
        print(', '.join(str(i) for i in bestIndividual.gene))
        print('\n')

end = time.time()
print(f'Duration: {end-start}')

117.0

0, 1, 0, 0, 1, 0, 0, 0, 0, 1


1000.0

1, 1, 1, 1, 1, 1, 1, 1, 1, 1


83228.0

1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0


67330.0

0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0


0

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0


5000.0

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1


65300.0

1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1,