## Before running

A virtual environment can be created using 
- 'pipenv install'
- 'pipenv shell'

This will allow us to all use the same packages and versions. They are listed in the Pipfile

In [1]:
from refactoring import *

## Inputs

Dictionaries are taken as input from a parameter file, they contain the parameters for each soap descriptor

In [2]:
descDict1 = {'lower': 1, 'upper': 50, 'centres': '{8, 7, 6, 1, 16, 17, 9}',
             'neighbours': '{8, 7, 6, 1, 16, 17, 9}', 'mu': 0, 
             'mu_hat': 0, 'nu': 2, 'nu_hat': 0, 'mutation_chance': 0.50, 
             'min_cutoff': 1, 'max_cutoff': 50, 'min_sigma': 0.1, 
             'max_sigma': 0.9}

descDict2 = {'lower': 51, 'upper': 100, 'centres': '{8, 7, 6, 1, 16, 17, 9}',
             'neighbours': '{8, 7, 6, 1, 16, 17, 9}', 'mu': 0, 
             'mu_hat': 0, 'nu': 2, 'nu_hat': 0, 'mutation_chance': 0.50,
             'min_cutoff': 51, 'max_cutoff': 100, 'min_sigma': 1.1, 
             'max_sigma': 1.9}

Other parameters are also taken as input. These are automatically checked that the parameters are viable

In [3]:
num_gens = 100
best_sample, lucky_few, population_size, number_of_children = 4, 2, 12, 4
early_stop = 2
early_number = 3 
min_generations = 5

## GeneParameter

GeneParameter class is created from each descriptor dictionary. 

In [4]:
params1 = GeneParameters(**descDict1)
params2 = GeneParameters(**descDict2)

In [5]:
params1

GeneParameters(lower=1, upper=50, centres='{8, 7, 6, 1, 16, 17, 9}', neighbours='{8, 7, 6, 1, 16, 17, 9}', mu=0, mu_hat=0, nu=2, nu_hat=0, mutation_chance=0.5, min_cutoff=1, max_cutoff=50, min_sigma=0.1, max_sigma=0.9)

## GeneSet

We can use these classes to create a specific set of parameters that are consistant with these values. This returns a randomly generated GeneSet class

In [6]:
example_gene_set = params1.make_gene_set()
example_gene_set

GeneSet(32, 23, 28, 0.31)

We can get the parameters used to create the GeneSet class

In [7]:
example_gene_set.gene_parameters

GeneParameters(lower=1, upper=50, centres='{8, 7, 6, 1, 16, 17, 9}', neighbours='{8, 7, 6, 1, 16, 17, 9}', mu=0, mu_hat=0, nu=2, nu_hat=0, mutation_chance=0.5, min_cutoff=1, max_cutoff=50, min_sigma=0.1, max_sigma=0.9)

We can get a descriptor string to be used as an input for getting SOAPs

In [8]:
example_gene_set.get_soap_string()

'soap average cutoff=32 l_max=23 n_max=28 atom_sigma=0.31 n_Z=9 Z={8, 7, 6, 1, 16, 17, 9} n_species=9 species_Z={8, 7, 6, 1, 16, 17, 9} mu=0 mu_hat=0 nu=2 nu_hat=0'

We can also mutate the gene using the mutation chance in the GeneParameters class

In [9]:
print(f"Before mutation {example_gene_set}")
example_gene_set.mutate_gene()
print(f"After mutation {example_gene_set}")

Before mutation [32, 23, 28, 0.31]
After mutation [32, 38, 40, 0.64]


## Individual

An Individual is made up of a list of GeneSet classes.

In [10]:
example_gene_set_two = params2.make_gene_set()
gene_set_list = [example_gene_set, example_gene_set_two]
example_individual = Individual(gene_set_list)
example_individual

Individual(['GeneSet(32, 38, 40, 0.64)', 'GeneSet(77, 74, 65, 1.72)'])

Getting the score for an indivudual

In [11]:
example_individual.get_score()
example_individual.score

109

Breeding two individuals to create a child. Mutation is automatically performed during this

In [12]:
example_individual_two = Individual(gene_set_list)
print(f"Breeding {example_individual} with {example_individual_two}")
child = breed_individuals(example_individual, example_individual_two)
print(f"Created child {child}")

Breeding Individual(['[32, 38, 40, 0.64]', '[77, 74, 65, 1.72]']) with Individual(['[32, 38, 40, 0.64]', '[77, 74, 65, 1.72]'])
Created child Individual(['[15, 43, 40, 0.73]', '[77, 67, 65, 1.4]'])


## Population

A Population is a collection of Individual classes. This can be created using a list of GeneParameter classes

In [21]:
gene_parameters = [params1, params2]
pop = Population(best_sample, lucky_few, population_size, 
                 number_of_children, gene_parameters, 
                 maximise_scores = True)
pop

Population(4, 2, 12, 4, [GeneParameters(lower=1, upper=50, centres='{8, 7, 6, 1, 16, 17, 9}', neighbours='{8, 7, 6, 1, 16, 17, 9}', mu=0, mu_hat=0, nu=2, nu_hat=0, mutation_chance=0.5, min_cutoff=1, max_cutoff=50, min_sigma=0.1, max_sigma=0.9), GeneParameters(lower=51, upper=100, centres='{8, 7, 6, 1, 16, 17, 9}', neighbours='{8, 7, 6, 1, 16, 17, 9}', mu=0, mu_hat=0, nu=2, nu_hat=0, mutation_chance=0.5, min_cutoff=51, max_cutoff=100, min_sigma=1.1, max_sigma=1.9)], True)

To initialise the population

In [14]:
pop.initialise_population()

Initial population of size 12 generated


If you want a way of neatly seeing what individuals are in the population

In [15]:
pop.print_population()

Individual(['[30, 43, 44, 0.33]', '[55, 97, 99, 1.14]']) has a score of: 85
Individual(['[37, 48, 45, 0.68]', '[98, 64, 52, 1.66]']) has a score of: 135
Individual(['[46, 49, 27, 0.83]', '[74, 61, 60, 1.39]']) has a score of: 120
Individual(['[1, 19, 7, 0.64]', '[93, 93, 66, 1.73]']) has a score of: 94
Individual(['[6, 5, 36, 0.23]', '[91, 74, 86, 1.54]']) has a score of: 97
Individual(['[9, 14, 34, 0.58]', '[70, 88, 79, 1.16]']) has a score of: 79
Individual(['[42, 25, 25, 0.47]', '[87, 95, 93, 1.69]']) has a score of: 129
Individual(['[23, 14, 36, 0.52]', '[66, 53, 75, 1.85]']) has a score of: 89
Individual(['[26, 9, 47, 0.21]', '[62, 53, 92, 1.49]']) has a score of: 88
Individual(['[8, 8, 9, 0.79]', '[98, 52, 59, 1.41]']) has a score of: 106
Individual(['[40, 33, 38, 0.89]', '[95, 85, 73, 1.81]']) has a score of: 135
Individual(['[13, 22, 6, 0.34]', '[94, 62, 54, 1.13]']) has a score of: 107


The next generation can then be generated 

In [16]:
pop.next_generation()
pop.print_population()

Individual(['[20, 33, 45, 0.62]', '[74, 64, 53, 1.81]']) has a score of: 94
Individual(['[37, 33, 45, 0.59]', '[84, 91, 73, 1.66]']) has a score of: 121
Individual(['[8, 8, 12, 0.2]', '[62, 53, 59, 1.87]']) has a score of: 70
Individual(['[14, 38, 27, 0.61]', '[53, 51, 76, 1.87]']) has a score of: 67
Individual(['[26, 27, 26, 0.33]', '[98, 52, 60, 1.49]']) has a score of: 124
Individual(['[19, 19, 6, 0.46]', '[52, 85, 94, 1.39]']) has a score of: 71
Individual(['[14, 48, 2, 0.68]', '[65, 85, 81, 1.25]']) has a score of: 79
Individual(['[46, 44, 27, 0.83]', '[54, 95, 56, 1.34]']) has a score of: 100
Individual(['[46, 25, 27, 0.83]', '[63, 61, 93, 1.69]']) has a score of: 109
Individual(['[8, 8, 45, 0.29]', '[77, 56, 59, 1.45]']) has a score of: 85
Individual(['[8, 8, 39, 0.81]', '[72, 66, 91, 1.49]']) has a score of: 80
Individual(['[46, 45, 23, 0.49]', '[74, 95, 93, 1.15]']) has a score of: 120


So to run the full GA 

In [17]:
for _ in range(num_gens):
    pop.next_generation()
pop.print_population()

Individual(['[10, 28, 49, 0.48]', '[53, 77, 52, 1.52]']) has a score of: 63
Individual(['[17, 37, 19, 0.58]', '[98, 83, 88, 1.46]']) has a score of: 115
Individual(['[45, 30, 48, 0.24]', '[57, 76, 71, 1.52]']) has a score of: 102
Individual(['[42, 18, 38, 0.46]', '[87, 92, 82, 1.63]']) has a score of: 129
Individual(['[44, 3, 25, 0.89]', '[87, 56, 94, 1.1]']) has a score of: 131
Individual(['[39, 25, 11, 0.48]', '[87, 56, 68, 1.63]']) has a score of: 126
Individual(['[27, 37, 19, 0.29]', '[98, 83, 87, 1.7]']) has a score of: 125
Individual(['[39, 32, 17, 0.79]', '[96, 57, 88, 1.7]']) has a score of: 135
Individual(['[27, 37, 19, 0.79]', '[51, 83, 62, 1.89]']) has a score of: 78
Individual(['[42, 45, 12, 0.48]', '[53, 75, 71, 1.6]']) has a score of: 95
Individual(['[34, 28, 6, 0.44]', '[90, 55, 61, 1.75]']) has a score of: 124
Individual(['[39, 1, 1, 0.48]', '[90, 92, 68, 1.53]']) has a score of: 129


## BestHistory

BestHistory is a class to store the history and check convergence criteria. So the entire GA can be run, printed, and saved using the following code snippet:

In [18]:
hist = BestHistory(early_stop, early_number, min_generations)
pop = Population(best_sample, lucky_few, population_size, 
                 number_of_children, gene_parameters, 
                 maximise_scores = True)

pop.initialise_population()    
for gen in range(num_gens):
    if hist.converged:
        break
    print(f"Generation {gen}")
    pop.next_generation()
    hist.append(pop)
    print("-------")

Initial population of size 12 generated
Generation 0
Best Individual Individual(['[45, 21, 41, 0.52]', '[98, 69, 61, 1.47]']) with a score of 143 added to history
-------
Generation 1
Best Individual Individual(['[45, 21, 31, 0.4]', '[98, 82, 53, 1.26]']) with a score of 143 added to history
-------
Generation 2
Best Individual Individual(['[45, 44, 31, 0.35]', '[99, 82, 53, 1.26]']) with a score of 144 added to history
-------
Generation 3
Best Individual Individual(['[45, 44, 31, 0.35]', '[98, 74, 80, 1.74]']) with a score of 143 added to history
-------
Generation 4
Best Individual Individual(['[45, 44, 43, 0.69]', '[98, 74, 80, 1.54]']) with a score of 143 added to history
SOAP_GAS has converged
-------


There now exists the entire history of the best Individuals throughout each generation that can be saved and easily accessed. 

In [19]:
vars(hist)

{'history': [Individual(['GeneSet(45, 21, 41, 0.52)', 'GeneSet(98, 69, 61, 1.47)']),
  Individual(['GeneSet(45, 21, 31, 0.4)', 'GeneSet(98, 82, 53, 1.26)']),
  Individual(['GeneSet(45, 44, 31, 0.35)', 'GeneSet(99, 82, 53, 1.26)']),
  Individual(['GeneSet(45, 44, 31, 0.35)', 'GeneSet(98, 74, 80, 1.74)']),
  Individual(['GeneSet(45, 44, 43, 0.69)', 'GeneSet(98, 74, 80, 1.54)'])],
 'converged': True,
 'early_stop': 2,
 'early_number': 3,
 'min_generations': 5}