# Optimization with a Genetic Algorithm
https://wiki.fysik.dtu.dk/ase/tutorials/ga/ga_optimize.html
A genetic algorithm (GA) has been implemented for global structure optimization within ase. The optimizer consists of its own module ase.ga which includes all classes needed for the optimizer.

# A Brief Overview of the Implementation
The GA relies on the ase.db module for tracking which structures have been found. Before the GA optimization starts the user therefore needs to prepare this database and appropriate folders. This is done through an initialization script as the one described in the next section. In this initialization the starting population is generated and added to the database.

After initialization the main script is run. This script defines objects responsible for the different parts of the GA and then creates and locally relaxes new candidates. It is up to the user to define when the main script should terminate. An example of a main script is given in the next section. Notice that because of the persistent data storage the main script can be executed multiple times to generate new candidates.

The GA implementation generally follows a responsibility driven approach. This means that each part of the GA is isolated into individual classes making it possible to put together an optimizer satisfying the needs of a specific optimization problem.

This tutorial will use the following parts of the GA:

*  A population responsible for proposing new candidates to pair together.
*  A paring operator which combines two candidates.
*  A set of mutations.
*  A comparator which determines if two structures are different.
*  A starting population generator.

Each of the above components are described in the supplemental material of the first reference given above and will not be discussed here. The example will instead focus on the technical aspect of executing the GA.

# A Basic Example
The user needs to specify the following three properties about the structure that needs to be optimized.

*  A list of atomic numbers for the structure to be optimized
*  A super cell in which to do the optimization. If the structure to optimize resides on a surface or in a support this supercell contains the atoms which should not be considered explicitly by the GA.
*  A box defining the volume of the super cell in which to randomly distribute the starting population.

As an example we will find the structure of a Ag2Au2 cluster on a Au(111) surface using the EMT optimizer.

The script doing all the initialisations should be run in the folder in which the GA optimisation is to take place. The script looks as follows:

In [1]:
from ase.ga.data import PrepareDB
from ase.ga.startgenerator import StartGenerator
from ase.ga.utilities import closest_distances_generator
from ase.ga.utilities import get_all_atom_types
from ase.constraints import FixAtoms
import numpy as np
from ase.build import fcc111

In [2]:
db_file = 'gadb.db'

create the surface

In [3]:
slab = fcc111('Au', size=(4, 4, 1), vacuum=10.0, orthogonal=True)
slab.set_constraint(FixAtoms(mask=len(slab) * [True]))

define the volume in which the adsorbed cluster is optimized the volume is defined by a corner position (p0) and three spanning vectors (v1, v2, v3)

In [4]:
pos = slab.get_positions()
cell = slab.get_cell()
p0 = np.array([0., 0., max(pos[:, 2]) + 2.])
v1 = cell[0, :] * 0.8
v2 = cell[1, :] * 0.8
v3 = cell[2, :]
v3[2] = 3.

Define the composition of the atoms to optimize

In [5]:
atom_numbers = 2 * [47] + 2 * [79]

define the closest distance two atoms of a given species can be to each other

In [6]:
unique_atom_types = get_all_atom_types(slab, atom_numbers)
cd = closest_distances_generator(atom_numbers=unique_atom_types,
                                 ratio_of_covalent_radii=0.7)

create the starting population

In [7]:
sg = StartGenerator(slab=slab,
                    atom_numbers=atom_numbers,
                    closest_allowed_distances=cd,
                    box_to_place_in=[p0, [v1, v2, v3]])

generate the starting population

In [8]:
population_size = 20
starting_population = [sg.get_new_candidate() for i in range(population_size)]

create the database to store information in

In [9]:
d = PrepareDB(db_file_name=db_file,
              simulation_cell=slab,
              stoichiometry=atom_numbers)

In [10]:
for a in starting_population:
    d.add_unrelaxed_candidate(a)

Having initialized the GA optimization we now need to actually run the GA. The main script running the GA consists of first an initialization part, and then a loop proposing new structures and locally optimizing them. The main script can look as follows:

In [11]:
from random import random
from ase.io import write
from ase.optimize import BFGS
from ase.calculators.emt import EMT

In [12]:
from ase.ga.data import DataConnection
from ase.ga.population import Population
from ase.ga.standard_comparators import InteratomicDistanceComparator
from ase.ga.cutandsplicepairing import CutAndSplicePairing
from ase.ga.utilities import closest_distances_generator
from ase.ga.utilities import get_all_atom_types
from ase.ga.offspring_creator import OperationSelector
from ase.ga.standardmutations import MirrorMutation
from ase.ga.standardmutations import RattleMutation
from ase.ga.standardmutations import PermutationMutation

Change the following three parameters to suit your needs

In [13]:
population_size = 20
mutation_probability = 0.3
n_to_test = 20

Initialize the different components of the GA

In [14]:
da = DataConnection('gadb.db')
atom_numbers_to_optimize = da.get_atom_numbers_to_optimize()
n_to_optimize = len(atom_numbers_to_optimize)
slab = da.get_slab()
all_atom_types = get_all_atom_types(slab, atom_numbers_to_optimize)
blmin = closest_distances_generator(all_atom_types,
                                    ratio_of_covalent_radii=0.7)

In [15]:
comp = InteratomicDistanceComparator(n_top=n_to_optimize,
                                     pair_cor_cum_diff=0.015,
                                     pair_cor_max=0.7,
                                     dE=0.02,
                                     mic=False)

In [16]:
pairing = CutAndSplicePairing(slab, n_to_optimize, blmin)
mutations = OperationSelector([1., 1., 1.],
                              [MirrorMutation(blmin, n_to_optimize),
                               RattleMutation(blmin, n_to_optimize),
                               PermutationMutation(n_to_optimize)])

Relax all unrelaxed structures (e.g. the starting population)

In [17]:
while da.get_number_of_unrelaxed_candidates() > 0:
    a = da.get_an_unrelaxed_candidate()
    a.set_calculator(EMT())
    print('Relaxing starting candidate {0}'.format(a.info['confid']))
    dyn = BFGS(a, trajectory=None, logfile=None)
    dyn.run(fmax=0.05, steps=100)
    a.info['key_value_pairs']['raw_score'] = -a.get_potential_energy()
    da.add_relaxed_step(a)

Relaxing starting candidate 2
Relaxing starting candidate 3
Relaxing starting candidate 4
Relaxing starting candidate 5
Relaxing starting candidate 6
Relaxing starting candidate 7
Relaxing starting candidate 8
Relaxing starting candidate 9
Relaxing starting candidate 10
Relaxing starting candidate 11
Relaxing starting candidate 12
Relaxing starting candidate 13
Relaxing starting candidate 14
Relaxing starting candidate 15
Relaxing starting candidate 16
Relaxing starting candidate 17
Relaxing starting candidate 18
Relaxing starting candidate 19
Relaxing starting candidate 20
Relaxing starting candidate 21


create the population

In [18]:
population = Population(data_connection=da,
                        population_size=population_size,
                        comparator=comp)

test n_to_test new candidates

In [19]:
for i in range(n_to_test):
    print('Now starting configuration number {0}'.format(i))
    a1, a2 = population.get_two_candidates()
    a3, desc = pairing.get_new_individual([a1, a2])
    if a3 is None:
        continue
    da.add_unrelaxed_candidate(a3, description=desc)

    # Check if we want to do a mutation
    if random() < mutation_probability:
        a3_mut, desc = mutations.get_new_individual([a3])
        if a3_mut is not None:
            da.add_unrelaxed_step(a3_mut, desc)
            a3 = a3_mut
        
    # Relax the new candidate
    a3.set_calculator(EMT())
    dyn = BFGS(a3, trajectory=None, logfile=None)
    dyn.run(fmax=0.05, steps=100)
    a3.info['key_value_pairs']['raw_score'] = -a3.get_potential_energy()
    da.add_relaxed_step(a3)
    population.update()

Now starting configuration number 0
Now starting configuration number 1
Now starting configuration number 2
Now starting configuration number 3
Now starting configuration number 4
Now starting configuration number 5
Now starting configuration number 6
Now starting configuration number 7
Now starting configuration number 8
Now starting configuration number 9
Now starting configuration number 10
Now starting configuration number 11
Now starting configuration number 12
Now starting configuration number 13
Now starting configuration number 14
Now starting configuration number 15
Now starting configuration number 16
Now starting configuration number 17
Now starting configuration number 18
Now starting configuration number 19


In [20]:
write('all_candidates.traj', da.get_all_relaxed_candidates())

The above script proposes and locally relaxes 20 new candidates. To speed up the execution of this sample the local relaxations are limited to 100 steps. This restriction should not be set in a real application. Note it is important to set the the `raw_score`, as it is what is being optimized (maximized). It is really an input in the `atoms.info['key_value_pairs']` dictionary.

The GA progress can be monitored by running the tool `ase/ga/tools/get_all_candidates` in the same folder as the GA. This will create a trajectory file `all_candidates.traj` which includes all locally relaxed candidates the GA has tried. This script can be run at the same time as the main script is running. This is possible because the ase.db database is being updated as the GA progresses.