# Genetic algorithm Search for stable FCC alloys
https://wiki.fysik.dtu.dk/ase/tutorials/ga/ga_fcc_alloys.html

In this tutorial we will emulate an older paper and determine the most stable FCC alloy using the genetic algorithm. Since the purpose is only the tutorial we will limit the phase space to the elements supported by the EMT potential.

# Basic outline of the search

* Choose the phase space of your problem. Is the number of possible individuals large enough to prevent a full screening and is the fitness function too discontinuous for a traditional optimization by derivation? If so continue.
* Choose model structures and calculate references in those structures. Put the results somewhere accesible for a script initiated by the genetic algorithm.
* Choose suitable parameters like population size (general rule of thumb for the population size: $\log_2(N)$ < pop size < $2\log_2(N)$, where $N$ is the size of the phase space), convergence criteria etc.
* Create the initial population.
* Choose procreation operators, i.e. how should offspring be produced. New operators can easily be created by modifying the existing operators.
* Run the algorithm.
Here we would like to predict the most stable fcc alloys. In this tutorial we only have the `ase.calculators.emt` available thus we are limited to the supported metal elements: Al, Ni, Cu, Pd, Ag, Pt and Au. We limit ourselves to at most 4 different metals in one structure, thereby having only $7^4=2401$ candidates in the phase space, symmetry would make this number even lower but the number is fitting for this tutorial.

For a real application of the algorithm it is necessary to use a more sophisticated calculator, in that case each individual calculation is performed on a cluster by submitting to a queuing system. How this is achieved in the algorithm is covered in Optimization with a Genetic Algorithm.

defined for an alloy $ABC_2$: A + B + 2C -> $ABC_2$ as: $\Delta Hf=E_{ABC_2}-E_A-E_B-2E_C$

# Setting up reference database

Now we need to set up a database in which reference calculations can be stored. This can either be in a central database server where keywords distinguish between different references or dedicated separate databases for each different type of reference calculations.

In the following script, ga_fcc_references.py, we put the references in the database file refs.db. Our model structure is fcc which is loaded with `ase.lattice.cubic.FaceCenteredCubic()`. We perform a volume relaxation to find the optimal lattice constant and lowest energy, which we save in the database as key-value pairs for quick retrieval.

In [1]:
import numpy as np

In [2]:
from ase.lattice.cubic import FaceCenteredCubic
from ase.calculators.emt import EMT
from ase.eos import EquationOfState
from ase.db import connect

In [3]:
db = connect('refs.db')

In [4]:
metals = ['Al', 'Au', 'Cu', 'Ag', 'Pd', 'Pt', 'Ni']
for m in metals:
    atoms = FaceCenteredCubic(m)
    atoms.set_calculator(EMT())
    e0 = atoms.get_potential_energy()
    a = atoms.cell[0][0]

    eps = 0.05
    volumes = (a * np.linspace(1 - eps, 1 + eps, 9))**3
    energies = []
    for v in volumes:
        atoms.set_cell([v**(1. / 3)] * 3, scale_atoms=True)
        energies.append(atoms.get_potential_energy())

    eos = EquationOfState(volumes, energies)
    v1, e1, B = eos.fit()

    atoms.set_cell([v1**(1. / 3)] * 3, scale_atoms=True)
    ef = atoms.get_potential_energy()

    db.write(atoms, metal=m,
             latticeconstant=v1**(1. / 3),
             energy_per_atom=ef / len(atoms))

# Initial population

We choose a population size of 10 individuals and create the initial population by randomly selecting four elements for each starting individual.

In [5]:
import random

In [6]:
from ase import Atoms
from ase.ga.data import PrepareDB

In [7]:
metals = ['Al', 'Au', 'Cu', 'Ag', 'Pd', 'Pt', 'Ni']

In [8]:
population_size = 10

Create database

In [9]:
db = PrepareDB('fcc_alloys.db',
               population_size=population_size,
               metals=metals)

Create starting population

In [10]:
for i in range(population_size):
    atoms_string = [random.choice(metals) for _ in range(4)]
    db.add_unrelaxed_candidate(Atoms(atoms_string),
                               atoms_string=''.join(atoms_string))

Note how we add the population size and metals as extra key-value pairs when we create the database fcc_alloys.db. We can then retrieve these parameters later when running the main script to avoid having to input the same parameters twice.

We can study our initial population by doing (on the command-line):

In [11]:
! ase db fcc_alloys.db -c +atoms_string

id|age|user  |formula |pbc|charge|   mass|atoms_string
 1| 2s|jovyan|        |FFF| 0.000|  0.000|            
 2| 2s|jovyan|AgAlNiPt|FFF| 0.000|388.627|NiAlPtAg    
 3| 2s|jovyan|AgAlPdPt|FFF| 0.000|436.354|AlPtAgPd    
 4| 2s|jovyan|CuNiPd2 |FFF| 0.000|335.079|NiCuPdPd    
 5| 2s|jovyan|AgAu3   |FFF| 0.000|698.768|AgAuAuAu    
 6| 2s|jovyan|Al2AuPt |FFF| 0.000|446.014|AlAuAlPt    
 7| 2s|jovyan|AgAlCuPt|FFF| 0.000|393.480|CuPtAlAg    
 8| 2s|jovyan|Al2CuPt |FFF| 0.000|312.593|CuAlAlPt    
 9| 2s|jovyan|AgCu2Pt |FFF| 0.000|430.044|PtAgCuCu    
10| 1s|jovyan|AgAuCuPd|FFF| 0.000|474.801|AgCuPdAu    
11| 1s|jovyan|Ag2AlPt |FFF| 0.000|437.802|AgAgPtAl    
Rows: 11
Keys: atoms_string, extinct, gaid, generation, origin, relaxed, simulation_cell


the term `atoms_string` determines the order in which the elements are put into the model structure. So it is possible to fully describe an individual by just providing the `atoms_string`.

# Run the algorithm
The following script runs the algorithm, also find it here: `ga_fcc_alloys_main.py`. Note that the relaxation script is imported from an external file `ga_fcc_alloys_relax.py`.

In [12]:
import numpy as np

In [13]:
from ase.lattice.cubic import FaceCenteredCubic
from ase.calculators.emt import EMT
from ase.eos import EquationOfState
from ase.db import connect

In [14]:
def relax(input_atoms, ref_db):
    atoms_string = input_atoms.get_chemical_symbols()

    # Open connection to the database with reference data
    db = connect(ref_db)

    # Load our model structure which is just FCC
    atoms = FaceCenteredCubic('X', latticeconstant=1.)
    atoms.set_chemical_symbols(atoms_string)

    # Compute the average lattice constant of the metals in this individual
    # and the sum of energies of the constituent metals in the fcc lattice
    # we will need this for calculating the heat of formation
    a = 0
    ei = 0
    for m in set(atoms_string):
        dct = db.get(metal=m)
        count = atoms_string.count(m)
        a += count * dct.latticeconstant
        ei += count * dct.energy_per_atom
    a /= len(atoms_string)
    atoms.set_cell([a, a, a], scale_atoms=True)

    # Since calculations are extremely fast with EMT we can also do a volume
    # relaxation
    atoms.set_calculator(EMT())
    eps = 0.05
    volumes = (a * np.linspace(1 - eps, 1 + eps, 9))**3
    energies = []
    for v in volumes:
        atoms.set_cell([v**(1. / 3)] * 3, scale_atoms=True)
        energies.append(atoms.get_potential_energy())

    eos = EquationOfState(volumes, energies)
    v1, ef, B = eos.fit()
    latticeconstant = v1**(1. / 3)

    # Calculate the heat of formation by subtracting ef with ei
    hof = (ef - ei) / len(atoms)

    # Place the calculated parameters in the info dictionary of the
    # input_atoms object
    input_atoms.info['key_value_pairs']['hof'] = hof
    
    # Raw score must always be set
    # Use one of the following two; they are equivalent
    input_atoms.info['key_value_pairs']['raw_score'] = -hof
    # set_raw_score(input_atoms, -hof)
    
    input_atoms.info['key_value_pairs']['latticeconstant'] = latticeconstant

    # Setting the atoms_string directly for easier analysis
    atoms_string = ''.join(input_atoms.get_chemical_symbols())
    input_atoms.info['key_value_pairs']['atoms_string'] = atoms_string

In [15]:
from ase.ga.data import DataConnection
from ase.ga.element_mutations import RandomElementMutation
from ase.ga.element_crossovers import OnePointElementCrossover
from ase.ga.offspring_creator import OperationSelector
from ase.ga.population import Population
from ase.ga.convergence import GenerationRepetitionConvergence

Specify the number of generations this script will run

In [16]:
num_gens = 40

In [17]:
db = DataConnection('fcc_alloys.db')
ref_db = 'refs.db'

Retrieve saved parameters

In [18]:
population_size = db.get_param('population_size')
metals = db.get_param('metals')

Specify the procreation operators for the algorithm Try and play with the mutation operators that move to nearby places in the periodic table

In [19]:
oclist = ([1, 1], [RandomElementMutation(metals),
                   OnePointElementCrossover(metals)])
operation_selector = OperationSelector(*oclist)

Pass parameters to the population instance

In [20]:
pop = Population(data_connection=db,
                 population_size=population_size)

We form generations in this algorithm run and can therefore set a convergence criteria based on generations

In [21]:
cc = GenerationRepetitionConvergence(pop, 3)

Relax the starting population

In [22]:
while db.get_number_of_unrelaxed_candidates() > 0:
    a = db.get_an_unrelaxed_candidate()
    relax(a, ref_db)
    db.add_relaxed_step(a)
pop.update()

Run the algorithm

In [23]:
for _ in range(num_gens):
    if cc.converged():
        print('converged')
        break
    for i in range(population_size):
        a1, a2 = pop.get_two_candidates(with_history=False)
        op = operation_selector.get_operator()
        a3, desc = op.get_new_individual([a1, a2])

        db.add_unrelaxed_candidate(a3, description=desc)

        relax(a3, ref_db)
        db.add_relaxed_step(a3)

    pop.update()

    # Print the current population to monitor the evolution
    print(['-'.join(p.get_chemical_symbols()) for p in pop.pop])

['Ni-Cu-Pt-Pd', 'Pt-Ag-Cu-Cu', 'Ag-Au-Au-Au', 'Al-Au-Au-Au', 'Pt-Ag-Pt-Cu', 'Ni-Cu-Pd-Pd', 'Pt-Ag-Pd-Cu', 'Ag-Cu-Pd-Au', 'Al-Au-Al-Pd', 'Ag-Au-Al-Pt']
['Ni-Cu-Pt-Pd', 'Pt-Ag-Cu-Cu', 'Ag-Au-Au-Au', 'Al-Au-Au-Au', 'Pt-Ag-Pt-Cu', 'Ni-Cu-Pd-Pd', 'Pt-Ag-Pd-Cu', 'Ag-Cu-Pt-Pd', 'Ag-Au-Cu-Cu', 'Ag-Cu-Pd-Au']
['Pt-Cu-Pt-Pd', 'Ni-Cu-Pt-Pd', 'Pt-Ag-Cu-Cu', 'Ag-Au-Au-Au', 'Al-Au-Au-Au', 'Pt-Ag-Pt-Cu', 'Ni-Pd-Pt-Pd', 'Ni-Cu-Pd-Cu', 'Ag-Au-Au-Pd', 'Ni-Cu-Pd-Pd']
['Pt-Cu-Pt-Pd', 'Ni-Cu-Pt-Pd', 'Pt-Ag-Cu-Cu', 'Ag-Au-Au-Au', 'Al-Au-Au-Au', 'Pt-Ag-Pt-Cu', 'Ni-Pd-Pt-Pd', 'Ni-Cu-Pt-Au', 'Ni-Cu-Pd-Cu', 'Ag-Cu-Au-Au']
['Pt-Cu-Pt-Ni', 'Ni-Cu-Pt-Pt', 'Pt-Cu-Pt-Pd', 'Ni-Cu-Pt-Pd', 'Ag-Al-Au-Au', 'Pt-Ag-Cu-Cu', 'Ag-Au-Au-Au', 'Al-Au-Au-Au', 'Pt-Ag-Pt-Cu', 'Ni-Pd-Pt-Pd']
['Pt-Cu-Pt-Ni', 'Ni-Cu-Pt-Pt', 'Ni-Cu-Pt-Ni', 'Pt-Cu-Pt-Pt', 'Pt-Cu-Pt-Pd', 'Ni-Cu-Pt-Pd', 'Ag-Al-Au-Au', 'Pt-Ag-Cu-Cu', 'Ag-Au-Au-Au', 'Al-Au-Au-Au']
['Pt-Cu-Pt-Cu', 'Pt-Cu-Pt-Ni', 'Ni-Cu-Pt-Pt', 'Ni-Cu-Pt-Ni', 'Pt-Cu-Pt-Pt', 'Pt-Cu-Pt-Pd', 'Ni

In this script we run a generational GA as opposed to the pool GA outlined in Optimization with a Genetic Algorithm. This is achieved by having two for-loops; the innermost loop runs the number of times specified by the population size it corresponds to one generation. The outermost loop runs as many generations as specified in `num_gens`. The function `pop.update()` is called after the innermost loop has finished thereby only adding individuals to the population after a whole generation is calculated.

After each generation is finished the population is printed to the screen so we can follow the evolution. The calculated individuals are continuously added to `fcc_alloys.db`, we can evaluate them directly by doing from the command line (in another shell instance if the GA is still running):

In [24]:
! ase db fcc_alloys.db -c +atoms_string,raw_score,generation,hof -s raw_score

 id| age|user  |formula |pbc|charge|   mass|atoms_string|raw_score|generation|  hof
 73|106s|jovyan|AlCuNiPd|FFF| 0.000|255.641|NiCuAlPd    |   -0.139|         3|0.139
 12|146s|jovyan|AgAlNiPt|FFF| 0.000|388.627|NiAlPtAg    |   -0.112|         0|0.112
235|  6s|jovyan|AgCuNi2 |FFF| 0.000|288.801|NiCuAgNi    |   -0.105|        11|0.105
211| 21s|jovyan|AlCuNiPt|FFF| 0.000|344.305|NiCuPtAl    |   -0.098|        10|0.098
151| 58s|jovyan|AlCuNiPt|FFF| 0.000|344.305|AlCuPtNi    |   -0.098|         7|0.098
231|  8s|jovyan|AlCuNiPt|FFF| 0.000|344.305|PtCuAlNi    |   -0.098|        11|0.098
189| 34s|jovyan|AlAu2Ni |FFF| 0.000|479.608|NiAlAuAu    |   -0.093|         9|0.093
 79|103s|jovyan|AgAlCu2 |FFF| 0.000|261.942|AlAgCuCu    |   -0.079|         3|0.079
 18|140s|jovyan|Al2CuPt |FFF| 0.000|312.593|CuAlAlPt    |   -0.075|         0|0.075
 29|132s|jovyan|Au3Ni   |FFF| 0.000|649.593|NiAuAuAu    |   -0.068|         1|0.068
 93| 95s|jovyan|AgAu2Ni |FFF| 0.000|560.495|AgAuAuNi    |   -0.06

Note: When reading the database using `ase db`, it might be necessary to increase the number of shown entries, e.g. `ase db fcc-alloys.db --limit N`, where `N` is the number of entries to show (as default only the first 20 entries are shown, `--limit 0` will show all. For further info use ase `db --help`, or consult the ase db manual).

The relaxation script is naturally similar to the script we used to calculate the references.

Note that the global optimum is $PtNi_3$ with a -0.12 eV heat of formation, whereas the second worst alloy is $AlNi_3$ heat of formation 0.26 eV. This result is in complete contrast to the conclusion obtained in Johannesson, where $AlNi_3$ is the most stable alloy within the phase space chosen here. Obviously there is a limit to the predictive power of EMT!