# Starting a new EvoMol optimization where a previous one stopped

In this tutorial, we show how it is possible to start a new optimization procedure using as initial population the molecules that were in the population at the end of a previous EvoMol run. We also cover the case where we build a cache in order to avoid another calculation of DFT-dependent properties.

## General case

### Base optimization

Let's execute our base optimization instance. Here we maximize the HOMO energy value for 10 optimization steps.

In [1]:
from evomol import run_model

model_path = "./data/homo_max_1s"

In [None]:
run_model({
    "obj_function": "homo",
    "io_parameters":{
        "model_path": model_path
    },
    "optimization_parameters":{
        "max_steps": 10
    }
})

### New execution starting from previous experiment's data

Now we would want to know what would have happened if the previous experiment had been running for a longer time. We start a new experiment that runs for 20 additional steps, using as initial population the final population from previous experiment.

We use for this the *smiles_list_init_path* parameter, that can accept the path to a file that contain a list of SMILES, but also in our case the path to the population file (pop.csv) of a previous EvoMol experiment.

Note that here **the objective function values of the initial population must be computed again**, which is unfortunate since the objective function depends on costly DFT calculations. This particular issue is addressed in next cell

In [None]:
from os.path import join

new_model_path = "./data/homo_max_2s"

run_model({
    "obj_function": "homo",
    "io_parameters":{
        "model_path": new_model_path,
        "smiles_list_init_path": join(model_path, "pop.csv")
    },
    "optimization_parameters":{
        "max_steps": 20
    }
})

## Using a cache to avoid another computation of the electronic properties

Now, we want to launch a final optimization starting from the population of the previous experiment. This time, we start by building a cache of all electronic properties values that are stored in the *pop.csv* file.

### Building the cache

We define the following function, that depends on a list of electronic properties to be extracted. Here we only need the HOMO ("homo") value. In other contexts, the other properties could be also retrieved ("lumo" for LUMO, "gap" for HOMO/LUMO gap and "homo-1" for HOMO-1).

In [7]:
import pandas as pd
import json 

def create_cache_OPT(model_path, output_cache_path, properties_list):
    
    # Loading pop.csv file
    df = pd.read_csv(join(model_path, "pop.csv"))
    
    cache_dict = {}
    
    # Iterating over all SMILES
    for i, smi in enumerate(df["smiles"]):
        curr_dict = {}
        
        # Iterating over all electronic properties to be cached
        for prop in properties_list:
            curr_dict[prop] = df[prop][i]
        
        # Saving cached values for current SMILES
        cache_dict[smi] = curr_dict
    
    # Writing the cache data to a JSON file
    with open(output_cache_path, "w") as f:
        json.dump(cache_dict, f)



In [8]:
cache_path = "./data/homo_max_2s_cache.json"

create_cache_OPT(new_model_path, cache_path, ["homo"])

### Running the final experiment using the cache

Now we run the final experiment performing 10 additional optimization steps. This time, we use the cache we built ("dft_cache_files" parameter) in order not to compute the properties of the initial population again.

In [9]:
final_model_path = "./data/homo_max_3s"

run_model({
    "obj_function": "homo",
    "io_parameters":{
        "model_path": new_model_path,
        "smiles_list_init_path": join(model_path, "pop.csv"),
        "dft_cache_files": [cache_path]
    },
    "optimization_parameters":{
        "max_steps": 10
    }
})

DFT MM obabel_mmff94
256 molecules in cache
SYMBOLS LIST : ['C', 'N', 'O', 'F', 'P', 'S', 'Cl', 'Br']
objective_calls
Computing scores at initialization...
Start pop algorithm
homo_mean : -7.05545
homo_med : -6.81019
homo_std : 1.19662
homo_min : -10.58115
homo_max : -4.92526
total_mean : -7.05545
total_med : -6.81019
total_std : 1.19662
total_min : -10.58115
total_max : -4.92526
new step
step : 0
best : C=CN
computing dft for CC=N
Starting OPT
Execution time OPT: 10s
There are 8 atoms and 37 MOs
computing dft for N=P
MM error (Evaluation error)
computing dft for C=NN
Starting OPT


1 molecule converted


Execution time OPT: 9s
There are 7 atoms and 35 MOs
computing dft for CN(C)F
Starting OPT


1 molecule converted


Execution time OPT: 22s
There are 10 atoms and 48 MOs
computing dft for C=NBr
MM error (Evaluation error)
computing dft for CC(N)Cl


1 molecule converted


Starting OPT
Execution time OPT: 62s
There are 10 atoms and 58 MOs
computing dft for N#N


1 molecule converted


MM error (Evaluation error)
computing dft for FN1CN1CS
Starting OPT
Execution time OPT: 62s
There are 11 atoms and 74 MOs
computing dft for OPCNPO


1 molecule converted


Starting OPT
Execution time OPT: 155s


1 molecule converted


There are 13 atoms and 88 MOs
computing dft for FN1CN1
Starting OPT
Execution time OPT: 17s
There are 7 atoms and 42 MOs
homo_mean : -6.97365
homo_med : -6.72448
homo_std : 1.13207
homo_min : -10.58115
homo_max : -4.92526
total_mean : -6.97365
total_med : -6.72448
total_std : 1.13207
total_min : -10.58115
total_max : -4.92526
new step
step : 1
best : C=CN
computing dft for C=CCNS
Starting OPT


1 molecule converted


Execution time OPT: 120s
There are 12 atoms and 69 MOs


1 molecule converted


computing dft for NCN(S)Cl
Starting OPT
Execution time OPT: 106s
There are 10 atoms and 75 MOs


1 molecule converted


computing dft for C=NCC
MM error (Evaluation error)
computing dft for CNO
Starting OPT
Execution time OPT: 10s
There are 8 atoms and 37 MOs
computing dft for 
Starting OPT


1 molecule converted
Error: segmentation violation
   rax 0000000000000000, rbx ffffffffffffffff, rcx 0000151a470d12a7
   rdx 0000151a26e89000, rsp 00007ffe9607b5e8, rbp 00007ffe9607bb60
   rsi 000000000000000b, rdi 0000000000001392, r8  0000151a4747f8c0
   r9  0000151a47c3cd80, r10 0000000000000006, r11 0000000000000206
   r12 0000000000000000, r13 0000000000000000, r14 00007ffe9607bba8
   r15 00000000000003e6
  --- traceback not available
Aborted (core dumped)


Execution time OPT: 0s
DFT error : Error during OPT for  (Evaluation error)
computing dft for N=CN
Starting OPT
Execution time OPT: 7s
There are 7 atoms and 35 MOs
computing dft for OCNPO
Starting OPT


1 molecule converted


Execution time OPT: 113s
There are 11 atoms and 67 MOs


1 molecule converted


computing dft for NCNCNF
Starting OPT
Execution time OPT: 73s
There are 14 atoms and 70 MOs
computing dft for CN(C)CF


1 molecule converted


Starting OPT
Execution time OPT: 42s
There are 13 atoms and 61 MOs
homo_mean : -6.84783
homo_med : -6.62679
homo_std : 1.10505
homo_min : -10.58115
homo_max : -4.92526
total_mean : -6.84783
total_med : -6.62679
total_std : 1.10505
total_min : -10.58115
total_max : -4.92526
new step
step : 2
best : C=CN
computing dft for C=NC
MM error (Evaluation error)
computing dft for N#N


1 molecule converted


MM error (Evaluation error)
computing dft for 
Starting OPT


Error: segmentation violation
   rax 0000000000000000, rbx ffffffffffffffff, rcx 000014d1f5ed12a7
   rdx 000014d1f5c89000, rsp 00007ffeca5456e8, rbp 00007ffeca545c60
   rsi 000000000000000b, rdi 0000000000001728, r8  000014d1f627f8c0
   r9  000014d1f6a3cd80, r10 0000000000000006, r11 0000000000000206
   r12 0000000000000000, r13 0000000000000000, r14 00007ffeca545ca8
   r15 00000000000003e6
  --- traceback not available
Aborted (core dumped)


Execution time OPT: 0s
DFT error : Error during OPT for  (Evaluation error)
computing dft for NN=CO
MM error (Evaluation error)
computing dft for NSO
Starting OPT
Execution time OPT: 20s
There are 6 atoms and 43 MOs
computing dft for NSP
Starting OPT


1 molecule converted


Execution time OPT: 47s
There are 7 atoms and 55 MOs
computing dft for N=C(N)Cl


1 molecule converted


Starting OPT
Execution time OPT: 15s
There are 7 atoms and 52 MOs
computing dft for NCCNCNP
Starting OPT


1 molecule converted


Execution time OPT: 190s


1 molecule converted


There are 19 atoms and 97 MOs
computing dft for OPCNPOS
Starting OPT
Execution time OPT: 264s


1 molecule converted


There are 14 atoms and 107 MOs
homo_mean : -6.77709
homo_med : -6.57509
homo_std : 1.11897
homo_min : -10.58115
homo_max : -4.92526
total_mean : -6.77709
total_med : -6.57509
total_std : 1.11897
total_min : -10.58115
total_max : -4.92526
new step
step : 3
best : C=CN
computing dft for N=CO
MM error (Evaluation error)
computing dft for N=P
MM error (Evaluation error)
computing dft for NCC1=NC1
MM error (Evaluation error)
computing dft for CCN(N)CNP
Starting OPT
Execution time OPT: 310s


1 molecule converted


There are 19 atoms and 97 MOs
computing dft for C#CCN
Starting OPT
Execution time OPT: 41s
There are 9 atoms and 46 MOs
computing dft for 
Starting OPT


1 molecule converted
Error: segmentation violation
   rax 0000000000000000, rbx ffffffffffffffff, rcx 000014bfdcb712a7
   rdx 000014bfbc929000, rsp 00007ffc6d9a32c8, rbp 00007ffc6d9a3840
   rsi 000000000000000b, rdi 0000000000001cfb, r8  000014bfdcf1f8c0
   r9  000014bfdd6dcd80, r10 0000000000000006, r11 0000000000000202
   r12 0000000000000000, r13 0000000000000000, r14 00007ffc6d9a3888
   r15 00000000000003e6
  --- traceback not available
Aborted (core dumped)


Execution time OPT: 0s
DFT error : Error during OPT for  (Evaluation error)
computing dft for CC(N)P
Starting OPT
Execution time OPT: 32s
There are 12 atoms and 62 MOs
computing dft for CC(N)N
Starting OPT


1 molecule converted


Execution time OPT: 38s
There are 12 atoms and 52 MOs
computing dft for [nH]1[nH]o1
Starting OPT


1 molecule converted


Execution time OPT: 7s
There are 5 atoms and 31 MOs
homo_mean : -6.70140
homo_med : -6.50298
homo_std : 1.09775
homo_min : -10.58115
homo_max : -4.92526
total_mean : -6.70140
total_med : -6.50298
total_std : 1.09775
total_min : -10.58115
total_max : -4.92526
new step
step : 4
best : C=CN
computing dft for CN=O
MM error (Evaluation error)
computing dft for N=P
MM error (Evaluation error)
computing dft for N#CCN


1 molecule converted


MM error (Evaluation error)
computing dft for CNCCl
Starting OPT
Execution time OPT: 80s
There are 10 atoms and 58 MOs
computing dft for NCCC(N)N


1 molecule converted


Starting OPT
Execution time OPT: 78s
There are 17 atoms and 76 MOs
computing dft for CC#N


1 molecule converted


MM error (Evaluation error)
computing dft for NCCNCP
Starting OPT
Execution time OPT: 80s
There are 17 atoms and 86 MOs
computing dft for NC(N)CBr


1 molecule converted


Starting OPT
Execution time OPT: 53s
There are 12 atoms and 79 MOs
computing dft for N#CCN


1 molecule converted


MM error (Evaluation error)
computing dft for NCC(F)CO
Starting OPT
Execution time OPT: 83s
There are 14 atoms and 70 MOs
computing dft for CCNS


1 molecule converted


Starting OPT
Execution time OPT: 51s
There are 11 atoms and 60 MOs
computing dft for 
Starting OPT


1 molecule converted
Error: segmentation violation
   rax 0000000000000000, rbx ffffffffffffffff, rcx 000014d9e86e42a7
   rdx 000014d9e849d000, rsp 00007fff44c58508, rbp 00007fff44c58a80
   rsi 000000000000000b, rdi 0000000000001e8c, r8  000014d9e8a928c0
   r9  000014d9e924fd80, r10 0000000000000006, r11 0000000000000202
   r12 0000000000000000, r13 0000000000000000, r14 00007fff44c58ac8
   r15 00000000000003e6
  --- traceback not available
Aborted (core dumped)


Execution time OPT: 0s
DFT error : Error during OPT for  (Evaluation error)
computing dft for CC(N)CP
Starting OPT
Execution time OPT: 199s


1 molecule converted


There are 15 atoms and 75 MOs
computing dft for NC(P)CO
Starting OPT
Execution time OPT: 59s
There are 13 atoms and 71 MOs
computing dft for C=NNCl


1 molecule converted


MM error (Evaluation error)
computing dft for CCCCNN
Starting OPT
Execution time OPT: 53s
There are 18 atoms and 78 MOs
homo_mean : -6.61461
homo_med : -6.39821
homo_std : 1.08380
homo_min : -10.58115
homo_max : -4.92526
total_mean : -6.61461
total_med : -6.39821
total_std : 1.08380
total_min : -10.58115
total_max : -4.92526
new step
step : 5
best : C=CN
computing dft for PCl


1 molecule converted


Starting OPT
Execution time OPT: 5s
There are 4 atoms and 42 MOs
computing dft for CCNCP
Starting OPT


1 molecule converted


Execution time OPT: 52s
There are 15 atoms and 75 MOs
computing dft for CCC(N)N


1 molecule converted


Starting OPT
Execution time OPT: 64s
There are 15 atoms and 65 MOs
computing dft for NCCCS


1 molecule converted


Starting OPT
Execution time OPT: 47s
There are 14 atoms and 73 MOs
computing dft for N#CO


1 molecule converted


Starting OPT
Execution time OPT: 5s
There are 4 atoms and 29 MOs
computing dft for NCCCNCNP
Starting OPT


1 molecule converted


Execution time OPT: 243s


1 molecule converted


There are 22 atoms and 110 MOs
homo_mean : -6.58958
homo_med : -6.39685
homo_std : 1.07951
homo_min : -10.58115
homo_max : -4.92526
total_mean : -6.58958
total_med : -6.39685
total_std : 1.07951
total_min : -10.58115
total_max : -4.92526
new step
step : 6
best : C=CN
computing dft for N#N
MM error (Evaluation error)
computing dft for 
Starting OPT


Error: segmentation violation
   rax 0000000000000000, rbx ffffffffffffffff, rcx 0000152a142012a7
   rdx 00001529f3fb9000, rsp 00007ffe5302f678, rbp 00007ffe5302fbf0
   rsi 000000000000000b, rdi 00000000000023ee, r8  0000152a145af8c0
   r9  0000152a14d6cd80, r10 0000000000000006, r11 0000000000000206
   r12 0000000000000000, r13 0000000000000000, r14 00007ffe5302fc38
   r15 00000000000003e6
  --- traceback not available
Aborted (core dumped)


Execution time OPT: 0s
DFT error : Error during OPT for  (Evaluation error)
computing dft for NC(N)S
Starting OPT
Execution time OPT: 107s
There are 10 atoms and 58 MOs


1 molecule converted


computing dft for PCCNCCCl
Starting OPT
Execution time OPT: 83s
There are 18 atoms and 105 MOs


1 molecule converted


computing dft for NCCCP(N)N
Starting OPT
Execution time OPT: 224s


1 molecule converted


There are 19 atoms and 97 MOs
computing dft for C=NCl
MM error (Evaluation error)
computing dft for C=NCC
MM error (Evaluation error)
computing dft for CCCNN
Starting OPT
Execution time OPT: 45s
There are 15 atoms and 65 MOs
computing dft for 
Starting OPT


1 molecule converted
Error: segmentation violation
   rax 0000000000000000, rbx ffffffffffffffff, rcx 000014d4599cc2a7
   rdx 000014d439785000, rsp 00007ffd5d475728, rbp 00007ffd5d475ca0
   rsi 000000000000000b, rdi 00000000000027cf, r8  000014d459d7a8c0
   r9  000014d45a537d80, r10 0000000000000006, r11 0000000000000206
   r12 0000000000000000, r13 0000000000000000, r14 00007ffd5d475ce8
   r15 00000000000003e6
  --- traceback not available
Aborted (core dumped)


Execution time OPT: 0s
DFT error : Error during OPT for  (Evaluation error)
computing dft for NCCNCN
Starting OPT
Execution time OPT: 89s
There are 17 atoms and 76 MOs
computing dft for NC1C=N1


1 molecule converted


MM error (Evaluation error)
computing dft for CN(F)CN
Starting OPT
Execution time OPT: 77s
There are 12 atoms and 59 MOs
homo_mean : -6.52383
homo_med : -6.34651
homo_std : 1.06368
homo_min : -10.58115
homo_max : -4.92526
total_mean : -6.52383
total_med : -6.34651
total_std : 1.06368
total_min : -10.58115
total_max : -4.92526
new step
step : 7
best : C=CN
computing dft for C=CNS


1 molecule converted


Starting OPT
Execution time OPT: 61s
There are 9 atoms and 56 MOs
computing dft for N=N


1 molecule converted


Starting OPT
Execution time OPT: 4s
DFT error : Different SMILES : N=N [NH][NH] (Evaluation error)
computing dft for NCN1CC1(N)Br
Starting OPT


1 molecule converted


Execution time OPT: 181s


1 molecule converted


There are 15 atoms and 99 MOs
computing dft for NCCCCNCP
Starting OPT
Execution time OPT: 115s


1 molecule converted


There are 23 atoms and 112 MOs
computing dft for NCPC(N)N
Starting OPT
Execution time OPT: 103s
There are 16 atoms and 84 MOs


1 molecule converted


computing dft for CN(C)O
Starting OPT
Execution time OPT: 21s
There are 11 atoms and 50 MOs
computing dft for NCCCBr
Starting OPT


1 molecule converted


Execution time OPT: 48s
There are 13 atoms and 81 MOs
computing dft for NCCCP(N)CN


1 molecule converted


Starting OPT
Execution time OPT: 330s


1 molecule converted


There are 22 atoms and 110 MOs
homo_mean : -6.46872
homo_med : -6.23141
homo_std : 1.04466
homo_min : -10.58115
homo_max : -4.92526
total_mean : -6.46872
total_med : -6.23141
total_std : 1.04466
total_min : -10.58115
total_max : -4.92526
new step
step : 8
best : C=CN
computing dft for C1=NC1
MM error (Evaluation error)
computing dft for N=N
Starting OPT
Execution time OPT: 4s
DFT error : Different SMILES : N=N [NH][NH] (Evaluation error)
computing dft for CNCCN
Starting OPT


1 molecule converted


Execution time OPT: 104s
There are 15 atoms and 65 MOs
computing dft for PP


1 molecule converted


Starting OPT
Execution time OPT: 7s
There are 6 atoms and 46 MOs
computing dft for C1N=N1
MM error (Evaluation error)
computing dft for PC1NCCN1P


1 molecule converted


Starting OPT
Execution time OPT: 197s


1 molecule converted


There are 17 atoms and 103 MOs
computing dft for NCCC1NC1N
Starting OPT
Execution time OPT: 87s
There are 18 atoms and 85 MOs


1 molecule converted


computing dft for N=C(N)PCN
Starting OPT


Error: segmentation violation
   rax 0000000000000000, rbx ffffffffffffffff, rcx 00001552c585d2a7
   rdx 00001552a5615000, rsp 00007ffc6ab1ebf8, rbp 00007ffc6ab1f170
   rsi 000000000000000b, rdi 0000000000002e9f, r8  00001552c5c0b8c0
   r9  00001552c63c8d80, r10 0000000000000006, r11 0000000000000202
   r12 0000000000000000, r13 0000000000000000, r14 00007ffc6ab1f1b8
   r15 00000000000003e6
  --- traceback not available
Aborted (core dumped)


Execution time OPT: 459s
DFT error : Error during OPT for N=C(N)PCN (Evaluation error)
computing dft for NCCCP(N)O
Starting OPT
Execution time OPT: 192s


1 molecule converted


There are 18 atoms and 95 MOs
computing dft for NCCP(N)CCN
Starting OPT
Execution time OPT: 260s


1 molecule converted


There are 22 atoms and 110 MOs
homo_mean : -6.43522
homo_med : -6.17127
homo_std : 1.05308
homo_min : -10.58115
homo_max : -4.92200
total_mean : -6.43522
total_med : -6.17127
total_std : 1.05308
total_min : -10.58115
total_max : -4.92200
new step
step : 9
best : CNCCN
computing dft for CN=CNC
MM error (Evaluation error)
computing dft for C=CS
Starting OPT
Execution time OPT: 11s
There are 7 atoms and 45 MOs
computing dft for N=N
Starting OPT


1 molecule converted


Execution time OPT: 4s
DFT error : Different SMILES : N=N [NH][NH] (Evaluation error)
computing dft for NCCNN
Starting OPT


1 molecule converted


Execution time OPT: 47s
There are 14 atoms and 63 MOs
computing dft for PNS
Starting OPT


1 molecule converted


Execution time OPT: 34s
There are 7 atoms and 55 MOs
computing dft for NCSN


1 molecule converted


Starting OPT
Execution time OPT: 45s
There are 10 atoms and 58 MOs
computing dft for NC1PN1CP


1 molecule converted


Starting OPT
Execution time OPT: 117s
There are 14 atoms and 90 MOs


1 molecule converted


computing dft for CC(N)C(N)(N)Br
Starting OPT
Execution time OPT: 420s


1 molecule converted


There are 17 atoms and 103 MOs
computing dft for NCCCCN
Starting OPT
Execution time OPT: 55s
There are 18 atoms and 78 MOs
homo_mean : -6.41614
homo_med : -6.13889
homo_std : 1.04769
homo_min : -10.58115
homo_max : -4.92200
total_mean : -6.41614
total_med : -6.13889
total_std : 1.04769
total_min : -10.58115
total_max : -4.92200
Stopping : stop condition reached


1 molecule converted


<evomol.popalg.PopAlg at 0x7fa7b68ad8e0>