<div class="alert alert-block alert-danger"> <b>This notebook can overwrite experimental setup information if not careful.</b> An exception has been added to prevent overwriting experiment data, however regardless this notebook should be executed with care.</div>

# This notebook is for setting up the reference.sxyz files needed to run SILVR

As is currently implement (March 2023), all experiments must be contained in a directory named "experiments" and then each individual experiment must be in it's own sub directory. These file structure has been hard coded. Sub directories can be given any name. Examples below.

`Experiment directory format
code/experiments/exp_1/
code/experiments/exp_2/
code/experiments/test_run_for_silvr/
`

All SILVR experimnets are defined using a file named `reference.sxyz`. This is a modified XYZ file that strictly contains no bonding information. The first line contains the total number of atoms in the reference, the second line contains experimnetal set up information, and then subseqeuent lines define element, coordinate, and silvr rate value. The file format is almost identical to XYZ, except with the addition of the silvr rate column. The value of silvr rate can be any float `0 < rate < 1`, however typically the values of `0.005` and `0.01` perform well for low and high similarity samples respectivly. 

It is neccessary that the experiment setup is exactly written in the format below. 
`dummy` - number of dummy atoms to be added during generation. These atoms are not guided towards any reference coordinate, and instead are free to explore the entire generative space. Do note however that dummy atoms are not well tested and can be unstable. 

`samples` - total number of molecules to be generated by SILVR. The generator script checks how many molecule files are currently listed in the given experiment directory, and continues to generate samples until the required number of samples is exceeded. 

`comment` - Any comment to describe the experiments taking place. Avoid using colons in the comment as later code may incorrectly parse the comment!

`Example reference.sxyz file:
6
dummy:0 samples:1000 comment:Example SILVR reference.sxyz file
C 6.565000 -5.040000 26.574000 0.01
N 9.451000 -3.115000 24.169000 0.005
C 9.351000 -1.831000 23.357000 0.005
O 10.748000 -1.232000 23.404000 0.02
C 11.059000 -0.192000 24.271000 0.002
C 12.351000 0.308000 24.296000 0.01
`

In [1]:
import glob
import py3Dmol
import time

all_xyz = glob.glob("mpro_ligands/Mpro-*")

def read_xyz_file(path):
    with open(path,"r") as readfile:
        string = readfile.read()
        
    return string

In [10]:
def create_superposition(paths, output_path):
    #xyz of mol1 and mol2 will be combined
    #mol1 is added to xyz first
    
    total_atoms = 0
    comment_line = ""
    xyz = ""
    
    for path in paths:
        with open(path, "r") as readfile:
            mol = readfile.readlines()
            total_atoms += int(mol[0].rstrip())
            comment_line += f"{mol[1].rstrip()} "
            xyz += "".join(mol[2:])
    
    output_file = ""
    output_file += f"{total_atoms}\n"
    output_file += f"{comment_line}\n"
    output_file += f"{xyz}"
    
    with open(output_path, "w") as writefile:
        writefile.write(output_file)
        
    #print(total_atoms)
        
    return output_file


def xyz_to_silvr(xyz_path, silvr_rate="0.005", dummy=0, samples=500, added_comment="scalar"):
    """
    This function converts a reference set of coordinates (XYZ file) into an sxyz file by
    adding the silvr rates column and adding all required experiment information on line 2    
    """
    
    with open(xyz_path,"r") as readfile:
        lines = readfile.readlines()
    
    total_atoms = int(lines[0].rstrip())
    comment = f"dummy:{dummy} samples:{samples} comment:{added_comment}"
        
    #silvr_xyz = [x.rstrip().split().append(silvr_rate) for x in lines[2:]]
    silvr_xyz_out = f"{total_atoms}\n"
    silvr_xyz_out += f"{comment}\n"
    
    for line in lines[2:]:
        line = line.rstrip().split()
        line.append(silvr_rate)
        line = " ".join(line)
        silvr_xyz_out += line+"\n"

    return silvr_xyz_out.rstrip()

# Visualisation of Mpro fragments - example with all fragments

In [4]:
#To help select which fragments are interesting to combine, the fragments can be visualised below.
#Rather than passing all_xyz (array of all fragment files), pass a spliced array containing
#only fragments of interest. SILVR has only been tested with 2 and 3 fragments. The EDM can
#Only accept a maximum of 181 atoms total.

create_superposition(all_xyz, "mpro_ligands/tmp.xyz")
view = py3Dmol.view(query="mpro_ligands/tmp.xyz")  
view.setStyle({'stick': {'color':'spectrum'}})
view.show()

# Designing an experiment
All experiment information can be defined within an array of dictionaries. The dictionary contains the parameters defining the experimnet setup. Details of key value pairs are below:

<b>directory:</b> (string) name of experiment directory to use. Automatically added to experimnets/ directory

<b>fragment_files:</b> (array of paths to XYZ files) Which fragment XYZ files to combine together. Fragment files must only contain XYZ coordinates and no bonding information. 

<b>silvr_rate:</b> (float) - the silvr rate value to be used. In this case only a scalar can be used. If you wish to define individual silvr rates to each atom, either manually edit the sxyz file, or create a new script to create an sxyz file given a vector. 

<b>dummy:</b> (int) number of dummy atoms to be included during the generation process. These are atoms without any mapping to a reference atom, and as such can explore the whole generative space. Please note this feature can be unstable, and has not been thoroughly tested. In most cases set this value to 0. 

<b>samples:</b> (int) the total number of samples to be taken from the SILVR model. The total number of samples within the experiment directory will not (significantly) exceed this value.

<b>added_comment:</b> (string) Any useful experiment setup information to note. I would highly suggest explictly writing the fragment file names within this comment. 

In [13]:
"""
#This commented code contains the experiment params used in the paper for testing silvr rates

designed_experiments_params = [
    {"directory":"exp_22", "fragment_files":["mpro_ligands/Mpro-x0072_0A.xyz", "mpro_ligands/Mpro-x0354_0A.xyz"], "silvr_rate":"0.0", "dummy":0, "samples":1000, "added_comment":"Effect of SILVR rate on sampling using fragments Mpro-x0072_0A and Mpro-x0354_0A, rate=0"},
    {"directory":"exp_23", "fragment_files":["mpro_ligands/Mpro-x0072_0A.xyz", "mpro_ligands/Mpro-x0354_0A.xyz"], "silvr_rate":"0.001", "dummy":0, "samples":1000, "added_comment":"Effect of SILVR rate on sampling using fragments Mpro-x0072_0A and Mpro-x0354_0A, rate=0.001"},
    {"directory":"exp_24", "fragment_files":["mpro_ligands/Mpro-x0072_0A.xyz", "mpro_ligands/Mpro-x0354_0A.xyz"], "silvr_rate":"0.0025", "dummy":0, "samples":1000, "added_comment":"Effect of SILVR rate on sampling using fragments Mpro-x0072_0A and Mpro-x0354_0A, rate=0.0025"},
    {"directory":"exp_25", "fragment_files":["mpro_ligands/Mpro-x0072_0A.xyz", "mpro_ligands/Mpro-x0354_0A.xyz"], "silvr_rate":"0.005", "dummy":0, "samples":1000, "added_comment":"Effect of SILVR rate on sampling using fragments Mpro-x0072_0A and Mpro-x0354_0A, rate=0.005"},
    {"directory":"exp_26", "fragment_files":["mpro_ligands/Mpro-x0072_0A.xyz", "mpro_ligands/Mpro-x0354_0A.xyz"], "silvr_rate":"0.01", "dummy":0, "samples":1000, "added_comment":"Effect of SILVR rate on sampling using fragments Mpro-x0072_0A and Mpro-x0354_0A, rate=0.01"},
    {"directory":"exp_27", "fragment_files":["mpro_ligands/Mpro-x0072_0A.xyz", "mpro_ligands/Mpro-x0354_0A.xyz"], "silvr_rate":"0.02", "dummy":0, "samples":1000, "added_comment":"Effect of SILVR rate on sampling using fragments Mpro-x0072_0A and Mpro-x0354_0A, rate=0.02"},
    {"directory":"exp_28", "fragment_files":["mpro_ligands/Mpro-x0072_0A.xyz", "mpro_ligands/Mpro-x0354_0A.xyz"], "silvr_rate":"0.03", "dummy":0, "samples":1000, "added_comment":"Effect of SILVR rate on sampling using fragments Mpro-x0072_0A and Mpro-x0354_0A, rate=0.03"},
]
"""

#To show an example experiment setup
designed_experiments_params = [
    {"directory":"exp_3000", "fragment_files":["mpro_ligands/Mpro-x0072_0A.xyz", "mpro_ligands/Mpro-x0354_0A.xyz"], "silvr_rate":"0.0", "dummy":0, "samples":1000, "added_comment":"Effect of SILVR rate on sampling using fragments Mpro-x0072_0A and Mpro-x0354_0A, rate=0"},
]

In [12]:
import os

for experiment in designed_experiments_params:
    directory = "experiments/"+experiment["directory"]
    if not os.path.exists(directory):
        os.makedirs(directory)
    else:
        raise Exception("WARNING: Experiment directory already exists. Experiment information will not be overwritten. Delete the directory, or rename the experiment.")
        
    #Make superposition file
    create_superposition(experiment["fragment_files"], "mpro_ligands/tmp.xyz")
    
    #Make .sxyz file
    sxyz_file=xyz_to_silvr("mpro_ligands/tmp.xyz", silvr_rate=experiment["silvr_rate"], dummy=experiment["dummy"], samples=experiment["samples"], added_comment=experiment["added_comment"])
    
    #print(sxyz_file)
    #xyz_to_silvr()
        
    with open(directory+"/reference.sxyz", "w") as writefile:
        writefile.write(sxyz_file)