# Protein preparation and set-up for MD simulations - using Amber

Store name of all designs (without .pdb or any extension) in a file named design_list.txt

In here, the starting model structures (in model_name.pdb format) should be stored in a directory named "designs"

In [None]:
designs_file = '/path/to/design_list.txt' # file containing all design names
outdir = '/path/to/root/directory/to/store/trajectory/files/' # directory where simulation input files will be saved to

In [None]:
## import necessary libraries
import re

## Apo

- The initial protein preparation step is run in Schrodinger's Maestro. For apo simulations, the ligand is removed from the starting pdb structure.

In [None]:
%%bash
cd /path/to/root/directory/to/store/trajectory/files/
while IFS= read -r file
        do
            cd /path/to/root/directory/to/store/trajectory/files/
            mkdir "$file"   ## Create directory for each design
            cp designs/"$file".pdb "$file"  # copying starting structure to specific design directory
            cd "$file"
            mkdir apo
            cp "$file".pdb apo/"$file"_apo.pdb
            cd apo
            sed -i '/HETATM/d' "$file"_apo.pdb
            
            ## Running Maestro
            /soft/schrodinger/latest/utilities/prepwizard -disulfides -label_pkas -mse -noepik -fix -rehtreat -propka_pH 7 "$file"_apo.pdb "$file"_apo_preparedByMaestro.pdb
            
        done < "design_list.txt"

- Running dowser:

In [None]:
%%bash
cd /path/to/root/directory/to/store/trajectory/files/
while IFS= read -r file
        do
            cd /path/to/root/directory/to/store/trajectory/files/
            cd "$file"/apo
            rm dowserwat*
            dowserx "$file"_apo_preparedByMaestro.pdb  ## Running dowser
            cp "$file"_apo_preparedByMaestro.pdb "$file"_apo_readyForLeap.pdb 
            sed -i '/REMARK/d' ./dowserwat.pdb 
            sed -i '/REMARK/d' "$file"_apo_readyForLeap.pdb
            sed -i '/END/d' "$file"_apo_readyForLeap.pdb
            sed -i '/CONECT/d' "$file"_apo_readyForLeap.pdb
        done < "design_list.txt"

- Preparing files for tLeap:

First, checking Histidine protonations to change residue names to appropriate Amber residue types (In Maestro-prepared file, if it has a HD1, it's HID. If it has a HE2, it's HIE). 

The histidine names (HIE or HID will be printed for each design. If a single histidine is listed as both HIE and HID, it correspond to a positively-charged histidine, with Amber residue name HIP).

In [None]:
designs = open(designs_file, 'r')

for line in designs:
    name = line.partition("\n")[0]
    with open(outdir + str(name) + '/apo/' + str(name) +'_apo_readyForLeap.pdb', 'r') as pdb:
        print('\n' + str(name))
        for line in pdb:
            columns = line.split()
            if columns[3] == 'HIS' and columns[2] == 'HD1':
                print('HID ' + str(columns[5]))
            elif columns[3] == 'HIS' and columns[2] == 'HE2':
                print('HIE ' + str(columns[5]))

Copying dowser waters into the prepared protein file and adding 'TER' in between molecules:

In [None]:
designs = open(designs_file, 'r')

for line in designs:
    name = line.partition("\n")[0]
    with open(outdir+ str(name) + '/apo/' + str(name) +'_apo_readyForLeap.pdb', 'a') as pdb:
        print(str(name))
        infile = open(outdir+ str(name) + '/apo/dowserwat.pdb', 'r')
        i = 1
        pdb.write('TER\n')
        for line in infile:
            if i == 3:
                pdb.write(line)
                pdb.write('TER\n')
                print('TER')
                i = 1
            else:
                pdb.write(line)
                i = i + 1
        pdb.write('END')

Finally, changing histidine names to the appropriate protonation states, according to the above identified protonation states. This has to be done manually for each design. If design contains cystine residues forming disulfide bridges, change their residue name to CYX here as well.

Atom names of waters added by dowser are also changed to correct Amber format.

In [None]:
## Change HBI_49 below to design name
%%bash -s "HBI_49" "HBI_49_apo_readyForLeap.pdb"
cd /path/to/root/directory/to/store/trajectory/files/$1/apo
sed -i 's/OW  HOH/O   HOH/g' $2

sed -i 's/HIS A  28/HID A  28/g' $2
sed -i 's/HIS A  44/HID A  44/g' $2

#sed -i 's/CYS A   1/CYX A   1/g' $2
#sed -i 's/CYS A  59/CYX A  59/g' $2

- Run tLeap

Create tleap1.in input file to calculate system charge and necessary number of ions to be added:

In [None]:
designs = open(designs_file, 'r')


for line in designs:
    name = line.partition("\n")[0]
    with open(outdir+ str(name) + '/apo/'+ str(name) +'_tleap1.in', 'w') as leap:
        print(str(name))
        infile = open(outdir + 'tleap1.in', 'r')  ## Reference input file provided with this workflow
        i = 1
        for line in infile:
            if i == 3:
                leap.write('mol=loadpdb ' + str(name) + '_apo_readyForLeap.pdb\n')
                i = i + 1
            else:
                leap.write(line)
                i = i + 1

ATTENTION: For designs with dissulfide bonds, add the line: bond mol.ID1.SG mol.ID2.SG to the leap input, where ID1 and ID2 correspond to the IDs of the cysteine residues

In [None]:
%%bash
cd /path/to/root/directory/to/store/trajectory/files/
while IFS= read -r file
        do
            cd /path/to/root/directory/to/store/trajectory/files/
            cd "$file"/apo
            tleap -s -f "$file"_tleap1.in > "$file"_tleap1.out     
        done < "design_list.txt"

Based on tleap1.out, we calculate how many ions to add to the water box and prepare the tleap2.in file to run tleap to completion:

In [None]:
designs = open(designs_file, 'r')

box = re.compile(r'Total vdw box size')
charge = re.compile(r'Total unperturbed charge')

ionic_strength = 0.15 # Desired salt concentration, 0.15M
NA = 6.022E23 ## Avogadro's number

for line in designs:
    name = line.partition("\n")[0]
    with open(outdir + str(name) + '/apo/'+ str(name) +'_tleap1.out', 'r') as leap:
        print('\n' + str(name))
        for line in leap:
            
            # Calculating total number of ions necessary for target concentration
            if box.search(line):
                columns = line.split()
                x = float(columns[4])*(1.0E-09)
                y = float(columns[5])*(1.0E-09)
                z = float(columns[6])*(1.0E-09)
            elif charge.search(line):
                columns = line.split()
                charge_value = float(columns[3])

        print('Box dimensions (m) = ' + str(x) + ' ' + str(y)+' ' + str(z))
        print('Charge = ' + str(charge_value))

        V = x*y*z
        ions = V*ionic_strength* NA

        print('total number of ions = ' + str(ions) + ' or ' + str(int(round(ions))))

        ## Calculating charge and number of ions necessary for system neutrality
        remaining = int(round(ions)) - int(round(abs(charge_value)))

        Cl = round(remaining/2)    ## Here the number are rounded to the interger if the remaining value is odd
        Na = round(remaining/2)

        if charge_value <0:
            Na = Na + int(round(abs(charge_value)))
        else:
            Cl = Cl + int(round(abs(charge_value)))
    
        ## Final number of ions to be added to system - This is saved automatically in the tleap2 input file
        ## and is printed only for record keeping
        print('Na+ = ' + str(Na))
        print('Cl- = ' + str(Cl))
        
        ## Writing tleap2.in
        with open((outdir + str(name) + '/apo/'+ str(name) +'_tleap2.in', 'w') as leap2:
            infile = open(outdir+'tleap2.in', 'r')
            i = 1
            for line in infile:
                if i == 3:
                    leap2.write('mol=loadpdb ' + str(name) + '_apo_readyForLeap.pdb\n')
                    i = i + 1
                elif i == 7:
                    leap2.write('addions mol Na+ ' + str(Na) + ' Cl- ' + str(Cl) + '\n')
                    i = i + 1
                elif i == 10:
                    leap2.write('saveamberparm mol ' + str(name) + '_apo.prmtop ' + str(name) + '_apo.inpcrd\n')
                    i = i + 1
                else:
                    leap2.write(line)
                    i = i + 1

In [None]:
%%bash
cd /path/to/root/directory/to/store/trajectory/files/
while IFS= read -r file
        do
            cd /path/to/root/directory/to/store/trajectory/files/
            cd "$file"/apo
            tleap -s -f "$file"_tleap2.in  > "$file"_tleap2.out 
        done < "design_list.txt"

The necessary protein files for amber simulations (prmtop and inpcrd) are now saved to each of the design's directories

## Holo

The same steps are performed for the holo system preparation, with the exception of dowser. The ligand GAFF parametrization files (frcmod and lib) should be saved in the same /path/to/root/directory/to/store/trajectory/files/

Run Maestro:

In [None]:
%%bash
cd /path/to/root/directory/to/store/trajectory/files/
while IFS= read -r file
        do
            cd /path/to/root/directory/to/store/trajectory/files/
            cd "$file"
            mkdir holo
            cp "$file".pdb holo/"$file"_holo.pdb
            cd holo
            
            ## Running Maestro
            /soft/schrodinger/latest/utilities/prepwizard -disulfides -label_pkas -mse -noepik -fix -rehtreat -propka_pH 7 "$file"_holo.pdb "$file"_holo_preparedByMaestro.pdb
            
        done < "design_list.txt"

In [None]:
%%bash
cd /path/to/root/directory/to/store/trajectory/files/
while IFS= read -r file
        do

            cd /path/to/root/directory/to/store/trajectory/files/
            cd "$file"/holo
            cp "$file"_holo_preparedByMaestro.pdb "$file"_holo_readyForLeap.pdb 
            sed -i '/REMARK/d' "$file"_holo_readyForLeap.pdb
            sed -i '/END/d' "$file"_holo_readyForLeap.pdb 
            sed -i '/CONECT/d' "$file"_holo_readyForLeap.pdb
            sed -i '/TITLE/d' "$file"_holo_readyForLeap.pdb
            sed -i '/HET /d' "$file"_holo_readyForLeap.pdb
            sed -i '/HETNAM/d' "$file"_holo_readyForLeap.pdb
            sed -i '/FORMUL/d' "$file"_holo_readyForLeap.pdb
            sed -i '/MODEL/d' "$file"_holo_readyForLeap.pdb
            
        done < "design_list.txt"

Correct histidines protonation states:

In [None]:
designs = open(designs_file, 'r')

for line in designs:
    name = line.partition("\n")[0]
    with open(outdir + str(name) + '/holo/' + str(name) +'_holo_readyForLeap.pdb', 'r') as pdb:
        print('\n' + str(name))
        for line in pdb:
            columns = line.split()
            if columns[3] == 'HIS' and columns[2] == 'HD1':
                print('HID ' + str(columns[5]))
            elif columns[3] == 'HIS' and columns[2] == 'HE2':
                print('HIE ' + str(columns[5]))

In [None]:
## Change HBI_49 below to design name
%%bash -s "HBI_49" "HBI_49_holo_readyForLeap.pdb"
cd /path/to/root/directory/to/store/trajectory/files/$1/holo

sed -i 's/HIS A  28/HID A  28/g' $2
sed -i 's/HIS A  44/HID A  44/g' $2

#sed -i 's/CYS A   1/CYX A   1/g' $2
#sed -i 's/CYS A  59/CYX A  59/g' $2

ATTENTION: make sure ligand atom names match those in the generated parameter files

Run tleap1.in:

In [None]:
# Write input file
designs = open(designs_file, 'r')


for line in designs:
    name = line.partition("\n")[0]
    with open(outdir+ str(name) + '/holo/'+ str(name) +'_tleap1_holo.in', 'w') as leap:
        print(str(name))
        infile = open(outdir + 'tleap1_holo.in', 'r')  ## Reference input file provided with this workflow
        i = 1
        for line in infile:
            if i == 6:
                leap.write('mol=loadpdb ' + str(name) + '_holo_readyForLeap.pdb\n')
                i = i + 1
            else:
                leap.write(line)
                i = i + 1

In [None]:
%%bash
cd /path/to/root/directory/to/store/trajectory/files/
while IFS= read -r file
        do
            cd /path/to/root/directory/to/store/trajectory/files/
            cd "$file"/holo
            tleap -s -f "$file"_tleap1_holo.in > "$file"_tleap1_holo.out     
        done < "design_list.txt"

Run tleap2.in:

In [None]:
# Write input file
designs = open(designs_file, 'r')

box = re.compile(r'Total vdw box size')
charge = re.compile(r'Total unperturbed charge')

ionic_strength = 0.15 # Desired salt concentration, 0.15M
NA = 6.022E23 ## Avogadro's number

for line in designs:
    name = line.partition("\n")[0]
    with open(outdir + str(name) + '/holo/'+ str(name) +'_tleap1_holo.out', 'r') as leap:
        print('\n' + str(name))
        for line in leap:
            
            # Calculating total number of ions necessary for target concentration
            if box.search(line):
                columns = line.split()
                x = float(columns[4])*(1.0E-09)
                y = float(columns[5])*(1.0E-09)
                z = float(columns[6])*(1.0E-09)
            elif charge.search(line):
                columns = line.split()
                charge_value = float(columns[3])

        print('Box dimensions (m) = ' + str(x) + ' ' + str(y)+' ' + str(z))
        print('Charge = ' + str(charge_value))

        V = x*y*z
        ions = V*ionic_strength* NA

        print('total number of ions = ' + str(ions) + ' or ' + str(int(round(ions))))

        ## Calculating charge and number of ions necessary for system neutrality
        remaining = int(round(ions)) - int(round(abs(charge_value)))

        Cl = round(remaining/2)    ## Here the number are rounded to the interger if the remaining value is odd
        Na = round(remaining/2)

        if charge_value <0:
            Na = Na + int(round(abs(charge_value)))
        else:
            Cl = Cl + int(round(abs(charge_value)))
    
        ## Final number of ions to be added to system - This is saved automatically in the tleap2 input file
        ## and is printed only for record keeping
        print('Na+ = ' + str(Na))
        print('Cl- = ' + str(Cl))
        
        ## Writing tleap2.in
        with open((outdir + str(name) + '/holo/'+ str(name) +'_tleap2_holo.in', 'w') as leap2:
            infile = open(outdir+'tleap2_holo.in', 'r')
            i = 1
            for line in infile:
                if i == 6:
                    leap2.write('mol=loadpdb ' + str(name) + '_holo_readyForLeap.pdb\n')
                    i = i + 1
                elif i == 10:
                    leap2.write('addions mol Na+ ' + str(Na) + ' Cl- ' + str(Cl) + '\n')
                    i = i + 1
                elif i == 13:
                    leap2.write('saveamberparm mol ' + str(name) + '_holo.prmtop ' + str(name) + '_holo.inpcrd\n')
                    i = i + 1
                else:
                    leap2.write(line)
                    i = i + 1

In [None]:
%%bash
cd /path/to/root/directory/to/store/trajectory/files/
while IFS= read -r file
        do
            cd /path/to/root/directory/to/store/trajectory/files/
            cd "$file"/holo
            tleap -s -f "$file"_tleap2_holo.in  > "$file"_tleap2_holo.out 
        done < "design_list.txt"