/
AutoDock.py
446 lines (422 loc) · 15.2 KB
/
AutoDock.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
#!/usr/bin/python
'''
Instructions:
-------------
This script uses Python 3.6+ and requires openbabel and PyMOL 2.2.
Follow this precise order of steps.
0. Use the command python3 AutoDock.py -h for the help menu.
1. Update system and install required programs:
sudo apt update && sudo apt full-upgrade && sudo apt install openbabel pymol
2. Choose the search space:
pymol AutoDock.py -b FILENAME.pdb
select an amino acid (has to be an amino acid only) where you want the center of the search box to be
then in PyMOL command terminal type Box("sele",1,1,1) where 1, 1, 1 are the x y z distances
from the center of the box. Adjust the x y z distances until the box is of satisfactory
size, you have to delete the Box and Position objects everytime before adjusting the numbers.
When finished the last line that is printed in the pymol terminal (Ca center of selection is)
will be the Center_X Center_Y Center_Z values. The Size_X Size_Y Size_Z values are the
values that you have adjusted. Use Box_fine(pX, pY, pZ, x, y, z) to fine adjust the center of the box,
where pX, pY, pZ are the values from Ca center of selection (the center of the box).
3. Prepare and convert the protein receptor from PDB to PDBQT:
python3 AutoDock.py -r FILENAME.pdb
4. Get Ligands from ZINC15 database.
5. Download the ligands and combine them into a file:
python3 AutoDock.py -d FILENAME.wget
6. Split ligands file for virtual screaning:
python3 AutoDock.py -s FILENAME.pdbqt 300000
7. Generate a PBS job submission file for a high performance computer:
python3 AutoDock.py -j Center_X Center_Y Center_Z Size_X Size_Y Size_Z Seed Exhaustiveness Output CPUs Array Email
8. Download Autodock vina from the following link:
http://vina.scripps.edu/download.html
9. Combine computed files and sort them to see which ligand binds strongest:
python3 AutoDock.py -c DIRECTORY
10. Generate PBS job script:
python3 AutoDock.py -p pX pY pZ x y z seed exhaust out CPU array email
pX pY pZ x y z -> these values are the box dementions
seed -> choose a random number for a starting seed
exhaust -> how exhaustive you want the search
out -> true or false (true will keep the output for each search - I recommend false)
CPU -> number of CPUs you have available per node
array -> number of array to repeat the code
email -> your email to be notified when the job completes
Here is a video explaning how to perform virtual screaning using AutoDock Vina
and how to use this script:
For running on a local computer use this Bash command:
for file in ./Ligands/*/*; do tmp=${file%.pdbqt}; name="${tmp##*/}"; ./vina --receptor receptor.pdbqt --ligand "$file" --out $name_out.pdbqt --log $name.log --exhaustiveness 10 --center_x 0 --center_y 0 --center_z 10 --size_x 15 --size_y 15 --size_z 15; awk '/^[-+]+$/{getline;print FILENAME,$0}' $name.log >> temp; done; sort temp -nk 3 > Results; rm temp; mkdir logs; mv *.log *.out logs
Another Bash command for running on a local computer:
* Edit this script to have all the correct values in the {} positions
* You should have all of the following in the same location:
1. A directory called "Ligands" with ligands in .pdbqt format within it.
2. The receptor.pdbqt file.
3. The vina executable
* Run this script using the command: bash dock.local
* MAKE SURE you backup the "Ligands" directory because this script will delete
each ligand after it finishes searching it.
COMMENT
process() { local n=${1##*/}
./vina \
--receptor receptor.pdbqt \
--ligand "$1" \
--out '/dev/null' \
--log "log_$n" \
--exhaustiveness 1 \
--cpu 1 \
--seed 1991540417 \
--center_x {} \
--center_y {} \
--center_z {} \
--size_x {} \
--size_y {} \
--size_z {} \
| awk -v name="$n" '$1 == "1" {print name "\t" $0;exit}' >> Docks \
&& rm log_$n \
&& rm "$1"
}
export -f process
find ./Ligands/ -type f -print0 | xargs -0 -P 24 -I{} bash -c 'process "$1"' _ {}
--------------------------------------------------------------------------------
MIT License
Copyright (c) 2018 Sari Sabban
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
'''
import os
import sys
import math
import numpy
import pymol
import argparse
import itertools
from pymol.cgo import *
def center_of_box(selection):
'''
output the avarage Ca coordinate of the given selection
'''
selection = selection + ' and n. CA'
model = querying.get_model(selection)
coords = model.get_coord_list()
coords_matrix = numpy.array(coords)
coords_ave = coords_matrix.mean(axis=0)
coords_ave = coords_ave.tolist()
print('Ca coordinates of selection are:', coords)
print('Ca center of selection is:', coords_ave)
cmd.extend("center_of_box", center_of_box)
return coords_ave
def Box(selection, x, y, z):
'''
Sets up the search box within the protein, which is
used in the docking protocol
'''
pX, pY, pZ = center_of_box(selection)
pymol.cmd.pseudoatom('Position', pos=[pX, pY, pZ])
([X, Y, Z],[a, b, c]) = pymol.cmd.get_extent('Position')
pymol.cmd.show('spheres', 'Position')
minX = X+float(x)
minY = Y+float(y)
minZ = Z+float(z)
maxX = X-float(x)
maxY = Y-float(y)
maxZ = Z-float(z)
boundingBox = [BEGIN, LINES,
VERTEX, minX, minY, minZ,
VERTEX, minX, minY, maxZ,
VERTEX, minX, maxY, minZ,
VERTEX, minX, maxY, maxZ,
VERTEX, maxX, minY, minZ,
VERTEX, maxX, minY, maxZ,
VERTEX, maxX, maxY, minZ,
VERTEX, maxX, maxY, maxZ,
VERTEX, minX, minY, minZ,
VERTEX, maxX, minY, minZ,
VERTEX, minX, maxY, minZ,
VERTEX, maxX, maxY, minZ,
VERTEX, minX, maxY, maxZ,
VERTEX, maxX, maxY, maxZ,
VERTEX, minX, minY, maxZ,
VERTEX, maxX, minY, maxZ,
VERTEX, minX, minY, minZ,
VERTEX, minX, maxY, minZ,
VERTEX, maxX, minY, minZ,
VERTEX, maxX, maxY, minZ,
VERTEX, minX, minY, maxZ,
VERTEX, minX, maxY, maxZ,
VERTEX, maxX, minY, maxZ,
VERTEX, maxX, maxY, maxZ,
END]
boxName = 'Box'
pymol.cmd.load_cgo(boundingBox, boxName)
return(boxName)
def Box_fine(pX, pY, pZ, x, y, z):
'''
Fine selection of the search box within the protein, which is
used in the docking protocol
'''
pymol.cmd.pseudoatom('Position', pos=[pX, pY, pZ])
([X, Y, Z],[a, b, c]) = pymol.cmd.get_extent('Position')
pymol.cmd.show('spheres', 'Position')
minX = X+float(x)
minY = Y+float(y)
minZ = Z+float(z)
maxX = X-float(x)
maxY = Y-float(y)
maxZ = Z-float(z)
boundingBox = [BEGIN, LINES,
VERTEX, minX, minY, minZ,
VERTEX, minX, minY, maxZ,
VERTEX, minX, maxY, minZ,
VERTEX, minX, maxY, maxZ,
VERTEX, maxX, minY, minZ,
VERTEX, maxX, minY, maxZ,
VERTEX, maxX, maxY, minZ,
VERTEX, maxX, maxY, maxZ,
VERTEX, minX, minY, minZ,
VERTEX, maxX, minY, minZ,
VERTEX, minX, maxY, minZ,
VERTEX, maxX, maxY, minZ,
VERTEX, minX, maxY, maxZ,
VERTEX, maxX, maxY, maxZ,
VERTEX, minX, minY, maxZ,
VERTEX, maxX, minY, maxZ,
VERTEX, minX, minY, minZ,
VERTEX, minX, maxY, minZ,
VERTEX, maxX, minY, minZ,
VERTEX, maxX, maxY, minZ,
VERTEX, minX, minY, maxZ,
VERTEX, minX, maxY, maxZ,
VERTEX, maxX, minY, maxZ,
VERTEX, maxX, maxY, maxZ,
END]
boxName = 'Box'
pymol.cmd.load_cgo(boundingBox, boxName)
return(boxName)
def download(filename):
'''
Download, unzip, combine, renumber ligands
'''
with open(filename, 'r')as infile:
for line in infile:
try:
namegz = line.split()[-1]
name = line.split()[-1].split('gz')[0][:-1]
get = line.split()[1]
#wget = 'wget {} -O {}'.format(get, namegz)
wget = line.strip()
gunzip = 'gunzip {}'.format(namegz)
cat = 'cat {} >> temp'.format(name)
os.system(wget)
os.system(gunzip)
with open(name) as f:
first = f.readline()
if first.split()[0] == 'MODEL':
os.system(cat)
else:
os.system('echo "MODEL 1" >> temp')
os.system(cat)
os.system('echo "ENDMDL" >> temp')
except:
with open('error', 'a') as e:
e.write(line)
count = 0
with open('temp', 'r') as infile:
with open('temp2', 'a') as outfile:
for line in infile:
if line.startswith('MODEL'):
count += 1
outfile.write('MODEL {:15}\n'.format(count))
else:
outfile.write(line)
#os.system('ls *.pdbqt | grep -v receptor.pdbqt | xargs rm')
os.system("rm -R `ls -1 -d */`")
os.remove('temp')
os.rename('temp2', 'ZINC15.pdbqt')
def receptor(filename):
'''
Prepares the receptor by first removing all the water molecules from
the protein's structure, then adds only the polar hydrogens, then
it exports the resulting structure and converts it to a .pdbqt file.
'''
cmd.load(filename)
cmd.remove('resn HOH')
cmd.h_add(selection='acceptors or donors')
cmd.save('protein.pdb')
#os.system('babel protein.pdb temp.pdbqt -xh') #OLD
os.system('obabel protein.pdb -O temp.pdbqt -xh --partialcharge gasteiger')
os.system('grep ATOM temp.pdbqt > receptor.pdbqt')
os.remove('temp.pdbqt')
os.remove('protein.pdb')
def ligand(filename):
name = filename[:-4]
cmd.load(filename)
cmd.remove('resn HOH')
cmd.h_add()
cmd.save('{}.pdb'.format(name))
os.system('obabel {}.pdb -O {}.pdbqt -xh --partialcharge gasteiger'.format(name, name))
os.remove('{}.pdb'.format(name))
def split(filename, direct, prefix, limit):
'''
Separates a .pdbqt file with multiple molecules into separate files with
singles molecules segmented over sub directories.
'''
with open(filename) as infile:
count = 0
in_dir_count = 0
dircount = 0
for dircount in itertools.count():
for line in infile:
#if line.strip() == 'MODEL{:16}'.format(count+1):
if line.strip().split()[0] == 'MODEL' and line.strip().split()[1] == '{}'.format(count+1):
directory = os.path.join(direct, '{}'.format(dircount+1))
os.makedirs(directory, exist_ok=True)
name = '{}_{:09}.pdbqt'.format(prefix, count+1)
out = os.path.join(directory, name)
with open(out, 'w') as outfile:
for line in infile:
if line.strip() == 'ENDMDL':
break
if line.split()[0] == 'REMARK' and\
line.split()[1] == 'Name':
NewName = os.path.join(directory,\
'{}.pdbqt'.format(line.split()[3]))
outfile.write(line)
os.rename(out, NewName)
count += 1
in_dir_count += 1
if in_dir_count >= limit:
in_dir_count = 0
print('[+] Finished directory {}'.format(directory))
break
else: break
print('----------\n[+] Done')
def PBS(pX, pY, pZ, x, y, z, seed, exhaust, out, CPU, array, email):
'''
Write a PBS file for HPC virtual screening
'''
if out == 'True' or out == 'true':
output = '"out_$n"'
elif out == 'False' or out == 'false':
output = '/dev/null'
with open('dock.pbs', 'w') as dock:
dock.write('#!/bin/bash\n\n')
dock.write('#PBS -N Docking\n')
dock.write('#PBS -m e\n')
dock.write('#PBS -M {}\n'.format(email))
dock.write('#PBS -q thin_1m\n')
dock.write('#PBS -l select=1:ncpus=24:ompthreads=24\n')
dock.write('#PBS -j oe\n')
dock.write('#PBS -J 1-{}\n\n'.format(array))
dock.write('cd $PBS_O_WORKDIR\n\n')
dock.write('mkdir -p ../Ligands_Completed/${PBS_ARRAY_INDEX}\n')
dock.write('process() { local n=${1##*/}\n')
dock.write('\t./vina \\\n')
dock.write('\t\t--receptor receptor.pdbqt \\\n')
dock.write('\t\t--ligand "$1" \\\n')
dock.write('\t\t--out {} \\\n'.format(output))
dock.write('\t\t--log "log_$n" \\\n')
dock.write('\t\t--exhaustiveness {} \\\n'.format(exhaust))
dock.write('\t\t--cpu {} \\\n'.format(CPU))
dock.write('\t\t--seed {} \\\n'.format(seed))
dock.write('\t\t--center_x {} \\\n'.format(pX))
dock.write('\t\t--center_y {} \\\n'.format(pY))
dock.write('\t\t--center_z {} \\\n'.format(pZ))
dock.write('\t\t--size_x {} \\\n'.format(x))
dock.write('\t\t--size_y {} \\\n'.format(y))
dock.write('\t\t--size_z {} \\\n'.format(z))
dock.write('''\t\t| awk -v name="$n" '$1 == "1" {print name "\\t" $0;exit}' >> Docks_${PBS_ARRAY_INDEX} \\\n''')
dock.write('\t\t&& rm log_$n \\\n')
dock.write('\t\t&& mv "$1" ../Ligands_Completed/${PBS_ARRAY_INDEX}\n')
dock.write('}\n')
dock.write('export -f process\n')
dock.write('''find ../Ligands/${PBS_ARRAY_INDEX}/ -type f -print0 | xargs -0 -P 24 -I{} bash -c 'process "$1"' _ {}''')
def affinity(name, Kd=None, dG=None, R=0.00198720425864083, T=298):
if Kd != None and dG == None:
Kd = float(Kd)
dG = R*T*math.log(Kd)
print('{} dG = {} Kcal/mol'.format(name, round(dG, 2)))
if dG != None and Kd == None:
dG = float(dG)
Kd = math.e**(dG/(R*T))
print('{} Kd = {:0.2e} M'.format(name, Kd))
parser = argparse.ArgumentParser(description='Prep ligands for AutoDock Vina')
parser.add_argument('-r',
'--receptor',
nargs='+',
help='Prep and convert protein receptor from PDB to PDBQT')
parser.add_argument('-b',
'--box',
nargs='+',
help='Draw search box')
parser.add_argument('-d',
'--download',
nargs='+',
help='Download, unzip, renumber, combine ligands')
parser.add_argument('-s',
'--split',
nargs='+',
help='Split a file with multiple models into single files\
segmented into directories')
parser.add_argument('-j',
'--job',
nargs='+',
help='Write the PBS file for HPC virtual screaning')
parser.add_argument('-c',
'--combine',
nargs='+',
help='Sort and combine the docking results into a file')
parser.add_argument('-Kd',
'--Kd_to_dG',
nargs='+',
help='Convert Kd to delta G')
parser.add_argument('-dG',
'--dG_to_Kd',
nargs='+',
help='Convert delta G to Kd')
parser.add_argument('-p',
'--pbs',
nargs='+',
help='Generates an HPC PBS job script')
args = parser.parse_args()
def main():
if args.receptor:
receptor(sys.argv[2])
elif args.box:
pymol.cmd.load(str(sys.argv[2]))
pymol.cmd.extend('Box', Box)
elif args.download:
download(sys.argv[2])
elif args.split:
split(sys.argv[2], 'Ligands', 'model', int(sys.argv[3]))
elif args.job:
PBS(sys.argv[2], # pX
sys.argv[3], # pY
sys.argv[4], # pZ
sys.argv[5], # x
sys.argv[6], # y
sys.argv[7], # z
sys.argv[8], # Seed
sys.argv[9], # Exhaustiveness
sys.argv[10], # Output
sys.argv[11], # CPUs
sys.argv[12], # Array
sys.argv[13]) # Email
elif args.combine:
os.system('cat {}/Docks_* | sort -nk 3 > Result'.format(sys.argv[2]))
elif args.Kd_to_dG:
affinity('molecule', Kd=sys.argv[2])
elif args.dG_to_Kd:
affinity('molecule', dG=sys.argv[2])
elif args.pbs:
PBS(sys.argv[2], sys.argv[3], sys.argv[4], sys.argv[5], sys.argv[6], sys.argv[7], sys.argv[8], sys.argv[9], sys.argv[10], sys.argv[11], sys.argv[12], sys.argv[13])
if __name__ == '__main__': main()