ABC of Cheminformatics
... with focus on macromolecule-ligand interactions. Inspired by magnus' RNA Structural Bioinformatics Crash Course
- Intro - to read
- Databases and online tools :: macromolecules
- Databases and online tools :: small molecules
- Other tools and useful links
- Software corner
- RNA Structural Bioinformatics
- Fun (optional)
Created by gh-md-toc
Intro - to read
Ligands activity measures
- IC50 - half maximal inhibitory concentration wiki
- pIC50 = -log(IC50)
- EC50 - half maximal effective concentration
- LD50 - median lethal dose
- Ki - binding affinity - describes the interaction of most ligands with their binding sites; high-affinity ligand binding results from greater intermolecular force between the ligand and its receptor while low-affinity ligand binding involves less intermolecular force between the ligand and its receptor. In general, high-affinity binding involves a longer residence time for the ligand at its receptor binding site than is the case for low-affinity binding.
The median affinity (IC50, EC50, ED50, Ki, Kd) for current small-molecule drugs is around 20 nM (source: doi:10.1038/nrd2199)
Drawing and visualizing small molecules
MarvinSketch from MarvinBeans Suite. Download for free (after registration): https://www.chemaxon.com/download/marvin-suite/#mbeans
- Draw the LSD molecule
- Save it as smiles, mol2, sdf
- copy it to clipboard as smiles and paste to the notepad
- optimize the 3D structure
Molecule formats and formats conversion
- convert saved molecules to sdf, mol2 and PDB
- add hydrogens
- optimize 3D structure
- generate conformers
Installing under debian/ubuntu/mint
- download the script: https://raw.githubusercontent.com/filipsPL/ABChemoinformatics/master/compile-pymol.sh
- make it executable:
chmod +x compile-pymol.sh
- run compilation
- run pymol
- Practical Pymol for Beginners http://www.pymolwiki.org/index.php/Practical_Pymol_for_Beginners
- See also: https://github.com/mmagnus/RNA-Structural-Bioinformatics-Crash-Course/blob/master/README.md#pymol
For protein 4N49 prepare images:
- general view of the complex; transparency, raytracing
- hydrogen bond network in the binding pocket:
- distance and angles measurement; sidechains labelling:
UCSF Chimera (optional)
Another strucute visualisation/editing program.
- Download: https://www.cgl.ucsf.edu/chimera/download.html
- Getting started: https://www.cgl.ucsf.edu/Outreach/Tutorials/GettingStarted.html
Useful tutorials and howtos:
- how to make high-quality images of a protein surface colored by hydrophobicity and electrostatic potential: https://www.cgl.ucsf.edu/chimera/docs/UsersGuide/tutorials/surfprop.html
- Self-Guided Volume Data Exercises: https://www.cgl.ucsf.edu/chimera/data/tutorials/maps08/exercises.html
- Structure Analysis and Comparison Tutorial: https://www.cgl.ucsf.edu/chimera/docs/UsersGuide/tutorials/squalene.html#surfaces
- Image Tutorial: Surface Properties: https://www.cgl.ucsf.edu/chimera/current/docs/UsersGuide/tutorials/surfprop.html
For protein 4N49:
- visualize hydrophobicity surface of the protein, with limitation to 6 A around the ligand(s)
- color molecular surface using Electrostatic Potential (Coulombic is enough)
- Download: https://www.knime.org/knime
- Quick Start Guide: https://tech.knime.org/files/KNIME_quickstart.pdf (PDF)
From the ChemBlDb, download IC50 activity for JAK2 Kinase
prepare a workflow:
Read the data from CSV file
convert smiles string to structures
calculate moelcular descriptors: AMW, logP, TPSA
calculate the pareto rank, minimizing IC50 value, MW, logP and TPSA
For these data:
Create a 3D plot:
xyz: TPSA / logP / AMW
color by: IC50
points size: logBB
save as png
play with another types of plot (parallel coordinates, bar plots, conditional bar plots etc)
sort the table according to Pareto Ran value (ascending)
save 10 top ranking compounds to csv and xls files.
- Official tutorial: http://vina.scripps.edu/tutorial.html
- pymol plugin: http://www.pymolwiki.org/index.php/Autodock_plugin
- For RNA-ligand complexes: LigandRNA: http://ligandrna.genesilico.pl/
- For Protein-ligand complexes: NNScore 2.0: http://nbcr.ucsd.edu/data/sw/hosted/nnscore/
- http://www.tcd.uni-konstanz.de/research/plants.php - PLANTS - Protein-Ligand ANT System
- http://dock.compbio.ucsf.edu/ - Dock 6.x
- for complexes of interest perform redocking of the native ligands to the macromolecule structure
- use various docking programs (AutoDock Vina, rDock...)
- check the influence of various ligand preparation steps (eg: ligand: native X-ray structure vs optimized with openbabel vs ...)
- check the influence of rescoring on the docking results
- which combination gives the best results? (and what mean "best results"?)
Databases and online tools :: macromolecules
- google patents: http://www.google.com/patents
- espace net: http://worldwide.espacenet.com/
- depatis net: https://depatisnet.dpma.de
- WIPO: https://patentscope.wipo.int/search/en/search.jsf
- https://www.surechembl.org/search/ - Open Patent Data
- http://www.uniprot.org/ high-quality and freely accessible resource of protein sequence and functional information.
- http://www.ebi.ac.uk/interpro/ - provides functional analysis of proteins by classifying them into families and predicting domains and important sites
- http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE=Proteins&PROGRAM=blastp&RUN_PSIBLAST=on - search protein databases using a protein query.
- http://www.ebi.ac.uk/Tools/sss/ncbiblast/ - to find regions of sequence similarity, which will yield functional and evolutionary clues about the structure and function of your novel sequence.
- http://toolkit.tuebingen.mpg.de/cs_blast CS-BLAST is an extension to standard NCBI BLAST that allows to increase its sensitivity by a factor of more than two on remote homologs at the same speed.
- http://toolkit.tuebingen.mpg.de/hhpred - Homology detection & structure prediction by HMM-HMM comparison
- Inne: http://toolkit.tuebingen.mpg.de/sections/search
- Pairwise Sequence Alignment: https://www.ebi.ac.uk/Tools/psa/
- Multiple Sequence Alignment: http://www.ebi.ac.uk/Tools/msa/clustalo/
- https://www.targetvalidation.org/ - helps answering the questions:
- I am interested in target T: Which diseases can be treated by modulating target T?
- I am interested in disease D: Which targets can be modulated to treat disease D?
- http://www.rcsb.org/pdb/home/home.do - This resource is powered by the Protein Data Bank archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies that helps students and researchers understand all aspects of biomedicine and agriculture, from protein synthesis to health and disease.
- http://ndbserver.rutgers.edu/ - contains information about experimentally-determined nucleic acids and complex assemblies.
- http://dogsite.zbh.uni-hamburg.de/ - on line pocket finder
- https://www.ebi.ac.uk/pdbsum -> Cleft analysis
- http://mole.upol.cz/ - rapid and fully automated location and characterization of channels, tunnels and pores
- Druggability: https://www.ebi.ac.uk/chembl/drugebility/structure (try: 1UV5)
Homology modeling, also known as comparative modeling of protein, refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein (the "template"). (from: wikipedia)
- Find residues crutial for 2'-O-methyltransferase activity of Dengue virus type 2 (strain Thailand/16681/1984)
- Find all sequences containing mRNA cap 0-1 NS5-type MT domain
- Find reviewed sequences of Flaviviridae methyltransferases (
- Find sequences similar to the human IDO1 proteins
- What are differences between four most similar sequences?
- Find x-ray structure of Cap-specific mRNA (nucleoside-2'-O-)-methyltransferase 1 Protein in complex with m7GpppG and SAM
- download it and visualize with pymol
- fetch fasta sequence
- show interactions diagram for both ligands
- find other structures containing SAM ligand
Databases and online tools :: small molecules
- http://zinc.docking.org/ (aspirin)
- Also: https://pubchem.ncbi.nlm.nih.gov/ - BioAssay
- http://chemoinfo.ipmc.cnrs.fr/MOLDB/index.html - e-Drug3D offers a facility to explore FDA approved drugs and active metabolites.
- http://bitterdb.agri.huji.ac.il/ - currently holds over 680 bitter compounds obtained from the literature and from Merck index and their associated 25 human bitter taste receptors (hT2Rs).
- http://www.eidogen-sertanty.com/kinasekb.php - Kinase Knowledgebase (KKB)
- https://www.ebi.ac.uk/chembl/malaria/ - resource for publicly available compounds, targets, assays and data for malaria research
- http://www.swissbioisostere.ch/ - isosters database
- http://www.drugbank.ca/ (1-phenyl-2-aminopropane)
- http://bidd.nus.edu.sg/group/ttd/ttd.asp (ALK)
- http://www.genome.jp/kegg/drug/ - a comprehensive drug information resource for approved drugs in Japan, USA, and Europe unified based on the chemical structures and/or the chemical components, and associated with target, metabolizing enzyme, and other molecular interaction network information.
ADMET properties estimation
- http://www.organic-chemistry.org/prog/peo/ - OSIRIS Property Explorer
- https://disco.chemaxon.com/apps/demos/ - ChemAxon - Calculator Plugin Demos
- http://bleoberis.bioc.cam.ac.uk/pkcsm/prediction - pkCSM: predicting small-molecule pharmacokinetic properties using graph-based signatures
- http://www.organic-chemistry.org/prog/peo/ - calculates on-the-fly various drug-relevant properties
- http://lmmd.ecust.edu.cn:8000/ - A comprehensive source and free tool for evaluating chemical ADMET properties
- http://tox.charite.de/tox/ - a webserver for the prediction of oral toxicities of small molecules in rodents!
- http://www.swisstargetprediction.ch/ - website allows you to predict the targets of a small molecule
- http://toxnet.nlm.nih.gov/newtoxnet/hsdb.htm - a toxicology database that focuses on the toxicology of potentially hazardous chemicals. It provides information on human exposure, industrial hygiene, emergency handling procedures, environmental fate, regulatory requirements, nanomaterials, and related areas. The information in HSDB has been assessed by a Scientific Review Panel.
- http://bioinformatics.charite.de/supertoxic/index.php?site=home - collected toxic compounds from literature and web sources in the database SuperToxic.
- What is Cytochrome P450: https://en.wikipedia.org/wiki/Cytochrome_P450
- http://www.farma.ku.dk/smartcyp/index.php - SMARTCyp predicts the sites in molecules that are most liable to cytochrome P450 mediated metabolism
- http://www.farma.ku.dk/whichcyp/index.php - WhichCyp predicts which P450 isoform will bind/metabolize a molecule using simple yes/no classification models.
- Download values of IC50 in the csv file
Find all registered drugs that targets RNA
Find possible bioisosteres of 3-methylindole (R in position 5)
For compound named
- find vendors
- find calculated values of logP i TPSA
- find comercially available compounds that are similar at 80% (Tanimoto similarity)
- Find patents mentioning this structure:
- Predict various physio-chemical/ADMET properties for the aspirin
Other tools and useful links
- https://opensourcemolecularmodeling.github.io/ - updated list of the Open Source Molecular Modeling software
- http://click2drug.org/ - Directory of computer-aided Drug Design tools
Linux and Bash
- Linux Bash Shell Cheat Sheet - http://cli.learncodethehardway.org/bash_cheat_sheet.pdf (PDF)
- Other useful commands:
- https://seaborn.pydata.org/ - a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.
- https://python-graph-gallery.com/ - displays hundreds of charts, always providing the reproducible python code! It aims to showcase the awesome dataviz possibilities of python and to help you benefit it.
- http://rdkit.sourceforge.net/ - Cheminformatics and Machine Learning Software
Git (/ɡɪt/) is a version control system that is widely used for software development and other version control tasks. It is a distributed revision control system with an emphasis on speed, data integrity, and support for distributed, non-linear workflows. Git was created by Linus Torvalds in 2005 for development of the Linux kernel, with other kernel developers contributing to its initial development.
GitLab, the software, is a web-based Git repository manager with wiki and issue tracking features.
- git - for storing files
- wiki - for documentation (and/or IPython Notebook: https://ipython.org/notebook.html)
Deduplicating archiver with compression and authenticated encryption, called "The holy grail of backup software":
- see: https://en.wikipedia.org/wiki/Markdown
- Mastering Markdown: https://guides.github.com/features/mastering-markdown/
- http://www.emoji-cheat-sheet.com/ - Emoji cheat sheet
- My notes about markdown and emoji: https://github.com/filipsPL/ABChemoinformatics/blob/master/markdown%2Bemoji.md#markdown
- Ten simple rules for making research software more robust: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005412
RNA Structural Bioinformatics
- see: https://github.com/mmagnus/RNA-Structural-Bioinformatics-Crash-Course/blob/master/README.md#table-of-contents
- see: http://genesilico.pl/ -> "software"