Skip to content

filipsPL/ABChemoinformatics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ABC of Cheminformatics

... with focus on macromolecule-ligand interactions. Inspired by magnus' RNA Structural Bioinformatics Crash Course

email

TOC

Created by gh-md-toc

Intro - to read

Basic concepts

Molecular scaffolds

scaffolds

Ligands activity measures

  • IC50 - half maximal inhibitory concentration wiki
    • pIC50 = -log(IC50)
  • EC50 - half maximal effective concentration
  • LD50 - median lethal dose
  • Ki - binding affinity - describes the interaction of most ligands with their binding sites; high-affinity ligand binding results from greater intermolecular force between the ligand and its receptor while low-affinity ligand binding involves less intermolecular force between the ligand and its receptor. In general, high-affinity binding involves a longer residence time for the ligand at its receptor binding site than is the case for low-affinity binding.

The median affinity (IC50, EC50, ED50, Ki, Kd) for current small-molecule drugs is around 20 nM (source: doi:10.1038/nrd2199)

Software

Drawing and visualizing small molecules

MarvinSketch from MarvinBeans Suite. Download for free (after registration): https://www.chemaxon.com/download/marvin-suite/#mbeans

Practicals:

  • Draw the LSD molecule
  • Save it as smiles, mol2, sdf
  • copy it to clipboard as smiles and paste to the notepad
  • optimize the 3D structure

Molecule formats and formats conversion

Practicals:

  • convert saved molecules to sdf, mol2 and PDB
  • add hydrogens
  • optimize 3D structure
  • generate conformers

PyMOL

Installing under debian/ubuntu/mint

Usage

Practicals:

For protein 4N49 prepare images:

  • general view of the complex; transparency, raytracing

pymol

  • hydrogen bond network in the binding pocket:

pymol

  • distance and angles measurement; sidechains labelling:

pymol

UCSF Chimera (optional)

Another strucute visualisation/editing program.

Useful tutorials and howtos:

🌀 See also: my notes about Chimera (useful commands, pymol vs chimera etc.): https://github.com/filipsPL/ABChemoinformatics/blob/master/pymol_chimera.md

Practicals:

For protein 4N49:

  • visualize hydrophobicity surface of the protein, with limitation to 6 A around the ligand(s)
  • color molecular surface using Electrostatic Potential (Coulombic is enough)

KNIME

Practicals:

  • From the ChemBlDb, download IC50 activity for JAK2 Kinase

  • prepare a workflow:

  • Read the data from CSV file

  • convert smiles string to structures

  • calculate moelcular descriptors: AMW, logP, TPSA

  • calculate logBB according to the formlula: formula (:bulb: use Math Node)

  • calculate the pareto rank, minimizing IC50 value, MW, logP and TPSA

  • For these data:

  • Create a 3D plot:

  • xyz: TPSA / logP / AMW

  • color by: IC50

  • points size: logBB

  • save as png

  • play with another types of plot (parallel coordinates, bar plots, conditional bar plots etc)

  • sort the table according to Pareto Ran value (ascending)

  • save 10 top ranking compounds to csv and xls files.

Molecular docking

AutoDock Vina

rDock

Results' rescoring

Other programs

Practicals:

  • for complexes of interest perform redocking of the native ligands to the macromolecule structure
  • use various docking programs (AutoDock Vina, rDock...)
  • check the influence of various ligand preparation steps (eg: ligand: native X-ray structure vs optimized with openbabel vs ...)
  • check the influence of rescoring on the docking results
  • which combination gives the best results? (and what mean "best results"?)

Databases and online tools :: macromolecules

‼️ ⚠️ Never reveal/use confidental structures on public servers!

Scientific literature

Patents

Sequences

  • http://www.uniprot.org/ high-quality and freely accessible resource of protein sequence and functional information.
  • http://www.ebi.ac.uk/interpro/ - provides functional analysis of proteins by classifying them into families and predicting domains and important sites

Similar sequences

Sequences Alignment

Other

  • https://www.targetvalidation.org/ - helps answering the questions:
    • I am interested in target T: Which diseases can be treated by modulating target T?
    • I am interested in disease D: Which targets can be modulated to treat disease D?

PDB

  • http://www.rcsb.org/pdb/home/home.do - This resource is powered by the Protein Data Bank archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies that helps students and researchers understand all aspects of biomedicine and agriculture, from protein synthesis to health and disease.
  • http://ndbserver.rutgers.edu/ - contains information about experimentally-determined nucleic acids and complex assemblies.

Tools

Homology modeling

Homology modeling, also known as comparative modeling of protein, refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein (the "template"). (from: wikipedia)


Practicals:

‼️ ⚠️ Never reveal/use confidental structures on public servers!

  • Find residues crutial for 2'-O-methyltransferase activity of Dengue virus type 2 (strain Thailand/16681/1984)
  • Find all sequences containing mRNA cap 0-1 NS5-type MT domain
  • Find reviewed sequences of Flaviviridae methyltransferases (taxonomy: Flaviviridae)
  • Find sequences similar to the human IDO1 proteins
  • What are differences between four most similar sequences?
  • Find x-ray structure of Cap-specific mRNA (nucleoside-2'-O-)-methyltransferase 1 Protein in complex with m7GpppG and SAM
  • download it and visualize with pymol
  • fetch fasta sequence
  • show interactions diagram for both ligands
  • find other structures containing SAM ligand

Databases and online tools :: small molecules

‼️ ⚠️ Never reveal/use confidental structures on public servers!

Chemical structures

Activity

Other

Drugs

ADMET properties estimation

Cytochrome P450


Practicals:

‼️ ⚠️ Never reveal/use confidental structures on public servers!

  • Find all activities of the compound: structure

    • Download values of IC50 in the csv file
  • Find all registered drugs that targets RNA

  • Find possible bioisosteres of 3-methylindole (R in position 5)

  • For compound named (2S)-4-[3-(5-methyl-2-furyl)benzofuran-2-yl]-2-phenyl-butan-2-ol

  • find vendors
  • find calculated values of logP i TPSA
  • find comercially available compounds that are similar at 80% (Tanimoto similarity)
  • Find patents mentioning this structure:

zapytanie

  • Predict various physio-chemical/ADMET properties for the aspirin

Other tools and useful links

Software corner

Linux and Bash

Python

  • https://seaborn.pydata.org/ - a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.
  • https://python-graph-gallery.com/ - displays hundreds of charts, always providing the reproducible python code! It aims to showcase the awesome dataviz possibilities of python and to help you benefit it.
  • http://rdkit.sourceforge.net/ - Cheminformatics and Machine Learning Software

Git

Git (/ɡɪt/) is a version control system that is widely used for software development and other version control tasks. It is a distributed revision control system with an emphasis on speed, data integrity, and support for distributed, non-linear workflows.[9] Git was created by Linus Torvalds in 2005 for development of the Linux kernel, with other kernel developers contributing to its initial development.

Gitlab

GitLab, the software, is a web-based Git repository manager with wiki and issue tracking features.

Backups

Borg

Deduplicating archiver with compression and authenticated encryption, called "The holy grail of backup software":

rsync

duplicity

Markdown

Programming

RNA Structural Bioinformatics

Science

Fun (optional)

xkcd

About

🆎 ABC of chemoinformatics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages