Skip to content
🆎 ABC of chemoinformatics
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

ABC of Cheminformatics

... with focus on macromolecule-ligand interactions. Inspired by magnus' RNA Structural Bioinformatics Crash Course



Created by gh-md-toc

Intro - to read

Basic concepts

Molecular scaffolds


Ligands activity measures

  • IC50 - half maximal inhibitory concentration wiki
    • pIC50 = -log(IC50)
  • EC50 - half maximal effective concentration
  • LD50 - median lethal dose
  • Ki - binding affinity - describes the interaction of most ligands with their binding sites; high-affinity ligand binding results from greater intermolecular force between the ligand and its receptor while low-affinity ligand binding involves less intermolecular force between the ligand and its receptor. In general, high-affinity binding involves a longer residence time for the ligand at its receptor binding site than is the case for low-affinity binding.

The median affinity (IC50, EC50, ED50, Ki, Kd) for current small-molecule drugs is around 20 nM (source: doi:10.1038/nrd2199)


Drawing and visualizing small molecules

MarvinSketch from MarvinBeans Suite. Download for free (after registration):

⚡️ Practicals:

  • Draw the LSD molecule
  • Save it as smiles, mol2, sdf
  • copy it to clipboard as smiles and paste to the notepad
  • optimize the 3D structure

Molecule formats and formats conversion

⚡️ Practicals:

  • convert saved molecules to sdf, mol2 and PDB
  • add hydrogens
  • optimize 3D structure
  • generate conformers


Installing under debian/ubuntu/mint


⚡️ Practicals:

For protein 4N49 prepare images:

  • general view of the complex; transparency, raytracing


  • hydrogen bond network in the binding pocket:


  • distance and angles measurement; sidechains labelling:


UCSF Chimera (optional)

Another strucute visualisation/editing program.

Useful tutorials and howtos:

🌀 See also: my notes about Chimera (useful commands, pymol vs chimera etc.):

⚡️ Practicals:

For protein 4N49:

  • visualize hydrophobicity surface of the protein, with limitation to 6 A around the ligand(s)
  • color molecular surface using Electrostatic Potential (Coulombic is enough)


⚡️ Practicals:

  • From the ChemBlDb, download IC50 activity for JAK2 Kinase

  • prepare a workflow:

  • Read the data from CSV file

  • convert smiles string to structures

  • calculate moelcular descriptors: AMW, logP, TPSA

  • calculate logBB according to the formlula: formula (💡 use Math Node)

  • calculate the pareto rank, minimizing IC50 value, MW, logP and TPSA

  • For these data:

  • Create a 3D plot:

  • xyz: TPSA / logP / AMW

  • color by: IC50

  • points size: logBB

  • save as png

  • play with another types of plot (parallel coordinates, bar plots, conditional bar plots etc)

  • sort the table according to Pareto Ran value (ascending)

  • save 10 top ranking compounds to csv and xls files.

Molecular docking

AutoDock Vina


Results' rescoring

Other programs

⚡️ Practicals:

  • for complexes of interest perform redocking of the native ligands to the macromolecule structure
  • use various docking programs (AutoDock Vina, rDock...)
  • check the influence of various ligand preparation steps (eg: ligand: native X-ray structure vs optimized with openbabel vs ...)
  • check the influence of rescoring on the docking results
  • which combination gives the best results? (and what mean "best results"?)

Databases and online tools :: macromolecules

‼️ ⚠️ Never reveal/use confidental structures on public servers!

Scientific literature



  • high-quality and freely accessible resource of protein sequence and functional information.
  • - provides functional analysis of proteins by classifying them into families and predicting domains and important sites

Similar sequences

Sequences Alignment


  • - helps answering the questions:
    • I am interested in target T: Which diseases can be treated by modulating target T?
    • I am interested in disease D: Which targets can be modulated to treat disease D?


  • - This resource is powered by the Protein Data Bank archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies that helps students and researchers understand all aspects of biomedicine and agriculture, from protein synthesis to health and disease.
  • - contains information about experimentally-determined nucleic acids and complex assemblies.


Homology modeling

Homology modeling, also known as comparative modeling of protein, refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein (the "template"). (from: wikipedia)

⚡️ Practicals:

‼️ ⚠️ Never reveal/use confidental structures on public servers!

  • Find residues crutial for 2'-O-methyltransferase activity of Dengue virus type 2 (strain Thailand/16681/1984)
  • Find all sequences containing mRNA cap 0-1 NS5-type MT domain
  • Find reviewed sequences of Flaviviridae methyltransferases (taxonomy: Flaviviridae)
  • Find sequences similar to the human IDO1 proteins
  • What are differences between four most similar sequences?
  • Find x-ray structure of Cap-specific mRNA (nucleoside-2'-O-)-methyltransferase 1 Protein in complex with m7GpppG and SAM
  • download it and visualize with pymol
  • fetch fasta sequence
  • show interactions diagram for both ligands
  • find other structures containing SAM ligand

Databases and online tools :: small molecules

‼️ ⚠️ Never reveal/use confidental structures on public servers!

Chemical structures




ADMET properties estimation

Cytochrome P450

⚡️ Practicals:

‼️ ⚠️ Never reveal/use confidental structures on public servers!

  • Find all activities of the compound: structure

    • Download values of IC50 in the csv file
  • Find all registered drugs that targets RNA

  • Find possible bioisosteres of 3-methylindole (R in position 5)

  • For compound named (2S)-4-[3-(5-methyl-2-furyl)benzofuran-2-yl]-2-phenyl-butan-2-ol

  • find vendors
  • find calculated values of logP i TPSA
  • find comercially available compounds that are similar at 80% (Tanimoto similarity)
  • Find patents mentioning this structure:


  • Predict various physio-chemical/ADMET properties for the aspirin

Other tools and useful links

Software corner

Linux and Bash


  • - a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.
  • - displays hundreds of charts, always providing the reproducible python code! It aims to showcase the awesome dataviz possibilities of python and to help you benefit it.
  • - Cheminformatics and Machine Learning Software


Git (/ɡɪt/) is a version control system that is widely used for software development and other version control tasks. It is a distributed revision control system with an emphasis on speed, data integrity, and support for distributed, non-linear workflows.[9] Git was created by Linus Torvalds in 2005 for development of the Linux kernel, with other kernel developers contributing to its initial development.


GitLab, the software, is a web-based Git repository manager with wiki and issue tracking features.



Deduplicating archiver with compression and authenticated encryption, called "The holy grail of backup software":





RNA Structural Bioinformatics


Fun (optional)


You can’t perform that action at this time.