Skip to content


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
tree: b4ac45cf33
Fetching contributors…

Cannot retrieve contributors at this time

executable file 228 lines (212 sloc) 12.327 kb
\title{Supplement History of Cheminformatics as part of Cheminformatics: The Computer Science of Chemical Discovery}
Joerg Kurt Wegner\\
\affaddr{Tibotec BVBA}\\
\affaddr{Turnhoutseweg 30}\\
\affaddr{2340 Beerse Turnhout, Belgium}\\
% 2nd. author
Aaron Sterling\\
\affaddr{Department of Computer Science}\\
\affaddr{Iowa State University}\\
\affaddr{Ames, Iowa, USA}\\
% 3rd author
Rajarshi Guha\\
\affaddr{NIH Center for Translational Therapeutics}\\
\affaddr{9800 Medical Center Drive}\\
\affaddr{Rockville, MD 20850}\\
\additionalauthors{Additional authors:
Andreas Bender (University of Cambridge, email: {\texttt{}}),
Jean-Loup Faulon (University of Evry, email: {\texttt{}}),
Janna Hastings (European Bioinformatics Institute, Cambridge, UK, email: {\texttt{}}),
Noel O'Boyle (University College Cork, Cork, Ireland, email: {\texttt{}}),
John Overington (European Bioinformatics Institute, Cambridge, UK, email: {\texttt{}}),
Herman Van Vlijmen (Tibotec, Beerse, Belgium, email: {\texttt{}}), and
Egon Willighagen (Karolinska Institutet, Stockholm, Sweden, email: {\texttt{}})
\date{25 June 2011}
\section{Definitions and References}
The aim of this brief review of the history of cheminformatics is to
put the content of the main article into a broader perspective. If we
consider all information, analysis, and \emph{in silico} optimization
of a molecule as ``cheminformatics'', then the field is very large,
since a molecule plays the central role in many related
disciplines. Clearly, this article is not mean to be an exhaustive
overview; rather, we point the reader to more detailed references and
highlight some historic milestones. In addition, we also discuss
some terms as they relate to cheminformatics. It is thus useful to
first provide some definitions of cheminformatics itself, given that
they also shed some light on the founding principles of the field. For
example, we have the following definitions:
\item ``\textit{The mixing of information resources to transform data
into information, and information into knowledge, for the intended
purpose of making better decisions faster in the arena of drug
lead identification and optimization.}'' Frank K. Brown, 1998.
\item ``\textit{[Chemoinformatics involves]... the computer
manipulation of two- or threedimensional chemical structures and
excludes textual information. This distinguishes the term from
chemical information, largely a discipline of chemical librarians
and does not include the development of computational methods.}''
Peter Willett, 2002.
\item ``\textit{\ldots the application of informatics to solve chemical
problems.}'' and ``\textit{\ldots chemoinformatics makes the point that
you're using one scientific discipline to understand another
scientific discipline.}'' Johann Gasteiger, 2002
\item ``\textit{The set of approaches to computer-aided drug design
that do not rely on the 3D structure of the [protein] target}''
John Van Drie, 2011 (personal communication)
More generally, Chen\cite{Chen2006} and Brown \cite{brown2009} cited various
definitions of pioneers in the field with Brown concluding that
``'\textit{The differences in definitions (of the term
cheminformatics) are largely a result of the types of analyses that
particular scientists practice and no single definition is intended
to be all-encompassing}'', as is evident from the three definitions above.
It is also useful to clarify some other terms and how
cheminformatics might relate to them:
\item \textbf{Quantum chemistry} (QM) is generally associated with
theoretical chemistry or chemical physics. Briefly, it focuses on
the description of chemical systems starting from first principles
and is based on the Schr\"{o}dinger wave equation and frameworks
built on top of it. The use of quantum mechanics allows one to
describe molecules in terms of electron density, and hence calculate
molecular properties (spectra, energies et.) in a physically
accurate manner. The results of quantum chemistry calculations are
used in molecular modeling (energy force field parameters) and
cheminformatics (atom partial charges, or full molecule properties,
e.g molecule polarizabilities).
\item \textbf{Molecular modeling} or \textbf{Computational chemistry}
can be considered an approximation of quantum chemistry and aims to
evaluate many of the properties considered by QM methods. The key
difference is that these methods employ a variety of
parametrizations (usually, but not always, derived from QM studies),
allowing one to evaluate energies for small molecules very rapidly
as well as handle larger systems (such as proteins) that are too
time consuming for QM methods. A variety of techniques including
semi-empirical methods, molecular mechanics and docking belong to
this category. While it is true that many of these methods involve
cheminformatics concepts (choice of tautomer, partial charge
assignments, etc.), the methodologies are fundamentally physical in
nature. We note that computational chemistry has close ties to
topics such as x-ray crystallography and NMR structure elucidation.
\item \textbf{Chemometrics} is a focused application of statistical
methods to problems in analytical chemistry, such as infrared, mass
and NMR spectroscopy.
\item \textbf{Translational research/medicine} aims to make more
direct links between academic (or early stage) research and clinical
practice in an effort to speed up the results of research into
actual treatments. When applied to medical research this usually
implies efforts to speed up the process by which a compounds goes
from the lead stage to the clinical trial stage. Naturally,
cheminformatics plays a role at various stages of this process (lead
identification, lead optimization, ADMET modeling) as well as more
broadly such as in drug repurposing
efforts \cite{Dudley:2011fk,Swamidass:2011uq}. Related topics and
books are also Pharmacogenomics \cite{yan2008pharmacogenomics} and
Chemogenomics \cite{kubinyi2004chemogenomics}.
Obviously, we cannot cover all domains that involve or refer to
cheminformatics concepts. However, there are a number of publications
that cover cheminformatics topics in different ways. Probably the
most comprehensive cheminformatic treatise was published by Gasteiger
\cite{Gasteiger2003} in 2003. More concise, introductory texts
followed from Gasteiger \& Engel \cite{gasteigerengel2003} and Leach
\& Gillet \cite{leachgillet2007}. The books of Bajorath
\cite{Bajorath2004} and Oprea \cite{oprea2005} focus on the
applications of cheminformatics in drug design and span a number of
topics. The first books discussing chemical graph theory and
algorithms were published in 1989 by Zupan \cite{zupan1989}, 1991 by
Bonchev and Rouvrey \cite{bonchevrouvrey1991,bonchevrouvrey2003} and
1992 by Trinastic \cite{Trinajstic1992}. More recently, Faulon \&
Bender \cite{faulon2010} have edited a collection focusing on
algorithms in cheminformatics, including various applications of graph
theory. Other books focus more on specific topcs such as mathematical
challenges \cite{mathchallenges1995}, molecular diversity
\cite{moleculardiversity1999}, factor analysis \cite{Malinowski2002},
evolutionary algorithms \cite{clark2000} and molecular descriptors
\section{Historical Milestones}
In this section we highlight the development of a variety of
cheminformatics concepts and techniques over time. The shorter term
'chemoinformatics' was coined by Brown in
1998\cite{brown1998}. However, the main journal for this field
(\textit{Journal of Chemical Documentation} was founded in
1961. Interestingly, the journal was renamed to \textit{Journal of
Chemical Information and Computer Sciences} in 1975 to reflect the
tight connection between chemical information and computer
science. The Chemical Abstract Service (CAS) of the American Chemical
Society (ACS) formed a research department in 1955 and starting from
1965 they provided their Chemical Registry System, and in 1968 they
made the first computer-readable file of all abstracted documents
available \cite{Chen2006}.
From a more technical viewpoint, the first chemical graphs were drawn
by the Scottish chemist Willam Cullen in 1758
\cite{bonchevrouvrey1991}, who initially called them affinity graphs.
After developing the concept of bonds between atoms (Couper, 1858) the
first chemical graphs occured in publications of Brown in 1864 and
Cayley in 1874 \cite{bonchevrouvrey1991,brown2009}. The notion of
mathematical chemistry in general was discussed by Helm in 1897
\cite{Helm:1897ys} and the reader is referred to Balaban
\cite{Balaban:2005zr} for a historical overview of this field.
The year 1946 may be regarded as the birth year of chemoinformatics
\cite{Chen2006}: \textit{``In 1946 King et al.\cite{kct1946} published
an article illustrating the use of IBM's business accounting
machines in carrying out the construction of the rotational spectra
of asymmetric rotors by the evaluation of mathematical equations for
line position and line intensity''}. One could argue that this is
more of a computational chemistry application.
From an informatics perspective, the first record of managing chemical
information goes back to the chemical literature being indexed since
the year 1771. In 1881 the first edition of the \textit{Beilstein's
Handbuch der Organischen Chemie} encyclopedia was published
\cite{polanski2009} registering 1500 chemical compounds. Many key
cheminformatics algorithms appeared in the early to mid 20th century
(though in some cases, the underlying mathematics was known much
earlier). Ray and Kirsch published in 1957 the first substructure
searching algorithm to support retrieval of computerized structure
records \cite{RayKirsch1957}. Subsequent developments improved
substructure searches, notably the use of pre-screening via fragments
\cite{Adamson:1973fk,Feldman:1975uq} in the 1970's. One of the key
algorithms for molecular graph canonicalization is the Morgan
algorithm \cite{Morgan1965}, described in 1965.
In 1965 the DENDRAL expert system started with the aim to
automatically determine the structure of an unknown chemical compound
from the corresponding mass spectrum \cite{Gray1986}. The system is
also often being cited as one of the earliest artificial intelligence
and expert systems \cite{Chen2006}.
The first system supporting chemical synthesis planning was OCSS
(Organic Chemical Simulation of Syntheses) developed by Corey and
Wipke in 1969 \cite{CoreyWipke1969} which in turn was an
implementation of theories described by Vleduts in 1963
QSAR is one of the most well known applications of predictive
cheminformatics and originates with the work of Hansch
\cite{Hansch:1962vn} in 1962. While Hansch's approach focused on
property-activity relationships, true structure-activity relationships
originated with the work of Free and Wilson in 1964
\cite{Free:1964ys}. Subsequent work in this area led to 3D-QSAR
techniques including CoMFA \cite{Cramer:1988zr} and
In 1978 Gasteiger and Marsili published a fast algorithm to calculate
partial charges in organic molecules by a Partial Equalization of
Orbital Electronegativities (PEOE) \cite{gm78}, which is till today
one of the gold-standards to calculate partial charges of atoms.
Instead of converting molecular graphs to a vector representation for
applying machine learning methods, it is also possible to use direct
molecular graph mining methods \cite{okada2006}. Some of the first
published methods use neural networks \cite{kireev1995}, inductive
logic programming \cite{yh02a}, or graph kernels \cite{kti03}.
\bibliographystyle{abbrv} \bibliography{paper}
Jump to Line
Something went wrong with that request. Please try again.