# rajarshi/cacm-article

### Subversion checkout URL

You can clone with HTTPS or Subversion.

Fetching contributors…

Cannot retrieve contributors at this time

executable file 228 lines (212 sloc) 12.327 kb
 \documentclass{sig-alternate} \begin{document} \title{Supplement History of Cheminformatics as part of Cheminformatics: The Computer Science of Chemical Discovery} \numberofauthors{9} \author{ \alignauthor Joerg Kurt Wegner\\ \affaddr{Tibotec BVBA}\\ \affaddr{Turnhoutseweg 30}\\ \affaddr{2340 Beerse Turnhout, Belgium}\\ \email{jwegner@its.jnj.com} % 2nd. author \alignauthor Aaron Sterling\\ \affaddr{Department of Computer Science}\\ \affaddr{Iowa State University}\\ \affaddr{Ames, Iowa, USA}\\ \email{sterling@iastate.edu} % 3rd author \alignauthor Rajarshi Guha\\ \affaddr{NIH Center for Translational Therapeutics}\\ \affaddr{9800 Medical Center Drive}\\ \affaddr{Rockville, MD 20850}\\ \email{guhar@mail.nih.gov} } \additionalauthors{Additional authors: Andreas Bender (University of Cambridge, email: {\texttt{andreas.bender@cantab.net}}), Jean-Loup Faulon (University of Evry, email: {\texttt{Jean-Loup.Faulon@issb.genopole.fr}}), Janna Hastings (European Bioinformatics Institute, Cambridge, UK, email: {\texttt{janna.hastings@gmail.com}}), Noel O'Boyle (University College Cork, Cork, Ireland, email: {\texttt{baoilleach@gmail.com}}), John Overington (European Bioinformatics Institute, Cambridge, UK, email: {\texttt{jpo@ebi.ac.uk}}), Herman Van Vlijmen (Tibotec, Beerse, Belgium, email: {\texttt{hvvlijme@its.jnj.com}}), and Egon Willighagen (Karolinska Institutet, Stockholm, Sweden, email: {\texttt{egon.willighagen@ki.se}}) .} \date{25 June 2011} \maketitle \section{Definitions and References} The aim of this brief review of the history of cheminformatics is to put the content of the main article into a broader perspective. If we consider all information, analysis, and \emph{in silico} optimization of a molecule as cheminformatics'', then the field is very large, since a molecule plays the central role in many related disciplines. Clearly, this article is not mean to be an exhaustive overview; rather, we point the reader to more detailed references and highlight some historic milestones. In addition, we also discuss some terms as they relate to cheminformatics. It is thus useful to first provide some definitions of cheminformatics itself, given that they also shed some light on the founding principles of the field. For example, we have the following definitions: \begin{itemize} \item \textit{The mixing of information resources to transform data into information, and information into knowledge, for the intended purpose of making better decisions faster in the arena of drug lead identification and optimization.}'' Frank K. Brown, 1998. \item \textit{[Chemoinformatics involves]... the computer manipulation of two- or threedimensional chemical structures and excludes textual information. This distinguishes the term from chemical information, largely a discipline of chemical librarians and does not include the development of computational methods.}'' Peter Willett, 2002. \item \textit{\ldots the application of informatics to solve chemical problems.}'' and \textit{\ldots chemoinformatics makes the point that you're using one scientific discipline to understand another scientific discipline.}'' Johann Gasteiger, 2002 \item \textit{The set of approaches to computer-aided drug design that do not rely on the 3D structure of the [protein] target}'' John Van Drie, 2011 (personal communication) \end{itemize} More generally, Chen\cite{Chen2006} and Brown \cite{brown2009} cited various definitions of pioneers in the field with Brown concluding that '\textit{The differences in definitions (of the term cheminformatics) are largely a result of the types of analyses that particular scientists practice and no single definition is intended to be all-encompassing}'', as is evident from the three definitions above. It is also useful to clarify some other terms and how cheminformatics might relate to them: \begin{itemize} \item \textbf{Quantum chemistry} (QM) is generally associated with theoretical chemistry or chemical physics. Briefly, it focuses on the description of chemical systems starting from first principles and is based on the Schr\"{o}dinger wave equation and frameworks built on top of it. The use of quantum mechanics allows one to describe molecules in terms of electron density, and hence calculate molecular properties (spectra, energies et.) in a physically accurate manner. The results of quantum chemistry calculations are used in molecular modeling (energy force field parameters) and cheminformatics (atom partial charges, or full molecule properties, e.g molecule polarizabilities). \item \textbf{Molecular modeling} or \textbf{Computational chemistry} can be considered an approximation of quantum chemistry and aims to evaluate many of the properties considered by QM methods. The key difference is that these methods employ a variety of parametrizations (usually, but not always, derived from QM studies), allowing one to evaluate energies for small molecules very rapidly as well as handle larger systems (such as proteins) that are too time consuming for QM methods. A variety of techniques including semi-empirical methods, molecular mechanics and docking belong to this category. While it is true that many of these methods involve cheminformatics concepts (choice of tautomer, partial charge assignments, etc.), the methodologies are fundamentally physical in nature. We note that computational chemistry has close ties to topics such as x-ray crystallography and NMR structure elucidation. \item \textbf{Chemometrics} is a focused application of statistical methods to problems in analytical chemistry, such as infrared, mass and NMR spectroscopy. \item \textbf{Translational research/medicine} aims to make more direct links between academic (or early stage) research and clinical practice in an effort to speed up the results of research into actual treatments. When applied to medical research this usually implies efforts to speed up the process by which a compounds goes from the lead stage to the clinical trial stage. Naturally, cheminformatics plays a role at various stages of this process (lead identification, lead optimization, ADMET modeling) as well as more broadly such as in drug repurposing efforts \cite{Dudley:2011fk,Swamidass:2011uq}. Related topics and books are also Pharmacogenomics \cite{yan2008pharmacogenomics} and Chemogenomics \cite{kubinyi2004chemogenomics}. \end{itemize} Obviously, we cannot cover all domains that involve or refer to cheminformatics concepts. However, there are a number of publications that cover cheminformatics topics in different ways. Probably the most comprehensive cheminformatic treatise was published by Gasteiger \cite{Gasteiger2003} in 2003. More concise, introductory texts followed from Gasteiger \& Engel \cite{gasteigerengel2003} and Leach \& Gillet \cite{leachgillet2007}. The books of Bajorath \cite{Bajorath2004} and Oprea \cite{oprea2005} focus on the applications of cheminformatics in drug design and span a number of topics. The first books discussing chemical graph theory and algorithms were published in 1989 by Zupan \cite{zupan1989}, 1991 by Bonchev and Rouvrey \cite{bonchevrouvrey1991,bonchevrouvrey2003} and 1992 by Trinastic \cite{Trinajstic1992}. More recently, Faulon \& Bender \cite{faulon2010} have edited a collection focusing on algorithms in cheminformatics, including various applications of graph theory. Other books focus more on specific topcs such as mathematical challenges \cite{mathchallenges1995}, molecular diversity \cite{moleculardiversity1999}, factor analysis \cite{Malinowski2002}, evolutionary algorithms \cite{clark2000} and molecular descriptors \cite{todeschini2000}. \section{Historical Milestones} In this section we highlight the development of a variety of cheminformatics concepts and techniques over time. The shorter term 'chemoinformatics' was coined by Brown in 1998\cite{brown1998}. However, the main journal for this field (\textit{Journal of Chemical Documentation} was founded in 1961. Interestingly, the journal was renamed to \textit{Journal of Chemical Information and Computer Sciences} in 1975 to reflect the tight connection between chemical information and computer science. The Chemical Abstract Service (CAS) of the American Chemical Society (ACS) formed a research department in 1955 and starting from 1965 they provided their Chemical Registry System, and in 1968 they made the first computer-readable file of all abstracted documents available \cite{Chen2006}. From a more technical viewpoint, the first chemical graphs were drawn by the Scottish chemist Willam Cullen in 1758 \cite{bonchevrouvrey1991}, who initially called them affinity graphs. After developing the concept of bonds between atoms (Couper, 1858) the first chemical graphs occured in publications of Brown in 1864 and Cayley in 1874 \cite{bonchevrouvrey1991,brown2009}. The notion of mathematical chemistry in general was discussed by Helm in 1897 \cite{Helm:1897ys} and the reader is referred to Balaban \cite{Balaban:2005zr} for a historical overview of this field. The year 1946 may be regarded as the birth year of chemoinformatics \cite{Chen2006}: \textit{In 1946 King et al.\cite{kct1946} published an article illustrating the use of IBM's business accounting machines in carrying out the construction of the rotational spectra of asymmetric rotors by the evaluation of mathematical equations for line position and line intensity''}. One could argue that this is more of a computational chemistry application. From an informatics perspective, the first record of managing chemical information goes back to the chemical literature being indexed since the year 1771. In 1881 the first edition of the \textit{Beilstein's Handbuch der Organischen Chemie} encyclopedia was published \cite{polanski2009} registering 1500 chemical compounds. Many key cheminformatics algorithms appeared in the early to mid 20th century (though in some cases, the underlying mathematics was known much earlier). Ray and Kirsch published in 1957 the first substructure searching algorithm to support retrieval of computerized structure records \cite{RayKirsch1957}. Subsequent developments improved substructure searches, notably the use of pre-screening via fragments \cite{Adamson:1973fk,Feldman:1975uq} in the 1970's. One of the key algorithms for molecular graph canonicalization is the Morgan algorithm \cite{Morgan1965}, described in 1965. In 1965 the DENDRAL expert system started with the aim to automatically determine the structure of an unknown chemical compound from the corresponding mass spectrum \cite{Gray1986}. The system is also often being cited as one of the earliest artificial intelligence and expert systems \cite{Chen2006}. The first system supporting chemical synthesis planning was OCSS (Organic Chemical Simulation of Syntheses) developed by Corey and Wipke in 1969 \cite{CoreyWipke1969} which in turn was an implementation of theories described by Vleduts in 1963 \cite{Vleduts:1963kx}. QSAR is one of the most well known applications of predictive cheminformatics and originates with the work of Hansch \cite{Hansch:1962vn} in 1962. While Hansch's approach focused on property-activity relationships, true structure-activity relationships originated with the work of Free and Wilson in 1964 \cite{Free:1964ys}. Subsequent work in this area led to 3D-QSAR techniques including CoMFA \cite{Cramer:1988zr} and CoMSIA\cite{Klebe:1994ly}. In 1978 Gasteiger and Marsili published a fast algorithm to calculate partial charges in organic molecules by a Partial Equalization of Orbital Electronegativities (PEOE) \cite{gm78}, which is till today one of the gold-standards to calculate partial charges of atoms. Instead of converting molecular graphs to a vector representation for applying machine learning methods, it is also possible to use direct molecular graph mining methods \cite{okada2006}. Some of the first published methods use neural networks \cite{kireev1995}, inductive logic programming \cite{yh02a}, or graph kernels \cite{kti03}. \bibliographystyle{abbrv} \bibliography{paper} \end{document}
Something went wrong with that request. Please try again.