src/doc/history.dox

// doc/history.dox

// Copyright 2009-2011  Microsoft Corporation
//           2012-2014  Johns Hopkins University (author: Daniel Povey)

// See ../../COPYING for clarification regarding multiple authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at

//  http://www.apache.org/licenses/LICENSE-2.0

// THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
// WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
// MERCHANTABLITY OR NON-INFRINGEMENT.
// See the Apache 2 License for the specific language governing permissions and
// limitations under the License.


/**
 \page history History of the Kaldi project

 Kaldi began its existence in the 2009 Johns Hopkins University workshop
 cumbersomely titled "Low Development Cost, High Quality Speech Recognition for
 New Languages and Domains" (see \ref history_ack).  The focus of that project
 was Subspace Gaussian Mixture Model (SGMM) based modeling and some
 investigations into lexicon learning.  The software which is now Kaldi began to
 be developed there, but the recipe we developed at that time was still
 dependent on HTK.  A list of participants in that workshop, official and
 unofficial, is (alphabetically by last name):

<small> Mohit Agarwal, Pinar Akyazi, Lukas Burget, Arnab Ghoshal, Ondrej Glembek, Nagendra Goel,
 Martin Karafiat, Feng Kai, Daniel Povey, Ariya Rastrow, Richard C. Rose, Petr Schwarz,
 Samuel Thomas. </small>

 Some of the participants of that workshop agreed to meet again in the summer of
 2010 in Brno, Czech Republic (hosted by the Brno University of Technology).  The
 aim of that workshop was to create a recipe based on the work done in 2009 that
 was clean and releasable, and to create a general-purpose speech toolkit as a
 byproduct.  The problem we were trying to solve was that our previous recipe was
 based on disparate scripts involving both HTK and our own early "Kaldi" code,
 and was not easy to encapsulate.  We also felt that a well-engineered, modern,
 general-purpose speech toolkit with an open license would be an asset to the
 speech-recognition community.  During August of 2010 the following group of
 people met in Brno to work on this (again alphabetically):

<small> Pinar Akyazi, Lukas Burget, Gilles Boullianne, Ondrej Glembek, Arnab Ghoshal,
 Nagendra Goel, Mirko Hannemann, Petr Motlicek, Daniel Povey, Yanmin Qian, Petr
 Schwarz, Jan Silowsky, Georg Stemmer, and Karel Vesely. </small>

 We also had some remote help around this time and shortly afterward, from
 Sandeep Boda, Sandeep Reddy and Haihua Xu (who helped with coding, code cleanup
 and documentation); we were visited by Michael Riley (who helped us to understand
 OpenFst and gave some lectures on FSTs), and would like to acknowledge the help of
 Honza Cernocky (for negotiating the venue and some support for the workshop from
 the Faculty of Information Technology of BUT and helping to organize it),
 Renata Kohlova (administration), and Tomas Kasparek (system administration).
 It is possible that this list of contributors contains
 oversights; any important omissions are unlikely to be intentional.

 A lot of code was written during the summer of 2010 but we still did not have a
 complete working system.  Some of the participants of the 2010 workshop
 continued working to complete the toolkit and get a working set of training scripts.
 The code was released on May 14th, 2011, and presented to public at ICASSP 2011 
 in Prague,
 <a href="https://www.superlectures.com/icassp2011/category.php?lang=en&id=131">
 see the recordings</a>.

 Since the initial release, Kaldi has been maintained and developed to a large
 extent by Daniel Povey, working at Microsoft Research until early 2012 and
 since then at Johns Hopkins University; but also with major contributions by
 others: notably Karel Vesely, who developed the neural-net training framework,
 and Arnab Ghoshal, who coordinated the acoustic modeling work early on; but
 also other major contributors whom we do not name here because it is too hard
 to determine where to cut off the list; and a long tail of minor contributors;
 the total number of people who have contributed code or scripts or patches is
 about 70 so far.

 \section history_ack Acknowledgements

 The JHU 2009 workshop was supported by National Science Foundation Grant Number
 IIS-0833652, with supplemental funding from Google Research, DARPA's GALE
 program and the Johns Hopkins University Human Language Technology Center of
 Excellence.  BUT researchers were partially supported during this time by Czech
 Ministry of Trade and Commerce project no. FR-TI1/034, Grant Agency of Czech
 Republic project no. 102/08/0707, and Czech Ministry of Education project
 no. MSM0021630528.
 Arnab Ghoshal was affiliated with Saarland University supported by
 the European Community's Seventh Framework Programme
 grant number 213850 (SCALE), and with The University of Edinburgh
 supported by United Kingdom's Engineering and Physical Sciences
 Research Council grant number EP/I031022/1 (Natural Speech
 Technology)"

 The work of BUT researchers on Kaldi was supported by the Technology Agency
 of the Czech Republic under project No. TA01011328.

 We would like to acknowledge the support of Geoffrey Zweig and Alex Acero
 at Microsoft Research, as well as the generosity of Henrique (Rico) Malvar in
 allowing the use of his FFT code.  Thanks are also due to Patrick Nguyen
 for his help in organizing the JHU'09 workshop and with the Wall Street
 Journal recipe.  We would also like to acknowledge the help
 of faculty and staff at Johns Hopkins University's Center for Language and
 Speech Processing during the JHU'09 workshop: particularly
 Sanjeev Khudanpur, Desiree Cleves and the late Fred Jelinek.

 Since 2012, Kaldi development has received significant support from IARPA's
 BABEL program (IARPA-BAA-11-02) and from the Human Language Technology
 Center of Excellence (HLTCOE); and since 2015, from the NSF computing
 research infrastructure (CRI) award ``CI-EN: Enhancements for the Kaldi Speech
 Recognition Toolkit''.

 Sanjeev Khudanpur deserves special mention for creating the conditions for the
 Kaldi project to succeed, first at the JHU'09 workshop where in his role as
 workshop organizer he was instrumental in putting the team together
 (e.g. suggesting to add Lukas Burget, without whom none of this would have
 happened); and since 2012 by making it possible for Daniel Povey to work at
 Johns Hopkins University in a position which allows him to devote much of his
 time to Kaldi development.

*/