forked from kaldi-asr/kaldi
-
Notifications
You must be signed in to change notification settings - Fork 0
/
history.dox
123 lines (103 loc) · 6.64 KB
/
history.dox
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
// doc/history.dox
// Copyright 2009-2011 Microsoft Corporation
// 2012-2014 Johns Hopkins University (author: Daniel Povey)
// See ../../COPYING for clarification regarding multiple authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
// http://www.apache.org/licenses/LICENSE-2.0
// THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
// WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
// MERCHANTABLITY OR NON-INFRINGEMENT.
// See the Apache 2 License for the specific language governing permissions and
// limitations under the License.
/**
\page history History of the Kaldi project
Kaldi began its existence in the 2009 Johns Hopkins University workshop
cumbersomely titled "Low Development Cost, High Quality Speech Recognition for
New Languages and Domains" (see \ref history_ack). The focus of that project
was Subspace Gaussian Mixture Model (SGMM) based modeling and some
investigations into lexicon learning. The software which is now Kaldi began to
be developed there, but the recipe we developed at that time was still
dependent on HTK. A list of participants in that workshop, official and
unofficial, is (alphabetically by last name):
<small> Mohit Agarwal, Pinar Akyazi, Lukas Burget, Arnab Ghoshal, Ondrej Glembek, Nagendra Goel,
Martin Karafiat, Feng Kai, Daniel Povey, Ariya Rastrow, Richard C. Rose, Petr Schwarz,
Samuel Thomas. </small>
Some of the participants of that workshop agreed to meet again in the summer of
2010 in Brno, Czech Republic (hosted by the Brno University of Technology). The
aim of that workshop was to create a recipe based on the work done in 2009 that
was clean and releasable, and to create a general-purpose speech toolkit as a
byproduct. The problem we were trying to solve was that our previous recipe was
based on disparate scripts involving both HTK and our own early "Kaldi" code,
and was not easy to encapsulate. We also felt that a well-engineered, modern,
general-purpose speech toolkit with an open license would be an asset to the
speech-recognition community. During August of 2010 the following group of
people met in Brno to work on this (again alphabetically):
<small> Pinar Akyazi, Lukas Burget, Gilles Boullianne, Ondrej Glembek, Arnab Ghoshal,
Nagendra Goel, Mirko Hannemann, Petr Motlicek, Daniel Povey, Yanmin Qian, Petr
Schwarz, Jan Silowsky, Georg Stemmer, and Karel Vesely. </small>
We also had some remote help around this time and shortly afterward, from
Sandeep Boda, Sandeep Reddy and Haihua Xu (who helped with coding, code cleanup
and documentation); we were visited by Michael Riley (who helped us to understand
OpenFst and gave some lectures on FSTs), and would like to acknowledge the help of
Honza Cernocky (for negotiating the venue and some support for the workshop from
the Faculty of Information Technology of BUT and helping to organize it),
Renata Kohlova (administration), and Tomas Kasparek (system administration).
It is possible that this list of contributors contains
oversights; any important omissions are unlikely to be intentional.
A lot of code was written during the summer of 2010 but we still did not have a
complete working system. Some of the participants of the 2010 workshop
continued working to complete the toolkit and get a working set of training scripts.
The code was released on May 14th, 2011, and presented to public at ICASSP 2011
in Prague,
<a href="https://www.superlectures.com/icassp2011/category.php?lang=en&id=131">
see the recordings</a>.
Since the initial release, Kaldi has been maintained and developed to a large
extent by Daniel Povey, working at Microsoft Research until early 2012 and
since then at Johns Hopkins University; but also with major contributions by
others: notably Karel Vesely, who developed the neural-net training framework,
and Arnab Ghoshal, who coordinated the acoustic modeling work early on; but
also other major contributors whom we do not name here because it is too hard
to determine where to cut off the list; and a long tail of minor contributors;
the total number of people who have contributed code or scripts or patches is
about 70 so far.
\section history_ack Acknowledgements
The JHU 2009 workshop was supported by National Science Foundation Grant Number
IIS-0833652, with supplemental funding from Google Research, DARPA's GALE
program and the Johns Hopkins University Human Language Technology Center of
Excellence. BUT researchers were partially supported during this time by Czech
Ministry of Trade and Commerce project no. FR-TI1/034, Grant Agency of Czech
Republic project no. 102/08/0707, and Czech Ministry of Education project
no. MSM0021630528.
Arnab Ghoshal was affiliated with Saarland University supported by
the European Community's Seventh Framework Programme
grant number 213850 (SCALE), and with The University of Edinburgh
supported by United Kingdom's Engineering and Physical Sciences
Research Council grant number EP/I031022/1 (Natural Speech
Technology)"
The work of BUT researchers on Kaldi was supported by the Technology Agency
of the Czech Republic under project No. TA01011328.
We would like to acknowledge the support of Geoffrey Zweig and Alex Acero
at Microsoft Research, as well as the generosity of Henrique (Rico) Malvar in
allowing the use of his FFT code. Thanks are also due to Patrick Nguyen
for his help in organizing the JHU'09 workshop and with the Wall Street
Journal recipe. We would also like to acknowledge the help
of faculty and staff at Johns Hopkins University's Center for Language and
Speech Processing during the JHU'09 workshop: particularly
Sanjeev Khudanpur, Desiree Cleves and the late Fred Jelinek.
Since 2012, Kaldi development has received significant support from IARPA's
BABEL program (IARPA-BAA-11-02) and from the Human Language Technology
Center of Excellence (HLTCOE); and since 2015, from the NSF computing
research infrastructure (CRI) award ``CI-EN: Enhancements for the Kaldi Speech
Recognition Toolkit''.
Sanjeev Khudanpur deserves special mention for creating the conditions for the
Kaldi project to succeed, first at the JHU'09 workshop where in his role as
workshop organizer he was instrumental in putting the team together
(e.g. suggesting to add Lukas Burget, without whom none of this would have
happened); and since 2012 by making it possible for Daniel Povey to work at
Johns Hopkins University in a position which allows him to devote much of his
time to Kaldi development.
*/