egs/icsi/README.txt


About the ICSI corpora and this particular recipe
=================================================================================

This recipe builds ASR models using ICSI data [3] (available from LDC) and where possible 
follows the best practises from AMI s5b recipe (look also at: ../ami/s5b).

ICSI data comprises around 72 hours of natural, meeting-style overlapped English speech 
recorded at International Computer Science Institute (ICSI), Berkley. 
Speech is captured using the set of parallel microphones, including close-talk headsets,
and several distant independent microhones (i.e. mics that do not form any explicitly 
known geometry, see below for an example layout). Recordings are sampled at 16kHz.

See [1] for more details on ICSI, or [2,3] to access the data. 
The correpodning paper describing the ICSI corpora is [4]

[1] http://www1.icsi.berkeley.edu/Speech/mr/
[2] LDC: LDC2004S02 for audio, and LDC2004T04 for transcripts (used in this recipe)
[3] http://groups.inf.ed.ac.uk/ami/icsi/ (free access, but for now only ihm data is available for download)
[4] A Janin, D Baron, J Edwards, D Ellis, D Gelbart, N Morgan, B Peskin,
    T Pfau, E Shriberg, A Stolcke, and C Wooters, The ICSI meeting corpus. 
    in Proc IEEE ICASSP, 2003, pp. 364-367


ICSI data did not come with any pre-defined splits for train/valid/eval sets as it was
mostly used as a training material for NIST RT evaluations. Some portions of the unrelased ICSI 
data (as a part of this corpora) can be found in, for example, NIST RT04 amd RT05 evaluation sets.

This recipe, however, to be self-contained factors out training (67.5 hours), development (2.2 hours 
and evaluation (2.8 hours) sets in a way to minimise the speaker-overlap between different partitions, 
and to avoid known issues with available recordings during evaluation. This recipe follows [5] where 
dev and eval sets are making use of {Bmr021, Bns00} and {Bmr013, Bmr018, Bro021} meetings, respectively.

[5] S Renals and P Swietojanski, Neural networks for distant speech recognition. 
    in Proc IEEE HSCMA 2014 pp. 172-176. DOI:10.1109/HSCMA.2014.6843274


============================================================================================
To train models, for an arbitrary mic, go to s5 and run something like (after you
set ICSI_DIR and/or ICSI_TRANS variables in the below scripts):

./run_prepare_shared.sh

and then

./run.sh --mic mic_type 

where mic_type depends on whether you want to use individual headset mic (ihm),
distant (but beamformed) multiple mics (mdm) or distant single mic (D1...D4).
Mutliple distant microphones (mdm) setup is using only up to 4 PZM mics. Look below
for more details, on notations and a typical meeting layout that ICSI recordings followed.

Look at run.sh for more details on what mic_type is expected to be.


Below description is (mostly) copied from ICSI documentation for convenience.
=================================================================================

Simple diagram of the seating arrangement in the ICSI meeting room.             
                                                                                
The ordering of seat numbers is as specified below, but their                   
alignment with microphones may not always be as precise as indicated            
here. Also, the seat number only indicates where the participant                
started the meeting. Since most of the microphones are wireless, they           
were able to move around.                                                       
                                                                                                                                                                
   Door                                                                         
                                                                                
                                                                                
          1         2            3           4                                  
     -----------------------------------------------------------------------    
     |                      |                       |                      |   S
     |                      |                       |                      |   c
     |                      |                       |                      |   r
    9|   D1        D2       |   D3  PDA     D4      |                      |   e
     |                      |                       |                      |   e
     |                      |                       |                      |   n
     |                      |                       |                      |    
     -----------------------------------------------------------------------    
          8         7            6           5                                  
                                                                                
                                                                                
D1, D2, D3, D4  - Desktop PZM microphones                                       
PDA - The mockup PDA with two cheap microphones                                 
                                                                                
The following are the TYPICAL channel assignments, although a handful           
of meetings (including Bmr003, Btr001, Btr002) differed in assignment.         

The mapping from the above, to the actual waveform channels in the corpora,
and (this recipe for a signle distant mic case) is:
                                                                                
D1 - chanE - (this recipe: sdm3)                                                                      
D2 - chanF - (this recipe: sdm4)                                                                     
D3 - chan6 - (this recipe: sdm1)                                                                     
D4 - chan7 - (this recipe: sdm2)                                                                     
PDA left - chanC                                                                
PDA right - chanD 

-----------
Note (Pawel): The mapping for headsets is being extracted from mrt files. 
In cases where IHM channels are missing for some speakers in some meetings, 
in this recipe we either back off to distant channel (typically D2, default)
or (optionally) skip this speaker's segments entirely from processing. 
This is not the case for eval set, where all the channels come with the 
expected recordings, and split is the same for all conditions (thus allowing 
for direct comparisons between IHM, SDM and MDM settings).