Skip to content

Commit

Permalink
Fix a couple LaTeX typos in synthetic AMI README.md.
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 455643636
  • Loading branch information
Sound Separation Team authored and stwisdom committed Jun 17, 2022
1 parent 1858967 commit 3915614
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion datasets/synthetic_ami/README.md
Expand Up @@ -20,7 +20,7 @@ python3 make_synthetic_ami.py -a ${AMI_DIRECTORY} -o ${OUTPUT_DIRECTORY}

A "segment" is a section of the meeting that is single-speaker, as indicated by the AMI annotations. The wav file path from which a segment is extracted is given by `wav_<bg,fg>`, and the start and end times of the segments are given by `seg_start_<bg,fg>` and `seg_end_<bg,fg>`. For each segment, we use least-squares to estimate the best linear time-invariant finite impulse response (FIR) filter that maps single-speaker headset audio to distant microphone audio. This provides clean reverberant versions of the anechoic headset audio, which can then be mixed together.

This filtering procedure also provides an estimate of the background noise. Given headset audio $x$, distant microphone audio $y$, and inferred filter $\hat{h}$, the filtered headset is $x*\hat{h}$, and the residual $y - x*\hat{h}$ is an estimate of the background noise. Note that the residual may still contain some speech and thus is an imperfect reference for background noise, since the linear filtering is not perfect.
This filtering procedure also provides an estimate of the background noise. Given headset audio $ x $, distant microphone audio $ y $, and inferred filter $\hat{h}$, the filtered headset is $x*\hat{h}$, and the residual $y - x*\hat{h}$ is an estimate of the background noise. Note that the residual may still contain some speech and thus is an imperfect reference for background noise, since the linear filtering is not perfect.

To construct the synthetic AMI mixtures with their corresponding references, we extract shorter "clips" from the segments decribed above. The offset of a clip within a segment is given by `offset_<bg,fg>`, and the duration of a clip from this offset is given by `duration_<bg,fg>`. Each synthetic AMI example is constructed from two clips: a "background" clip, which is always 5 seconds long, and a "foreground" clip, which has duration less than or equal to 5 seconds. For the background clip, two sources are created: reverberant filtered headset $x*\hat{h}$, and the reverberant residual $y-\hat{h}*x$ that serves as an imperfect reference for background noise. For the foreground clip, a single source is created: the reverberant filtered headset, shifted by `shift_fg`.

Expand Down

0 comments on commit 3915614

Please sign in to comment.