This is the code&dataset for our paper [Modeling Attention and Memory for Auditory Selection in a Cocktail Party Environment. AAAI 2018]
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
dataset
software
AAAI2018-Modeling Attention and Memory for Auditory Selection in a Cocktail Party Environment.pdf
README.md

README.md

AAAI2018-Modeling Attention and Memory for Auditory Selection in a Cocktail Party Environment

=======================================================================

Our demo code is implemented in Keras (writtern in Python, and the backend is theano).

Usage:
$python main_run.py
or execute it in terminal background:
$bash run.sh

Notice:
(1). In order to aviod the version mismatch of Keras, we fork the verison_1.2.2 of Keras into this project.
(2). We use Matlab version of BSS_eval to evaluate NSDR.

Figure 1: Auditory Attention

Figure 1: Two specific attention tasks for auditory selection in a three speech mixture environment. One is top-down task-specific attention, and the other is bottom-up stimulus-driven attention.           Figure 2: Framework

Figure 2: An illustration of our Auditory Selection with Attention and Memory (ASAM). (a): The overall architecture of the proposed ASAM. (b): Life-long memory module to memory the prior knowledge. In top-down attention scene, the dashed boxes and arrow are only conducted in the training phase and removed in the evaluation time.    

Figure 3: Attention Heat Map

Figure 3: Effects of attention with different amounts of stimulus on one male and female mixture sample from WSJ0. (a) shows the SIR (Signal-to-Interference Ratio), SAR (Signal-to-Artifacts Ratio) and NSDR results, (b)-(d) are the auditory stimuli whose magnitudes are divided by the maximum magnitude, (e) is the mixture input spectrogram, (i) is the target spectrogram, (f)-(h) are attention maps based on the corresponding auditory stimuli and (j)-(l) are the corresponding predictions with their NSDR performances.    

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.