LSTM Continuous Turn-Taking Prediction

Pytorch implementation for two papers:

Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs (ICMI '18)
Investigating Speech Features for Continuous Turn-Taking Prediction Using LSTMs (INTERSPEECH '18)

The supplied code is designed to reproduce the main results from [1] that show the utility of using the multiscale approach. The code can potentially be adapted to reproduce other results from both papers. It can also be used to investigate other user-defined feature-sets and architectures. I hope it is useful! Feel free to contact me if you find any errors or have any queries. Please note that it is still a work in progress. The data preparation script takes roughly 4 hours on a modern computer with 4 cores. The script to reproduce the results takes several hours using a single GTX1080.

Requirements:

Linux
PyTorch v>0.3.0
Anaconda
nltk
Sox
OpenSmile-2.3.0

Setup

Download the repository.

git clone https://github.com/mattroddy/lstm_turn_taking_prediction

Download the maptask corpus audio data from (http://groups.inf.ed.ac.uk/maptask/maptasknxt.html) by running the wget.sh script obtained from the site. Run the script from within the lstm_turn_taking_prediction/data/ folder:

cd lstm_turn_taking_prediction/data
sh 'maptaskBuild-xxxxx.wget.sh'
wget http://groups.inf.ed.ac.uk/maptask/hcrcmaptask.nxtformatv2-1.zip
unzip hcrcmaptask.nxtformatv2-1.zip
rm hcrcmaptask.nxtformatv2-1.zip
cd ..

Split the audio channels:

sh scripts/split_channels.sh

Download opensmile from (https://audeering.com/technology/opensmile/#download) and extract into lstm_turn_taking_prediction/utils. Then replace config files with modified ones: (note: config files have been modified to use a 50ms step size, not use smoothing, and adopt the left-alignment convention)

rm -r utils/opensmile-2.3.0/config
mv utils/config utils/opensmile-2.3.0/

Extract features and evaluation metrics:

python prepare_data.py

Running the code

At this point a model can be trained and tested by running:

python run_json.py

To reproduce the main results in [1] set the path to your python environment in the appropriate icmi_18_results file. Then:

python icmi_18_results_no_subnets.py 
python icmi_18_results_two_subnets.py

This will reproduce table 1 from [1]. This should take about a day on a modern computer with a GTX1080 GPU. We reduce the number of trials from 5 to 3 to save time. The results can be viewed in the "report_dict.json" files within each respective directory.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data/splits		data/splits
img		img
scripts		scripts
utils/config		utils/config
.gitignore		.gitignore
README.md		README.md
data_loader.py		data_loader.py
feature_vars.py		feature_vars.py
icmi_18_results.py		icmi_18_results.py
icmi_18_results_no_subnets.py		icmi_18_results_no_subnets.py
icmi_18_results_two_subnets.py		icmi_18_results_two_subnets.py
lstm_model.py		lstm_model.py
prepare_data.py		prepare_data.py
run_json.py		run_json.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data/splits

data/splits

img

img

scripts

scripts

utils/config

utils/config

.gitignore

.gitignore

README.md

README.md

data_loader.py

data_loader.py

feature_vars.py

feature_vars.py

icmi_18_results.py

icmi_18_results.py

icmi_18_results_no_subnets.py

icmi_18_results_no_subnets.py

icmi_18_results_two_subnets.py

icmi_18_results_two_subnets.py

lstm_model.py

lstm_model.py

prepare_data.py

prepare_data.py

run_json.py

run_json.py

Repository files navigation

LSTM Continuous Turn-Taking Prediction

Requirements:

Setup

Running the code

About

Releases

Packages

Languages

mattroddy/lstm_turn_taking_prediction

Folders and files

Latest commit

History

Repository files navigation

LSTM Continuous Turn-Taking Prediction

Requirements:

Setup

Running the code

About

Resources

Stars

Watchers

Forks

Languages