Continuous Conversational SSL

OLD REPO FOR ACCESS TO THE NEWER STEREO-VERSION OF THIS MODEL SEE:

VoiceActivityProjection

IT SIMPLIFIES INFERENCE AND DEPENDENCY HELL...

Model training for

Voice Activity Projection: Self-supervised Learning of Turn-taking Events
How Much Does Prosody Help Turn-taking? Investigations using Voice Activity Projection Models

Installation

Create conda env: conda create -n conv_ssl python=3.9
- source env: conda source conv_ssl
PyTorch: conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
Dependencies:
- Install requirements: pip install -r requirements.txt
- NOTE: If you have problems install pip install cython manually first and then run the pip install -r requirements.txt command (trouble automating the install of the CPC_audio repo).
  - [Optional] Manual installation of CPC_audio (if the note above does not work)
    - git clone https://github.com/facebookresearch/CPC_audio.git
    - cd to repo and install dependencies (see repository) but probably you'll need
      - pip install cython
    - run: python setup.py develop (again see original implementation)
- VAP: Voice Activity Projection multi-purpose "head".
  - Install vap_turn_taking
    - git clone https://github.com/ErikEkstedt/vap_turn_taking.git
    - cd to repo, and install dependencies: pip install -r requirements.txt
    - Install: pip install -e .
- DATASET
  - Install datasets_turntaking
    - git clone https://github.com/ErikEkstedt/datasets_turntaking.git
    - cd to repo, and install dependencies: pip install -r requirements.txt
    - Install repo: pip install -e .
  - WARNING: Requires Switchboard and/or Fisher data!
Install conv_ssl:
- cd to root directory and run: pip install -e .

Train

python conv_ssl/train.py data.datasets=['switchboard','fisher'] +trainer.val_check_interval=0.5 early_stopping.patience=20

Evaluate

python conv_ssl/evaluation/evaluation.py \
  +checkpoint_path=/full/path/checkpoint.ckpt \
  +savepath=assets/vap_fis \
  data.num_workers=4 \
  data.batch_size=16

Run

The run.py script loads a pretrained model and evaluates on a sample (waveform + text_grid_name.TextGrid or vad_list_name.json). See examples folder for format etc.

Using defaults: python run.py

Custom run requires a audio file sample.wav and either a text_grid_name.TextGrid or vad_list_name.json

python run.py \
  -c example/cpc_48_50hz_15gqq5s5.ckpt \
  -w example/student_long_female_en-US-Wavenet-G.wav \ # waveform
  -tg example/student_long_female_en-US-Wavenet-G.TextGrid \ # text grid
  -v example/vad_list.json \ # vad-list
  -o VAP_OUTPUT.json  # output file

Paper

The paper investigates the performance over kfold splits (see conv_ssl/config/swb_kfolds) over 4 different model architectures ('discrete', 'independent', 'independent-40', 'comparative').

Save samples to disk: conv_ssl/dataset_save_samples_to_disk.py
train on samples on disk: conv_ssl/train_disk.py
run scripts/model_kfold.bash
We evaluate (find threshold over validation set + final evaluation on test-set)
- see conv_ssl/evaluation/evaluate_paper_model.py
- the ids are the WandB ids.
- We save all model scores to disk
In conv_ssl/evaluation/anova.py we compare the scores to extract the final values in the paper.

Experiments

Training uses WandB by default.
The event settings used in the paper are included in conv_ssl/config/event_settings.json.
- See paper Section 3

from conv_ssl.utils import read_json

event_settings = read_json("conv_ssl/config/event_settings.json")
hs_kwargs = event_settings['hs']
bc_kwargs = event_settings['bc']
metric_kwargs = event_settings['metric']

{
  "hs": {
    "post_onset_shift": 1,
    "pre_offset_shift": 1,
    "post_onset_hold": 1,
    "pre_offset_hold": 1,
    "non_shift_horizon": 2,
    "metric_pad": 0.05,
    "metric_dur": 0.1,
    "metric_pre_label_dur": 0.5,
    "metric_onset_dur": 0.2
  },
  "bc": {
    "max_duration_frames": 1.0,
    "pre_silence_frames": 1.0,
    "post_silence_frames": 2.0,
    "min_duration_frames": 0.2,
    "metric_dur_frames": 0.2,
    "metric_pre_label_dur": 0.5
  },
  "metric": {
    "pad": 0.05,
    "dur": 0.1,
    "pre_label_dur": 0.5,
    "onset_dur": 0.2,
    "min_context": 3.0
  }
}

Citation

TBA

Name		Name	Last commit message	Last commit date
Latest commit History 171 Commits
conv_ssl		conv_ssl
docker		docker
example		example
frontend		frontend
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
pytest.ini		pytest.ini
requirements.txt		requirements.txt
run.py		run.py
setup.py		setup.py
visualize_run.py		visualize_run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Continuous Conversational SSL

Installation

Train

Evaluate

Run

Paper

Experiments

Citation

About

Releases

Packages

Languages

ErikEkstedt/conv_ssl

Folders and files

Latest commit

History

Repository files navigation

Continuous Conversational SSL

Installation

Train

Evaluate

Run

Paper

Experiments

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages