Group-dependent filled pause prediction models and training scripts

About this repository

This is an official implementation of Personalized Filled-pause Generation with Group-wise Prediction Models (in LREC 2022). We implement group-dependent filled pause (FP) prediction on the basis of FP usage of speakers using Corpus of Spontaneous Japanese (CSJ). Pre-trained group-dependent models on the basis of FP words and positions and training script of group-dependent models are available.

Requirements

You can install the Python requirements with
- Our recommendation of the Python version is 3.8.
```
$ pip install -r requirements.txt
```
Install BERT model to the directory bert/ from here. We use pytorch-pretrained-BERT with LARGE WWM version.

1. Pre-trained group-dependent filled pause prediction models

Pre-trained group-dependent filled pause prediction models are available at model_files. File names and model descriptions are listed below. Model files follows pytorch-lightning format. We recommend using predict.py to get prediction results (detailed in here). We describe the detailed process of grouping speakers and training models in paper.

filename	description
word_group1.ckpt	group 1 (word)
word_group2.ckpt	group 2 (word)
word_group3.ckpt	group 3 (word)
word_group4.ckpt	group 4 (word)
position_group1.ckpt	group 1 (position)
position_group2.ckpt	group 2 (position)
position_group3.ckpt	group 3 (position)
position_group4.ckpt	group 4 (position)

2. Training of group-dependent filled pause prediction models

Step 0: Prepare CSJ data

Install CSJ to the directory corpus/ from here. We need transcription files of core and noncore data with Form1.

Step 1: Get CSJ information

The script get_csj_info.py get the list of the pairs of speaker and lecture id and the list of the core speakers.

$ python get_csj_info.py path/to/CSJ path/to/CSJ/fileList.csv

Step 2: Preprocess

The script preprocess.py gets the list of utterances from the transcription files, segments them to morphemes, extracts features, splits them to training, validation, and evaluaitio data, and gets the frequency of FPs. This follows the setting written in conf/preprocess/config.yaml. Change the setting accordingly.

$ python preprocess.py

Step 3: Training

The script train.py train the non-personalized model or group-dependent models. This follows the setting written in conf/train/config.yaml. Change the setting accordingly.

$ python train.py

Train the non-personalized model. Write the following in conf/train/config.yaml.
```
train:
    model_type: non_personalized
    fine_tune: False
```

Train the group-dependent models. Write the following in conf/train/config.yaml.

train:
    model_type: group
    group_id: <group_id>
    fine_tune: True
    load_ckpt_step: <step>

Contributors

Yuta Matsunaga (The University of Tokyo, Japan) [main contributor]
Takaaki Saeki (The University of Tokyo, Japan)
Shinnosuke Takamichi (The University of Tokyo, Japan)
Hiroshi Saruwatari (The University of Tokyo, Japan)

Citation

@inproceedings{matsunaga22personalizedfpgeneration,
    title = "Personalized Filled-pause Generation with Group-wise Prediction Models",
    author = "Yuta Matsunaga and Takaaki Saeki and Shinnosuke Takamichi and Hiroshi Saruwatari",
    booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",
    month = "Jun.",
    year = "2022",
    pages = "385--392",
}

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
bert		bert
conf		conf
corpus/CSJ		corpus/CSJ
fp_pred_group		fp_pred_group
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
evaluate.py		evaluate.py
get_csj_info.py		get_csj_info.py
predict.py		predict.py
preprocess.py		preprocess.py
preprocess_test.py		preprocess_test.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Group-dependent filled pause prediction models and training scripts

About this repository

Requirements

1. Pre-trained group-dependent filled pause prediction models

2. Training of group-dependent filled pause prediction models

Step 0: Prepare CSJ data

Step 1: Get CSJ information

Step 2: Preprocess

Step 3: Training

Contributors

Citation

References

About

Releases

Packages

Languages

License

ndkgit339/filledpause_prediction_group

Folders and files

Latest commit

History

Repository files navigation

Group-dependent filled pause prediction models and training scripts

About this repository

Requirements

1. Pre-trained group-dependent filled pause prediction models

2. Training of group-dependent filled pause prediction models

Step 0: Prepare CSJ data

Step 1: Get CSJ information

Step 2: Preprocess

Step 3: Training

Contributors

Citation

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages