SpeechPrompt-v2

Website: https://ga642381.github.io/SpeechPrompt/
Paper Link: https://arxiv.org/abs/2303.00733
Pipeline Charts: https://github.com/ga642381/SpeechPrompt-v2/blob/main/docs/pipeline.png
Datasets Doc: https://github.com/ga642381/SpeechPrompt-v2/blob/main/docs/dataset.md

Update Reminder:

Sampling Rate for Downstream Task:

When performing prompting on the downstream task, ensure that the sampling rate of the audios is 16kHz.
Modification: There is a recent commit to force librosa to load the audio in 16kHz.

Pre-trained Model Loading:

Make sure the pre-trained model is loaded correctly for reasonable results with prompting.
Observation: When loading the pre-trained model correctly, the training epoch for prompts should start at epoch 46, not epoch 1. This is because the pre-trained GSLM is already trained for 45 epochs.

🐘 Pre-trained models and files

There are 4 files you will be having:

HuBERT model: encoding speech
K-means model: quantizing the speech representations into discrete units
dictionary file: defining the unit space for the unit language model.
unit Language Model (uLM): performing generative language modeling on the disrete units

These models can be automatically downloaded when running preprocessing pipeline.

🔧 Preprocessing

Concept

There are 4 steps in the data preprocess (Speech2unit) pipline. The main task here is to perform speech-to-units and collating the task labels
1. generate manifest
2. quantize
3. reduce_quantized
4. create_lm_dataset
We save intermediate data in each step so that we can do further analysis on the data that we are interested in. Also, you can better understand how it works by checking each intermediate data.

Steps

Download the dataset
Modify the dataset config ([downstream]/config.yaml)
Modify the global config (preprocess/config.yaml)

Run Preporcess/runner.py

option 1

# You can run --action all to run through all the 4 stages:
python runner.py --model GSLM --downstream SCR_google_speech_commands --action all

option 2

# Or you can run through these 4 stages sequentially by the following command:
python runner.py --model GSLM --downstream SCR_google_speech_commands --action generate_manifest
python runner.py --model GSLM --downstream SCR_google_speech_commands --action quantize
python runner.py --model GSLM --downstream SCR_google_speech_commands --action reduce_quantized
python runner.py --model GSLM --downstream SCR_google_speech_commands --action create_lm_dataset

🔄 Verbalizer

Concept

There are 2 steps in Verbalizer, which maps the task labels into language model's vocabulary.

Steps

run verbalizer.py

example:

python verbalizer.py --downstream SCR_google_speech_commands --action all --method freq

🐟 Fairseq Preprocess

Concept

This step converts the verbalized data to binary files that will be used for fairseq training.

Steps

run fairseq_preprocess.py

example:

python fairseq_preprocess.py --downstream SCR_google_speech_commands --vb_method freq

🔥 Training

Concept

During training, 2 kinds of checkpoints will be saved
- base_model
- prompt

steps

run train.py

example:

python train.py \
    --downstream SCR_google_speech_commands \
    --vb_method freq \
    --exp_name SCR_google_speech_commands_plen.5 \
    --prompt_length 5 \
    --deep_prompt

✒️ Sampling

Concept

Load base_model and prompts to perform sampling

Steps

run sample.py

example:

python sample.py \
    --exp_name SCR_google_speech_commands_plen.5 \
    --downstream SCR_google_speech_commands \
    --vb_method freq

The output is a json file containing the file_name, source units, ground truth (label), and model prediction:

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
GSLM		GSLM
docs		docs
.gitignore		.gitignore
README.md		README.md
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GSLM

GSLM

docs

docs

.gitignore

.gitignore

README.md

README.md

environment.yaml

environment.yaml

Repository files navigation

SpeechPrompt-v2

Update Reminder:

🐘 Pre-trained models and files

🔧 Preprocessing

Concept

Steps

🔄 Verbalizer

Concept

Steps

🐟 Fairseq Preprocess

Concept

Steps

🔥 Training

Concept

steps

✒️ Sampling

Concept

Steps

About

Releases

Packages

Contributors 2

Languages

ga642381/SpeechPrompt-v2

Folders and files

Latest commit

History

Repository files navigation

SpeechPrompt-v2

Update Reminder:

🐘 Pre-trained models and files

🔧 Preprocessing

Concept

Steps

🔄 Verbalizer

Concept

Steps

🐟 Fairseq Preprocess

Concept

Steps

🔥 Training

Concept

steps

✒️ Sampling

Concept

Steps

About

Topics

Resources

Stars

Watchers

Forks

Languages