S2T-Perceiver | Dynamic Latent Access |
---|---|
# path to clone this repo
export PERCEIVER_ROOT=...
# path to must-c data
export MUSTC_ROOT=...
# path to save experiment outputs
export OUTPUT_ROOT=...
Clone this repository to $PERCEIVER_ROOT
:
git clone https://github.com/mt-upc/s2t_perceiver.git ${PERCEIVER_ROOT} && \
cd ${PERCEIVER_ROOT} && \
git submodule init && \
git submodule update
Create a conda environment using the environment.yml
file and activate it:
conda env create -f ${PERCEIVER_ROOT}/environment.yml && \
conda activate s2t_perceiver && \
pip install --editable ${PERCEIVER_ROOT}/fairseq/
To prepare the MuST-C data follow the instructions here. We used en-de (v2.0) and en-es, en-ru (v1.0).
To train the model first do the ASR pre-training step and then start the ST training with the pre-trained encoder.
- To train without DLA-train, set
$k_train = $n
. - Example is for English-to-German (en-de).
- The suggested values for
base_update_freq
andbatch_size
are for an NVIDIA GeForce RTX 2080 Ti , adjust them accordingly for other devices. - For training with multiple devices make sure that
base_update_freq
is divisible byn_gpus
.
# total number of latents
n=...
# number of latents for DLA-train
k_train=...
# language pair, for example: "en-de"
lang_pair=...
# ASR pre-training
bash ${PERCEIVER_ROOT}/scripts/train_perceiver_asr.sh $n $k_train
# ST training
bash ${PERCEIVER_ROOT}/scripts/train_perceiver_st.sh $lang_pair $n $k_train
To evaluate without DLA-inf, set $k_inf = $n
.
# path to the trained model ($path_to_exp/ckpts)
path_to_exp=...
# language pair, for example: "en-de"
lang_pair=...
# number of latents for DLA-inf
k_inf=...
bash ${PERCEIVER_ROOT}/scripts/eval_perceiver_st.sh $path_to_exp $lang_pair $k_inf
@INPROCEEDINGS{10095276,
author={Tsiamas, Ioannis and Gállego, Gerard I. and Fonollosa, José A. R. and Costa-jussà, Marta R.},
booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={Efficient Speech Translation with Dynamic Latent Perceivers},
year={2023},
volume={},
number={},
pages={1-5},
doi={10.1109/ICASSP49357.2023.10095276}}