specified by the command -d ctc
- Download Common Voice Corpus 7.0
es
: Spanishzh-CN
: Chinese (China)ar
: Arabic
- Modify the preprocessing script
corpus/preprocess_cv.sh
cv_root
: path to common voice 7.0 datasetdata_root
: path to save preprocessed data (.tsv
files)
- Run the following script
This step includes preprocessing transcriptions and downsampling audio waveforms.
cd downstream/ctc/corpus bash preprocess_cv.sh
- Check processed files (in
${data_root}
). ├── ar │ ├── dev.tsv │ ├── test.tsv │ ├── train.tsv │ └── train.txt ├── es │ ├── dev.tsv │ ├── test.tsv │ ├── train.tsv │ └── train.txt └── zh-CN ├── dev.tsv ├── test.tsv ├── train.tsv └── train.txt
- Modify training configs in
cv_config/
downstream_expert: corpus: name: 'common_voice' path: 'path/to/cv-corpus-7.0-2021-07-21/.../clips' train: ['path/to/train.tsv'] dev: ['path/to/dev.tsv'] test: ['path/to/test.tsv']
- Training
Replace
parser.add_argument('-k', '--upstream_ckpt', metavar='{PATH,URL,GOOGLE_DRIVE_ID}', help='Only set when the specified upstream need it') parser.add_argument('-g', '--upstream_model_config', help='The config file for constructing the pretrained model') parser.add_argument('-r', '--upstream_refresh', action='store_true', help='Re-download cached ckpts for on-the-fly upstream variants') parser.add_argument('-f', '--upstream_trainable', action='store_true', help='Fine-tune, set upstream.train(). Default is upstream.eval()') parser.add_argument('-s', '--upstream_feature_selection', default='hidden_states', help='Specify the layer to be extracted as the representation') parser.add_argument('-l', '--upstream_layer_selection', type=int, help='Select a specific layer for the features selected by -s') parser.add_argument('--upstream_feature_normalize', action='store_true', help='Specify whether to normalize hidden features before weighted sum') parser.add_argument('--upstream_model_name', default="model.pt", help='The name of the model file in the HuggingFace Hub repo.') parser.add_argument('--upstream_revision', help="The commit hash of the specified HuggingFace Repository") python3 run_downstream.py -n ExpName -m train -u Upstream -d ctc -c downstream/ctc/cv_config/cv_${lang}.yaml
${lang}
withes
,zh
, orar
. - Testing
python3 run_downstream.py -m evaluate -e result/downstream/ExpName/dev-best.ckpt
- Clone vectominist/SBCSAE-preprocess for data preprocessing
git clone https://github.com/vectominist/SBCSAE-preprocess.git
- Follow the instructions in [vectominist/SBCSAE-preprocess] to download and process data.
- Modify training the config in
sbcsae.yaml
downstream_expert: corpus: name: 'sbcsae' path: 'path/to/SBCSAE/wav' train: ['path/to/sbcsae/train.tsv'] dev: ['path/to/sbcsae/dev.tsv'] test: ['path/to/sbcsae/test.tsv']
- Training
python3 run_downstream.py -n ExpName -m train -u Upstream -d ctc -c downstream/ctc/sbcsae.yaml
- Testing
python3 run_downstream.py -m evaluate -e result/downstream/ExpName/dev-best.ckpt