s3prl_attentive_correlation/s3prl/downstream/ctc at main · skakouros/s3prl_attentive_correlation

History

Name		Name	Last commit message	Last commit date
parent directory ..
corpus		corpus
cv_config		cv_config
cv_vocab		cv_vocab
lexicon		lexicon
vocab		vocab
README.md		README.md
__init__.py		__init__.py
data.py		data.py
expert.py		expert.py
libriphone.yaml		libriphone.yaml
librispeech.yaml		librispeech.yaml
metric.py		metric.py
sbcsae.yaml		sbcsae.yaml
snips.yaml		snips.yaml
text.py		text.py

README.md

OOD-ASR: Out-of-domain Automatic Speech Recognition Tasks

specified by the command -d ctc

Cross-lingual Tasks

Download Common Voice Corpus 7.0
- es: Spanish
- zh-CN: Chinese (China)
- ar: Arabic
Modify the preprocessing script corpus/preprocess_cv.sh
- cv_root: path to common voice 7.0 dataset
- data_root: path to save preprocessed data (.tsv files)
Run the following script
```
cd downstream/ctc/corpus
bash preprocess_cv.sh
```
This step includes preprocessing transcriptions and downsampling audio waveforms.

Check processed files (in ${data_root})

.
├── ar
│   ├── dev.tsv
│   ├── test.tsv
│   ├── train.tsv
│   └── train.txt
├── es
│   ├── dev.tsv
│   ├── test.tsv
│   ├── train.tsv
│   └── train.txt
└── zh-CN
    ├── dev.tsv
    ├── test.tsv
    ├── train.tsv
    └── train.txt

Modify training configs in cv_config/

downstream_expert:
    corpus:
        name: 'common_voice'
        path: 'path/to/cv-corpus-7.0-2021-07-21/.../clips'

        train: ['path/to/train.tsv']
        dev: ['path/to/dev.tsv']
        test: ['path/to/test.tsv']

Training

parser.add_argument('-k', '--upstream_ckpt', metavar='{PATH,URL,GOOGLE_DRIVE_ID}', help='Only set when the specified upstream need it')
parser.add_argument('-g', '--upstream_model_config', help='The config file for constructing the pretrained model')
parser.add_argument('-r', '--upstream_refresh', action='store_true', help='Re-download cached ckpts for on-the-fly upstream variants')
parser.add_argument('-f', '--upstream_trainable', action='store_true', help='Fine-tune, set upstream.train(). Default is upstream.eval()')
parser.add_argument('-s', '--upstream_feature_selection', default='hidden_states', help='Specify the layer to be extracted as the representation')
parser.add_argument('-l', '--upstream_layer_selection', type=int, help='Select a specific layer for the features selected by -s')
parser.add_argument('--upstream_feature_normalize', action='store_true', help='Specify whether to normalize hidden features before weighted sum')
parser.add_argument('--upstream_model_name', default="model.pt", help='The name of the model file in the HuggingFace Hub repo.')
parser.add_argument('--upstream_revision', help="The commit hash of the specified HuggingFace Repository")

python3 run_downstream.py -n ExpName -m train -u Upstream -d ctc -c downstream/ctc/cv_config/cv_${lang}.yaml

Replace ${lang} with es, zh, or ar.

Testing

python3 run_downstream.py -m evaluate -e result/downstream/ExpName/dev-best.ckpt

Spontaneous Speech

Clone vectominist/SBCSAE-preprocess for data preprocessing

git clone https://github.com/vectominist/SBCSAE-preprocess.git

Follow the instructions in [vectominist/SBCSAE-preprocess] to download and process data.

Modify training the config in sbcsae.yaml

downstream_expert:
    corpus:
        name: 'sbcsae'
        path: 'path/to/SBCSAE/wav'

        train: ['path/to/sbcsae/train.tsv']
        dev: ['path/to/sbcsae/dev.tsv']
        test: ['path/to/sbcsae/test.tsv']

Training

python3 run_downstream.py -n ExpName -m train -u Upstream -d ctc -c downstream/ctc/sbcsae.yaml

Testing

python3 run_downstream.py -m evaluate -e result/downstream/ExpName/dev-best.ckpt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ctc

ctc

corpus

corpus

cv_config

cv_config

cv_vocab

cv_vocab

lexicon

lexicon

vocab

vocab

README.md

README.md

init.py

init.py

data.py

data.py

expert.py

expert.py

libriphone.yaml

libriphone.yaml

librispeech.yaml

librispeech.yaml

metric.py

metric.py

sbcsae.yaml

sbcsae.yaml

snips.yaml

snips.yaml

text.py

text.py

README.md

OOD-ASR: Out-of-domain Automatic Speech Recognition Tasks

Cross-lingual Tasks

Spontaneous Speech

Files

ctc

Directory actions

More options

Directory actions

More options

Latest commit

History

ctc

Folders and files

parent directory

OOD-ASR: Out-of-domain Automatic Speech Recognition Tasks

Cross-lingual Tasks

Spontaneous Speech