This repository contains the code for the paper "Improving Generalization of Norwegian ASR with Limited Linguistic Resources", presented at NoDaLiDa 2023. To cite the paper, please use:
@InProceedings{SolbergEtAlNoDaLiDa2023,
author = {Per Erik Solberg and Pablo Ortiz and Phoebe Parsons and Torbjørn Svendsen and Giampiero Salvi},
title = {Improving Generalization of Norwegian ASR with Limited Linguistic Resources},
booktitle = {Proceedings of the 24th Nordic Conference on Computational Linguistics},
year = {2023},
month = {May},
address = {Tórshavn, Faroe Islands},
}
- analysis/ contains the analyses in the paper. analysis.ipynb contains the analyses without a language model, and analysis_w_lm.ipynb contains the analyses with a language model.
- make_datasets/ contains the code for making the different datasets used for testing and training. It also contains a notebook for retrieving stats about the different datasets.
- training/ contains the code for the training of the different models used in the paper. The script name corresponds in the paper to the model name.
Note also this repository, which contains the code for standardizing the datasets used in this paper.