Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



12 Commits

Repository files navigation

Transformer based ASR System for German Language

This repository contains code to build a German ASR system based on transformers and the Common voice 9.0 dataset. We use the pre-trained Wav2Vec Transformer as well as the pre-trained Wave2Vec Conformer , both from Facebook.

Hardware Requirements

Warning: The transformer model has around 100 millions trainable parameters; the conformer model is almost 7 times the transformer model.Therefore, be cautious with hardware.

  1. At least 100Gb Disk space
  2. At least 32Gb GPU Memory if you intend to use GPU.(particularly for the conformer)


Recommended Python version 3.9

  1. Create a new conda-environment and activate it.

    • conda create -n ailab python=3.9
    • conda activate ailab
  2. Install all requirements. (Please manually install cuda if you plan to use a GPU)

    • pip install -r requirements.txt


The preprocessing does the following:

  1. Downloads the dataset via HuggingFace. You need to create an account and get a token to be able to download this particular dataset.
  2. Removes unnecessary columns from the dataset.
  3. Resamples the audio files from an initial frequency of 48 000 Hz to 16 000Hz.
  4. Removes special characters.
  5. Takes care of padding the sentences.
  6. Saves the data set in directories training_set, validation_set, test_set.
  7. Creates the model's tokenizer and saves it.

The script requires following arguments: Token ( string from huggingface), num_workers(int). It can be launched via : python --token Token --workers num_workers


There are 2 training scripts: One for the Transformer model and one for the Conformer model. The scripts need as arguments the following:

  • Number of epochs: An integer
  • Percentage of data to use: an Integer in [0 , 100].
  • If to resume the training or not.

To train the Transformer for Example, One could use the following command:

python --epochs 10 --data 50 --no-resume_training > output_transformer.txt

Models hperparameters can be changed in the scripts.


To test the trained model, first start by copying the vocab.json file created during tokenization into the directory where the model is saved.

Then test the model by running It requires follow arguments:

  • Model Directory: The path to the model you want to test.
  • Model: transformer or conformer.
  • Percentage of data to use for testing: an Integer in [0 , 100].
  • If to print predicted sentences or not.

An Example of command could be: python --model transformer --model_dir transformer_model --data 50 --print_examples > output_results.txt

Warning: At the time this repository was created, the conformer model was still in development. Therefore, some compatibility problems may arise.

Note: Fill free to contact me in case you have some issues or questions.