Skip to content

kehanlu/Prompt-Whisper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Prompt-Whisper

This repository aims to improve the accuracy of ASR (Automatic Speech Recognition) in specialized tasks utilizing Whisper, such as code-switching in datasets, through well-crafted prompts.

It can be applied to:

  • Assignment 6 in the Deep Learning for Human Language Processing (DLHLP) course at National Taiwan University, Fall 2023.

  • Assignment 7 in the Deep Learning for Human Language Processing (DLHLP) course at National Taiwan University, Fall 2023. However, this requires addressing issues related to overly long input speech. Strategies such as dividing the speech into 30-second segments may be employed.

  • Any task where you believe adding prompts could enhance the performance of Whisper.

Objective

  • Utilize Whisper for Chinese-English code-switched speech recognition.

  • Enhance Whisper's recognition accuracy by utilizing additional language ID, task tag, prompts, etc.

  • For instance, prompts could include hints about common errors made by the model or domain knowledge related to the speech content.

Setup

conda create --prefix conda/whisper python=3.10

pip install openai-whisper datasets transformers librosa soundfile opencc-python-reimplemented jiwer

Prompt Whisper

We have selected the first six videos from chiyuanhsiao/ML2021_HungyiLee_Corpus as our test data in this script.

  • --model_name_or_path, -m: This parameter allows you to specify the Whisper model you want to use. For example, you can use models like openai/whisper-large-v3 or openai/whisper-base.
  • --dataset_path, -d: Specify the dataset path (name).
  • --device, -v: Specify the device. For instance, cuda or cpu.
  • --cache_dir, -s: Specify the cache directory you want to save your dataset.
  • --batch_size, -b: Specify the batch size.
  • --output_dir: Path for the results file.
  • Generation Options: You have the flexibility to customize the generation process using several options. Refer to the transformers.WhisperForConditionalGeneration.generate function for more details. These options include:
    • --task, -t: Specify the task you want the model to perform, which can be either transcribe or translate.
    • --language, -l: Provide the language tag for the input or output text. For instance, you can use language codes like zh for Chinese or en for English.
    • --prompt, -p: Input your prompt text.
  • --overwrite_forced_decoder_ids, -c: This option allows you to override the force_decoder_id within the generate() function. This customization gives you greater control over the model's behavior during generation.
python prompt_whisper.py -t transcribe -l zh -m "openai/whisper-base"

python prompt_whisper.py -t transcribe -l zh -p "太強了Whisper"

python prompt_whisper.py -p "真是太厲害了"

python prompt_whisper.py -c "<|en|><|zh|><|transcribe|><|notimestamps|>"

python prompt_whisper.py -t transcribe -l zh -c "<|en|><|zh|><|transcribe|><|notimestamps|>" -p "加油吧, Whisper" 

Error Rate

To determine the mixed error rate, we will follow this procedure:

  • Convert simplified Chinese characters to traditional Chinese characters.
  • Insert spaces between Chinese characters and English words

Example:

[{
    "id": "0_1891_1894.mp3",
    "prediction": "我 們 不 止 訓 練 一 個 classifier 來 解 任 務 一",
    "transcription": "我 們 不 止 訓 練 一 個 classifier 來 解 任 務 一",
    "raw_prediction": "我们不止训练一个classifier来解任务一"
},
{
    "id": "5_1722_1725.mp3",
    "prediction": "這 個 tensor 的 大 小 是 5 乘 以 10 乘 以 3",
    "transcription": "這 個 tensor 的 大 小 是 5 乘 以 10 乘 以 3",
    "raw_prediction": "這個 tensor的大小是5乘以10乘以3"
},
{
    "id": "6_1153_1156.mp3",
    "prediction": "是 要 把 source domain 跟 target domain 分 開",
    "transcription": "是 要 把 source domain 跟 target domain 分 開",
    "raw_prediction": "是要把source domain跟target domain分開"
}]

raw_prediction represents the original output sequence from whisper.

Dataset

chiyuanhsiao/ML2021_HungyiLee_Corpus

References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages