Skip to content
/ kfa Public

A fast Khmer Forced Aligner powered by Wav2Vec2CTC and Phonetisaurus

Notifications You must be signed in to change notification settings

seanghay/kfa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KFA

[Google Colab]

A fast Khmer Forced Aligner powered by Wav2Vec2CTC and Phonetisaurus.

  • Built-in Speech Enhancement
  • Word-level Alignment
pip install kfa

CLI

Note

audio.wav Input audio sample rate should be in 16kHz. Use ffmpeg or any other tools to resample the audio before processing.

ffmpeg -i audio_orig.wav -ac 1 -ar 16000 audio.wav

kfa -a audio.wav -t text.txt -o alignments.jsonl

# Output as Whisper style JSON format
kfa -a audio.wav -t text.txt --format whisper -o alignments.json

Python

from kfa import align, create_session
import librosa

with open("test.txt") as infile:
    text = infile.read()

y, sr = librosa.load("text.wav", sr=16000, mono=True)
session = create_session()

for alignment in align(y, sr, text, session=session):
  print(alignment)

References

License

Apache-2.0

About

A fast Khmer Forced Aligner powered by Wav2Vec2CTC and Phonetisaurus

Topics

Resources

Stars

Watchers

Forks

Languages