- Budapest, Hungary
-
15:37
(UTC +01:00) - https://gyorgy.orosz.link
- in/oroszgy
Highlights
Speech
Robust Speech Recognition via Large-Scale Weak Supervision
Foundational Models for State-of-the-Art Speech and Text Translation
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
Whisper realtime streaming for long speech-to-text transcription and translation
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
Sync Toolbox - Python package with reference implementations for efficient, robust, and accurate music synchronization based on dynamic time warping (DTW)
Timething is a library for aligning text transcripts with their audio recordings.
📈 A forced aligner intended for synchronization of narrated text
Text to speech alignment using CTC forced alignment
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Transcription, forced alignment, and audio indexing with OpenAI's Whisper
Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voic…





