Skip to content
View oroszgy's full-sized avatar
:octocat:
:octocat:

Organizations

@ec-doris @huspacy

Block or report oroszgy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

Speech

19 repositories

Robust Speech Recognition via Large-Scale Weak Supervision

Python 95,471 11,821 Updated Dec 15, 2025

Port of OpenAI's Whisper model in C/C++

C++ 47,238 5,258 Updated Mar 5, 2026

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 11,760 1,166 Updated Mar 3, 2026

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

Python 4,052 349 Updated Jan 8, 2025

Whisper realtime streaming for long speech-to-text transcription and translation

Python 3,546 414 Updated Nov 12, 2025
Jupyter Notebook 8,823 633 Updated Oct 25, 2025

A fast, local neural text to speech system

C++ 10,633 913 Updated Aug 26, 2025

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

Python 2,809 268 Updated Jun 22, 2024

Sync Toolbox - Python package with reference implementations for efficient, robust, and accurate music synchronization based on dynamic time warping (DTW)

Python 132 16 Updated Feb 6, 2026

Timething is a library for aligning text transcripts with their audio recordings.

Jupyter Notebook 130 14 Updated Dec 3, 2024

A small speech recognizer

C 4,277 729 Updated Mar 2, 2026

📈 A forced aligner intended for synchronization of narrated text

Python 102 14 Updated Aug 9, 2025

Text to speech alignment using CTC forced alignment

Python 451 78 Updated Feb 23, 2026

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 16,856 3,355 Updated Mar 5, 2026

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Python 20,493 2,168 Updated Feb 22, 2026

Transcription, forced alignment, and audio indexing with OpenAI's Whisper

Python 2,171 227 Updated Oct 29, 2025

Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voic…

TypeScript 438 42 Updated Sep 1, 2025

State-of-the-art TTS model under 25MB 😻

Python 11,195 627 Updated Feb 24, 2026

Open-Source Frontier Voice AI

Python 23,615 2,611 Updated Feb 28, 2026