Skip to content

v0.0.7

Latest

Choose a tag to compare

@mratanusarkar mratanusarkar released this 26 May 19:11
· 1 commit to main since this release

Release v0.0.7

First stable release + publish to PyPI

What's Changed

see full changelog

Audim - First Stable Release 🎉

see release notes (short)

Audim (v0.0.7) - First Stable Release

Transform audio podcasts into engaging animated videos with precise programmatic control.

What's New

🎙️ Audio to Video Pipeline

  • Automatic Transcription: Convert audio/video files to timestamped subtitles
  • Speaker Diarization: Identify and label multiple speakers
  • Animated Video Generation: Transform subtitles + audio into professional podcast videos

🎬 Key Features

  • Multi-format Support: MP3, M4A, WAV, MP4, MKV, AVI
  • Layout System: Customizable scenes with headers, profiles, and text elements
  • Effects Engine: Smooth transitions and highlight effects
  • Parallel Processing: Optimized rendering for faster video generation
  • Watermark System: Built-in branding and attribution

Installation

pip install audim

Documentation

License

Apache 2.0 - Free for personal and commercial use. Please retain the default watermark or add attribution.

Contributing

We welcome contributions! Please check our development guide to get started.


Note: This is our first stable release. While extensively tested, please report any issues you encounter. Your feedback helps us improve!

Report Issues | View Examples | Join Discussion

see full release notes (long)

Audim (v0.0.7) - Audio Podcast Animation Engine

We are excited to announce the first stable release of Audim, a comprehensive animation and video rendering engine designed specifically for creating visually engaging podcast videos from audio-based and voice-based content. This release represents the culmination of extensive development work spanning multiple iterations and represents a significant milestone in programmatic podcast video generation.

🎯 Overview

Audim transforms the landscape of podcast content creation by providing precise programmatic animations and video rendering capabilities for audio-based content. The engine enables creators to convert raw audio recordings into professionally animated podcast videos with sophisticated layout-based scenes, automated subtitle generation, and customizable visual elements.

✨ Core Features

Audio Processing and Transcription

  • Audio to Subtitle Generation: Complete audio transcription engine with support for multiple audio formats (.mp3, .m4a, .wav)
  • Real-time Processing: Generate subtitles and transcripts from audio/video files with timestamp synchronization
  • Speaker Recognition: Advanced speaker identification and placeholder replacement with actual names
  • Multi-format Support: Compatible with various video formats (.mp4, .mkv, .avi) for audio extraction

Video Generation and Animation

  • Subtitle to Podcast Conversion: Transform subtitle files (.srt) into fully animated podcast videos
  • Precise Programmatic Animations: Engine designed for exact frame-level control and smooth transitions
  • Layout-based Scene Rendering: Professional video generation with customizable scene compositions
  • Parallel Processing: Optimized and parallelized video generation engine for enhanced performance

Visual Elements and Customization

  • Watermark Integration: Built-in watermark system for content attribution and branding
  • Multiple Layout Options: Flexible layout system supporting various podcast video styles
  • Effect System: Comprehensive effects engine including transitions and highlights
  • Element Customization: Configurable header, profile, text, and watermark elements

🏗️ Architecture and Modules

Aud2Sub Module

The audio-to-subtitle conversion system provides robust transcription capabilities with support for multiple transcriber backends. This module handles the initial processing of audio content into time-synchronized subtitle files.

Sub2Pod Module

The subtitle-to-podcast conversion engine represents the core animation system, featuring advanced layout management, element positioning, and visual effects. This module includes specialized components for content positioning offset and timestamp normalization.

Utils Module

A comprehensive utility suite offering audio playback capabilities, subtitle processing tools, and video-to-audio extraction functionality. These utilities provide essential support functions for the entire pipeline.

Effects and Transitions

Advanced visual effects system supporting smooth transitions, content highlights, and professional-grade animation sequences. The effects subsystem has been completely refactored to provide enhanced API design and improved performance.

📚 Documentation and Examples

Comprehensive Documentation

  • API Documentation: Complete API reference covering all modules and functions
  • Usage Examples: Extensive collection of usage scripts covering various real-world scenarios
  • Development Blog: Detailed development insights and version progression documentation
  • Installation Guide: Step-by-step setup instructions for both users and developers

Example Scripts

The release includes multiple example scripts demonstrating various use cases and implementation patterns. These examples serve as practical starting points for different podcast video generation scenarios.

🚀 Installation and Setup

PyPI Distribution

Audim is now available on PyPI for easy installation and distribution. The package follows standard Python packaging conventions and supports modern Python environments.

Development Environment

Complete development setup instructions are provided for contributors, including proper development environment configuration and contribution guidelines.

📄 Licensing and Attribution

Apache 2.0 License

Audim is released under the Apache 2.0 license, allowing free use for both personal and commercial projects. The license provides broad permissions while maintaining appropriate attribution requirements.

Attribution Guidelines

  • Default watermark retention in generated videos
  • Optional "Made with Audim" attribution in video descriptions
  • Repository linking in project documentation
  • Comprehensive attribution guidelines available in the NOTICE file

Citation Support

Academic and research citation formats are provided for users incorporating Audim into scholarly work. The project includes formal citation guidelines accessible through GitHub's citation feature.

⚠️ Important Notes

Development Stage Disclaimer

While this represents the first stable release, Audim continues active development and may contain limitations in diverse usage scenarios. The rendering engine requires ongoing development and testing across various use cases.

API Stability

Users should monitor documentation updates as the API may evolve based on community feedback and usage patterns. We are committed to maintaining backward compatibility while improving functionality.

Community Engagement

We encourage users to try Audim for their podcast video projects, report issues when encountered, and contribute improvements through pull requests. Community feedback is essential for continued development and enhancement.

🔗 Resources

This release represents a significant milestone in programmatic podcast video generation, providing creators with powerful tools for transforming audio content into engaging visual experiences. We look forward to seeing the innovative podcast videos created with Audim and welcome community feedback and contributions.

New Contributors

Full Changelog: https://github.com/mratanusarkar/audim/commits/v0.0.7