Creating a **real-time subtitle generator from a video** involves:

### 🎯 Key Objectives:

1. **Capture audio from video in real-time**
2. **Transcribe the speech to text using ASR (Automatic Speech Recognition)**
3. **Display or save the subtitles in sync with video**

---

## ✅ Recommended Tech Stack:

| Component            | Tool/Library                                                                 |
| -------------------- | ---------------------------------------------------------------------------- |
| Video Input          | OpenCV or FFmpeg                                                             |
| Audio Extraction     | PyAV / MoviePy / ffmpeg-python                                               |
| ASR (Speech-to-Text) | 🔥 [Whisper](https://github.com/openai/whisper), `Vosk`, `SpeechRecognition` |
| Real-Time Sync       | `threading` / `asyncio`                                                      |
| Optional GUI         | Streamlit / Tkinter / Gradio                                                 |

---

## 🔥 Best Choice for Accuracy: **Whisper (OpenAI)**

### Whisper supports:

* Multilingual speech recognition
* Word-level timestamps (good for subtitles)
* Models from `tiny` to `large` based on accuracy vs speed

---

## 🧪 Simple Implementation (Live Subtitles from Video)

### 🛠 Step 1: Install Required Packages

```bash
pip install opencv-python ffmpeg-python openai-whisper
```

Also install `ffmpeg` (must be on your system PATH):
🔗 [https://ffmpeg.org/download.html](https://ffmpeg.org/download.html)

---

### 🧠 Step 2: Code to Extract Audio and Generate Subtitles in Real-Time

> This example works for **pre-recorded video**, but can be adapted for **live video** (e.g., webcam or stream).

```python
import whisper
import ffmpeg
import tempfile
import os
import cv2

# Load Whisper model
model = whisper.load_model("base")  # use "tiny" for speed or "medium"/"large" for accuracy

def extract_audio_from_video(video_path, output_audio_path):
    # Extract audio using ffmpeg
    ffmpeg.input(video_path).output(output_audio_path, ac=1, ar='16000').run(overwrite_output=True)

def generate_subtitles(video_path):
    # Step 1: Extract audio
    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp_audio:
        extract_audio_from_video(video_path, tmp_audio.name)

        # Step 2: Transcribe
        print("🔊 Transcribing audio with Whisper...")
        result = model.transcribe(tmp_audio.name, verbose=True)
        os.unlink(tmp_audio.name)  # clean up

        # Step 3: Print results
        for segment in result["segments"]:
            print(f"[{segment['start']:.2f}s - {segment['end']:.2f}s] {segment['text']}")

# Run
generate_subtitles("sample_video.mp4")
```

---

### 🧼 Output Example:

```
[0.00s - 3.50s] Hello, my name is John.
[3.50s - 6.20s] Welcome to the real-time subtitle demo.
```

You can also save it as `.srt` or `.vtt` subtitle files.

---

## 🖥️ Display Subtitles on Screen with Video (Real-Time Preview)

You can modify the code to use **OpenCV** to play the video and overlay text:

```python
def display_video_with_subtitles(video_path, transcription):
    cap = cv2.VideoCapture(video_path)
    fps = cap.get(cv2.CAP_PROP_FPS)
    
    current_segment = 0

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        current_time = cap.get(cv2.CAP_PROP_POS_MSEC) / 1000.0

        if current_segment < len(transcription):
            seg = transcription[current_segment]
            if seg["start"] <= current_time <= seg["end"]:
                cv2.putText(frame, seg["text"], (50, 400),
                            cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
            elif current_time > seg["end"]:
                current_segment += 1

        cv2.imshow("Video with Subtitles", frame)
        if cv2.waitKey(int(1000 / fps)) & 0xFF == ord('q'):
            break

    cap.release()
    cv2.destroyAllWindows()
```

Then call:

```python
result = model.transcribe(audio_path)
display_video_with_subtitles("sample_video.mp4", result["segments"])
```

---

## 🎙️ Want True **Live Mic or Webcam Feed** Subtitles?

You can also do this:

* Capture **mic audio** in chunks (e.g., 5 sec)
* Run `model.transcribe()` on each chunk
* Display live transcription as subtitles or text stream

Let me know if you want the **live mic version** or subtitles for **YouTube livestreams**, Zoom, etc.

---

## 🔚 Summary

| Feature              | Tool                        |
| -------------------- | --------------------------- |
| Audio Extraction     | `ffmpeg`                    |
| ASR (speech-to-text) | `whisper` (best) or `Vosk`  |
| Subtitles on video   | `OpenCV`, `pysubs2`, `.srt` |
| Real-time audio      | `pyaudio`, `threading`      |

---

Would you like the live microphone version next, or export to `.srt` file for video players?
