# YouTube Video Pipeline: Azure TTS, Subtitles, BGM, Video, Thumbnail, Metadata

This notebook automates the creation of a YouTube-ready video from text, with Azure TTS, subtitles (ASS & SRT), background music, auto-generated thumbnails, and attribution. All settings are controlled by `.env` and `config.yaml` files for easy editing and reproducibility.

In [31]:
# --- 1. Install Required Packages ---
!pip install azure-cognitiveservices-speech pysubs2 pillow snownlp python-dotenv tqdm pyyaml





## 2. Load User Config and Secrets
Settings and secrets are loaded from `config.yaml` and `.env` files. Edit those files to change project parameters.

In [71]:
import yaml
from dotenv import load_dotenv
import os

load_dotenv()
AZURE_SPEECH_KEY = os.environ.get("AZURE_SPEECH_KEY")
AZURE_SPEECH_REGION = os.environ.get("AZURE_SPEECH_REGION")
if not AZURE_SPEECH_KEY or not AZURE_SPEECH_REGION:
    raise ValueError("Missing AZURE_SPEECH_KEY or AZURE_SPEECH_REGION in .env file!")

with open('config.yaml', encoding='utf-8') as f:
    config = yaml.safe_load(f)

print("Loaded config:\n", config)

Loaded config:
 {'title': 'Demo Project Video', 'author': 'flyregit842', 'bgm_volume': 0.25, 'video_resolution': [1920, 1080], 'voice_name': 'zh-CN-XiaoxiaoNeural', 'subtitle_font': 'NotoSansCJKtc-Regular.otf', 'subtitle_fontsize': 14, 'background': 'background.jpg', 'bgm': 'bgm.mp3', 'text': 'text.txt', 'thumbnail': 'thumbnail.jpg', 'attribution': 'Background by Unsplash, Music by Pixabay, Voice by Azure TTS. Assets used under free license.'}


In [73]:
import os
import re

# --- 1. 文字內容設定（可直接編輯 input_text 或 text.txt） ---
text_file = 'text.txt'
input_text = """神奇的三角小幫手——三角函數

你有沒有想過，為什麼我們不用爬到屋頂上，也能知道那棟樓多高？或是，飛機師怎麼知道自己飛得有多斜、有多高？這可不是魔法，而是「三角函數」在幫忙！

想像你畫了一座「山」，然後從地面畫一條斜線上去，就像你在爬山一樣。這條斜線、地面、和山頂之間，會形成一個三角形。只要知道三角形的幾個邊有多長，就能算出山的高度，或爬坡的角度。這種「用三角形找答案」的數學，就是三角函數的魔力！

那三角函數到底是什麼？你可以把它想成三個愛幫忙的小精靈：

正弦小精靈（sin）：專門告訴你「有多高」。

餘弦小精靈（cos）：會幫你看「有多長」。

正切小精靈（tan）：最聰明，能幫你比較「高和長的比例」。

只要你知道其中一個角，這三個小精靈就能幫你找到其他邊的長度。

比如說，你站在地上，想知道前面那棵樹有多高。你只要量出你離樹的距離，還有你抬頭看的角度，三角函數小精靈就能幫你算出樹的高度！

三角函數聽起來像大人用的數學，其實它就像一把「角度的祕密鑰匙」，讓我們不用爬高、不用量一堆東西，就能知道世界的大小。

所以下次看到山坡、滑梯或斜屋頂時，你可以偷偷想：
「嗯，這裡面一定藏著三角函數的小魔法！」"""
# 如需手動編輯 text.txt，這段可註解
with open(text_file, 'w', encoding='utf-8') as f:
    f.write(input_text)

# 讀取並處理句子
with open(text_file, 'r', encoding='utf-8') as f:
    sentences = [line.strip() for line in f if line.strip()]

# --- 2. 參數設定（從 config.yaml 取得，並可手動覆蓋） ---
# 必須保證 config 已由上一 cell 載入
project_title = config.get('title', 'My YouTube Video')
project_author = config.get('author', 'Anonymous')
bgm_volume = config.get('bgm_volume', 0.3)
attribution = config.get('attribution', 'Assets used under free license. See description.')

# TTS參數
voice_gender = config.get('voice_gender', 'male')    # 'male' 或 'female'
tts_speed = config.get('tts_speed', 1.0)             # 正常速度
tts_pitch = config.get('tts_pitch', 0)               # 預設音高

voice_map = {
    'female': 'zh-CN-XiaoxiaoNeural',
    'male': 'zh-CN-YunyangNeural'
}
voice_name = voice_map.get(voice_gender.lower(), config.get('voice_name', 'zh-CN-YunyangNeural'))

# 字幕與影片參數
subtitle_font = config.get('subtitle_font', 'NotoSansCJKtc-Regular.otf')
subtitle_fontsize = config.get('subtitle_fontsize', 30)
video_resolution = config.get('video_resolution', [1920, 1080])
if not (isinstance(video_resolution, (tuple, list)) and len(video_resolution) == 2):
    print("Invalid video_resolution, using default (1920, 1080).")
    video_resolution = (1920, 1080)
else:
    video_resolution = tuple(video_resolution)

# 檔案設定
background_file = config.get('background', 'background.jpg')
bgm_file = config.get('bgm', 'bgm.mp3')
thumbnail_file = config.get('thumbnail', 'thumbnail.jpg')
text_file = config.get('text', text_file)  # 若 config.yaml 有 text 路徑則用它

# --- 3. 參數安全性驗證與自動修正 ---
if voice_gender.lower() not in ['male', 'female']:
    print("Invalid voice_gender, using default 'male'.")
    voice_gender = 'male'
    voice_name = voice_map['male']
if not isinstance(tts_speed, (int, float)) or tts_speed <= 0:
    print("Invalid tts_speed, using default 1.0.")
    tts_speed = 1.0
if not isinstance(tts_pitch, (int, float)):
    print("Invalid tts_pitch, using default 0.")
    tts_pitch = 0
if not isinstance(subtitle_fontsize, int) or subtitle_fontsize <= 0:
    print("Invalid subtitle_fontsize, using default 40.")
    subtitle_fontsize = 40

# --- 4. 檔案存在性檢查 ---
required_files = [background_file, bgm_file, text_file, subtitle_font, thumbnail_file]
for fname in required_files:
    if not os.path.exists(fname):
        print(f"警告：缺少檔案 {fname}")

# --- 5. 印出所有設定，方便檢查 ---
print("="*30)
print(f"專案標題: {project_title}")
print(f"作者: {project_author}")
print(f"版權說明: {attribution}")
print(f"已載入 {len(sentences)} 句子，來源: {text_file}")
print(f"TTS語音: {voice_name} (性別: {voice_gender})")
print(f"TTS速度: {tts_speed}, 音高: {tts_pitch}")
print(f"字幕字型: {subtitle_font}, 字體大小: {subtitle_fontsize}")
print(f"影片解析度: {video_resolution}")
print(f"背景檔: {background_file}")
print(f"BGM檔: {bgm_file}")
print(f"縮圖檔: {thumbnail_file}")
print("="*30)

# --- 6. 提示如何批量更新 ---
print("如果你想一次性（批量）更新很多參數，不必在 notebook 裡逐一手動修改，\n"
      "只需直接打開並編輯 config.yaml 檔案，把你要更改的設定（如 voice_gender、subtitle_fontsize、video_resolution 等）寫進去，\n"
      "然後重新執行這個 cell，cell 會自動讀取最新的 config.yaml 內容，\n"
      "使用你剛改好的所有新參數。\n")

專案標題: Demo Project Video
作者: flyregit842
版權說明: Background by Unsplash, Music by Pixabay, Voice by Azure TTS. Assets used under free license.
已載入 12 句子，來源: text.txt
TTS語音: zh-CN-YunyangNeural (性別: male)
TTS速度: 1.0, 音高: 0
字幕字型: NotoSansCJKtc-Regular.otf, 字體大小: 14
影片解析度: (1920, 1080)
背景檔: background.jpg
BGM檔: bgm.mp3
縮圖檔: thumbnail.jpg
如果你想一次性（批量）更新很多參數，不必在 notebook 裡逐一手動修改，
只需直接打開並編輯 config.yaml 檔案，把你要更改的設定（如 voice_gender、subtitle_fontsize、video_resolution 等）寫進去，
然後重新執行這個 cell，cell 會自動讀取最新的 config.yaml 內容，
使用你剛改好的所有新參數。



## 3. Check Input Files

In [74]:
required_files = [background_file, bgm_file, text_file, subtitle_font]
for fname in required_files:
    if not os.path.exists(fname):
        raise FileNotFoundError(f"Missing required file: {fname}")

## 4. Read and Split Text into Sentences

In [75]:
import re
from snownlp import SnowNLP

with open(text_file, encoding="utf-8") as f:
    text = f.read().strip()

# Keep only sentences with at least one CJK character, letter, or digit
def is_pronounceable(s):
    return bool(re.search(r'[\u4e00-\u9fffA-Za-z0-9]', s))

sentences = [s.strip() for s in SnowNLP(text).sentences if s.strip()]
sentences = [s for s in sentences if is_pronounceable(s)]

print(f"Total sentences after filtering: {len(sentences)}")
for idx, s in enumerate(sentences):
    print(f"{idx}: '{s}'")

Total sentences after filtering: 40
0: '神奇的三角小幫手——三角函數'
1: '你有沒有想過'
2: '為什麼我們不用爬到屋頂上'
3: '也能知道那棟樓多高'
4: '或是'
5: '飛機師怎麼知道自己飛得有多斜、有多高'
6: '這可不是魔法'
7: '而是「三角函數」在幫忙'
8: '想像你畫了一座「山」'
9: '然後從地面畫一條斜線上去'
10: '就像你在爬山一樣'
11: '這條斜線、地面、和山頂之間'
12: '會形成一個三角形'
13: '只要知道三角形的幾個邊有多長'
14: '就能算出山的高度'
15: '或爬坡的角度'
16: '這種「用三角形找答案」的數學'
17: '就是三角函數的魔力'
18: '那三角函數到底是什麼'
19: '你可以把它想成三個愛幫忙的小精靈：'
20: '正弦小精靈（sin）：專門告訴你「有多高」'
21: '餘弦小精靈（cos）：會幫你看「有多長」'
22: '正切小精靈（tan）：最聰明'
23: '能幫你比較「高和長的比例」'
24: '只要你知道其中一個角'
25: '這三個小精靈就能幫你找到其他邊的長度'
26: '比如說'
27: '你站在地上'
28: '想知道前面那棵樹有多高'
29: '你只要量出你離樹的距離'
30: '還有你抬頭看的角度'
31: '三角函數小精靈就能幫你算出樹的高度'
32: '三角函數聽起來像大人用的數學'
33: '其實它就像一把「角度的祕密鑰匙」'
34: '讓我們不用爬高、不用量一堆東西'
35: '就能知道世界的大小'
36: '所以下次看到山坡、滑梯或斜屋頂時'
37: '你可以偷偷想：'
38: '「嗯'
39: '這裡面一定藏著三角函數的小魔法'


## 5. Azure TTS Synthesis (with Progress Bar & Error Logging)

In [76]:
import azure.cognitiveservices.speech as speechsdk
from tqdm.notebook import tqdm
import subprocess
import os

speech_config = speechsdk.SpeechConfig(subscription=AZURE_SPEECH_KEY, region=AZURE_SPEECH_REGION)
speech_config.speech_synthesis_voice_name = voice_name
speech_config.set_speech_synthesis_output_format(
    speechsdk.SpeechSynthesisOutputFormat.Audio16Khz32KBitRateMonoMp3
)

audio_files, durations, error_log = [], [], []
failed_sentences = []

max_retries = 3

for i, sentence in enumerate(tqdm(sentences, desc="Synthesizing TTS")):
    mp3_fname = f"voice_{i}.mp3"
    success = False
    for attempt in range(max_retries):
        try:
            audio_config = speechsdk.audio.AudioOutputConfig(filename=mp3_fname)
            synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
            result = synthesizer.speak_text_async(sentence).get()
            if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                if os.path.exists(mp3_fname) and os.path.getsize(mp3_fname) > 0:
                    r = subprocess.run([
                        "ffprobe", "-v", "error", "-show_entries",
                        "format=duration", "-of", "default=noprint_wrappers=1:nokey=1", mp3_fname
                    ], capture_output=True)
                    try:
                        duration_val = float(r.stdout.decode().strip())
                        durations.append(duration_val)
                        audio_files.append(mp3_fname)
                        success = True
                        break
                    except ValueError:
                        error_log.append(f"Could not get duration for {mp3_fname}")
                else:
                    error_log.append(f"File not created or empty: {mp3_fname}")
            else:
                error_log.append(
                    f"Synthesis failed for sentence {i}: {result.reason} Details: {getattr(result, 'error_details', '')}"
                )
        except Exception as e:
            error_log.append(f"Exception for sentence {i} try {attempt+1}: {str(e)}")

    if not success:
        failed_sentences.append((i, sentence))

print("TTS synthesis complete. Valid files:", len(audio_files))
if error_log:
    print("Errors encountered:\n", "\n".join(error_log))
if failed_sentences:
    print("Failed sentences:")
    for idx, s in failed_sentences:
        print(f"Sentence {idx}: '{s}'")

Synthesizing TTS:   0%|          | 0/40 [00:00<?, ?it/s]

TTS synthesis complete. Valid files: 40


## 6. Concatenate Valid MP3 Files into One

In [77]:
import os
import subprocess

audio_files = []
durations = []
valid_sentences = []

for i, sentence in enumerate(sentences):
    mp3_fname = f"voice_{i}.mp3"
    if os.path.exists(mp3_fname) and os.path.getsize(mp3_fname) > 0:
        r = subprocess.run([
            "ffprobe", "-v", "error", "-show_entries",
            "format=duration", "-of", "default=noprint_wrappers=1:nokey=1", mp3_fname
        ], capture_output=True)
        try:
            duration_val = float(r.stdout.decode().strip())
            # Accept very short sentences (>0.01s) and filter out very long (>30s)
            if duration_val > 0.01 and duration_val < 30:
                durations.append(duration_val)
                audio_files.append(mp3_fname)
                valid_sentences.append(sentence)
                print(f"OK: {i}: '{sentence}' ({duration_val:.2f}s)")
            else:
                print(f"Skipped {i}: '{sentence}' - duration {duration_val:.2f}s")
        except Exception as e:
            print(f"Error reading duration for {mp3_fname}: {e}")
    else:
        print(f"Missing or empty audio: {i}: '{sentence}'")

print(f"\nKept {len(audio_files)} valid audio files and sentences")
if valid_sentences and valid_sentences[-1] == sentences[-1]:
    print("✅ Last sentence included:", valid_sentences[-1])
else:
    print("❌ Last sentence missing! Check synthesis and validation steps.")

# Write list for ffmpeg concat
with open("files.txt", "w", encoding="utf-8") as f:
    for af in audio_files:
        f.write(f"file '{af}'\n")

# Concatenate using ffmpeg
concat_result = subprocess.run(
    ["ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", "files.txt", "-c", "copy", "voice.mp3"],
    capture_output=True
)
if concat_result.returncode != 0:
    print("[FFmpeg Error] Failed to concatenate audio files.")
    print(concat_result.stderr.decode())
else:
    print("Concatenated voice.mp3 created successfully.")

OK: 0: '神奇的三角小幫手——三角函數' (3.31s)
OK: 1: '你有沒有想過' (1.66s)
OK: 2: '為什麼我們不用爬到屋頂上' (2.74s)
OK: 3: '也能知道那棟樓多高' (2.48s)
OK: 4: '或是' (1.15s)
OK: 5: '飛機師怎麼知道自己飛得有多斜、有多高' (4.03s)
OK: 6: '這可不是魔法' (1.73s)
OK: 7: '而是「三角函數」在幫忙' (2.59s)
OK: 8: '想像你畫了一座「山」' (2.05s)
OK: 9: '然後從地面畫一條斜線上去' (2.88s)
OK: 10: '就像你在爬山一樣' (2.12s)
OK: 11: '這條斜線、地面、和山頂之間' (3.42s)
OK: 12: '會形成一個三角形' (2.27s)
OK: 13: '只要知道三角形的幾個邊有多長' (3.38s)
OK: 14: '就能算出山的高度' (2.16s)
OK: 15: '或爬坡的角度' (1.84s)
OK: 16: '這種「用三角形找答案」的數學' (2.99s)
OK: 17: '就是三角函數的魔力' (2.27s)
OK: 18: '那三角函數到底是什麼' (2.63s)
OK: 19: '你可以把它想成三個愛幫忙的小精靈：' (3.38s)
OK: 20: '正弦小精靈（sin）：專門告訴你「有多高」' (3.64s)
OK: 21: '餘弦小精靈（cos）：會幫你看「有多長」' (3.42s)
OK: 22: '正切小精靈（tan）：最聰明' (3.06s)
OK: 23: '能幫你比較「高和長的比例」' (2.59s)
OK: 24: '只要你知道其中一個角' (2.52s)
OK: 25: '這三個小精靈就能幫你找到其他邊的長度' (3.67s)
OK: 26: '比如說' (1.19s)
OK: 27: '你站在地上' (1.58s)
OK: 28: '想知道前面那棵樹有多高' (2.77s)
OK: 29: '你只要量出你離樹的距離' (2.45s)
OK: 30: '還有你抬頭看的角度' (2.20s)
OK: 31: '三角函數小精靈就能幫你算出樹的高度' (3.82s)
OK: 32: '三角函數聽起來像大人用的數學' (3.35s)
OK: 33: '其

## 7. Generate ASS (burned-in) and SRT (YouTube CC) Subtitles

In [78]:
import pysubs2
import re

subs = pysubs2.SSAFile()
subs.info["playresx"], subs.info["playresy"] = video_resolution

style = subs.styles["Default"]
style.fontsize = 22  # Try a very small value (change as needed!)
style.alignment = pysubs2.Alignment.BOTTOM_CENTER
style.marginv = 10
style.fontname = "Noto Sans CJK TC"
style.primarycolor = 0xFFFFFF
style.fonts = [subtitle_font]

def clean_subtitle(text):
    return re.sub(r"[^\w\s\u4e00-\u9fff.,;:?!()（）「」《》'\"-]", "", text)

start = 0
for i, sentence in enumerate(valid_sentences):
    dur = durations[i]
    cleaned = clean_subtitle(sentence)
    subs.events.append(pysubs2.SSAEvent(
        start=int(start * 1000),
        end=int((start + dur) * 1000),
        text=cleaned
    ))
    start += dur
subs.save("subtitle.ass")
print("Saved subtitle.ass")

Saved subtitle.ass


## 8. Image Processing for Background and Thumbnail

In [79]:
from PIL import Image, ImageDraw, ImageFont

# Resize background
bg = Image.open(background_file)
bg_fixed = bg.resize(tuple(video_resolution), resample=Image.LANCZOS)
bg_fixed.save("background_youtube.jpg")
print("Saved background_youtube.jpg")

# Generate thumbnail with title overlay
thumb = bg_fixed.copy()
draw = ImageDraw.Draw(thumb)
try:
    font = ImageFont.truetype(subtitle_font, 72)
except Exception:
    font = ImageFont.load_default()
draw.rectangle([0,0,video_resolution[0],160], fill=(0,0,0,160))
draw.text((40,40), project_title, font=font, fill=(255,255,255,255))
thumb.save(thumbnail_file)
print(f"Generated thumbnail: {thumbnail_file}")

Saved background_youtube.jpg
Generated thumbnail: thumbnail.jpg


## 9. Convert Voice/BGM to WAV

In [80]:
def convert_to_wav(infile, outfile):
    r = subprocess.run([
        "ffmpeg", "-y", "-i", infile,
        "-acodec", "pcm_s16le", "-ar", "44100", "-ac", "1",
        outfile
    ], capture_output=True)
    if r.returncode != 0:
        print(r.stderr.decode())
        raise RuntimeError(f"Failed to convert {infile} to WAV.")
    print(f"Converted {infile} to {outfile}")

convert_to_wav("voice.mp3", "voice_fixed.wav")
convert_to_wav(bgm_file, "bgm_fixed.wav")

Converted voice.mp3 to voice_fixed.wav
Converted bgm.mp3 to bgm_fixed.wav


## 10. Mix Voice and BGM with Volume Control

In [81]:
# Cell #10: Mix Voice and BGM with Volume Control (BGM louder)

# Increase BGM volume (e.g., from 0.3 to 0.6 for more presence but still under voice)
bgm_loud_volume = 0.6

mix_cmd = [
    "ffmpeg", "-y",
    "-i", "voice_fixed.wav",
    "-i", "bgm_fixed.wav",
    "-filter_complex", f"[1:a]volume={bgm_loud_volume}[a1];[0:a][a1]amix=inputs=2:duration=first[aout]",
    "-map", "[aout]",
    "-acodec", "aac",
    "output_audio.aac"
]
mix_result = subprocess.run(mix_cmd, capture_output=True)
if mix_result.returncode != 0:
    print(mix_result.stderr.decode())
    raise RuntimeError("Failed to mix voice and bgm.")
print("Mixed output_audio.aac with BGM volume set to", bgm_loud_volume)

Mixed output_audio.aac with BGM volume set to 0.6


## 11. Get Mixed Audio Duration

In [82]:
import wave
import subprocess
import os
from datetime import datetime, timedelta

# Set your final audio filename here
audio_file = "voice_fixed.wav"  # Or "output_audio.aac" or whatever you use

# Step 1: Get audio duration (works for .wav or any format supported by ffprobe)
if audio_file.endswith('.wav'):
    with wave.open(audio_file, "rb") as w:
        frames = w.getnframes()
        rate = w.getframerate()
        audio_duration = frames / float(rate)
else:
    probe2 = subprocess.run([
        "ffprobe", "-v", "error", "-show_entries",
        "format=duration", "-of", "default=noprint_wrappers=1:nokey=1", audio_file
    ], capture_output=True)
    audio_duration = float(probe2.stdout.decode().strip())
print(f"Audio duration: {audio_duration:.2f} seconds")

# Step 2: Create main video to match audio length
background_img = "background_youtube.jpg"  # Update with your actual path
video_resolution = (1920, 1080)            # Update as needed

video_cmd = [
    "ffmpeg", "-y",
    "-loop", "1",
    "-i", background_img,
    "-t", str(audio_duration),
    "-vf", f"scale={video_resolution[0]}:{video_resolution[1]}",
    "-c:v", "libx264",
    "-pix_fmt", "yuv420p",
    "main_video.mp4"
]
subprocess.run(video_cmd, capture_output=True)

# Step 3: Mux video and audio WITHOUT '-shortest'
mux_cmd = [
    "ffmpeg", "-y",
    "-i", "main_video.mp4",
    "-i", audio_file,
    "-c:v", "copy",
    "-c:a", "aac",
    "main_video_with_audio.mp4"
]
subprocess.run(mux_cmd, capture_output=True)
print("Main video with audio created: main_video_with_audio.mp4")

Audio duration: 106.35 seconds
Main video with audio created: main_video_with_audio.mp4


## 12. Final MP4 Video Synthesis (Full HD, Subtitles, Attribution Slide)

In [83]:
import subprocess
import os
from datetime import datetime, timedelta

# --- CONFIG ---
extra_pause = 2
background_img = "background_youtube.jpg"
pause_video = "pause_video.mp4"
main_video_with_audio = "main_video_with_audio.mp4"  # Already has synced audio/subtitle duration
subtitle_file = "subtitle.ass"  # Use your generated ASS file

# --- 1. Create pause video ---
pause_cmd = [
    "ffmpeg", "-y",
    "-loop", "1",
    "-i", background_img,
    "-t", str(extra_pause),
    "-c:v", "libx264",
    "-pix_fmt", "yuv420p",
    pause_video
]
subprocess.run(pause_cmd, capture_output=True)

# --- 2. Concatenate main video + pause ---
concat_list = "concat_list_final.txt"
with open(concat_list, 'w', encoding='utf-8') as f:
    f.write(f"file '{main_video_with_audio}'\n")
    f.write(f"file '{pause_video}'\n")

output_with_pause = "output_with_pause.mp4"
concat_cmd = [
    "ffmpeg", "-y",
    "-f", "concat",
    "-safe", "0",
    "-i", concat_list,
    "-c", "copy",
    output_with_pause
]
subprocess.run(concat_cmd, capture_output=True)

# --- 3. Calculate total duration ---
probe = subprocess.run([
    "ffprobe", "-v", "error", "-show_entries",
    "format=duration", "-of", "default=noprint_wrappers=1:nokey=1", main_video_with_audio
], capture_output=True)
audio_duration = float(probe.stdout.decode().strip())
total_duration = audio_duration + extra_pause

# --- 4. Fade out last second ---
dt_utc_now = datetime.utcnow() + timedelta(hours=8)
outname_no_sub = f"output_{dt_utc_now.strftime('%Y%m%d_%H%M%S')}_nosub.mp4"
fade_cmd = [
    "ffmpeg", "-y",
    "-i", output_with_pause,
    "-vf", f"fade=t=out:st={total_duration-1}:d=1",
    "-af", f"afade=t=out:st={total_duration-1}:d=1",
    "-c:v", "libx264",
    "-c:a", "aac",
    "-pix_fmt", "yuv420p",
    outname_no_sub
]
fade_result = subprocess.run(fade_cmd, capture_output=True)
if fade_result.returncode != 0:
    print(fade_result.stderr.decode())
    raise RuntimeError("Failed to apply fade out to video.")
#outname_final = f"output_{dt_utc_now.strftime('%Y%m%d_%H%M%S')}.mp4"
first_sentence = valid_sentences[0]
# Remove forbidden characters and trim length if needed (keep it safe for filenames)
import re
first_sentence_clean = re.sub(r'[\\/:*?"<>|]', '', first_sentence).strip()
# Optionally, truncate to a reasonable length for filename (e.g., 20 chars)
first_sentence_clean = first_sentence_clean[:20]

outname_final = f"output_{dt_utc_now.strftime('%Y%m%d_%H%M%S')}_{first_sentence_clean}.mp4"
# --- 5. Burn subtitles into the final video with adjustable font size ---
font_size = 18  # <<< Change this number to easily adjust subtitle font size!
#
burn_cmd = [
    "ffmpeg", "-y",
    "-i", outname_no_sub,
    "-vf", f"subtitles={subtitle_file}:force_style='Fontsize={font_size}'",
    "-c:a", "copy",
    "-c:v", "libx264",
    "-pix_fmt", "yuv420p",
    outname_final
]
burn_result = subprocess.run(burn_cmd, capture_output=True)
if burn_result.returncode != 0:
    print(burn_result.stderr.decode())
    raise RuntimeError("Failed to burn subtitles into video.")

print(f"Final video created: {outname_final} (subtitles font size: {font_size}, no credits!)")

Final video created: output_20251024_010532_神奇的三角小幫手——三角函數.mp4 (subtitles font size: 18, no credits!)


## 13. Preview and Export Metadata

In [45]:
from IPython.display import Video as ShowVideo, Image as ShowImage
ShowImage(thumbnail_file)
ShowVideo(outname_final)

# Export YouTube metadata
with open("youtube_metadata.txt", "w", encoding="utf-8") as meta:
    meta.write(f"Title: {project_title}\nAuthor: {project_author}\nDate: {dt_utc_now.strftime('%Y-%m-%d')}\n")
    meta.write(f"Description: {attribution}\n")
    meta.write(f"Subtitle SRT: subtitle.srt\nThumbnail: {thumbnail_file}\n")
print("Exported metadata for YouTube upload: youtube_metadata.txt")

Exported metadata for YouTube upload: youtube_metadata.txt
