## Cell 1: 環境設置與依賴安裝

**技術與模型概覽：**

*   **包管理器：** `pip` (用於安裝和管理 Python 庫)。
*   **核心計算庫：** `PyTorch` (包含 `torch`, `torchvision`, `torchaudio`) - 作為深度學習計算的基礎框架，對後續的 TTS、圖像生成、視頻生成等模型至關重要。此處通常會指定或期望一個較新的版本以支持最新功能（如 Zeroscope）。
*   **語音合成 (TTS) 相關：**
    *   `f5-tts`: 主要用於高質量聲音克隆（例如特定人物音色）的 TTS 系統。
    *   `gTTS`: Google Text-to-Speech Python 庫，作為備選或簡單英文 TTS。
    *   `openai-whisper`: （雖然主要用於 ASR，但安裝它可能會引入一些有用的音頻處理依賴）。
*   **圖像生成 (Diffusers) 相關：**
    *   `diffusers`: Hugging Face 的庫，用於運行各種擴散模型，如 Stable Diffusion，進行文本到圖像生成。
    *   `transformers`: Hugging Face 的核心庫，提供多種預訓練模型和 Pipeline，是 Diffusers 的重要依賴。
    *   `accelerate`: Hugging Face 的庫，用於簡化在不同硬件（CPU, GPU, TPU）上運行 PyTorch 代碼的工具，對 F5-TTS 和 Diffusers 都很重要。
    *   `compel`, `invisible-watermark`: Diffusers 的輔助庫，用於 Prompt 加權和水印處理。
*   **大型語言模型 (LLM) 相關：**
    *   `google-generativeai`: Google Gemini API 的 Python SDK，用於古文字辨識、文本內容生成（描述、故事、笑話）、文本優化、圖像/視頻 Prompt 生成等。
*   **視頻處理相關：**
    *   `moviepy`: 用於視頻編輯，如合併圖像和音頻、拼接視頻片段等。
    *   `imageio`, `imageio-ffmpeg`: MoviePy 的依賴，用於讀寫圖像和視頻文件。
*   **其他輔助庫：** `Pillow` (圖像處理), `numpy` (數值計算), `soundfile`/`wavio` (音頻文件處理), `requests` (網絡請求), `gradio` (UI界面，如果使用), `datasets`, `pandas` 等。
*   **系統工具：** `ffmpeg` (處理音視頻編解碼)。

**功能介紹：**

此 Cell 的主要目標是搭建和配置運行整個多功能 Notebook 所需的全部軟件環境和依賴庫。它執行以下關鍵操作：
1.  **升級 `pip`** 並設置環境變量以優化安裝過程。
2.  **徹底卸載**一系列可能與項目相關的舊版本庫，以確保從乾淨的狀態開始安裝，避免版本衝突。
3.  **安裝/升級核心庫**，特別是 `PyTorch`、`f5-tts`、`accelerate`、`transformers` 和 `diffusers`，通常會指定特定版本或版本範圍以保證兼容性和功能性。
4.  **安裝其他所有項目特定的輔助庫**，如 Gemini SDK、gTTS、MoviePy、Pillow 等。
5.  **檢查關鍵庫的安裝版本**，確保它們符合預期，並打印出來供用戶確認。
6.  **檢查 F5-TTS 的命令行工具**是否可用。
7.  **【重要】** 此 Cell 通常需要在首次執行成功後，**重新啟動 Colab 執行階段並再次運行**，以確保所有庫的版本和依賴關係被正確加載和解析。

In [None]:
# Cell 1: 環境設置 (v8.10.8 - 強調 PyTorch >=2.7.0 for Zeroscope)
print("--- 0. 初始 Python 版本 ---")
!python --version

print("\n--- 1. 升級 pip ---")
!pip install --upgrade pip
import os
os.environ['PIP_NO_CACHE_DIR'] = 'off'
os.environ['PIP_DISABLE_PIP_VERSION_CHECK'] = '1'

print("\n--- 2. 徹底卸載所有相關包，重新開始 ---")
packages_to_uninstall = [
    "TTS", "CoquiTTS", "f5-tts", "openai-whisper", "autoawq",
    "diffusers", "compel", "invisible-watermark", "xformers",
    "accelerate", "transformers", "huggingface-hub",
    "torch", "torchaudio", "torchvision",
    "datasets", "pandas", "networkx",
    "gradio", "wavio", "gtts", "Pillow",
    "moviepy", "imageio", "imageio-ffmpeg", "decorator",
    "google-generativeai", "google-api-python-client", "google-auth-oauthlib", "google-auth-httplib2",
    "requests", "gdown", "rich",
    "numpy", "scipy", "librosa", "soundfile",
    "thinc", "spacy", "torchtune", "gcsfs", "fsspec",
    "sentence-transformers", "peft", "fastai", "timm"
]
print("    確保 ffmpeg 系統工具存在...")
!apt-get update -qq && apt-get install -y -qq ffmpeg
for pkg in packages_to_uninstall:
    print(f"Uninstalling {pkg}...")
    !pip uninstall -y {pkg}

print("\n--- 3. 【優先】安裝 PyTorch >= 2.7.0 (為 Zeroscope 和安全更新) ---")
# 卸載是為了確保我們得到的是通過下面命令安裝的版本
!pip uninstall -y torch torchvision torchaudio
# 從默認源安裝，期望得到 Colab 環境的 >=2.7.0 版本 (例如 2.7.0+cu126)
print("    正在安裝 PyTorch, torchvision, torchaudio (期望 >=2.7.0)...")
!pip install --no-cache-dir torch torchvision torchaudio

print("\n--- 4. 【核心組件A】安裝 F5-TTS 並嘗試固定其 accelerate 版本 ---")
print("    4a. 安裝 F5-TTS (它會引入其依賴的 transformers, accelerate 等)...")
!pip install --no-cache-dir -v f5-tts
print("    4b. 嘗試強制 F5-TTS 的 accelerate 版本 (如果 F5-TTS 依賴較舊版本)...")
F5_ACCELERATE_VERSION = "0.30.1"
print(f"    嘗試安裝/固定 accelerate=={F5_ACCELERATE_VERSION} (如果 F5-TTS 有特定需求)...")
# 注意：如果 PyTorch 2.7.0 或後續的 transformers/diffusers 與此 accelerate 版本衝突，pip 可能會升級它或報錯。
!pip install --no-cache-dir accelerate=={F5_ACCELERATE_VERSION} openai-whisper

print("\n--- 5. 【核心組件B】安裝/升級 Transformers, Diffusers ---")
# 我們需要較新的 transformers (e.g., 4.52.1) 和 diffusers (e.g., 0.33.1)
TARGET_TRANSFORMERS_VERSION = "4.52.1"
TARGET_DIFFUSERS_VERSION = "0.33.1"
# Accelerate 版本會由 pip 根據 transformers 和 F5 的需求來協調，或我們在下面明確指定一個較新的下限
MIN_ACCELERATE_FOR_NEW_FEATURES = "0.21.0" # 例如 PEFT 等可能需要的版本

print(f"    安裝 transformers>={TARGET_TRANSFORMERS_VERSION}, diffusers>={TARGET_DIFFUSERS_VERSION}, invisible-watermark, compel, accelerate>={MIN_ACCELERATE_FOR_NEW_FEATURES}...")
!pip install --no-cache-dir \
    "transformers>={TARGET_TRANSFORMERS_VERSION}" \
    "diffusers>={TARGET_DIFFUSERS_VERSION}" \
    "accelerate>={MIN_ACCELERATE_FOR_NEW_FEATURES}" \
    invisible-watermark \
    compel \
    huggingface-hub # 確保 huggingface-hub 也安裝

print("\n--- 6. 安裝其他項目特定輔助庫 ---")
print("    正在安裝 Pillow (特定版本)...")
!pip install --no-cache-dir Pillow==10.4.0
print("    正在安裝 google-generativeai (用於 Gemini)...")
!pip install --no-cache-dir google-generativeai
print("    正在安裝 gTTS (備用 TTS)...")
!pip install --no-cache-dir gTTS
print("    正在安裝 Wavio (WAV 文件處理)...")
!pip install --no-cache-dir wavio
print("    正在安裝 MoviePy (特定版本 1.0.3) 及其相關依賴...")
!pip install --no-cache-dir moviepy==1.0.3 imageio==2.31.1 imageio-ffmpeg==0.4.7 decorator==4.4.2 tqdm==4.64.0 proglog==0.1.10
# soundfile 可能在 f5-tts 中被安裝，如果沒有，可以單獨加一行 !pip install soundfile

print("    檢查並安裝 Gradio (如果需要)...")
try:
    import gradio
    print("        Gradio 已被其他依賴安裝。")
except ImportError:
    print("        Gradio 未安裝，現在補裝...")
    !pip install --no-cache-dir gradio

print("\n--- 7. 檢查關鍵庫版本 ---")
import importlib, pkg_resources, subprocess, sys
def get_package_version(package_name):
    try:
        actual_import_name = package_name.replace('-', '_')
        if package_name == 'openai-whisper': actual_import_name = 'whisper'
        elif package_name == 'google-api-python-client': actual_import_name = 'googleapiclient'
        elif package_name == 'Pillow': actual_import_name = 'PIL'
        elif package_name == 'moviepy': actual_import_name = 'moviepy'
        elif package_name == 'imageio-ffmpeg': actual_import_name = 'imageio_ffmpeg'
        elif package_name == 'gTTS': actual_import_name = 'gtts'

        if package_name in ["torch", "torchvision", "torchaudio"]:
            module = importlib.import_module(actual_import_name)
            return module.__version__

        module = importlib.import_module(actual_import_name)
        version = None
        if hasattr(module, '__version__'): version = module.__version__
        elif package_name == 'Pillow' and hasattr(module, 'PILLOW_VERSION'): version = module.PILLOW_VERSION

        if version: return version
        try: return pkg_resources.get_distribution(package_name).version
        except pkg_resources.DistributionNotFound:
            try:
                result = subprocess.run(['pip', 'show', package_name], capture_output=True, text=True, check=True)
                for line in result.stdout.splitlines():
                    if line.startswith('Version:'):
                        return line.split(':')[1].strip()
                return f"Version not found via pip show ({package_name})"
            except subprocess.CalledProcessError: return f"Not found via pip show ({package_name})"
            except FileNotFoundError: return f"pip command not found ({package_name})"
        except Exception: return f"Unknown error getting version for {package_name}"
    except ImportError: return f"Not found: {package_name}"
    except Exception as e: return f"Error for {package_name}: {str(e)}"

print("--- Python ---")
print(f"Python: {sys.version.split()[0]}")

print("\n--- 核心 PyTorch / Hugging Face / TTS 依賴 ---")
core_deps_to_check = [
    "torch", "torchvision", "torchaudio",
    "transformers", "diffusers", "accelerate", "huggingface-hub",
    "Pillow", "openai-whisper", "f5-tts"
]
for dep in core_deps_to_check: print(f"{dep.capitalize()}: {get_package_version(dep)}")

print("\n--- 其他主要輔助庫 ---")
aux_deps_to_check = [
    "gradio", "google-generativeai", "gTTS", "wavio", "soundfile",
    "datasets", "pandas", "scipy", "librosa", "requests", "rich",
    "moviepy", "imageio_ffmpeg", "decorator", "imageio", "tqdm", "proglog",
    "compel", "invisible-watermark"
]
for dep in aux_deps_to_check: print(f"{dep.capitalize()}: {get_package_version(dep)}")

print("\n--- 8. 檢查 F5-TTS CLI ---") # 序號調整
try:
    import shutil; f5_cli_path = shutil.which("f5-tts_infer-cli")
    if f5_cli_path:
        print(f"Found f5-tts_infer-cli at: {f5_cli_path}")
        process = subprocess.run([f5_cli_path, "--help"], capture_output=True, text=True, timeout=15, check=False)
        if process.returncode == 0: print("F5-TTS CLI --help: OK")
        else: print(f"F5-TTS CLI --help: FAILED (Code: {process.returncode})\n    STDERR: {process.stderr.strip() if process.stderr else '(empty)'}")
    else: print("f5-tts_infer-cli: Not found in PATH.")
except Exception as e_cli: print(f"F5-TTS CLI check error: {e_cli}")

print("\n環境設定完成 (v8.10.8 - Zeroscope env integrated).")
print("="*60)
print("【重要】請仔細檢查上面列出的庫版本。")
print("        特別關注 Torch (期望 >=2.7.0), Transformers (期望 ~4.52.1),")
print("        Diffusers (期望 ~0.33.1), Accelerate (觀察最終版本是否與 F5-TTS 和新庫兼容)。")
print("【【【 運行此 Cell 後，務必重新啟動執行階段，然後再次運行此 Cell 以確保版本正確加載！ 】】】")
print("      (第二次運行時，卸載和安裝步驟會很快)")
print("="*60)

## Cell 2: API 金鑰、設備檢查、Drive 掛載與數據準備

**技術與模型概覽：**

*   **API 交互：**
    *   `google-generativeai`: 用於配置和使用 Google Gemini API。
    *   `google.colab.userdata`: 用於從 Colab Secrets 安全地讀取 API 金鑰。
*   **硬件加速：** `PyTorch (`torch.cuda`)` 用於檢測和選擇 GPU 或 CPU 作為計算設備。
*   **雲存儲集成：** `google.colab.drive` 用於掛載 Google Drive，方便訪問存儲在 Drive 中的參考音頻等文件。
*   **數據集處理 (如果使用)：** `datasets` (Hugging Face 庫) 用於加載和處理數據集，例如 `tuenguyen/trump-speech-dataset-tts`（在此項目中可能主要用於演示或早期參考，核心 F5-TTS 參考音頻來自 Drive）。

**功能介紹：**

此 Cell 負責項目運行前的關鍵初始化步驟，確保核心服務可用並配置好計算環境：
1.  **API 金鑰配置：** 從 Colab Secrets 中讀取 `GOOGLE_API_KEY` 並配置 Gemini API。這是使用 Gemini 進行古文字辨識和內容生成的必要前提。
2.  **GPU 檢查與設備定義：** 檢測當前 Colab 環境是否分配了 GPU。如果 GPU 可用，則將全局計算設備 `device` 設置為 `"cuda"`，否則設置為 `"cpu"`。這會影響後續模型（如 Diffusers, F5-TTS）的運行效率和能力。
3.  **模型緩存變數初始化：** 初始化用於緩存已加載模型的全局變數（例如 `loaded_pipeline` for Diffusers, `loaded_zeroscope_pipe` for Zeroscope），以避免在循環中重複加載模型。
4.  **掛載 Google Drive：** 將用戶的 Google Drive 掛載到 Colab 環境的 `/content/drive` 路徑。這對於讀取存儲在 Drive 中的 F5-TTS 參考音頻文件（例如特定人物的中文或英文音色）至關重要。
5.  **(可選) Hugging Face Dataset 加載：** 加載指定的 Hugging Face 數據集。在這個項目中，雖然有加載 `tuenguyen/trump-speech-dataset-tts` 的代碼，但主要的角色音色克隆更依賴於用戶在 Drive 中提供的參考音頻。
6.  **路徑變量定義 (隱含)：** 通常在此 Cell 或後續 Cell 初始化中定義 `VOICE_OPTIONS` 字典，其中包含指向 Drive 中參考音頻的路徑。

In [None]:
# Cell 2: API 金鑰設定, GPU 檢查, Drive 掛載, HF Dataset 加載 (v9.2.2)

# --- 導入此 Cell 需要的模塊 ---
print("--- 正在導入 Cell 2 所需模塊 ---")
try:
    from google.colab import userdata, drive # 【【重新加入 drive】】
    import torch
    import google.generativeai as genai
    import os
    from datasets import load_dataset
    print("模塊導入成功。")
except ImportError as e:
    print(f"導入模塊失敗，請確保 Cell 1 已成功執行並重啟: {e}")

# --- 從 Colab Secrets 讀取 API Keys ---
print("\n--- 正在讀取 Google API Key ---")
GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')

# --- 設定 Google Gemini API ---
if GOOGLE_API_KEY:
    try:
        genai.configure(api_key=GOOGLE_API_KEY)
        print("Google Gemini API 金鑰已設定。")
    except Exception as e: print(f"配置 Google Gemini API 時出錯: {e}")
else:
    print("警告：未設置 GOOGLE_API_KEY。 Gemini 功能將無法使用。")

print("-" * 30)

# --- GPU 檢查並定義全局 device ---
print("--- 正在檢查 GPU 並設置 device ---")
if 'torch' not in globals(): import torch
if torch.cuda.is_available():
    print(f"GPU 可用！設備名稱: {torch.cuda.get_device_name(0)}")
    device = "cuda"
else:
    print("警告：未檢測到 GPU。")
    device = "cpu"
print(f"將使用的計算設備: {device}")

print("-" * 30)
if not GOOGLE_API_KEY: print("\n【錯誤】缺少 Google API 金鑰，核心功能無法執行！")
else: print("\nAPI 金鑰與設備檢查完畢。")

# --- 初始化模型緩存變數 ---
print("\n--- 初始化模型緩存變數 ---")
loaded_pipeline = None; loaded_pipeline_name = None; loaded_coqui_synthesizer = None
print("-" * 30)

# --- 【【新增：掛載 Google Drive】】 ---
print("\n--- 正在嘗試掛載 Google Drive ---")
try:
    drive.mount('/content/drive', force_remount=True) # force_remount 可確保每次都重新驗證
    print("Google Drive 掛載成功於 /content/drive")
    # 【【重要提示】】請確保您的參考音頻文件已上傳到 Google Drive 中，
    # 例如 '/content/drive/MyDrive/MyTTSAudio/trump_speaks_chinese_ref_24khz.wav'
    # 您需要在 Cell 5 的 VOICE_OPTIONS 中使用這些 Drive 路徑。
except Exception as e_drive:
    print(f"【錯誤】掛載 Google Drive 失敗: {e_drive}")
    print("      如果 F5-TTS 參考音頻存儲在 Drive 中，將無法訪問。")
print("-" * 30)


# --- Hugging Face Dataset 加載 (保持不變，可能用於其他目的) ---
print("\n--- 正在加載 Hugging Face 數據集: tuenguyen/trump-speech-dataset-tts ---")
hf_dataset = None
# ... (HF Dataset 加載邏輯與 v9.2.1 相同) ...
try:
    hf_dataset = load_dataset("tuenguyen/trump-speech-dataset-tts", split="train", streaming=False)
    print("Hugging Face 數據集加載命令已執行。")
    if hf_dataset:
        print("Hugging Face 數據集對象已創建。")
        dataset_iterator = iter(hf_dataset)
        try:
            example_sample = next(dataset_iterator)
            print("\n數據集樣本結構預覽 (第一個樣本):")
            for key, value in example_sample.items():
                if key == "audio" or (isinstance(value, dict) and 'path' in value and 'array' in value): # 兼容舊的 path 結構
                    audio_info = value if isinstance(value, dict) else {}
                    path_val = audio_info.get('path', 'N/A')
                    array_val = audio_info.get('array', None)
                    array_shape_val = array_val.shape if hasattr(array_val, 'shape') else 'N/A'
                    sampling_rate_val = audio_info.get('sampling_rate', 'N/A')
                    print(f"  {key}: {{'path': '{path_val}', 'array_shape': {array_shape_val}, 'sampling_rate': {sampling_rate_val}}}")
                else:
                    str_value = str(value)
                    print(f"  {key}: {str_value[:100]}{'...' if len(str_value) > 100 else ''}")
        except StopIteration: print("【警告】HF數據集成功加載但無法獲取樣本（可能是空的）。")
        except Exception as e_sample: print(f"【警告】獲取HF數據集樣本時出錯: {e_sample}")
except Exception as e_hf_load:
    print(f"【錯誤】加載 Hugging Face 數據集失敗: {e_hf_load}"); traceback.print_exc(); hf_dataset = None
if hf_dataset: print("\n【成功】Hugging Face 數據集處理流程完成。")
else: print("\n【【警告】】未能成功加載 HF 數據集。")

print("-" * 30)
print("Cell 2 執行完畢 (v9.2.2 - 已加入 Drive 掛載)。")

## Cell 3: 核心功能函數定義

**技術與模型概覽：**

此 Cell 集中定義了項目中所有核心功能的實現函數，廣泛使用了之前 Cell 安裝的庫和配置的 API。關鍵技術包括：
*   **Google Gemini API 調用：**
    *   古文字辨識 (`recognize_ancient_char_with_gemini`)。
    *   文本評估與修正 (`evaluate_and_correct_text_with_gemini`)。
    *   帶人設的單人故事/描述/笑話生成 (`generate_story_with_persona`, `generate_character_joke_with_persona`)。
    *   帶人設的多角色對話式故事/笑話腳本生成 (`generate_dialogue_story_with_personas`, `generate_dialogue_joke_with_personas`)。
    *   為靜態圖像生成 Prompt (`generate_final_image_prompt`)。
    *   為 Zeroscope 動態視頻生成多個分鏡 Prompt (`generate_multiple_video_prompts_for_story_gemini`)。
    *   為多圖靜態視頻生成多個圖像 Prompt (`generate_multiple_image_prompts_for_story` - 來自 version 9 的移植)。
*   **文本處理與解析：**
    *   古文字辨識結果解析 (`parse_recognition_result`)。
    *   多角色對話腳本解析 (`parse_dialogue_script`)。
    *   使用了正則表達式 (`re`) 進行文本匹配和清理。
*   **語音合成 (TTS)：**
    *   F5-TTS 聲音克隆 (`clone_voice_f5tts`)：通過調用 `f5-tts_infer-cli` 命令行工具實現。
    *   gTTS (`generate_and_play_speech_gtts`)：作為備選或通用 TTS。
*   **圖像生成 (Diffusers)：**
    *   使用 Stable Diffusion v1.5 (或其他模型如 SDXL) 生成靜態圖像 (`generate_image_with_diffusers`)，包含模型加載、緩存、顯存優化（如 xformers, attention slicing）的嘗試。
*   **動態視頻生成 (Zeroscope via Diffusers)：**
    *   加載 Zeroscope 模型 (`load_zeroscope_pipeline_once`)。
    *   基於文本 Prompt 生成短視頻幀序列 (`generate_video_with_zeroscope`)。
*   **視頻/音頻編輯 (MoviePy)：**
    *   將靜態圖像（單張或序列）與音頻合成為 MP4 視頻 (`create_video_from_images_and_audio`)。
    *   拼接多個音頻片段 (`concatenate_audio_clips_moviepy`)。
    *   拼接多個視頻片段並配上完整音軌 (`combine_video_clips_and_set_audio_moviepy`)。
*   **圖像處理：** `Pillow (PIL)` 用於圖像的打開、格式轉換、尺寸調整等。
*   **文件與系統操作：** `os`, `shutil`, `pathlib`, `subprocess`。
*   **Colab 特定功能：** `google.colab.files` 用於圖像上傳。
*   **輸出顯示：** `IPython.display` 用於在 Colab 中顯示 Markdown, HTML, Audio, Image 等。

**功能介紹：**

此 Cell 是整個項目的“大腦”，定義了執行各項任務所需的幾乎所有 Python 函數。這些函數被後續的 Cell 5, Cell 6, Cell 7 中的主處理循環所調用。主要功能包括：
1.  **圖像處理與上傳**：提供上傳本地圖像文件的功能。
2.  **古文字辨識與解析**：調用 Gemini API 辨識圖像中的古文字，並將返回的文本結果解析成結構化數據。
3.  **LLM 內容創作**：
    *   根據古文字信息和指定人設，利用 Gemini 生成描述、單人故事、單人笑話、多角色對話式故事/笑話的腳本（支持中英文）。
    *   對生成的文本進行評估和語法優化（強調保持目標語言）。
4.  **多媒體元素生成提示**：
    *   為靜態圖像內容（描述、笑話等）生成適合 Stable Diffusion 的英文 Prompt。
    *   為動態視頻內容（故事、對話腳本）生成分鏡描述和適合 Zeroscope 的多個英文 Prompt。
    *   為多圖靜態視頻內容生成多個圖像 Prompt。
5.  **語音合成 (TTS)**：
    *   實現了通過 F5-TTS 命令行工具進行聲音克隆，支持中文和英文（需提供對應語言的參考音頻）。
    *   實現了通過 gTTS 生成中文或英文語音作為備選。
6.  **靜態圖像生成**：使用 Hugging Face Diffusers 和 Stable Diffusion 模型，根據文本 Prompt 生成圖像。包含模型按需加載和緩存的邏輯。
7.  **動態視頻生成**：使用 Zeroscope 模型（通過 Diffusers Pipeline），根據文本 Prompt 生成短的動態視頻片段（幀序列）。
8.  **音視頻合成與處理**：
    *   使用 MoviePy 將單張或多張靜態圖像與音頻合成為 MP4 視頻。
    *   使用 MoviePy 拼接多個獨立的音頻片段（例如，多角色對話的每一句）成一個完整的音軌。
    *   使用 MoviePy 拼接多個 Zeroscope 生成的短視頻片段，並配上完整的旁白音軌。
9.  **輔助與檢查**：包含確保 Gemini SDK 可用性、檢查庫可用狀態等輔助邏輯。

此 Cell 本身不直接產生用戶可見的最終輸出，而是為後續的交互式 Cell 提供所有必要的工具函數。它的成功執行是整個項目能夠運行的基礎。

In [None]:
# Cell 3: 函數定義 (v9.3.25 - 強化對話笑話生成時的語言和格式控制)

import sys, os, io, json, requests, textwrap, traceback, shutil, re # Added re
from pathlib import Path; import shlex, subprocess
import PIL.Image
from PIL import ImageDraw, ImageFont
from IPython.display import display, Markdown, HTML, Audio
from gtts import gTTS
import gradio as gr; import wavio
import numpy as np
from diffusers.utils import export_to_video # Assuming this is for Zeroscope parts
import math

print("--- Cell 3 開始執行 (v9.3.25 - 強化對話笑話生成時的語言和格式控制) ---")
DIFFUSERS_AVAILABLE = False; GEMINI_AVAILABLE = False; MOVIEPY_AVAILABLE = False

loaded_zeroscope_pipe = None
loaded_pipeline = None # For static image generation (Stable Diffusion)
loaded_pipeline_name = None

try:
    from google.colab import files
    from transformers import (AutoProcessor, AutoTokenizer, TextGenerationPipeline,
                            AutoModelForSpeechSeq2Seq, AutomaticSpeechRecognitionPipeline, pipeline)
    print("S2T 和 LLM 相關的 transformers 組件導入成功。")
    try:
        import diffusers
        from diffusers import DiffusionPipeline, StableDiffusionPipeline, StableDiffusionXLPipeline
        print(f"Diffusers 模塊頂層導入成功！版本: {diffusers.__version__}")
        DIFFUSERS_AVAILABLE = True
    except ImportError as e_diffusers: print(f"【警告】導入 Diffusers 失敗: {e_diffusers}"); DIFFUSERS_AVAILABLE = False
    try:
        import google.generativeai as genai_check
        print(f"Google Generative AI 模塊頂層導入成功！版本: {genai_check.__version__ if hasattr(genai_check, '__version__') else '未知'}")
        GEMINI_AVAILABLE = True
        if 'genai' not in globals() and genai_check:
            import google.generativeai as genai
    except ImportError as e_gemini: print(f"【警告】頂層導入 google.generativeai 失敗: {e_gemini}")
    try:
        import moviepy.editor
        print(f"MoviePy 頂層導入檢查成功！(版本通常在 Cell 1 設定)")
        MOVIEPY_AVAILABLE = True
    except ImportError as e_moviepy:
        print(f"【警告】頂層導入 moviepy 失敗: {e_moviepy}。視頻生成功能可能受影響。")
        MOVIEPY_AVAILABLE = False
    print("模塊導入（或嘗試導入）完成。")
except ImportError as e: print(f"導入 Cell 3 核心模塊時發生錯誤: {e}")
except NameError as e_name: print(f"全局變數初始化可能不完整: {e_name}")


def _ensure_gemini_available_in_func():
    if 'genai' not in globals() or not callable(getattr(globals().get('genai'), 'GenerativeModel', None)):
        try:
            print("    (嘗試在函數內動態導入 google.generativeai...)")
            import google.generativeai as genai_local
            globals()['genai'] = genai_local
            print("    (Gemini SDK 已在函數內動態導入並設置到全局)")
            return genai_local
        except ImportError:
            print("    【錯誤】無法在函數內動態導入 google.generativeai。")
            return None
    return globals()['genai']
# === 圖像上傳函數 ===
def upload_image(): # (No changes from v9.3.24)
    print("請選擇一個古文字圖片文件上傳："); uploaded = files.upload();
    if not uploaded: print("未選擇任何文件！"); return None, None
    filename = next(iter(uploaded)); print(f"已上傳文件： {filename}\\n"); return filename, uploaded[filename]

# === 古文字辨識函數 (使用 Gemini) ===
def recognize_ancient_char_with_gemini(pil_image, filename="input_image"): # (No changes from v9.3.24)
    genai_sdk = _ensure_gemini_available_in_func();
    if not genai_sdk: return "錯誤：Gemini SDK 不可用 (recognize)。", None
    global GOOGLE_API_KEY
    if GOOGLE_API_KEY:
        try: genai_sdk.configure(api_key=GOOGLE_API_KEY)
        except Exception: pass
    else: return "錯誤：缺少 Gemini API Key (recognize)。", None
    if not pil_image: return "錯誤：沒有圖像數據 (recognize)。", None
    print(f"--- 預處理圖像 (recognize): {filename} ---"); processed_image = pil_image
    try:
        current_format = getattr(processed_image, 'format', 'N/A'); current_mode = getattr(processed_image, 'mode', 'N/A'); print(f"傳入圖像格式: {current_format}, 模式: {current_mode}")
        if not isinstance(processed_image, PIL.Image.Image):
            if isinstance(processed_image, bytes): processed_image = PIL.Image.open(io.BytesIO(processed_image))
            else: return "錯誤：圖像數據類型無法處理。", None
            print(f"重新打開後格式: {getattr(processed_image, 'format', 'N/A')}, 模式: {getattr(processed_image, 'mode', 'N/A')}")
        if getattr(processed_image, 'format', None) == "GIF" or not hasattr(processed_image, '_im'):
            print("檢測到 GIF 或潛在問題對象，嘗試處理...");
            try: processed_image.seek(0); frame_copy = processed_image.copy(); processed_image = frame_copy.convert("RGB"); print(f"內部 GIF 處理後模式: {processed_image.mode}")
            except Exception as e_gif_internal: print(f"內部處理 GIF 失敗: {e_gif_internal}"); pass
        if processed_image.mode != "RGB": print(f"圖像模式為 {processed_image.mode}，轉為 RGB..."); processed_image = processed_image.convert("RGB"); print(f"轉換後模式: {processed_image.mode}")
        print("圖像“淨化”..."); img_byte_arr = io.BytesIO(); processed_image.save(img_byte_arr, format='PNG'); img_byte_arr.seek(0)
        purified_image = PIL.Image.open(img_byte_arr); purified_image.load(); print(f"淨化後模式: {purified_image.mode}, 格式: {getattr(purified_image, 'format', '從BytesIO加載通常為None')}")
        image_for_gemini = purified_image
    except Exception as e_preprocess: print(f"圖像預處理/淨化出錯: {e_preprocess}"); traceback.print_exc(); return f"圖像預處理/淨化出錯: {e_preprocess}", None
    if not isinstance(image_for_gemini, PIL.Image.Image): return "錯誤：最終圖像對象無效。", None
    try: model = genai_sdk.GenerativeModel('gemini-1.5-pro-latest')
    except Exception as e_model: return f"創建 Gemini 模型出錯: {e_model}", None
    prompt_gemini_rec = "你是一位精通多種古文字（包括甲骨文、金文、篆書等）的專家。\\n請仔細分析這張圖片中的主要古文字。如果有多個文字，請專注於最清晰、最完整或最中心的那一個。\\n請提供以下資訊，並確保格式清晰：1. **辨識結果** (或 **古文字**) 2. **文字類型** 3. **拼音** (ma3格式) 4. **基本含義/來源** (或 **釋義**) 5. **置信度** (高/中/低)。\\n請直接回答，不要包含額外的對話。"
    print(f"使用 {model.model_name} (圖像模式: {getattr(image_for_gemini, 'mode', 'N/A')}) 進行辨識...")
    try:
        response = model.generate_content([prompt_gemini_rec, image_for_gemini])
        if response.parts:
            return None, response.text.strip()
        else:
            feedback_msg = "Gemini API (辨識) 返回空回應。"
            if hasattr(response, 'prompt_feedback'):
                feedback = response.prompt_feedback;
                if hasattr(feedback, 'block_reason'): feedback_msg += f" 原因：{feedback.block_reason}."
                if hasattr(feedback, 'safety_ratings') and feedback.safety_ratings: feedback_msg += f" 安全評級: {feedback.safety_ratings}."
            return feedback_msg, None
    except Exception as e_gemini_call:
        print(f"Gemini API (辨識) 調用出錯: {type(e_gemini_call).__name__} - {str(e_gemini_call)}");
        traceback.print_exc();
        return f"Gemini API (辨識) 調用出錯: {e_gemini_call}", None

# === 解析辨識結果函數 ===
def parse_recognition_result(text_content): # (No changes from v9.3.24)
    info = {'character': '未知', 'type': '未知', 'pinyin': '未知', 'meaning': '未知', 'confidence': '未知'}
    if not text_content or not isinstance(text_content, str) or text_content.startswith("錯誤") or text_content.startswith("無法生成"):
        return info
    processed_text = text_content.replace("**", "")
    lines = processed_text.splitlines()
    patterns = {
        'character': r"^(?:\d+\.\s*)?(?:辨識結果|古文字(?:描述)?|推斷漢字|識別出的字|現代漢字)\s*[:：]\s*(.+?)(?:\s*\(.+?\)|（.+?）|\s+-\s+.+)?\s*$",
        'type':      r"^(?:\d+\.\s*)?(?:文字類型|類型)\s*[:：]\s*(.+?)\s*$",
        'pinyin':    r"^(?:\d+\.\s*)?拼音\s*[:：]\s*(.+?)\s*$",
        'meaning':   r"^(?:\d+\.\s*)?(?:基本含義|含義|釋義|基本含義/來源|意義)\s*[:：]\s*(.+?)\s*$",
        'confidence':r"^(?:\d+\.\s*)?置信度\s*[:：]\s*(.+?)\s*$"
    }
    def clean_extracted_value(value_str, field_key="generic"):
        if not value_str: return ""
        cleaned = value_str.strip()
        if field_key == 'character':
            cleaned = cleaned.split('，')[0].split('；')[0].strip()
            cleaned = re.sub(r"\s*\([^\)]+\)\s*$", "", cleaned).strip()
            cleaned = re.sub(r"\s*（[^）]+）\s*$", "", cleaned).strip()
        return cleaned
    for line_idx, line in enumerate(lines):
        line_orig_case = line.strip()
        if not line_orig_case: continue
        for key_field, pattern_str in patterns.items():
            match = re.match(pattern_str, line_orig_case)
            if match:
                value = match.group(1).strip()
                if value: info[key_field] = clean_extracted_value(value, key_field)
                break
    if info['character'] == '未知':
        char_search_pattern = r"(?:辨識結果|古文字(?:描述)?|推斷漢字|識別出的字|現代漢字)\s*[:：]\s*([^\n（(]+)"
        full_text_match = re.search(char_search_pattern, processed_text)
        if full_text_match:
            candidate_char = full_text_match.group(1).strip()
            info['character'] = clean_extracted_value(candidate_char, 'character')
    if info['meaning'] == '未知':
        meaning_search_pattern = r"(?:基本含義|含義|釋義|基本含義/來源|意義)\s*[:：]\s*([^\n]+)"
        full_text_match_meaning = re.search(meaning_search_pattern, processed_text)
        if full_text_match_meaning:
            candidate_meaning = full_text_match_meaning.group(1).strip()
            info['meaning'] = clean_extracted_value(candidate_meaning, 'meaning')
    if info['character'] == '未知' or info['character'] == '提取失敗':
        found_char_label = False
        for line in lines:
            line = line.strip()
            if re.match(r"^(?:\d+\.\s*)?(?:辨識結果|古文字(?:描述)?|推斷漢字|識別出的字|現代漢字)", line):
                found_char_label = True
                potential_char = re.sub(r"^(?:\d+\.\s*)?(?:辨識結果|古文字(?:描述)?|推斷漢字|識別出的字|現代漢字)\s*[:：]\s*", "", line).strip()
                potential_char = clean_extracted_value(potential_char, 'character')
                if potential_char and not any(kw in potential_char for kw in ["類型", "拼音", "含義", "釋義", "置信度"]):
                    info['character'] = potential_char
                    break
            elif found_char_label and line and not any(re.match(pat_str, line) for fk, pat_str in patterns.items()):
                info['character'] = clean_extracted_value(line, 'character')
                break
        if info['character'] == '未知': info['character'] = '提取失敗'
    return info
# === LLM 文本評估/修正函數 ===
def evaluate_and_correct_text_with_gemini(text_to_evaluate, target_lang="中文", context_for_llm="常規文本"):
    genai_sdk = _ensure_gemini_available_in_func();
    if not genai_sdk: return f"錯誤：Gemini SDK 不可用 (evaluate)。", text_to_evaluate, "N/A", "N/A"
    global GOOGLE_API_KEY
    if not GOOGLE_API_KEY: return "錯誤：缺少 Google API Key (evaluate)。", text_to_evaluate, "N/A", "N/A"
    try: genai_sdk.configure(api_key=GOOGLE_API_KEY)
    except Exception: pass
    try:
        model = genai_sdk.GenerativeModel('gemini-1.5-pro-latest')
        # 【【【修改 Prompt 以強調保持語言】】】
        prompt = f"""
請仔細評估以下這段關於「{context_for_llm}」的「{target_lang}」文本。
你的任務是：
1.  **嚴格保持文本的原始語言為「{target_lang}」**。不要將文本翻譯成其他任何語言。
2.  如果文本在「{target_lang}」的語法、流暢度或口語自然度方面存在明顯錯誤或不自然的表達，請提供一個僅針對這些方面進行修正的「{target_lang}」版本。
3.  如果原始文本在「{target_lang}」表達上已經很好，沒有必要修改，則 "corrected_text" 應與 "original_text" 相同。
4.  請對原始文本在「{target_lang}」下的自然口語表達給出一個評分（1-5分，5分為非常自然）。
5.  請簡要說明你的評分理由和（如果進行了）修改建議，修改建議也應基於「{target_lang}」。

請嚴格按照以下JSON格式返回結果（確保 "corrected_text" 仍然是 "{target_lang}"）：
{{
  "original_text": "{text_to_evaluate}",
  "corrected_text": "[修正後的 {target_lang} 文本]",
  "score": [評分數字],
  "reason": "[基於 {target_lang} 的理由和建議]"
}}

待評估「{target_lang}」文本：
"{text_to_evaluate}"
"""
        print(f"\\n--- 使用 Gemini 評估/修正 ({context_for_llm}, 強調保持語言: {target_lang}): '{text_to_evaluate[:50]}...' ---") # 更新打印信息
        response = model.generate_content(prompt)
        if response.parts:
            response_text = response.text.strip(); print(f"Gemini 評估/修正原始返回: \\n{response_text}")
            final_corrected_text = text_to_evaluate; final_score = "N/A"; final_reason = "未能解析評估。"
            try:
                json_response_text = response_text
                if json_response_text.startswith("```json"): json_response_text = json_response_text[7:]
                if json_response_text.endswith("```"): json_response_text = json_response_text[:-3]
                eval_result = json.loads(json_response_text)

                # 檢查 corrected_text 是否為空或僅包含空格
                corrected_candidate = eval_result.get("corrected_text")
                if corrected_candidate and corrected_candidate.strip():
                    final_corrected_text = corrected_candidate
                else:
                    print("    【警告】Gemini 返回的 corrected_text 為空或無效，將使用原始文本。")
                    final_corrected_text = text_to_evaluate # Fallback to original if corrected is empty

                final_score = str(eval_result.get("score", "N/A (JSON)"));
                final_reason = eval_result.get("reason", "未能解析評估理由 (JSON)。")
                print(f"JSON解析成功: 修正後文本='{final_corrected_text}', 評分={final_score}, 理由='{final_reason}'");
                return None, final_corrected_text, final_score, final_reason
            except json.JSONDecodeError as e_json_parse:
                # ... (後備標籤解析邏輯保持不變，但也要注意它提取的文本語言) ...
                print(f"解析 Gemini 返回的 JSON 時出錯: {e_json_parse}")
                print("    回退到標籤解析...")
                temp_corrected = text_to_evaluate # 默認回退到原始文本
                temp_score = "N/A"
                temp_reason = "N/A (標籤解析)"

                try:
                    if "\"corrected_text\":\"" in response_text:
                        start_marker = "\"corrected_text\":\""
                        end_marker = "\""
                        substring_after_marker = response_text.split(start_marker, 1)[1]
                        extracted_val = substring_after_marker.split(end_marker, 1)[0]
                        extracted_val = extracted_val.replace('\\\\"', '"').replace('\\\\n', '\\n')
                        if extracted_val.strip(): # 確保提取的不是空字符串
                            temp_corrected = extracted_val
                except IndexError:
                    print("    後備解析 corrected_text 時發生 IndexError (可能未找到標籤或格式不符)。")
                except Exception as e_parse_corr:
                    print(f"    後備解析 corrected_text 時發生其他錯誤: {e_parse_corr}")

                try:
                    if "\"score\":" in response_text:
                        score_part = response_text.split("\"score\":", 1)[1]
                        score_match = re.search(r"(\d+\.?\d*)", score_part)
                        if score_match:
                            temp_score = score_match.group(1).strip()
                except IndexError:
                     print("    後備解析 score 時發生 IndexError。")
                except Exception as e_parse_score:
                    print(f"    後備解析 score 時發生其他錯誤: {e_parse_score}")

                try:
                    if "\"reason\":\"" in response_text:
                        start_marker_reason = "\"reason\":\""
                        end_marker_reason = "\""
                        substring_after_reason_marker = response_text.split(start_marker_reason, 1)[1]
                        extracted_reason = substring_after_reason_marker.split(end_marker_reason, 1)[0]
                        temp_reason = extracted_reason.replace('\\\\"', '"').replace('\\\\n', '\\n')
                except IndexError:
                    print("    後備解析 reason 時發生 IndexError。")
                except Exception as e_parse_reason:
                    print(f"    後備解析 reason 時發生其他錯誤: {e_parse_reason}")

                print(f"後備解析結果: 修正後文本='{temp_corrected}', 評分='{temp_score}', 理由='{temp_reason}'")
                return "JSON解析失敗，使用後備標籤解析。", temp_corrected, temp_score, temp_reason
        else: # response.parts is empty
            return "Gemini API 返回空回應", text_to_evaluate, "N/A", "N/A"
    except Exception as e_gemini_eval:
        print(f"調用 Gemini 進行文本評估時出錯: {e_gemini_eval}"); traceback.print_exc();
        return f"調用 Gemini 進行文本評估時出錯: {e_gemini_eval}", text_to_evaluate, "N/A", "N/A"
# === 生成帶人設的故事文本函數 ===
def generate_story_with_persona(char_info, persona_description, target_lang="中文", story_length_hint="大約3-5句話"): # (No changes from v9.3.24)
    genai_sdk = _ensure_gemini_available_in_func()
    if not genai_sdk: return "錯誤：Gemini SDK 不可用 (story generation)。", None
    global GOOGLE_API_KEY
    if not GOOGLE_API_KEY: return "錯誤：缺少 Google API Key (story generation)。", None
    try: genai_sdk.configure(api_key=GOOGLE_API_KEY)
    except Exception: pass
    if not char_info or char_info.get('character', '未知') in ['未知', '提取失敗']: return "錯誤：缺少有效的古文字信息以生成故事。", None
    if not persona_description: return "錯誤：缺少角色人設描述以生成故事。", None
    character = char_info.get('character'); meaning = char_info.get('meaning', '不詳'); char_type = char_info.get('type', '古文字')
    try:
        model = genai_sdk.GenerativeModel('gemini-1.5-pro-latest')
        prompt = f"""{persona_description}\\n\\n現在，請你圍繞以下古文字信息，創作一個簡短的、符合你上述人設風格的{target_lang}故事或一段評論。故事或評論的內容應該與這個古文字相關，可以涉及到它的起源、含義、或者引申出來的文化意義。長度請控制在 {story_length_hint} 左右。請直接輸出故事或評論的文本，不要任何額外的開場白、解釋或標籤。\\n\\n古文字信息：\\n- 文字: {character}\\n- 類型: {char_type}\\n- 基本含義: {meaning}"""
        print(f"\\n--- 使用 Gemini (某人設) 為 '{character}' 生成故事/評論 ---")
        response = model.generate_content(prompt)
        if response.parts:
            story_text = response.text.strip()
            prefixes_to_remove = ["好的，這有一個故事：", "故事：", "評論：", "好的，這是一個笑話：", "好的，這有個笑話：", "笑話：", "這是一個笑話：", "這有個笑話：", "趣事：", "俏皮話："]
            for prefix in prefixes_to_remove:
                if story_text.lower().startswith(prefix.lower()): story_text = story_text[len(prefix):].strip()
            print(f"Gemini 生成的原始故事/評論 (已嘗試移除前綴): \\n{story_text}"); return None, story_text
        else:
            feedback_msg = "Gemini API (故事生成) 返回空回應。"
            if hasattr(response, 'prompt_feedback'):
                feedback = response.prompt_feedback;
                if hasattr(feedback, 'block_reason'): feedback_msg += f" 原因：{feedback.block_reason}."
                if hasattr(feedback, 'safety_ratings') and feedback.safety_ratings: feedback_msg += f" 安全評級: {feedback.safety_ratings}."
            return feedback_msg, None
    except Exception as e_gemini_story: print(f"調用 Gemini 生成故事/評論時出錯: {e_gemini_story}"); traceback.print_exc(); return f"調用 Gemini 生成故事/評論時出錯: {e_gemini_story}", None

# === 生成帶人設的笑話文本函數 (單人) ===
def generate_character_joke_with_persona(char_info, persona_description, target_lang="中文"): # (No changes from v9.3.24)
    genai_sdk = _ensure_gemini_available_in_func()
    if not genai_sdk: return "錯誤：Gemini SDK 不可用 (joke generation)。", None
    global GOOGLE_API_KEY
    if not GOOGLE_API_KEY: return "錯誤：缺少 Google API Key (joke generation)。", None
    try: genai_sdk.configure(api_key=GOOGLE_API_KEY)
    except Exception: pass
    if not char_info or char_info.get('character', '未知') in ['未知', '提取失敗']:
        return "錯誤：缺少有效的古文字信息以生成笑話。", None
    if not persona_description:
        return "錯誤：缺少角色人設描述以生成笑話。", None
    character = char_info.get('character'); meaning = char_info.get('meaning', '不詳'); char_type = char_info.get('type', '古文字')
    try:
        model = genai_sdk.GenerativeModel('gemini-1.5-pro-latest')
        prompt = f"""{persona_description}\\n\\n現在，請你圍繞以下古文字信息，創作一個簡短的、符合你上述人設風格的【{target_lang}笑話】。笑話的內容應該巧妙地與這個古文字相關，可以涉及到它的字形、含義、或者相關的趣聞。請直接輸出笑話的文本，不要任何額外的開場白、解釋或標籤。\\n\\n古文字信息：\\n- 文字: {character}\\n- 類型: {char_type}\\n- 基本含義: {meaning}"""
        print(f"\\n--- 使用 Gemini (某人設) 為 '{character}' 生成【單人笑話】---")
        response = model.generate_content(prompt)
        if response.parts:
            joke_text = response.text.strip()
            prefixes_to_remove = ["好的，這有一個笑話：", "好的，這有個笑話：", "笑話：", "這是一個笑話：", "這有個笑話：", "來個笑話："]
            for prefix in prefixes_to_remove:
                if joke_text.lower().startswith(prefix.lower()):
                    joke_text = joke_text[len(prefix):].strip()
            print(f"Gemini 生成的原始單人笑話 (已嘗試移除前綴): \\n{joke_text}");
            return None, joke_text
        else:
            feedback_msg = "Gemini API (單人笑話生成) 返回空回應。"
            if hasattr(response, 'prompt_feedback'):
                feedback = response.prompt_feedback;
                if hasattr(feedback, 'block_reason'): feedback_msg += f" 原因：{feedback.block_reason}."
                if hasattr(feedback, 'safety_ratings') and feedback.safety_ratings: feedback_msg += f" 安全評級: {feedback.safety_ratings}."
            return feedback_msg, None
    except Exception as e_gemini_joke:
        print(f"調用 Gemini 生成單人笑話時出錯: {e_gemini_joke}");
        traceback.print_exc();
        return f"調用 Gemini 生成單人笑話時出錯: {e_gemini_joke}", None

# Cell 3 - Part X: Modified Dialogue Joke Generation and Parsing

# === 【修正版】生成【多角色對話式】笑話文本函數 (返回 script_tags) ===
def generate_dialogue_joke_with_personas(char_info,
                                        persona_A_description,
                                        persona_B_description,
                                        speaker_A_display_name, # 用於顯示的完整名字，如 "川普"
                                        speaker_B_display_name,
                                        target_lang="中文",
                                        num_dialogue_turns=2,
                                        joke_context_hint=""):
    genai_sdk = _ensure_gemini_available_in_func()
    if not genai_sdk:
        return "錯誤：Gemini SDK 不可用 (dialogue joke generation)。", None, None, None
    global GOOGLE_API_KEY
    if not GOOGLE_API_KEY:
        return "錯誤：缺少 Google API Key (dialogue joke generation)。", None, None, None
    try:
        genai_sdk.configure(api_key=GOOGLE_API_KEY)
    except Exception:
        pass

    if not char_info or char_info.get('character', '未知') in ['未知', '提取失敗']:
        return "錯誤：缺少有效的古文字信息以生成對話式笑話。", None, None, None
    if not persona_A_description or not persona_B_description:
        return "錯誤：缺少角色人設描述以生成對話式笑話。", None, None, None
    if not speaker_A_display_name or not speaker_B_display_name: # Changed from speaker_A_name
        return "錯誤：缺少對話角色顯示名稱。", None, None, None

    character = char_info.get('character')
    meaning = char_info.get('meaning', '不詳')
    char_type = char_info.get('type', '古文字')

    # --- 確定用於腳本的說話者標籤 ---
    # 簡單規則：取顯示名稱的第一個詞。如果目標是英文，做一些常見映射。
    speaker_A_script_tag = speaker_A_display_name.split(' ')[0]
    speaker_B_script_tag = speaker_B_display_name.split(' ')[0]

    # 針對已知角色的英文標籤優化 (可以擴展這個映射)
    if target_lang == 'en':
        name_map_en = { "川普": "Trump", "歐巴馬": "Obama", "minecraft村民": "Villager" } # 假設顯示名是中文
        # 如果顯示名本身就是英文，split(' ')[0] 可能已經夠用
        speaker_A_script_tag = name_map_en.get(speaker_A_display_name, speaker_A_script_tag)
        speaker_B_script_tag = name_map_en.get(speaker_B_display_name, speaker_B_script_tag)
        # 確保即使映射失敗，也有個默認值 (即 split(' ')[0] 的結果)

    lang_for_prompt = "英文" if target_lang == "en" else "中文"
    output_instruction_dialogue = f"請直接輸出【{lang_for_prompt}】的對話腳本。"
    if target_lang == 'en':
        output_instruction_dialogue = "Please directly output the dialogue script in ENGLISH."


    prompt = f"""
你是一位幽默的編劇。請圍繞以下古文字信息，創作一個簡短的、包含兩個角色之間多輪對話的【{lang_for_prompt}笑話腳本】。

古文字信息 (Ancient Character Information):
- 文字 (Character): {character}
- 類型 (Type): {char_type}
- 基本含義 (Basic Meaning): {meaning}
{joke_context_hint}

對話角色A:
- 在腳本中請使用此標籤 (Use this tag in script): {speaker_A_script_tag}
- 人設 (Persona): {persona_A_description}

對話角色B:
- 在腳本中請使用此標籤 (Use this tag in script): {speaker_B_script_tag}
- 人設 (Persona): {persona_B_description}

任務要求:
1. 笑話內容應巧妙地與上述古文字的【字形】、【含義】或【相關趣聞】相關。
2. 對話應在角色A ({speaker_A_script_tag}) 和角色B ({speaker_B_script_tag}) 之間進行。
3. 每個角色大約說 {num_dialogue_turns} 輪話。
4. 確保對話符合各自的人設風格，並且整體構成一個有趣的笑話。
5. 請嚴格按照以下格式輸出笑話腳本（每行一個角色的話，並以【角色腳本標籤名】和冒號開頭，例如 "{speaker_A_script_tag}: [內容]" 或 "{speaker_B_script_tag}: [內容]"）：
   {speaker_A_script_tag}: [角色A說的話]
   {speaker_B_script_tag}: [角色B說的話]
   ...
6. {output_instruction_dialogue} 不要包含任何額外的開場白、解釋、標題（如“腳本：”）、Markdown標記（如 **）或結尾語。

"""
    print(f"\\n--- 使用 Gemini 為 '{character}' 生成對話式笑話 ({speaker_A_script_tag} vs {speaker_B_script_tag}, 語言: {lang_for_prompt}) ---")
    try:
        model = genai_sdk.GenerativeModel('gemini-1.5-pro-latest')
        response = model.generate_content(prompt)

        if response.parts:
            dialogue_script_text = response.text.strip()
            # (之前的清理邏輯基本可以保留)
            prefixes_to_remove_patterns = [
                r"^\s*好的，這是一個.*腳本：\s*", r"^\s*這是一個由.*?演繹的笑話：\s*",
                r"^\s*腳本：\s*", r"^\s*笑話腳本：\s*", r"^\s*Script:\s*", r"^\s*Joke Script:\s*",
                r"^\s*\[SCENE START\]\s*", r"^\s*\[場景開始\]\s*",
                r"^\s*好的，這有一個對話笑話：\s*", r"^\s*Okay, here's the dialogue script:\s*",
                r"^\s*Here is the dialogue in Chinese:\s*", r"^\s*Chinese dialogue:\s*",
                r"^\s*中文对话：\s*", r"^\s*以下是中文对话：\s*"
            ]
            for pat in prefixes_to_remove_patterns:
                dialogue_script_text = re.sub(pat, "", dialogue_script_text, flags=re.IGNORECASE | re.MULTILINE).strip()
            dialogue_script_text = dialogue_script_text.replace("**", "")

            # 行級清理，只保留看起來像 "標籤: 內容" 的行
            cleaned_lines = []
            potential_dialogue_lines = dialogue_script_text.splitlines()
            # 使用返回的 script_tags 來構建正則，更準確
            pattern_for_parsing = re.compile(rf"^\s*(?:{re.escape(speaker_A_script_tag)}|{re.escape(speaker_B_script_tag)})\s*[:：].*", re.IGNORECASE)

            for line in potential_dialogue_lines:
                if pattern_for_parsing.match(line.strip()):
                    cleaned_lines.append(line)

            dialogue_script_text_original_cleaned = dialogue_script_text
            if cleaned_lines:
                dialogue_script_text_after_aggressive_clean = "\\n".join(cleaned_lines).strip()
                if not dialogue_script_text_after_aggressive_clean and dialogue_script_text_original_cleaned:
                    print("    【警告】對話腳本積極清理後結果為空，將回退到僅移除已知前綴的結果。")
                    # dialogue_script_text remains dialogue_script_text_original_cleaned
                elif len(dialogue_script_text_after_aggressive_clean) < len(dialogue_script_text_original_cleaned) * 0.5 and \
                     len(potential_dialogue_lines) > 3 and \
                     len(dialogue_script_text_after_aggressive_clean) > 0 :
                    print("    【警告】對話腳本積極清理後文本大幅縮短，但仍保留積極清理的結果。")
                    dialogue_script_text = dialogue_script_text_after_aggressive_clean
                elif dialogue_script_text_after_aggressive_clean:
                    dialogue_script_text = dialogue_script_text_after_aggressive_clean
            elif not dialogue_script_text_original_cleaned :
                 dialogue_script_text = ""

            print(f"    Gemini 生成的對話腳本 (已處理): \\n{dialogue_script_text}")
            return None, dialogue_script_text, speaker_A_script_tag, speaker_B_script_tag # 【返回 script tags】
        else:
            feedback_msg = "Gemini API (對話笑話生成) 返回空回應。"
            if hasattr(response, 'prompt_feedback') and response.prompt_feedback and hasattr(response.prompt_feedback, 'block_reason') and response.prompt_feedback.block_reason:
                feedback_msg += f" 原因: {response.prompt_feedback.block_reason}"
            return feedback_msg, None, None, None # 【返回 None for tags】
    except Exception as e_gemini_dialogue:
        print(f"調用 Gemini 生成對話式笑話時出錯: {e_gemini_dialogue}")
        traceback.print_exc()
        return f"調用 Gemini 生成對話式笑話時出錯: {e_gemini_dialogue}", None, None, None # 【返回 None for tags】

# === 【修正版】解析對話腳本函數 (使用腳本中的實際標籤) ===
# Cell 3 - parse_dialogue_script (v9.3.43 - 修正行分割並保留調試)
def parse_dialogue_script(script_text, speaker_A_tag_in_script, speaker_B_tag_in_script):
    if not script_text or not isinstance(script_text, str): return []
    if not speaker_A_tag_in_script or not speaker_B_tag_in_script:
        print("    [Parser Debug] 缺少有效的說話者標籤傳入 parse_dialogue_script。")
        return []

    print(f"    [Parser Debug] Input script_text (first 300 chars):\n'''{script_text[:300]}'''")
    print(f"    [Parser Debug] Parsing with Speaker A Tag: '{speaker_A_tag_in_script}', Speaker B Tag: '{speaker_B_tag_in_script}'")

    turns = []

    # 【【【核心修改：改進行分割方式】】】
    # 首先嘗試替換字面量的 '\n' (即 '\\n') 為真實的換行符，以防萬一
    script_text_for_splitting = script_text.replace('\\n', '\n').replace('\\r', '\r')
    lines = re.split(r'\r\n|\r|\n', script_text_for_splitting) # 然後使用正則分割

    print(f"    [Parser Debug] Number of lines after splitting: {len(lines)}") # 打印分割後的行數

    norm_A_tag = re.escape(speaker_A_tag_in_script.strip())
    norm_B_tag = re.escape(speaker_B_tag_in_script.strip())

    pattern_A = re.compile(rf"^\s*{norm_A_tag}\s*[:：]\s*(.*)", re.IGNORECASE)
    pattern_B = re.compile(rf"^\s*{norm_B_tag}\s*[:：]\s*(.*)", re.IGNORECASE)

    current_speaker_matched_tag = None
    current_dialogue_buffer = []

    for line_num, line_content in enumerate(lines):
        line_stripped = line_content.strip()
        # 為了避免過多打印，只在行內容不為空時打印正在處理的行
        if line_stripped:
            print(f"    [Parser Debug] Processing line {line_num+1}/{len(lines)}: '{line_stripped[:100]}'...")

        if not line_stripped: continue # 跳過空行

        matched_tag_this_line = None
        dialogue_part_this_line = ""

        match_A_obj = pattern_A.match(line_stripped)
        match_B_obj = pattern_B.match(line_stripped)

        if match_A_obj:
            matched_tag_this_line = speaker_A_tag_in_script
            dialogue_part_this_line = match_A_obj.group(1).strip()
            # print(f"        [Parser Debug] Matched Speaker A. Dialogue part: '{dialogue_part_this_line[:30]}...'") # 可以取消註釋以獲取更詳細日誌
        elif match_B_obj:
            matched_tag_this_line = speaker_B_tag_in_script
            dialogue_part_this_line = match_B_obj.group(1).strip()
            # print(f"        [Parser Debug] Matched Speaker B. Dialogue part: '{dialogue_part_this_line[:30]}...'")

        if matched_tag_this_line:
            if current_speaker_matched_tag and current_dialogue_buffer:
                turn_to_add = {"speaker": current_speaker_matched_tag, "dialogue": " ".join(current_dialogue_buffer).strip()}
                turns.append(turn_to_add)
                # print(f"        [Parser Debug] Appended previous turn: {turn_to_add}")
            current_speaker_matched_tag = matched_tag_this_line
            current_dialogue_buffer = [dialogue_part_this_line] if dialogue_part_this_line else [] # 確保不添加空字符串到buffer
            # print(f"        [Parser Debug] Starting new turn for: {current_speaker_matched_tag}. Buffer: {current_dialogue_buffer}")
        elif current_speaker_matched_tag:
            current_dialogue_buffer.append(line_stripped) # 將整行（已去除首尾空格）加入緩衝
            # print(f"        [Parser Debug] Appending to current speaker {current_speaker_matched_tag}. Buffer now: {len(current_dialogue_buffer)} lines.")
        # else:
             # print(f"        [Parser Debug] Line did not match any speaker and no current speaker active: '{line_stripped}'")

    if current_speaker_matched_tag and current_dialogue_buffer: # 保存最後一個說話者的內容
        turn_to_add = {"speaker": current_speaker_matched_tag, "dialogue": " ".join(current_dialogue_buffer).strip()}
        turns.append(turn_to_add)
        # print(f"    [Parser Debug] Appended final turn: {turn_to_add}")

    turns = [turn for turn in turns if turn.get("dialogue")] # 過濾掉可能產生的空對話輪次

    print(f"    [Parser Debug] Final parsed turns count: {len(turns)}")
    # 只打印少量輪次信息以避免過多輸出，除非需要詳細調試
    for i_debug_turn, debug_turn in enumerate(turns[:5]): # 最多打印前5輪
        print(f"        Turn {i_debug_turn+1}: Speaker='{debug_turn['speaker']}', Dialogue (first 50 chars)='{debug_turn['dialogue'][:50]}...'")
    if len(turns) > 5:
        print(f"        ... (and {len(turns) - 5} more turns not shown in debug log)")


    if not turns and script_text.strip(): # 只有在原始腳本不為空但解析結果為空時才警告
        print(f"    [Parser Warning] Script parsing resulted in zero turns despite non-empty input script.")
        print(f"        Original script (first 200 chars):\n'''{script_text[:200]}'''")
        print(f"        Speaker tags used for parsing: A='{speaker_A_tag_in_script}', B='{speaker_B_tag_in_script}'")
    return turns
# === 為故事/笑話生成【多個】【視頻】提示的函數 ===
def generate_multiple_video_prompts_for_story_gemini(story_or_joke_text, # (No changes from v9.3.24)
                                                    num_clips_target=3, content_type="故事",
                                                    char_info=None, persona_style_hint="",
                                                    video_style_hint="cinematic animation, visually rich, detailed"):
    genai_sdk = _ensure_gemini_available_in_func();
    if not genai_sdk:
        return [{"scene_id": i+1, "scene_description_chinese": "錯誤", "english_video_prompt": f"Error: Gemini SDK not available"} for i in range(num_clips_target)], f"Error: Gemini SDK not available ({content_type})"
    global GOOGLE_API_KEY
    if not GOOGLE_API_KEY:
        return [{"scene_id": i+1, "scene_description_chinese": "錯誤", "english_video_prompt": "Error: Missing API Key"} for i in range(num_clips_target)], f"Error: Missing API Key ({content_type})"
    try: genai_sdk.configure(api_key=GOOGLE_API_KEY)
    except Exception: pass
    if not story_or_joke_text or not story_or_joke_text.strip():
        return [{"scene_id": i+1, "scene_description_chinese": "錯誤", "english_video_prompt": f"Error: Input {content_type} text is empty."} for i in range(num_clips_target)], f"Error: Input {content_type} text is empty."
    video_prompts_data_list = []; error_message = None
    try:
        model = genai_sdk.GenerativeModel('gemini-1.5-pro-latest')
        char_info_str = ""
        if char_info and char_info.get('character') and char_info.get('character') != '未知':
            char_info_str = f"相關古文字信息：文字「{char_info.get('character','')}」，含義「{char_info.get('meaning','')}」。"
        gemini_task_intro = f"我有一段「{content_type}」文本，請你將其分解成大約 {num_clips_target} 個獨立的、適合用 AI 生成短視頻片段（每個約2-5秒）來表達的關鍵視覺場景或核心概念。"
        if content_type == "笑話":
            gemini_task_intro += f" 由於這是個（可能是對話式的）笑話，請確保分解出的場景和生成的 Prompt 能體現其幽默感或笑點所在，可能需要表現角色互動或反應。"
        scene_division_prompt = f"""{gemini_task_intro}
{char_info_str}
「{content_type}」的整體風格基調是：{persona_style_hint if persona_style_hint else "富有想象力和藝術感"}。
對於每個分解出來的場景/概念，請完成兩件事：
1.  提供一個簡潔的【中文場景描述】，概括這個視頻片段應該展示什麼。
2.  基於這個中文場景描述（以及古文字信息，如果相關，請巧妙融入），生成一個詳細的、引人入勝的【英文視頻生成 Prompt】。這個英文 Prompt 應該適合直接輸入到 Zeroscope 或類似的 AI 視頻模型中。 Prompt 應包含：主體、動作/運動（適合短視頻）、環境、氛圍、光照、攝像機視角/運鏡（例如 "dynamic camera pan", "slow zoom in", "birds-eye view transitioning to close-up"），以及明確的視覺風格（例如 "{video_style_hint}", "surreal dreamlike animation", "ancient scroll revealing secrets", "glowing ethereal energy"）。 強調動態感和視覺衝擊力。
「{content_type}」文本如下：
"{story_or_joke_text}"
請嚴格按照以下 JSON 格式返回一個包含 {num_clips_target} 個（或最接近此數量，至少1個）場景對象的列表：
[ {{\"scene_id\": 1, \"scene_description_chinese\": \"場景1的中文描述...\", \"english_video_prompt\": \"Detailed English prompt for video scene 1...\"}}, {{\"scene_id\": 2, \"scene_description_chinese\": \"場景2的中文描述...\", \"english_video_prompt\": \"Detailed English prompt for video scene 2...\"}} ]
如果故事/笑話很短，分解成1-2個場景也可以。確保每個 "english_video_prompt" 都是高質量且具體的。 """
        print(f"\\n--- 使用 Gemini 為「{content_type}」生成 {num_clips_target} 個視頻片段的 Prompts ---"); print(f"    基於文本 (前50字符): {story_or_joke_text[:50]}...")
        response = model.generate_content(scene_division_prompt)
        if response.parts:
            response_text = response.text.strip(); print(f"    Gemini 返回的視頻 Prompts (原始): \\n{response_text}")
            try:
                json_response_text = response_text
                if json_response_text.startswith("```json"): json_response_text = json_response_text[7:]
                if json_response_text.endswith("```"): json_response_text = json_response_text[:-3]
                scenes_data = json.loads(json_response_text)
                if isinstance(scenes_data, list):
                    for idx, scene in enumerate(scenes_data):
                        if isinstance(scene, dict) and "english_video_prompt" in scene and scene["english_video_prompt"].strip():
                            video_prompts_data_list.append({"scene_id": scene.get('scene_id', idx + 1), "scene_description_chinese": scene.get('scene_description_chinese', 'N/A'), "english_video_prompt": scene["english_video_prompt"]})
                            print(f"      - 片段 {idx+1} 描述 (中): {scene.get('scene_description_chinese', 'N/A')}"); print(f"      - 片段 {idx+1} Prompt (英): {scene['english_video_prompt']}")
                    if not video_prompts_data_list:
                        error_message = f"未能從 Gemini 返回中解析出「{content_type}」的有效視頻 Prompts 列表（可能是空的有效JSON列表）。"; print(f"    【警告】{error_message}")
                        video_prompts_data_list = [{"scene_id":1, "scene_description_chinese": "通用場景", "english_video_prompt": f"A visually engaging animation related to: {story_or_joke_text[:50]}..., {video_style_hint}"}]
                else:
                    error_message = f"Gemini 為「{content_type}」返回的不是預期的列表格式 (而是 {type(scenes_data)}）。"; print(f"    【警告】{error_message}")
                    video_prompts_data_list = [{"scene_id":1, "scene_description_chinese": "通用場景", "english_video_prompt": f"A general animation about: {story_or_joke_text[:50]}..., {video_style_hint}"}]
            except json.JSONDecodeError as e_json:
                error_message = f"解析 Gemini 為「{content_type}」返回的 JSON 失敗: {e_json}"; print(f"    【警告】{error_message}")
                video_prompts_data_list = [{"scene_id":1, "scene_description_chinese": "通用場景", "english_video_prompt": f"An artistic animation representing: {story_or_joke_text[:70]}..., {video_style_hint}"}]
        else:
            error_message = f"Gemini ({content_type}視頻 Prompts) 返回空回應。"
            if hasattr(response, 'prompt_feedback') and response.prompt_feedback.block_reason: error_message += f" 原因: {response.prompt_feedback.block_reason}"
            print(f"    {error_message}"); video_prompts_data_list = [{"scene_id":1, "scene_description_chinese": "通用場景", "english_video_prompt": f"Video clip related to the {content_type}: {story_or_joke_text[:50]}..., {video_style_hint}"}]
    except Exception as e_gen_multi_video_prompt:
        error_message = f"為「{content_type}」生成多視頻 Prompts 時出錯: {e_gen_multi_video_prompt}"; print(f"    {error_message}"); traceback.print_exc()
        video_prompts_data_list = [{"scene_id":1, "scene_description_chinese": "錯誤後備", "english_video_prompt": f"Default video prompt due to error for {content_type}: {story_or_joke_text[:50]}..., {video_style_hint}"}]
    if not video_prompts_data_list:
        video_prompts_data_list = [{"scene_id":i+1, "scene_description_chinese": "通用視頻片段", "english_video_prompt": f"A general animated video clip {i+1}, {video_style_hint}"} for i in range(num_clips_target)]
        if not error_message: error_message = "未知原因導致 video_prompts_data_list 為空，使用默認值。"
    current_num_prompts = len(video_prompts_data_list)
    if current_num_prompts > num_clips_target: video_prompts_data_list = video_prompts_data_list[:num_clips_target]
    elif 0 < current_num_prompts < num_clips_target:
        last_prompt_data = video_prompts_data_list[-1].copy()
        for i in range(num_clips_target - current_num_prompts):
            new_data = last_prompt_data.copy(); new_data["scene_id"] = current_num_prompts + i + 1
            video_prompts_data_list.append(new_data)
    elif current_num_prompts == 0 and num_clips_target > 0 :
        video_prompts_data_list = [{"scene_id":i+1, "scene_description_chinese": "最終後備", "english_video_prompt": f"Fallback animated clip {i+1}, {video_style_hint}"} for i in range(num_clips_target)]
    return video_prompts_data_list, error_message

# === 生成圖像 Prompt 函數 (使用 Gemini, 適用於單一靜態圖像) ===
def generate_final_image_prompt(  # 在左括號後直接換行
    text_content_for_image,       # 每個參數獨立一行，比 def 多4個空格縮進
    content_type_hint="一段描述",
    char_info=None,
    persona_description=None,
    user_specific_visual_request=""
):                                # 右括號獨佔一行，與 def 對齊
    # 函數體從這裡開始，也比 def 多4個空格縮進
    genai_sdk = _ensure_gemini_available_in_func();
    if not genai_sdk: return "錯誤：Gemini SDK 不可用 (final_img_prompt)。", None
    global GOOGLE_API_KEY;
    if not GOOGLE_API_KEY: return "錯誤：缺少 Google API Key (final_img_prompt)。", None
    try: genai_sdk.configure(api_key=GOOGLE_API_KEY)
    except Exception: pass
    if not text_content_for_image or not text_content_for_image.strip():
        return f"錯誤：用於生成圖像 Prompt 的「{content_type_hint}」文本內容為空。", None
    context_str = f"主要的文本內容 (類型: {content_type_hint}):\\n\"\"\"\\n{text_content_for_image}\\n\"\"\""
    if char_info and char_info.get('character') and char_info.get('character') != '未知':
        context_str += f"\\n\\n相關古文字: '{char_info.get('character')}' (含義: {char_info.get('meaning', 'N/A')}, 類型: {char_info.get('type', 'N/A')})"
    if persona_description: context_str += f"\\n\\n當前講述者人設風格: {persona_description}"
    if user_specific_visual_request: context_str += f"\\n\\n用戶的額外視覺要求: {user_specific_visual_request}"
    task_description = ""
    if "笑話" in content_type_hint or "對話" in content_type_hint:
        task_description = f"請為上述「{content_type_hint}」內容生成一個能夠【幽默地】或【形象地】捕捉到其核心情景或互動的圖像。思考如何用視覺來增強其趣味性。"
        if "對話" in content_type_hint and persona_description and ";" in persona_description:
             task_description += " 如果是多角色對話，嘗試表現角色間的互動或對比。"
    elif "故事" in content_type_hint:
        task_description = f"請為上述「{content_type_hint}」內容生成一個能夠捕捉其【核心情感】或【關鍵時刻】的圖像。思考如何用視覺來輔助故事的敘述。"
    else: task_description = f"請為上述「{content_type_hint}」內容生成一個能夠準確且富有藝術感地表達其核心信息的圖像。"
    if char_info and char_info.get('character') and char_info.get('character') != '未知':
        task_description += f" 如果可能，巧妙地將古文字「{char_info.get('character')}」的【字形】或【含義】融入到圖像設計中，可以通過抽象、符號化或環境元素的方式。"
    prompt_for_gemini = f"""
You are an expert prompt engineer for AI image generation models like Stable Diffusion.
Your goal is to generate a single, highly detailed, and creative **English prompt** suitable for direct input into a text-to-image model.
The prompt MUST strictly adhere to safety guidelines and avoid generating harmful or inappropriate content.
**Provided Context and Content:**
{context_str}
**Specific Task:**
{task_description}
**Instructions for the English Prompt:**
1.  **Language:** Must be in English.
2.  **Visual Focus:** Emphasize visual elements: subjects, actions, environment, atmosphere, lighting, camera composition (e.g., close-up, wide shot, bird's eye view), and artistic style.
3.  **Keywords & Style:** Include relevant keywords that would guide an AI image generator. Suggested styles could be: photorealistic, cinematic, fantasy art, cartoonish, abstract, impressionistic, specific artist styles (e.g., "in the style of Van Gogh"), etc. Consider the persona style if provided.
4.  **Conciseness with Detail:** Be descriptive and evocative, but aim for a prompt length that is effective for image models (e.g., 50-75 words is often a good range, but can be longer if necessary for detail).
5.  **Direct Output:** Output ONLY the final English prompt. No extra text, no explanations, no "Prompt:".
Generate the English prompt. """ # 注意這裡的 """ 結尾
    print(f"\\n--- 正在使用 Gemini 為「{content_type_hint}」生成圖像 Prompt ---"); print(f"    基於內容 (前50字): {text_content_for_image[:50]}...")
    try:
        model = genai_sdk.GenerativeModel('gemini-1.5-pro-latest')
        response = model.generate_content(prompt_for_gemini)
        if response.parts:
            final_prompt = response.text.strip().replace('```', '').replace("Prompt:", "").replace("prompt:", "").strip()
            if final_prompt.startswith('"') and final_prompt.endswith('"'): final_prompt = final_prompt[1:-1]
            if not final_prompt: return "錯誤：Gemini 生成了空的圖像 Prompt。", None
            print(f"\\n--- 生成的圖像 Prompt (英文) for {content_type_hint} ---"); display(Markdown(f"`{final_prompt}`")); return None, final_prompt
        else:
            feedback_msg = f"Gemini API ({content_type_hint} Img Prompt) 返回了空的回應。"
            if hasattr(response, 'prompt_feedback'):
                feedback = response.prompt_feedback;
                if hasattr(feedback, 'block_reason'): feedback_msg += f" 原因：{feedback.block_reason}."
                if hasattr(feedback, 'safety_ratings') and feedback.safety_ratings: feedback_msg += f" 安全評級: {feedback.safety_ratings}."
            return feedback_msg, None
    except Exception as e: traceback.print_exc(); return f"調用 Gemini 為「{content_type_hint}」生成圖像 Prompt 時出錯: {e}", None

# === 圖像生成函數 (使用 Hugging Face Diffusers) ===
def generate_image_with_diffusers(prompt, model_name="runwayml/stable-diffusion-v1-5", num_inference_steps=30): # (No changes from v9.3.24)
    import torch
    global loaded_pipeline, loaded_pipeline_name, device, DIFFUSERS_AVAILABLE
    if not DIFFUSERS_AVAILABLE: print("【警告】Diffusers 未加載，圖像生成跳過。"); return "Diffusers 不可用", None
    if 'device' not in globals(): device = "cuda" if torch.cuda.is_available() else "cpu"; print(f"generate_image_with_diffusers: device 設置為 {device}")
    current_device_str = str(device)
    if not prompt or not isinstance(prompt, str) or len(prompt.strip()) == 0: return "錯誤：缺少 Prompt。", None
    pipe_to_use = None
    try:
        if loaded_pipeline is None or loaded_pipeline_name != model_name:
            print(f"\\n--- 加載 Diffusers 模型: {model_name} 到 {current_device_str} ---"); display(HTML("<p><i>圖像模型加載中... (可能需要1-2分鐘)</i></p>"));
            pipe_class_to_use = StableDiffusionPipeline;
            if "xl" in model_name.lower(): pipe_class_to_use = StableDiffusionXLPipeline
            try:
                pipe_to_use = pipe_class_to_use.from_pretrained(model_name, torch_dtype=torch.float16 if current_device_str == "cuda" else torch.float32, use_safetensors=True)
                print(f"    使用 {pipe_class_to_use.__name__} 加載 {model_name} 成功。")
            except Exception as e_load_primary:
                print(f"    主要加載方法失敗 ({e_load_primary})，嘗試後備...")
                if "xl" in model_name.lower() and pipe_class_to_use == StableDiffusionXLPipeline:
                    try: pipe_to_use = pipe_class_to_use.from_pretrained(model_name, torch_dtype=torch.float16, variant="fp16", use_safetensors=True); print("    SDXL (fp16 variant) 加載成功。")
                    except Exception as e_load_variant: print(f"    SDXL (fp16 variant) 也失敗: {e_load_variant}。")
                if pipe_to_use is None and pipe_class_to_use != DiffusionPipeline: # Check type before import to avoid error
                    print(f"    嘗試通用 DiffusionPipeline 加載 {model_name}...")
                    try:
                        from diffusers import DiffusionPipeline # Import here if needed for fallback
                        pipe_to_use = DiffusionPipeline.from_pretrained(model_name, torch_dtype=torch.float16 if current_device_str == "cuda" else torch.float32, use_safetensors=True)
                        print("    通用 DiffusionPipeline 加載成功。")
                    except Exception as e_load_generic: print(f"    通用 DiffusionPipeline 也失敗: {e_load_generic}。")
            if pipe_to_use is None: return f"模型 Pipeline '{model_name}' 初始化失敗。", None
            pipe_to_use = pipe_to_use.to(current_device_str)
            if current_device_str == "cuda" and hasattr(pipe_to_use, "enable_xformers_memory_efficient_attention"):
                try: pipe_to_use.enable_xformers_memory_efficient_attention(); print("    已啟用 xformers (若可用)。")
                except: print("    xformers 不可用或啟用失敗。")
            if hasattr(pipe_to_use, "enable_attention_slicing"):
                try: pipe_to_use.enable_attention_slicing(); print("    已啟用 attention slicing。")
                except Exception as e_slice: print(f"    啟用 attention slicing 失敗: {e_slice}")
            loaded_pipeline = pipe_to_use; loaded_pipeline_name = model_name;
            print(f"模型 {model_name} 加載完成並已緩存。")
        else:
            print(f"\\n--- 使用已緩存 Diffusers 模型: {loaded_pipeline_name} ---"); pipe_to_use = loaded_pipeline; pipe_to_use.to(current_device_str)
        print(f"--- Diffusers 生成圖片中 (設備: {current_device_str}, Prompt: '{prompt[:70]}...') ---"); display(HTML("<p><i>圖像生成中...</i></p>"))
        height_param, width_param = (1024, 1024) if "xl" in loaded_pipeline_name.lower() else (512, 512)
        if current_device_str == "cpu":
            height_param, width_param = (512, 512) if "xl" in loaded_pipeline_name.lower() else (256, 256)
            print(f"    警告：CPU 運行，圖像尺寸限制為 {width_param}x{height_param}。")
            if num_inference_steps > 15: num_inference_steps = 15
        generator = torch.Generator(device=current_device_str).manual_seed(int(time.time()))
        with torch.inference_mode():
            image_output = pipe_to_use(prompt=prompt, num_inference_steps=num_inference_steps, guidance_scale=7.5, height=height_param, width=width_param, generator=generator).images[0]
        print("--- 圖像生成成功！ ---\\n");
        if current_device_str == "cuda" and 'torch' in sys.modules and hasattr(torch.cuda, 'empty_cache'): torch.cuda.empty_cache()
        return None, image_output
    except Exception as e_img_gen_outer: print(f"【嚴重錯誤】Diffusers 圖像生成意外錯誤: {e_img_gen_outer}"); traceback.print_exc(); return f"Diffusers 圖像生成嚴重錯誤: {e_img_gen_outer}", None
# Cell 3 - Part 5: TTS and Audio/Video Combination Functions (Corrected Prints)

# === F5-TTS 的 clone_voice 函數 ===
def clone_voice_f5tts(path_to_ref_audio: str, gen_text: str, ref_text: str = "", output_file: str = "f5_tts_output.wav", output_dir: str = "f5_tts_results", speed: float = 1.0, nfe_step_f5: int = 48): # (No changes from v9.3.24 content)
    global device; cli_executable = "f5-tts_infer-cli"; model_id_f5 = "F5TTS_v1_Base"
    if 'device' not in globals():
        try: import torch; device = "cuda" if torch.cuda.is_available() else "cpu"
        except: device = "cpu"
    output_path_obj = Path(output_dir) / output_file; output_path_obj.parent.mkdir(parents=True, exist_ok=True)
    f5_device_str = str(device)
    if "cuda" in f5_device_str and ":" not in f5_device_str: f5_device_str = "cuda:0"
    command = [cli_executable, "--model", model_id_f5, "--ref_audio", str(Path(path_to_ref_audio).resolve()), "--ref_text", ref_text, "--gen_text", gen_text, "--output_dir", str(output_path_obj.parent.resolve()), "--output_file", output_path_obj.name, "--remove_silence", "--speed", str(speed), "--nfe_step", str(nfe_step_f5), "--device", f5_device_str]
    print(f"準備執行 F5-TTS 命令: {shlex.join(command)}")
    try:
        result = subprocess.run(command, check=True, capture_output=True, text=True, encoding='utf-8', errors='replace', timeout=180)
        print("\\nF5-TTS 命令執行成功。"); print(f"生成的音訊檔案: {output_path_obj}"); return None, str(output_path_obj)
    except subprocess.CalledProcessError as e:
        print(f"\\nF5-TTS 命令執行失敗 (錯誤碼 {e.returncode}):"); stderr_msg = e.stderr
        if isinstance(stderr_msg, bytes):
            try: stderr_msg = stderr_msg.decode('utf-8', errors='replace')
            except: stderr_msg = str(stderr_msg)
        print(f"STDOUT: {e.stdout if e.stdout else '(無)'}"); print(f"STDERR: {stderr_msg if stderr_msg else '(無)'}")
        if stderr_msg and ("out of memory" in stderr_msg.lower() or "oom" in stderr_msg.lower()): print("【【F5-TTS OOM detected in STDERR!】】"); return "F5-TTS 命令失敗 (OOM)", None
        return f"F5-TTS 命令失敗", None
    except FileNotFoundError: print(f"\\n錯誤：找不到命令 '{cli_executable}'。"); return f"找不到 '{cli_executable}'", None
    except subprocess.TimeoutExpired: print(f"\\nF5-TTS 命令執行超時。文本可能過長或系統負載過高。"); return "F5-TTS 超時", None
    except Exception as e_f5: print(f"\\n執行 F5-TTS 時錯誤: {e_f5}"); traceback.print_exc(); return f"執行 F5-TTS 時出錯: {e_f5}", None

# === 文字轉語音函數 (使用 gTTS - 基礎備選) ===
def generate_and_play_speech_gtts(text, lang='zh-cn', filename="gtts_output.mp3"): # (No changes from v9.3.24 content)
    if not text or not isinstance(text, str): text = str(text)
    if not text.strip(): return "錯誤：沒有文字可轉換。", None
    try:
        print(f"\\n--- 正在使用 gTTS 生成語音 (文本: '{text[:50]}...') ---"); tts_instance_gtts = gTTS(text=text, lang=lang, slow=False);
        output_dir = os.path.dirname(filename)
        if output_dir and not os.path.exists(output_dir): os.makedirs(output_dir, exist_ok=True)
        tts_instance_gtts.save(filename); print(f"gTTS 語音已保存為 {filename}"); return None, filename
    except Exception as e_gtts: print(f"gTTS 生成語音時出錯: {e_gtts}"); return f"gTTS 錯誤: {e_gtts}", None

# === 拼接多個音頻片段 (MoviePy) ===
def concatenate_audio_clips_moviepy(audio_paths_list, output_path, silence_duration_ms=200, target_sample_rate=24000):
    if not MOVIEPY_AVAILABLE: return "錯誤: MoviePy 不可用於音頻拼接。", None
    if not audio_paths_list: return "錯誤: 沒有提供音頻路徑列表用於拼接。", None
    try:
        from moviepy.editor import AudioFileClip, concatenate_audioclips
        from moviepy.audio.AudioClip import AudioArrayClip
    except ImportError as e_moviepy_concat:
        return f"導入MoviePy組件失敗(拼接): {e_moviepy_concat}", None

    print(f"--- MoviePy: 準備拼接 {len(audio_paths_list)} 個音頻片段 ---")
    print(f"    輸出至: {output_path}")
    if silence_duration_ms < 0: silence_duration_ms = 0

    clips_to_process = []
    valid_clips_loaded = 0
    try:
        # ... (for loop for loading clips) ...
        for i, path in enumerate(audio_paths_list):
            if not os.path.exists(path):
                print(f"    【警告】音頻文件未找到，跳過: {path}")
                continue
            try:
                clip = AudioFileClip(path)
                if clip.fps != target_sample_rate:
                    print(f"    重新採樣片段 {i+1} 從 {clip.fps}Hz 到 {target_sample_rate}Hz")
                    clip = clip.set_fps(target_sample_rate)
                clips_to_process.append(clip)
                valid_clips_loaded +=1
                print(f"    已加載音頻片段 {i+1}: {path} (時長: {clip.duration:.2f}s, SR: {clip.fps}Hz)")
                if silence_duration_ms > 0 and i < len(audio_paths_list) - 1:
                    sr_for_silence = clip.fps
                    num_samples_silence = int((silence_duration_ms / 1000.0) * sr_for_silence)
                    nchannels_for_silence = clip.nchannels if hasattr(clip, 'nchannels') and clip.nchannels > 0 else 1
                    silence_array = np.zeros((num_samples_silence, nchannels_for_silence))
                    silence_clip = AudioArrayClip(silence_array, fps=sr_for_silence)
                    clips_to_process.append(silence_clip)
            except Exception as e_load_clip:
                print(f"    【警告】加載或處理音頻片段 '{path}' 失敗: {e_load_clip}，跳過此片段。")

        if not clips_to_process or valid_clips_loaded == 0:
            return "錯誤: 未能加載任何有效的音頻片段進行拼接。", None

        final_audio = concatenate_audioclips(clips_to_process)
        output_dir_concat = os.path.dirname(output_path)
        if output_dir_concat and not os.path.exists(output_dir_concat):
            os.makedirs(output_dir_concat, exist_ok=True)

        final_audio.write_audiofile(output_path, codec='pcm_s16le', fps=target_sample_rate)
        total_duration_final = final_audio.duration

        # Close clips after successful write
        for c in clips_to_process: # This loop should be part of the try block if successful
            try: c.close()
            except: pass
        try: final_audio.close() # Close the final concatenated clip
        except: pass

        print(f"--- 【成功】音頻已拼接至 {output_path} (總時長: {total_duration_final:.2f}s, SR: {target_sample_rate}Hz) ---")
        return None, output_path

    except Exception as e:  # <--- except 塊開始
        traceback.print_exc() # <--- 相對於 except 縮進4個空格
        # 下面的 for 循環和 return 也應該有相同的縮進級別
        for c_err in clips_to_process: # <--- 與 traceback.print_exc() 對齊
            try:
                c_err.close()
            except:
                pass
        return f"音頻拼接過程中出錯: {e}", None # <--- 與 traceback.print_exc() 對齊

# === 創建視頻函數 (來自靜態圖像序列) ===
def create_video_from_images_and_audio(image_files, audio_file, output_video_path, fps=24, image_duration=None): # (Corrected f-string issues from snippet)
    global MOVIEPY_AVAILABLE
    if not MOVIEPY_AVAILABLE:
        try: import moviepy.editor; MOVIEPY_AVAILABLE = True; print("    (create_video: MoviePy 動態檢查並設置為 True)")
        except ImportError: MOVIEPY_AVAILABLE = False; print("    (create_video: MoviePy 動態檢查失敗，設置為 False)")
    if not MOVIEPY_AVAILABLE: print("【錯誤】創建視頻失敗：MoviePy 不可用或未導入。"); return f"MoviePy不可用", None
    try: from moviepy.editor import ImageClip, AudioFileClip, concatenate_videoclips
    except ImportError as e_moviepy_func: print(f"【錯誤】MoviePy 組件無法在函數內導入: {e_moviepy_func}"); return f"MoviePy導入失敗: {e_moviepy_func}", None

    print(f"--- 開始創建視頻 (靜態圖像序列): {os.path.basename(output_video_path)} ---")
    print(f"  音頻文件: {audio_file}")
    print(f"  圖像文件數量: {len(image_files)}") # Corrected: Removed extra quote and parenthesis

    if not os.path.exists(audio_file): print(f"【錯誤】音頻文件未找到: {audio_file}"); return f"音頻文件未找到", None
    if not image_files: print(f"【錯誤】沒有提供圖像文件。"); return "沒有圖像文件", None
    for i, img_f_path in enumerate(image_files):
        if not os.path.exists(img_f_path): print(f"【錯誤】圖像文件 (索引 {i}) 未找到: {img_f_path}"); return f"圖像文件 {i} 未找到", None # Corrected: Removed extra quote

    video_clip = None; audio_clip_to_use = None; final_video_clip = None; temp_clips = []; temp_rgb_image_paths = []
    try:
        audio_clip_to_use = AudioFileClip(audio_file); audio_duration = audio_clip_to_use.duration; print(f"  音頻時長: {audio_duration:.2f} 秒")
        if image_duration is None or image_duration <= 0: image_duration = audio_duration / len(image_files) if len(image_files) > 0 else audio_duration
        print(f"  每張圖片計劃顯示時長: {image_duration:.2f} 秒")
        for i, img_path in enumerate(image_files):
            try:
                pil_img = PIL.Image.open(img_path)
                if pil_img.mode != "RGB":
                    pil_img = pil_img.convert("RGB"); temp_rgb_path = f"/content/temp_rgb_for_moviepy_{i}_{Path(img_path).stem}.png"
                    pil_img.save(temp_rgb_path); temp_rgb_image_paths.append(temp_rgb_path); path_for_clip = temp_rgb_path
                    print(f"    圖像 '{img_path}' 已轉換為 RGB 並保存至 '{temp_rgb_path}'")
                else: path_for_clip = img_path
                img_clip_obj = ImageClip(path_for_clip).set_duration(image_duration)
                if len(image_files) > 1:
                    fade_time = min(0.5, image_duration / 4)
                    if i == 0: img_clip_obj = img_clip_obj.crossfadein(fade_time)
                    if i == len(image_files) - 1: img_clip_obj = img_clip_obj.crossfadeout(fade_time)
                temp_clips.append(img_clip_obj)
            except Exception as e_img_clip: print(f"    創建圖像片段 '{img_path}' 出錯: {e_img_clip}")
        if not temp_clips: print("【錯誤】未能創建有效圖像片段。"); return "無有效圖像片段", None
        video_clip = concatenate_videoclips(temp_clips, method="compose")
        final_target_duration = min(video_clip.duration, audio_clip_to_use.duration)
        if abs(video_clip.duration - audio_clip_to_use.duration) > 0.1:
            print(f"    視頻軌道 ({video_clip.duration:.2f}s) 與音頻 ({audio_clip_to_use.duration:.2f}s) 時長不完全匹配。將使用最短時長 {final_target_duration:.2f}s。")
            video_clip = video_clip.subclip(0, final_target_duration); audio_clip_to_use = audio_clip_to_use.subclip(0, final_target_duration)
        final_video_clip = video_clip.set_audio(audio_clip_to_use); final_video_clip = final_video_clip.set_duration(final_target_duration)
        print(f"  正在寫入視頻文件 (FPS={fps}, 最終時長約 {final_target_duration:.2f}s)...");
        output_dir_video = os.path.dirname(output_video_path)
        if output_dir_video and not os.path.exists(output_dir_video): os.makedirs(output_dir_video, exist_ok=True)
        final_video_clip.write_videofile(output_video_path, fps=fps, codec="libx264", audio_codec="aac", threads=os.cpu_count() or 2, preset="medium", logger='bar')
        print(f"【成功】視頻已生成: {output_video_path}"); return None, output_video_path
    except Exception as e_create_vid: print(f"創建視頻時嚴重錯誤: {e_create_vid}"); traceback.print_exc(); return f"創建視頻嚴重錯誤: {e_create_vid}", None
    finally:
        if audio_clip_to_use: audio_clip_to_use.close();
        if video_clip: video_clip.close();
        if final_video_clip: final_video_clip.close();
        for clip_obj in temp_clips: clip_obj.close()
        for temp_path in temp_rgb_image_paths:
            if os.path.exists(temp_path):
                try: os.remove(temp_path)
                except Exception as e_remove: print(f"清理臨時RGB文件 {temp_path} 失敗: {e_remove}")
# === Zeroscope 視頻生成函數 ===
def load_zeroscope_pipeline_once(model_id="cerspense/zeroscope_v2_576w"): # (No changes from v9.3.24)
    import torch
    global loaded_zeroscope_pipe, device
    if 'device' not in globals(): device = "cuda" if torch.cuda.is_available() else "cpu"; print(f"load_zeroscope_pipeline_once: 'device' (在函數內初始化) 設置為: {device}")
    if loaded_zeroscope_pipe is not None:
        print(f"--- 使用已緩存的 Zeroscope pipeline ({model_id}) ---")
        try: loaded_zeroscope_pipe.to(device); return loaded_zeroscope_pipe
        except Exception as e_to_device:
            print(f"    將 Zeroscope 移至 {device} 失敗: {e_to_device}")
            if str(device) == "cpu" and hasattr(loaded_zeroscope_pipe, 'device') and str(loaded_zeroscope_pipe.device) != "cpu":
                try: loaded_zeroscope_pipe.to("cpu"); print("    已將Zeroscope移至CPU")
                except: pass
            return loaded_zeroscope_pipe
    print(f"\\n--- 首次加載 Zeroscope pipeline: {model_id} 到 {device} ---"); display(HTML("<p><i>Zeroscope 模型加載中 (可能需要幾分鐘)...</i></p>"))
    pipe_zeroscope_local = None
    try:
        from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler # Ensure these are imported
        pipe_zeroscope_local = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16 if str(device) == "cuda" else torch.float32)
        pipe_zeroscope_local.to(device); pipe_zeroscope_local.scheduler = DPMSolverMultistepScheduler.from_config(pipe_zeroscope_local.scheduler.config)
        if str(device) == "cuda" and hasattr(pipe_zeroscope_local, "enable_model_cpu_offload"):
            try: pipe_zeroscope_local.enable_model_cpu_offload(); print("    Zeroscope: 已啟用模型 CPU offloading (如果顯存不足)。")
            except: print("    Zeroscope: CPU offloading 啟用失敗或不需要。")
        elif str(device) == "cpu" and hasattr(pipe_zeroscope_local, "enable_attention_slicing"):
            try: pipe_zeroscope_local.enable_attention_slicing(); print("    Zeroscope (CPU): 已啟用 attention slicing。")
            except: pass
        print(f"    Zeroscope pipeline ({model_id}) 加載成功並已配置 scheduler。"); loaded_zeroscope_pipe = pipe_zeroscope_local; return loaded_zeroscope_pipe
    except Exception as e_load_zeroscope: print(f"【錯誤】加載 Zeroscope pipeline ({model_id}) 失敗: {e_load_zeroscope}"); traceback.print_exc(); return None

def generate_video_with_zeroscope(prompt_text, num_frames=24, height=320, width=576, num_inference_steps=25, guidance_scale=9.0, seed=None): # (No changes from v9.3.24)
    import torch; from PIL import Image # Ensure these are imported
    global device
    if 'device' not in globals(): device = "cuda" if torch.cuda.is_available() else "cpu"; print(f"generate_video_with_zeroscope: 'device' (在函數內初始化) 設置為: {device}")
    pipe = load_zeroscope_pipeline_once()
    if pipe is None: return "Zeroscope pipeline 未能加載", None
    if not prompt_text or not isinstance(prompt_text, str) or len(prompt_text.strip()) == 0: return "錯誤：缺少有效的視頻生成 Prompt。", None
    actual_seed = seed if seed is not None else torch.Generator(device=pipe.device).seed(); generator = torch.Generator(device=pipe.device).manual_seed(actual_seed)
    print(f"\\n--- Zeroscope 視頻生成中 (目標設備: {pipe.device},實際運行可能因offload變化) ---"); print(f"    Prompt: '{prompt_text[:100]}...'\")"); print(f"    參數: frames={num_frames}, size=({height}x{width}), steps={num_inference_steps}, guidance={guidance_scale}, seed={actual_seed}"); display(HTML("<p><i>AI 動態視頻生成中 (Zeroscope)...</i></p>")) # Corrected print
    video_frames_pil = None
    try:
        with torch.inference_mode():
            video_output = pipe(prompt=prompt_text, num_inference_steps=num_inference_steps, num_frames=num_frames, height=height, width=width, guidance_scale=guidance_scale, generator=generator)
        if hasattr(video_output, 'frames') and isinstance(video_output.frames, (list, np.ndarray)):
            if isinstance(video_output.frames, np.ndarray):
                frames_data = video_output.frames; print(f"    Zeroscope 原始輸出 frames (ndarray) shape: {frames_data.shape}, dtype: {frames_data.dtype}")
                if frames_data.ndim == 5 and frames_data.shape[0] == 1: frames_data = frames_data.squeeze(0)
                if frames_data.ndim == 4 and frames_data.shape[-1] == 3: video_frames_pil = [Image.fromarray((frame * 255).astype(np.uint8) if frame.dtype == np.float32 else frame.astype(np.uint8)) for frame in frames_data]
                elif frames_data.ndim == 4 and frames_data.shape[1] == 3: video_frames_pil = [Image.fromarray((frame.transpose(1,2,0) * 255).astype(np.uint8) if frame.dtype == np.float32 else frame.transpose(1,2,0).astype(np.uint8)) for frame in frames_data]
                else: print(f"    NumPy 數組格式無法直接轉為 PIL 列表: {frames_data.shape}")
            elif isinstance(video_output.frames, list) and all(isinstance(f, PIL.Image.Image) for f in video_output.frames): video_frames_pil = video_output.frames; print(f"    Zeroscope 原始輸出為 PIL 圖像列表 ({len(video_frames_pil)} 幀)。")
            else: print(f"    Zeroscope pipeline 輸出 video_output.frames 格式非預期: {type(video_output.frames)}")
        if video_frames_pil and len(video_frames_pil) > 0: print(f"    【成功】Zeroscope 生成了 {len(video_frames_pil)} 幀 PIL 圖像。"); return None, video_frames_pil # Corrected print
        else: return "未能從 Zeroscope 輸出中提取有效的 PIL 圖像列表。", None
    except Exception as e_zeroscope_gen: print(f"【錯誤】Zeroscope 視頻生成時發生錯誤: {e_zeroscope_gen}"); traceback.print_exc(); return f"Zeroscope 視頻生成錯誤: {e_zeroscope_gen}", None

# === 使用 Gemini 優化視頻生成 Prompt (用於 Zeroscope) ===
def optimize_video_prompt_with_gemini(original_story_text, # (No changes from v9.3.24)
                                    char_info=None, persona_description=None,
                                    target_video_style_hint="cinematic animation, visually rich, detailed, high quality",
                                    model_name_for_prompt="Zeroscope or Stable Video Diffusion style prompt", max_prompt_length_words=60):
    genai_sdk = _ensure_gemini_available_in_func()
    if not genai_sdk: return "錯誤：Gemini SDK 不可用 (optimize_video_prompt)。", None
    global GOOGLE_API_KEY
    if not GOOGLE_API_KEY: return "錯誤：缺少 Google API Key (optimize_video_prompt)。", None
    try: genai_sdk.configure(api_key=GOOGLE_API_KEY)
    except Exception: pass
    if not original_story_text or not original_story_text.strip(): return "錯誤：原始故事文本為空，無法優化 Prompt。", None
    context_parts = []; context_parts.append(f"Original Story (Chinese):\\n{original_story_text}")
    if char_info and char_info.get('character') and char_info.get('character') != '未知': context_parts.append(f"Related Ancient Character: '{char_info.get('character')}' (Meaning: {char_info.get('meaning', 'N/A')})")
    if persona_description: context_parts.append(f"Narrator Persona: {persona_description.split('.')[0]}")
    full_context = "\\n\\n".join(context_parts)
    gemini_prompt_for_optimization = f"""
You are an expert prompt engineer for AI text-to-video generation models like {model_name_for_prompt}.
Your task is to take the following context (original story, character information, narrator persona) and distill its essence into a single, highly descriptive, and visually compelling **English prompt**.
This prompt should be suitable for generating a **short, focused video clip (e.g., 3-7 seconds)** that captures a core visual theme or a key symbolic moment from the story, rather than trying to narrate the entire story.
**Context to use:**
{full_context}
**Instructions for the English Prompt:**
1.  **Language:** The output prompt MUST be in English.
2.  **Identify Core Visual Theme:** From the story, extract ONE central visual theme, a symbolic image, or a pivotal moment. Think about what single, short, animated scene could best represent or evoke the story's feeling or main idea.
3.  **Focus and Detail:** Build the prompt around this *single* core theme. Describe it with rich visual details:
    *   **Subject(s):** What is the main focus? (e.g., "an ancient scroll unfurling," "a stylized representation of the character '{char_info.get('character','it') if char_info else 'it'}' morphing," "a hand carving the character onto a surface").
    *   **Action/Movement:** What is happening? Keep it suitable for a short clip (e.g., "slowly revealing text," "glowing with ethereal light," "particles swirling around it," "gentle zoom"). Avoid complex multi-step actions.
    *   **Environment/Setting:** Where is this happening? (e.g., "on a scholar's desk," "against a backdrop of swirling mists," "in a mystical cave").
    *   **Atmosphere/Mood:** What is the feeling? (e.g., "ancient and mysterious," "serene and contemplative," "powerful and reverent").
    *   **Lighting:** (e.g., "dramatic spotlight," "soft, diffused light," "glowing from within").
    *   **Artistic Style:** (e.g., "{target_video_style_hint}", "ink wash painting style," "bronze texture," "glowing runes effect," "abstract representation"). If the character is mentioned, how can its form or meaning be *visually and abstractly* integrated into the style or scene?
    *   **Camera:** (e.g., "close-up shot," "dynamic camera movement," "slow pan").
4.  **Keywords for Video AI:** Use strong keywords. Examples: "animated sequence," "motion graphics," "visual metaphor," "symbolic animation."
5.  **Conciseness and Length:** The prompt should be detailed but aim for a length of around **{max_prompt_length_words} words (approximately 70-77 tokens for CLIP)**. Prioritize impactful visual descriptions.
6.  **Direct Output:** Output ONLY the final English prompt. No preambles, no explanations, no "Prompt:".
Based on the context, generate the optimized English video prompt. """
    print(f"\\n--- 使用 Gemini 優化視頻 Prompt (新指令) ---"); print(f"    原始上下文摘要 (故事前50字): {original_story_text[:50]}...")
    try:
        model = genai_sdk.GenerativeModel('gemini-1.5-pro-latest')
        response = model.generate_content(gemini_prompt_for_optimization)
        if response.parts:
            optimized_prompt = response.text.strip()
            if optimized_prompt.lower().startswith("prompt:"): optimized_prompt = optimized_prompt[len("prompt:"):].strip()
            optimized_prompt = optimized_prompt.replace("```", "").strip()
            if optimized_prompt.startswith('"') and optimized_prompt.endswith('"'): optimized_prompt = optimized_prompt[1:-1]
            if not optimized_prompt: return "Gemini (optimize_video_prompt) 生成了空的 Prompt。", None
            print(f"    【成功】Gemini 優化後的英文視頻 Prompt:\\n    {optimized_prompt}"); return None, optimized_prompt
        else:
            feedback_msg = "Gemini API (optimize_video_prompt) 返回了空的回應。"
            if hasattr(response, 'prompt_feedback'):
                feedback = response.prompt_feedback;
                if hasattr(feedback, 'block_reason'): feedback_msg += f" 原因：{feedback.block_reason}."
                if hasattr(feedback, 'safety_ratings') and feedback.safety_ratings: feedback_msg += f" 安全評級: {feedback.safety_ratings}."
            return feedback_msg, None
    except Exception as e_gemini_optimize: print(f"【錯誤】調用 Gemini 優化視頻 Prompt 時出錯: {e_gemini_optimize}"); traceback.print_exc(); return f"Gemini 優化視頻 Prompt 錯誤: {e_gemini_optimize}", None

# === 【【【修改此函數的返回值】】】為故事生成多個圖像提示的函數 ===
def generate_multiple_image_prompts_for_story(story_text, num_images_target=3, char_info=None, persona_style_hint=""):
    genai = _ensure_gemini_available_in_func(); # 確保 genai 已被正確引用 (您當前 Cell 3 應該有 _ensure_gemini_available_in_func)
    if not genai:
        print("【錯誤】generate_multiple_prompts: Gemini SDK 不可用。");
        return [{"scene_description_chinese": "錯誤", "image_prompt_english": f"Error: Gemini SDK not available"} for _ in range(num_images_target)]

    # 確保 GOOGLE_API_KEY 是全局可訪問的，或者在函數內部能正確獲取
    # global GOOGLE_API_KEY # 如果 GOOGLE_API_KEY 不是總是在全局，可能需要這一行
    # if 'GOOGLE_API_KEY' not in globals() or not GOOGLE_API_KEY: # 更安全的檢查

    # 以下是從 Demo_version_9_ 的 Cell 3 中複製的函數體
    # （我會稍微調整以確保與您當前 v9.3.25 的 _ensure_gemini_available_in_func 和 API Key 處理方式一致）

    genai_sdk = _ensure_gemini_available_in_func() # 使用您 v9.3.25 的方式
    if not genai_sdk:
        print("【錯誤】generate_multiple_image_prompts_for_story: Gemini SDK 不可用。");
        return [{"scene_description_chinese": "錯誤", "image_prompt_english": f"Error: Gemini SDK not available"} for _ in range(num_images_target)]

    global GOOGLE_API_KEY # 與您 v9.3.25 的其他 Gemini 函數保持一致
    if not GOOGLE_API_KEY:
        print("【錯誤】generate_multiple_image_prompts_for_story: 缺少 Gemini API Key。");
        return [{"scene_description_chinese": "錯誤", "image_prompt_english": "Error: Missing API Key"} for _ in range(num_images_target)]
    try:
        genai_sdk.configure(api_key=GOOGLE_API_KEY)
    except Exception:
        pass # Already configured or error during configure

    prompts_data_list = []
    try:
        model = genai_sdk.GenerativeModel('gemini-1.5-pro-latest') # 與您 v9.3.25 一致
        char_info_str = ""
        if char_info and char_info.get('character') and char_info.get('character') != '未知': # 增加了對 '未知' 的判斷
            char_info_str = f"古文字參考信息：文字「{char_info.get('character','')}」，含義「{char_info.get('meaning','')}」。"

        # Prompt 來自 Demo_version_9_ Cell 3
        scene_division_prompt = f"""我有一段故事文本，請你將其分解成大約 {num_images_target} 個獨立的、適合用圖像來表達的關鍵場景或視覺概念。{char_info_str} 故事的整體風格基調是：{persona_style_hint if persona_style_hint else "富有想象力和藝術感"}。對於每個分解出來的場景/概念，請完成兩件事：1. 提供一個簡潔的【中文場景描述】。2. 基於這個中文場景描述（以及古文字信息，如果相關），生成一個詳細的、引人入勝的【英文圖像生成Prompt】。這個英文Prompt應該適合直接輸入到Stable Diffusion等AI繪畫模型中。請在Prompt中描述主體、動作、環境、氛圍、光照、構圖和建議的藝術風格。故事文本如下：\"{story_text}\" 請嚴格按照以下JSON格式返回一個包含 {num_images_target} 個（或接近數量）場景對象的列表：[ {{\"scene_description_chinese\": \"場景1的中文描述...\", \"image_prompt_english\": \"Detailed English prompt for scene 1...\"}}, {{\"scene_description_chinese\": \"場景2的中文描述...\", \"image_prompt_english\": \"Detailed English prompt for scene 2...\"}} ] 如果無法很好地分解成 {num_images_target} 個，2到4個也可以。如果故事很短，一個場景也可以。"""

        print(f"\n--- 使用 Gemini 為故事生成多個場景描述和圖像 Prompts (目標 {num_images_target} 張) ---")
        print(f"    故事文本 (前50字符): {story_text[:50]}...")
        response = model.generate_content(scene_division_prompt)

        if response.parts:
            response_text = response.text.strip()
            print(f"    Gemini 返回的多場景 Prompts (原始): \n{response_text}")
            try:
                json_response_text = response_text
                if json_response_text.startswith("```json"): json_response_text = json_response_text[7:]
                if json_response_text.endswith("```"): json_response_text = json_response_text[:-3]
                scenes_data = json.loads(json_response_text)

                if isinstance(scenes_data, list):
                    for scene in scenes_data:
                        if isinstance(scene, dict) and "image_prompt_english" in scene and scene["image_prompt_english"].strip():
                            prompts_data_list.append({
                                "scene_description_chinese": scene.get('scene_description_chinese', 'N/A'),
                                "image_prompt_english": scene["image_prompt_english"]
                            })
                            print(f"      - 場景描述 (中): {scene.get('scene_description_chinese', 'N/A')}")
                            print(f"      - 提取的圖像 Prompt (英): {scene['image_prompt_english']}")
                    if not prompts_data_list: # If list is empty after processing
                        print("    【警告】未能從 Gemini 返回中解析出有效的圖像 Prompts 列表（可能是空的有效JSON列表）。使用後備。")
                        prompts_data_list = [{"scene_description_chinese": "通用場景", "image_prompt_english": f"A creative depiction related to the story: {story_text[:50]}..."}]
                else: # Not a list
                    print(f"    【警告】Gemini 返回的不是預期的列表格式 (而是 {type(scenes_data)}）。使用後備。")
                    prompts_data_list = [{"scene_description_chinese": "通用場景", "image_prompt_english": f"A fantasy scene about the story: {story_text[:50]}..."}]
            except json.JSONDecodeError as e_json:
                print(f"    【警告】解析 Gemini 返回的多場景 Prompts JSON 失敗: {e_json}。使用後備。")
                prompts_data_list = [{"scene_description_chinese": "通用場景", "image_prompt_english": f"An artistic representation of the story: {story_text[:70]}..."}]
        else: # No parts in response
            feedback_msg = "    Gemini (多場景 Prompts) 返回空回應。"
            if hasattr(response, 'prompt_feedback') and response.prompt_feedback and hasattr(response.prompt_feedback, 'block_reason') and response.prompt_feedback.block_reason:
                 feedback_msg += f" 原因: {response.prompt_feedback.block_reason}"
            print(feedback_msg + " 使用後備。")
            prompts_data_list = [{"scene_description_chinese": "通用場景", "image_prompt_english": f"Image related to the story: {story_text[:50]}..."}]

    except Exception as e_gen_multi_prompt:
        print(f"    生成多圖像 Prompts 時出錯: {e_gen_multi_prompt}")
        traceback.print_exc()
        prompts_data_list = [{"scene_description_chinese": "錯誤後備", "image_prompt_english": f"Default prompt due to error: {story_text[:50]}..."}]

    # Ensure we return *something*, even if it's just one fallback prompt if everything failed.
    if not prompts_data_list:
        prompts_data_list = [{"scene_description_chinese": "最終後備圖像", "image_prompt_english": "A general artistic image representing the story's theme."}]

    # 控制返回數量 (與 Demo_version_9 一致)
    if len(prompts_data_list) > num_images_target:
        prompts_data_list = prompts_data_list[:num_images_target]
    elif 0 < len(prompts_data_list) < num_images_target: # If not enough, use the last one to fill
        last_prompt_data = prompts_data_list[-1].copy() # Use a copy
        prompts_data_list.extend([last_prompt_data] * (num_images_target - len(prompts_data_list)))
    elif not prompts_data_list and num_images_target > 0: # If list is empty and target > 0
        prompts_data_list = [{"scene_description_chinese": f"後備場景 {i+1}", "image_prompt_english": "A fallback artistic image."} for i in range(num_images_target)]

    return prompts_data_list

# === 合併多個視頻片段並配上一個完整音頻的函數 ===
def combine_video_clips_and_set_audio_moviepy(video_file_paths_list, # (No changes from v9.3.24)
                                            full_audio_path, output_video_path, target_fps=24):
    global MOVIEPY_AVAILABLE
    if not MOVIEPY_AVAILABLE:
        try: import moviepy.editor; MOVIEPY_AVAILABLE = True
        except ImportError: MOVIEPY_AVAILABLE = False
    if not MOVIEPY_AVAILABLE: print("【錯誤】MoviePy 不可用，無法合併多片段視頻。"); return "MoviePy不可用", None
    if not video_file_paths_list: return "錯誤：視頻文件路徑列表為空。", None
    if not os.path.exists(full_audio_path): return f"完整音頻文件未找到: {full_audio_path}", None
    existing_video_paths = [p for p in video_file_paths_list if os.path.exists(p)]
    if not existing_video_paths: return "錯誤：所有提供的視頻片段文件均未找到。", None
    if len(existing_video_paths) < len(video_file_paths_list): print(f"【警告】部分視頻片段文件未找到。將使用 {len(existing_video_paths)} 個有效片段。")
    print(f"\\n--- MoviePy 拼接 {len(existing_video_paths)} 個視頻片段並配音 ---"); print(f"    完整音頻: {full_audio_path}"); print(f"    輸出路徑: {output_video_path}")
    all_video_clips_objects = []; stitched_video_track_internal = None; final_stitched_video_with_correct_duration = None; full_audio_clip_object = None; final_video_output_clip_obj = None; final_video_segments_for_composite = []
    try:
        from moviepy.editor import VideoFileClip, AudioFileClip, concatenate_videoclips, CompositeVideoClip; from moviepy.video.fx.all import resize # Ensure resize is imported
        full_audio_clip_object = AudioFileClip(full_audio_path); total_audio_duration = full_audio_clip_object.duration; print(f"    完整音頻總時長: {total_audio_duration:.2f} 秒")
        if total_audio_duration <=0: return "錯誤：完整音頻時長為0或無效。", None
        common_width, common_height = None, None
        for i, vid_path in enumerate(existing_video_paths):
            try:
                clip = VideoFileClip(vid_path, fps_source="fps")
                if common_width is None: common_width, common_height = clip.size; print(f"    使用視頻片段的通用尺寸: {common_width}x{common_height}")
                if clip.size != (common_width, common_height) and common_width is not None: print(f"    片段 {i+1} 尺寸 {clip.size} 與通用尺寸不符，將調整大小..."); clip = resize(clip, newsize=(common_width, common_height))
                all_video_clips_objects.append(clip); print(f"    已加載視頻片段 {i+1}: {vid_path}, 時長: {clip.duration:.2f}s, 尺寸: {clip.size}")
            except Exception as e_load_vclip: print(f"    【警告】加載視頻片段 '{vid_path}' 失敗: {e_load_vclip}，跳過此片段。")
        if not all_video_clips_objects: return "未能成功加載任何有效的視頻片段對象。", None
        current_video_time = 0; video_idx_to_loop = 0
        while current_video_time < total_audio_duration:
            if not all_video_clips_objects: break
            clip_to_add_raw = all_video_clips_objects[video_idx_to_loop % len(all_video_clips_objects)]; clip_to_add = clip_to_add_raw.copy()
            segment_start_time = current_video_time; segment_duration = clip_to_add.duration
            if segment_start_time + segment_duration > total_audio_duration: segment_duration = total_audio_duration - segment_start_time
            if segment_duration <= 0: break
            clip_to_add = clip_to_add.subclip(0, segment_duration).set_start(segment_start_time); final_video_segments_for_composite.append(clip_to_add)
            current_video_time += segment_duration; video_idx_to_loop += 1
        if not final_video_segments_for_composite: return "未能準備任何視頻片段進行最終合成（可能是音頻時長問題）。", None
        final_size = (common_width, common_height) if common_width and common_height else all_video_clips_objects[0].size
        final_stitched_video_with_correct_duration = CompositeVideoClip(final_video_segments_for_composite, size=final_size).set_duration(total_audio_duration)
        print(f"    視頻軌道已通過循環/裁剪調整至目標時長: {final_stitched_video_with_correct_duration.duration:.2f}s")
        final_video_output_clip_obj = final_stitched_video_with_correct_duration.set_audio(full_audio_clip_object)
        print(f"    最終視頻合成中 (目標FPS: {target_fps})..."); os.makedirs(os.path.dirname(output_video_path), exist_ok=True)
        final_video_output_clip_obj.write_videofile(output_video_path, fps=target_fps, codec="libx264", audio_codec="aac", temp_audiofile=f'temp-audio-multiclip-{int(time.time())}.m4a', remove_temp=True, threads=os.cpu_count() or 2, preset="medium", logger='bar')
        print(f"    【成功】多片段拼接並配音的視頻已保存到: {output_video_path}"); return None, output_video_path # Corrected print
    except Exception as e_combine_multi: print(f"【錯誤】使用 MoviePy 拼接多片段視頻並配音時發生錯誤: {e_combine_multi}"); traceback.print_exc(); return f"MoviePy多片段合併錯誤: {e_combine_multi}", None
    # ... (try 和 except 塊) ...
    finally: # Extensive cleanup
        if full_audio_clip_object:
            try: full_audio_clip_object.close()
            except: pass

        # all_video_clips_objects contains original clips from files, close them
        if 'all_video_clips_objects' in locals() and all_video_clips_objects: # Check if list exists and is not empty
            for clip_obj in all_video_clips_objects:
                try:
                    clip_obj.close()
                except:
                    pass # Ignore errors during cleanup closing

        # final_video_segments_for_composite contains copies or subclips, close them
        if 'final_video_segments_for_composite' in locals() and final_video_segments_for_composite: # Check if list exists
            for seg_clip in final_video_segments_for_composite:
                try:
                    seg_clip.close()
                except:
                    pass # Ignore errors during cleanup closing

        # stitched_video_track_internal is not used in this revised logic, but good to keep conditional close
        if 'stitched_video_track_internal' in locals() and stitched_video_track_internal :
            try:
                stitched_video_track_internal.close()
            except:
                pass

        # Check final_stitched_video_with_correct_duration before closing
        if 'final_stitched_video_with_correct_duration' in locals() and \
           final_stitched_video_with_correct_duration and \
           (not ('stitched_video_track_internal' in locals()) or \
            final_stitched_video_with_correct_duration != locals().get('stitched_video_track_internal')):
            try:
                final_stitched_video_with_correct_duration.close()
            except:
                pass

        if 'final_video_output_clip_obj' in locals() and final_video_output_clip_obj:
            try:
                final_video_output_clip_obj.close()
            except:
                pass
# Cell 3 - 新增 generate_dialogue_story_with_personas

# === 【新增】生成【多角色對話式故事】文本函數 ===
def generate_dialogue_story_with_personas(char_info,
                                          persona_A_description,
                                          persona_B_description,
                                          speaker_A_display_name,
                                          speaker_B_display_name,
                                          target_lang="中文",
                                          num_dialogue_turns=3, # 故事可能需要更多輪次
                                          story_theme_hint=""): # 可以給一些故事主題提示
    genai_sdk = _ensure_gemini_available_in_func()
    if not genai_sdk:
        return "錯誤：Gemini SDK 不可用 (dialogue story generation)。", None, None, None
    global GOOGLE_API_KEY
    # ... (API Key 和參數檢查，與 generate_dialogue_joke_with_personas 類似) ...
    if not char_info or char_info.get('character', '未知') in ['未知', '提取失敗']: return "錯誤：缺少古文字信息。", None, None, None
    if not persona_A_description or not persona_B_description: return "錯誤：缺少人設描述。", None, None, None
    if not speaker_A_display_name or not speaker_B_display_name: return "錯誤：缺少角色名。", None, None, None

    character = char_info.get('character')
    meaning = char_info.get('meaning', '不詳')
    char_type = char_info.get('type', '古文字')

    speaker_A_script_tag = speaker_A_display_name.split(' ')[0]
    speaker_B_script_tag = speaker_B_display_name.split(' ')[0]
    if target_lang == 'en':
        name_map_en = { "川普": "Trump", "歐巴馬": "Obama", "minecraft村民": "Villager" }
        speaker_A_script_tag = name_map_en.get(speaker_A_display_name, speaker_A_script_tag)
        speaker_B_script_tag = name_map_en.get(speaker_B_display_name, speaker_B_script_tag)

    lang_for_prompt = "英文" if target_lang == "en" else "中文"
    output_instruction_dialogue = f"請直接輸出【{lang_for_prompt}】的對話故事腳本。"
    if target_lang == 'en':
        output_instruction_dialogue = "Please directly output the dialogue story script in ENGLISH."

    prompt = f"""
你是一位富有想象力的故事編劇。請圍繞以下古文字信息，創作一個簡短的、包含兩個角色之間多輪對話的【{lang_for_prompt}故事腳本】。
故事應該有一個簡單的開頭、發展和結局（或者一個懸念）。
{story_theme_hint}

古文字信息 (Ancient Character Information):
- 文字 (Character): {character}
- 類型 (Type): {char_type}
- 基本含義 (Basic Meaning): {meaning}

對話角色A:
- 在腳本中請使用此標籤 (Use this tag in script): {speaker_A_script_tag}
- 人設 (Persona): {persona_A_description}

對話角色B:
- 在腳本中請使用此標籤 (Use this tag in script): {speaker_B_script_tag}
- 人設 (Persona): {persona_B_description}

任務要求:
1. 故事內容應巧妙地與上述古文字相關，可以圍繞其起源、發現、意義或引發的事件展開。
2. 對話應在角色A ({speaker_A_script_tag}) 和角色B ({speaker_B_script_tag}) 之間進行。
3. 每個角色大約說 {num_dialogue_turns} 輪話。
4. 確保對話符合各自的人設風格，並共同推動故事情節發展。
5. 請嚴格按照以下格式輸出故事腳本（每行一個角色的話，並以【角色腳本標籤名】和冒號開頭）：
   {speaker_A_script_tag}: [角色A說的話]
   {speaker_B_script_tag}: [角色B說的話]
   ...
6. {output_instruction_dialogue} 不要包含任何額外的開場白、解釋、標題、Markdown標記或結尾語。
"""
    print(f"\\n--- 使用 Gemini 為 '{character}' 生成對話式【故事】 ({speaker_A_script_tag} vs {speaker_B_script_tag}, 語言: {lang_for_prompt}) ---")
    try:
        model = genai_sdk.GenerativeModel('gemini-1.5-pro-latest')
        response = model.generate_content(prompt)
        if response.parts:
            dialogue_script_text = response.text.strip()
            # (清理邏輯與 generate_dialogue_joke_with_personas 相同)
            prefixes_to_remove_patterns = [
                r"^\s*好的，這是一個.*腳本：\s*", r"^\s*這是一個由.*?演繹的.*：\s*",
                r"^\s*腳本：\s*", r"^\s*故事腳本：\s*", r"^\s*Script:\s*", r"^\s*Story Script:\s*",
                 r"^\s*Okay, here's the dialogue story script:\s*"
            ] # 可以繼續添加
            for pat in prefixes_to_remove_patterns:
                dialogue_script_text = re.sub(pat, "", dialogue_script_text, flags=re.IGNORECASE | re.MULTILINE).strip()
            dialogue_script_text = dialogue_script_text.replace("**", "")

            # 行級清理 (與 joke 版本相同)
            cleaned_lines = []
            potential_dialogue_lines = dialogue_script_text.splitlines()
            pattern_for_parsing = re.compile(rf"^\s*(?:{re.escape(speaker_A_script_tag)}|{re.escape(speaker_B_script_tag)})\s*[:：].*", re.IGNORECASE)
            for line in potential_dialogue_lines:
                if pattern_for_parsing.match(line.strip()): cleaned_lines.append(line)

            dialogue_script_text_original_cleaned = dialogue_script_text
            if cleaned_lines:
                dialogue_script_text_after_aggressive_clean = "\\n".join(cleaned_lines).strip()
                if not dialogue_script_text_after_aggressive_clean and dialogue_script_text_original_cleaned:
                    print("    【警告】對話故事腳本積極清理後結果為空，將回退。")
                elif len(dialogue_script_text_after_aggressive_clean) < len(dialogue_script_text_original_cleaned) * 0.5 and len(potential_dialogue_lines) > 3 and len(dialogue_script_text_after_aggressive_clean) > 0 :
                    print("    【警告】對話故事腳本積極清理後文本大幅縮短。"); dialogue_script_text = dialogue_script_text_after_aggressive_clean
                elif dialogue_script_text_after_aggressive_clean: dialogue_script_text = dialogue_script_text_after_aggressive_clean
            elif not dialogue_script_text_original_cleaned : dialogue_script_text = ""

            print(f"    Gemini 生成的對話式【故事】腳本 (已處理): \\n{dialogue_script_text}")
            return None, dialogue_script_text, speaker_A_script_tag, speaker_B_script_tag
        else:
            # ... (錯誤處理與 joke 版本相同)
            return "Gemini API (對話故事生成) 返回空回應。", None, None, None
    except Exception as e_gemini_dialogue_story:
        # ... (錯誤處理與 joke 版本相同)
        return f"調用 Gemini 生成對話式【故事】時出錯: {e_gemini_dialogue_story}", None, None, None

print("Cell 3 所有函數定義（或嘗試定義）完成 (v9.3.25)。")
if 'get_package_version' in globals() and callable(get_package_version): # Ensure get_package_version is callable
    if 'DIFFUSERS_AVAILABLE' in globals() and DIFFUSERS_AVAILABLE: print(f"Diffusers 可用狀態: True (版本: {get_package_version('diffusers')})")
    else: print("Diffusers 可用狀態: False")
    if 'GEMINI_AVAILABLE' in globals() and GEMINI_AVAILABLE: print(f"Gemini 可用狀態: True (版本: {get_package_version('google-generativeai')})")
    else: print("Gemini 可用狀態: False")
    if 'MOVIEPY_AVAILABLE' in globals() and MOVIEPY_AVAILABLE: print(f"MoviePy 可用狀態: True (版本: {get_package_version('moviepy')})")
    else: print("MoviePy 可用狀態: False")
else:
    print("get_package_version 函數未定義或不可調用，無法打印部分庫版本。")

if shutil.which("f5-tts_infer-cli"): print("F5-TTS CLI 工具已找到。")
else: print("【警告】F5-TTS CLI 工具未在 PATH 中找到。F5-TTS 功能將無法使用。")
print("--- Cell 3 執行完畢 (v9.3.25) ---")

## Cell 4: 初始化全局狀態變數

**技術與模型概覽：**

*   此 Cell 主要進行 Python 變量的初始化，不直接涉及特定的 AI 模型或複雜技術。
*   可能會檢查 `torch.cuda.is_available()` 以再次確認 `device` 變量。

**功能介紹：**

此 Cell 的主要功能是初始化或重置在多個處理流程（尤其是在 Cell 5、6、7 的循環或多次運行之間）中可能需要共享或傳遞的全局狀態變量。這些變量可能包括：
1.  `current_char_info`: 存儲當前處理的古文字辨識結果。
2.  `last_recognition_text`: 存儲上一次 Gemini 辨識返回的原始文本。
3.  `uploaded_image_pil`, `image_filename`, `image_data`: 存儲上傳的圖像相關信息。
4.  `loaded_pipeline`, `loaded_pipeline_name`: 再次確認或重置 Diffusers 模型緩存變量（儘管主要初始化在 Cell 2，這裡可能是為了循環處理的重置）。
5.  再次確認全局 `device` 變量的設置。

這個 Cell 的目的是為接下來的主處理循環（如 Cell 5）準備一個乾淨的初始狀態，尤其是在用戶可能多次運行主循環處理不同圖片時。在某些高度集成的 Notebook 中，此 Cell 的功能可能合併到主處理 Cell 的循環開始部分。

In [None]:
# Cell 4: 初始化狀態變數 (v9.2 - 移除本地數據集路徑)

import os
try:
    import torch
except ImportError:
    print("警告: Cell 4 無法導入 torch，可能影響 device 初始化。")

print("--- 正在初始化狀態變數 (F5-TTS 流程) ---")
current_char_info = None
last_recognition_text = None
uploaded_image_pil = None
image_filename = None
image_data = None

if 'loaded_pipeline' not in globals(): loaded_pipeline = None
if 'loaded_pipeline_name' not in globals(): loaded_pipeline_name = None

if 'device' not in globals():
    try:
        device = "cuda" if torch.cuda.is_available() else "cpu"
    except:
        device = "cpu"; print("警告: 無法確定 GPU 狀態，device 默認為 cpu。")
print(f"Cell 4 確認將使用的計算設備: {device}")

# 【移除】關於 tts_dataset_path 和 dataset_source_path 的部分，因為我們將從 Hugging Face dataset 獲取參考
# print(f"F5-TTS 參考音頻數據集路徑: {tts_dataset_path}")

print("\n狀態變數配置完成。 Hugging Face 數據集應在 Cell 2 中加載。")

## Cell 5: 主處理循環 - 古文字探索與多媒體創作

**技術與模型概覽：**

此 Cell 是項目的核心交互和處理中樞，它編排並調用 Cell 3 中定義的各種功能，實現一個完整的多媒體內容生成流程。涉及的技術與 Cell 3 中列出的幾乎相同，但此處是它們的實際應用和集成：
*   **用戶交互：** Python `input()` 函數獲取用戶選擇。
*   **流程控制：** `while True` 循環實現可重複的會话，`if/elif/else` 進行條件判斷和流程分支。
*   **語言選擇：** 支持用戶選擇操作語言（中文/英文）。
*   **聲音風格選擇：** `VOICE_OPTIONS` 字典驅動的聲音選擇機制。
*   **核心功能調用：** 大量調用 Cell 3 中的函數，包括：
    *   `upload_image`, `recognize_ancient_char_with_gemini`, `parse_recognition_result`
    *   `generate_story_with_persona` (用於描述), `generate_character_joke_with_persona` (單角色笑話), `generate_dialogue_joke_with_personas` (多角色笑話) - 均支持多語言。
    *   `evaluate_and_correct_text_with_gemini` (可選的文本優化)。
    *   `clone_voice_f5tts` (F5-TTS), `generate_and_play_speech_gtts` (gTTS) - 支持多語言和F5回退。
    *   `parse_dialogue_script` (多角色腳本解析)。
    *   `concatenate_audio_clips_moviepy` (拼接多角色語音)。
    *   `generate_final_image_prompt` (為描述和笑話生成圖像Prompt)。
    *   `generate_image_with_diffusers` (生成靜態圖像)。
    *   `create_video_from_images_and_audio` (合成靜態圖視頻)。
*   **文件系統操作：** 創建輸出文件夾，保存生成的音頻、圖像、視頻文件。
*   **顯存管理：** 在調用大型模型前後嘗試清理 CUDA 緩存。

**功能介紹：**

Cell 5 提供了一個交互式的端到端流程，引導用戶完成從古文字圖片輸入到多媒體內容輸出的完整體驗。其主要步驟和功能包括：
1.  **語言選擇：** 允許用戶選擇本輪操作的語言（中文或英文），影響後續的文本內容生成和語音合成。
2.  **主要聲音風格選擇：** 用戶選擇一個主要的角色聲音（例如“川普”、“歐巴馬”、“村民”或 gTTS），該聲音將用於“描述”流程和“單角色笑話”流程，以及“多角色笑話”中的第一個角色。
3.  **圖像上傳與古文字辨識：** 用戶上傳一張包含古文字的圖片，系統調用 Gemini API 進行辨識，並解析出文字、類型、拼音、含義等信息。
4.  **“描述”流程：**
    *   基於辨識結果，使用 Gemini 生成一段關於該古文字的描述文本（中文或英文）。
    *   使用選定的主要角色聲音為描述文本合成語音（F5-TTS 或 gTTS，支持中英文）。
    *   使用 Gemini 為描述內容生成一個相關的圖像 Prompt（英文）。
    *   使用 Stable Diffusion 生成與描述內容相關的靜態圖像。
    *   **(新增功能)** 將描述音頻和描述圖像合成為一個短視頻。
5.  **“單角色笑話”流程 (可選)：**
    *   詢問用戶是否想聽一個由主要角色講述的、關於此古文字的笑話。
    *   如果用戶選擇是，使用 Gemini 生成一個符合主要角色人設的單角色笑話文本（中文或英文）。
    *   使用主要角色聲音為笑話文本合成語音。
    *   為笑話內容生成一張相關的靜態圖像。
    *   **(新增功能)** 將笑話音頻和笑話圖像合成為一個短視頻。
6.  **“多角色對話式笑話”流程 (可選)：**
    *   詢問用戶是否想聽一個多角色對話式笑話。
    *   如果用戶選擇是，引導用戶選擇第二個對話角色。
    *   使用 Gemini 生成一個包含兩個選定角色（主要角色 vs 第二角色）之間、圍繞古文字的多輪對話式笑話腳本（中文或英文）。
    *   解析此腳本，為每一輪對話的每一句分別生成對應角色的 TTS 語音（支持中英文，F5-TTS 或 gTTS）。
    *   將這些單獨的對話語音片段拼接成一個完整的對話笑話音軌。
    *   為這個完整的對話式笑話內容（或其摘要）生成一張相關的靜態圖像。
    *   **(新增功能)** 將完整的對話笑話音軌和對話笑話圖像合成為一個短視頻。
7.  **循環與退出：** 在一輪流程結束後，詢問用戶是否開始新的會話或退出程序。

此 Cell 的目標是將項目的各個核心 AI 功能串聯起來，提供一個連貫且富有創造性的用戶體驗，最終產出包含語音和圖像（或視頻）的多媒體內容。

In [None]:
# Cell 5: 主處理循環 (v9.3.43 - 解決多角色解析與圖像/視頻生成鏈路問題)
import os
import IPython
from IPython.display import display, Markdown, HTML, Audio
import shutil
import numpy as np
import soundfile as sf
import PIL.Image
import io
import json
import time
import traceback
import re

if 'device' not in globals():
    try:
        if 'torch' not in globals(): import torch
        device = "cuda" if torch.cuda.is_available() else "cpu"
    except: device = "cpu"
    print(f"Cell 5: 'device' (重新)設置為: {device}")

DRIVE_AUDIO_BASE_PATH = "/content/drive/MyDrive/Voice data/"
VOICE_OPTIONS = {
    "1": {
        "name": "川普",
        "persona_description": "你現在扮演唐納德·特朗普，說話風格誇張、自信，常用標誌性口頭禪，並時不時強調自己的偉大和美國的偉大。",
        "type": "f5tts",
        "zh": { "ref_audio_path": os.path.join(DRIVE_AUDIO_BASE_PATH, "Donald Trump chinese.wav"), "ref_text": "大家好，我是小明。今天的天氣真好，陽光明媚，適合出門散步。希望你們也有個愉快的一天！" },
        "en": { "ref_audio_path": os.path.join(DRIVE_AUDIO_BASE_PATH, "Donald Trump english.wav"), "ref_text": "Technology is changing the way we live and communicate. Every day, we are surrounded by innovations that make our lives easier and more connected. This is only the beginning of what’s possible." } # 使用您已驗證的英文參考
    },
    "2": {
        "name": "minecraft 村民",
        "persona_description": "嗯？嗯！你現在扮演一個《我的世界》遊戲中的村民，說話時會發出'嗯？'、'嗯！'的聲音，語氣好奇、友善，有時帶點呆萌和重複。",
        "type": "f5tts",
        "zh": { "ref_audio_path": os.path.join(DRIVE_AUDIO_BASE_PATH, "Minecraft Villager chinese.wav"), "ref_text": "大家好，我是小明。今天的天氣真好，陽光明媚，適合出門散步。希望你們也有個愉快的一天！" },
        "en": { "ref_audio_path": os.path.join(DRIVE_AUDIO_BASE_PATH, "Minecraft Villager english.wav"), "ref_text": "Technology is changing the way we live and communicate. Every day, we are surrounded by innovations that make our lives easier and more connected. This is only the beginning of what’s possible." } # 【請替換為您的村民英文參考】
    },
    "3": {
        "name": "歐巴馬",
        "persona_description": "你現在扮演貝拉克·奧巴馬，說話風格沉穩、富有邏輯和感染力，常用排比和富有哲理的句子，語氣溫和且堅定。",
        "type": "f5tts",
        "zh": { "ref_audio_path": os.path.join(DRIVE_AUDIO_BASE_PATH, "Barack Obama chinese.wav"), "ref_text": "大家好，我是小明。今天的天氣真好，陽光明媚，適合出門散步。希望你們也有個愉快的一天！" },
        "en": { "ref_audio_path": os.path.join(DRIVE_AUDIO_BASE_PATH, "Barack Obama english.wav"), "ref_text": "Technology is changing the way we live and communicate. Every day, we are surrounded by innovations that make our lives easier and more connected. This is only the beginning of what’s possible." } # 【請替換為您的歐巴馬英文參考】
    },
    "4": { "name": "gTTS (預設音色)", "persona_description": "你是一個友好、知識淵博的AI助手...", "type": "gtts" }
}
CURRENT_LANGUAGE = "zh"

while True:
    print("\\n" + "="*70); print("      ✨ 古文字探索、多角色故事/笑話、語音合成與圖像創作 ✨"); print("="*70)
    print("\\n--- 步驟 -1: 請選擇本輪操作語言 ---")
    prev_lang_display = '中文' if CURRENT_LANGUAGE == 'zh' else 'English'
    lang_choice_input = input(f"  請選擇語言 (1: 中文, 2: English, Enter 使用 '{prev_lang_display}'): ").strip()
    if lang_choice_input == "1": CURRENT_LANGUAGE = "zh"
    elif lang_choice_input == "2": CURRENT_LANGUAGE = "en"

    current_lang_display = '中文' if CURRENT_LANGUAGE == 'zh' else 'English'
    print(f"--- 當前操作語言: {current_lang_display} ---")
    target_gtts_lang_param = 'zh-cn' if CURRENT_LANGUAGE == 'zh' else 'en'

    print("\\n--- 步驟 0: 請選擇本輪【主要講述者】的聲音風格 ---")
    for key, option in VOICE_OPTIONS.items(): print(f"  {key}: {option['name']}")
    print("  q: 退出程序");
    voice_choice_main_key = input("請輸入選項數字 (例如 1) 或 'q' 退出: ").strip().lower()

    if voice_choice_main_key == 'q': print("好的，退出程序。感謝使用！"); break
    if voice_choice_main_key not in VOICE_OPTIONS:
        voice_choice_main_key = next(iter(VOICE_OPTIONS))
        print(f"無效選擇，使用默認聲音: {VOICE_OPTIONS[voice_choice_main_key]['name']}.")

    selected_voice_config_main = VOICE_OPTIONS[voice_choice_main_key]
    print(f"\\n您已選擇主要講述者: {selected_voice_config_main['name']}")
    current_persona_main = selected_voice_config_main.get("persona_description")

    use_f5_tts_main = (selected_voice_config_main.get("type") == "f5tts")
    ref_audio_path_main, ref_text_main, f5_tts_main_ready = None, None, False
    if use_f5_tts_main:
        lang_config_main = selected_voice_config_main.get(CURRENT_LANGUAGE)
        if lang_config_main and lang_config_main.get("ref_audio_path"):
            ref_audio_path_main = lang_config_main.get("ref_audio_path")
            ref_text_main = lang_config_main.get("ref_text", "")
            if os.path.exists(ref_audio_path_main):
                try:
                    _, sr = sf.read(ref_audio_path_main)
                    print(f"  - 主要角色 F5-TTS ({current_lang_display}) 參考: '{os.path.basename(ref_audio_path_main)}' (SR: {sr}Hz)")
                    f5_tts_main_ready = True
                except Exception as e: print(f"  【錯誤】讀取主要角色 F5-TTS ({current_lang_display}) 參考音頻失敗: {e}")
            else: print(f"  【錯誤】主要角色 F5-TTS ({current_lang_display}) 參考音頻未找到: {ref_audio_path_main}")
        else: print(f"  【警告】主要角色 '{selected_voice_config_main['name']}' 未配置 {current_lang_display} F5-TTS 參考。")

    if use_f5_tts_main and not f5_tts_main_ready:
        use_f5_tts_main = False
        print(f"    主要角色 '{selected_voice_config_main['name']}' 的 F5-TTS ({current_lang_display}) 未就緒，將使用 gTTS。")

    # --- 步驟 1: 上傳與辨識 ---
    # ... (與您上次成功運行的版本一致，假設 `current_char_info_loop` 被賦值)
    print("\\n--- 步驟 1: 上傳並辨識古文字圖片 ---");
    image_to_process_for_gemini = None; image_filename_loop = None; current_char_info_loop = None
    try:
        if 'upload_image' not in globals(): raise NameError("函數 'upload_image' 未定義")
        image_filename_loop, image_data_loop = upload_image()
        if image_data_loop:
            temp_pil_image = PIL.Image.open(io.BytesIO(image_data_loop))
            if getattr(temp_pil_image, 'format', None) == "GIF":
                try: temp_pil_image.seek(0); frame_image = temp_pil_image.copy(); image_to_process_for_gemini = frame_image.convert("RGB")
                except Exception: image_to_process_for_gemini = None
            elif temp_pil_image.mode not in ['RGB', 'L', 'RGBA']: image_to_process_for_gemini = temp_pil_image.convert("RGB")
            else: image_to_process_for_gemini = temp_pil_image
            if image_to_process_for_gemini:
                display(image_to_process_for_gemini.resize((200, int(200 * image_to_process_for_gemini.height / image_to_process_for_gemini.width))) if image_to_process_for_gemini.width > 0 else image_to_process_for_gemini)
                print("\\n--- 正在進行古文字辨識 (Gemini) ---")
                rec_err, recognition_text_loop = recognize_ancient_char_with_gemini(image_to_process_for_gemini, image_filename_loop or "uploaded_image")
                if not rec_err and recognition_text_loop:
                    display(Markdown(recognition_text_loop))
                    current_char_info_loop = parse_recognition_result(recognition_text_loop)
                    print("\\n--- 已解析文字資訊 ---"); print(json.dumps(current_char_info_loop, indent=2, ensure_ascii=False))
                else: current_char_info_loop = None; print(f"辨識失敗: {rec_err or recognition_text_loop}")
            else: print("圖像預處理失敗"); continue
        else: print("未上傳圖片"); continue
    except Exception as e: print(f"圖像處理或辨識出錯: {e}"); traceback.print_exc(); continue

    # --- 變量初始化 ---
    description_text_final, desc_audio_path, generated_desc_image_path = None, None, None
    single_joke_text_final, single_joke_audio_path, generated_single_joke_image_path = None, None, None
    dialogue_joke_script_raw, dialogue_turns_parsed, full_dialogue_joke_audio_path, generated_dialogue_joke_image_path = None, [], None, None
    script_tag_A_dialogue, script_tag_B_dialogue = None, None # For dialogue script parsing

    output_folder_suffix = selected_voice_config_main['name'].split(' ')[0].lower().replace('(','').replace(')','').replace('/','')
    char_filename_base = current_char_info_loop.get('character','char').replace(' ','_').replace('*', '_').replace('/','_').replace('?','_') if isinstance(current_char_info_loop, dict) else "unknown"

    # --- 步驟 2: 描述流程 ---
    if isinstance(current_char_info_loop, dict) and current_char_info_loop.get('character', '未知') not in ['未知', '提取失敗', None]:
        print(f"\\n--- 步驟 2.1: LLM 生成【描述文本】 ({current_lang_display}) ---")
        desc_persona = "你是一位知識淵博的古文字解說員。" if CURRENT_LANGUAGE == 'zh' else "You are a knowledgeable commentator on ancient characters."
        desc_len_hint = "1-2句話" if CURRENT_LANGUAGE == 'zh' else "1-2 sentences"
        err_desc_gen, raw_desc_text = generate_story_with_persona(current_char_info_loop, desc_persona, target_lang=CURRENT_LANGUAGE, story_length_hint=desc_len_hint)
        if err_desc_gen or not (raw_desc_text and raw_desc_text.strip()):
            description_text_final = "無法生成描述。" if CURRENT_LANGUAGE == 'zh' else "Failed to generate description."
        else: description_text_final = raw_desc_text
        print(f"    原始描述 ({current_lang_display}): '{description_text_final}'")

        if description_text_final and description_text_final.strip() != ("無法生成描述。" if CURRENT_LANGUAGE == 'zh' else "Failed to generate description."):
            print(f"\\n--- 步驟 2.2: 為【描述文本】生成語音 ({selected_voice_config_main['name']}) ---")
            audio_out_dir = f"/content/generated_audio/{output_folder_suffix}/description_{CURRENT_LANGUAGE}"
            audio_out_fn = f"desc_{char_filename_base}_{int(time.time())}.wav"
            if use_f5_tts_main and f5_tts_main_ready:
                err_tts, desc_audio_path = clone_voice_f5tts(ref_audio_path_main, description_text_final, ref_text_main, audio_out_fn, audio_out_dir)
                if err_tts: print(f"    F5-TTS (描述) 錯誤: {err_tts}")
            else:
                print(f"    {'回退到' if use_f5_tts_main else ''} gTTS ({target_gtts_lang_param}) 為描述合成...")
                if 'generate_and_play_speech_gtts' in globals():
                    os.makedirs(audio_out_dir, exist_ok=True)
                    err_tts, desc_audio_path = generate_and_play_speech_gtts(description_text_final, lang=target_gtts_lang_param, filename=os.path.join(audio_out_dir, audio_out_fn))
                    if err_tts: print(f"    gTTS (描述) 錯誤: {err_tts}")
            if desc_audio_path and os.path.exists(desc_audio_path): print(f"    【成功】描述語音: {desc_audio_path}"); display(Audio(desc_audio_path))
            else: print("    描述語音生成失敗。")

        print(f"\\n--- 步驟 2.3: 為【描述文本】生成相關圖像 ---")
        if description_text_final and description_text_final.strip() != ("無法生成描述。" if CURRENT_LANGUAGE == 'zh' else "Failed to generate description.") and \
           'generate_final_image_prompt' in globals() and 'generate_image_with_diffusers' in globals() and DIFFUSERS_AVAILABLE:
            # ... (圖像 Prompt 和生成邏輯，與您上次成功的版本類似)
            persona_for_desc_image = f"{current_persona_main.split('.')[0]} ({current_lang_display} 解說風格)"
            err_prompt, desc_img_prompt_eng = generate_final_image_prompt(description_text_final, f"描述 ({current_lang_display})", current_char_info_loop, persona_for_desc_image, "Artistic, relevant to the character's meaning.")
            if not err_prompt and desc_img_prompt_eng:
                final_desc_img_prompt = desc_img_prompt_eng # 直接使用，或添加用戶確認
                print(f"    使用描述圖像 Prompt: {final_desc_img_prompt}")
                if 'loaded_pipeline' in globals() and loaded_pipeline is not None: del loaded_pipeline; loaded_pipeline = None; loaded_pipeline_name = None
                if 'torch' in globals() and hasattr(torch, 'cuda') and torch.cuda.is_available(): torch.cuda.empty_cache()
                err_img, desc_img_obj = generate_image_with_diffusers(final_desc_img_prompt, num_inference_steps=25)
                if not err_img and desc_img_obj:
                    img_dir = f"/content/generated_images_cell5/{output_folder_suffix}/description_{CURRENT_LANGUAGE}"
                    os.makedirs(img_dir, exist_ok=True)
                    generated_desc_image_path = os.path.join(img_dir, f"desc_img_{char_filename_base}_{int(time.time())}.png")
                    desc_img_obj.save(generated_desc_image_path); print(f"    描述圖像保存至: {generated_desc_image_path}"); display(desc_img_obj)
                else: print(f"    描述圖像生成失敗: {err_img}")
            else: print(f"    為描述生成圖像 Prompt 失敗: {err_prompt}")
        else: print(f"    跳過描述圖像生成（文本/函數/Diffusers不可用）。")
    else: print("\\n由於未成功辨識古文字，跳過描述流程。")

    # --- 步驟 3: 單角色笑話流程 ---
    ask_single_joke_prompt = f"您想聽一個關於這個古文字的、由「{selected_voice_config_main['name']}」講述的{current_lang_display}笑話嗎？ (y/n，默認n): "
    user_wants_single_joke = input(ask_single_joke_prompt).strip().lower()
    if user_wants_single_joke == 'y' and isinstance(current_char_info_loop, dict) and current_char_info_loop.get('character', '未知') not in ['未知', '提取失敗', None]:
        print(f"\\n--- 步驟 3.S.1: LLM 生成【單角色笑話文本】 ({current_lang_display}) ---")
        err_sjoke_gen, raw_sjoke_text = generate_character_joke_with_persona(current_char_info_loop, current_persona_main, target_lang=CURRENT_LANGUAGE)
        if err_sjoke_gen or not (raw_sjoke_text and raw_sjoke_text.strip()):
            single_joke_text_final = "無法生成單角色笑話。" if CURRENT_LANGUAGE == 'zh' else "Failed to generate single-character joke."
        else: single_joke_text_final = raw_sjoke_text
        print(f"    原始單角色笑話 ({current_lang_display}): '{single_joke_text_final}'")

        if single_joke_text_final and single_joke_text_final.strip() != ("無法生成單角色笑話。" if CURRENT_LANGUAGE == 'zh' else "Failed to generate single-character joke."):
            print(f"\\n--- 步驟 3.S.2: 為【單角色笑話】生成語音 ({selected_voice_config_main['name']}) ---")
            sjoke_audio_dir = f"/content/generated_audio/{output_folder_suffix}/single_joke_{CURRENT_LANGUAGE}"
            sjoke_audio_fn = f"sjoke_{char_filename_base}_{int(time.time())}.wav"
            if use_f5_tts_main and f5_tts_main_ready:
                err_tts, single_joke_audio_path = clone_voice_f5tts(ref_audio_path_main, single_joke_text_final, ref_text_main, sjoke_audio_fn, sjoke_audio_dir)
                if err_tts: print(f"    F5-TTS (單角色笑話) 錯誤: {err_tts}")
            else:
                print(f"    {'回退到' if use_f5_tts_main else ''} gTTS ({target_gtts_lang_param}) 為單角色笑話合成...")
                if 'generate_and_play_speech_gtts' in globals():
                    os.makedirs(sjoke_audio_dir, exist_ok=True)
                    err_tts, single_joke_audio_path = generate_and_play_speech_gtts(single_joke_text_final, lang=target_gtts_lang_param, filename=os.path.join(sjoke_audio_dir, sjoke_audio_fn))
                    if err_tts: print(f"    gTTS (單角色笑話) 錯誤: {err_tts}")
            if single_joke_audio_path and os.path.exists(single_joke_audio_path): print(f"    【成功】單角色笑話語音: {single_joke_audio_path}"); display(Audio(single_joke_audio_path))
            else: print("    單角色笑話語音生成失敗。")

            print(f"\\n--- 步驟 3.S.3: 為【單角色笑話】生成相關圖像 ---")
            if 'generate_final_image_prompt' in globals() and 'generate_image_with_diffusers' in globals() and DIFFUSERS_AVAILABLE:
                # (與上面描述圖像生成類似的邏輯)
                persona_for_sjoke_image = f"{current_persona_main.split('.')[0]} ({current_lang_display} 笑話風格)"
                err_prompt_sj, sjoke_img_prompt_eng = generate_final_image_prompt(single_joke_text_final, f"單角色笑話 ({current_lang_display})", current_char_info_loop, persona_for_sjoke_image, "Humorous, relevant to the joke.")
                if not err_prompt_sj and sjoke_img_prompt_eng:
                    final_sjoke_img_prompt = sjoke_img_prompt_eng
                    print(f"    使用單角色笑話圖像 Prompt: {final_sjoke_img_prompt}")
                    if 'loaded_pipeline' in globals() and loaded_pipeline is not None: del loaded_pipeline; loaded_pipeline = None; loaded_pipeline_name = None
                    if 'torch' in globals() and hasattr(torch, 'cuda') and torch.cuda.is_available(): torch.cuda.empty_cache()
                    err_img_sj, sjoke_img_obj = generate_image_with_diffusers(final_sjoke_img_prompt, num_inference_steps=25)
                    if not err_img_sj and sjoke_img_obj:
                        img_dir_sj = f"/content/generated_images_cell5/{output_folder_suffix}/single_joke_{CURRENT_LANGUAGE}"
                        os.makedirs(img_dir_sj, exist_ok=True)
                        generated_single_joke_image_path = os.path.join(img_dir_sj, f"sjoke_img_{char_filename_base}_{int(time.time())}.png")
                        sjoke_img_obj.save(generated_single_joke_image_path); print(f"    單角色笑話圖像保存至: {generated_single_joke_image_path}"); display(sjoke_img_obj)
                    else: print(f"    單角色笑話圖像生成失敗: {err_img_sj}")
                else: print(f"    為單角色笑話生成圖像 Prompt 失敗: {err_prompt_sj}")
            else: print(f"    跳過單角色笑話圖像生成（函數/Diffusers不可用）。")
    elif user_wants_single_joke == 'y': print("\\n由於未成功辨識古文字，無法生成單角色笑話。")
    else: print("\\n用戶選擇不聽單角色笑話。")

    # --- 步驟 4: 多角色對話式笑話流程 ---
    print("-\\" * 30)
    ask_dialogue_joke_prompt = f"您想聽一個關於這個古文字的、涉及兩個角色對話的 {current_lang_display} 笑話嗎？ (y/n，默認n): "
    user_wants_dialogue_joke = input(ask_dialogue_joke_prompt).strip().lower()
    if user_wants_dialogue_joke == 'y' and isinstance(current_char_info_loop, dict) and current_char_info_loop.get('character', '未知') not in ['未知', '提取失敗', None]:
        print(f"\\n--- 步驟 4.1: 選擇對話笑話的【第二個角色】聲音風格 ---")
        available_voices_for_B = {k: v for k, v in VOICE_OPTIONS.items() if k != voice_choice_main_key}
        if not available_voices_for_B: print("    【警告】沒有其他可用聲音風格。")
        else:
            for key_b, option_b in available_voices_for_B.items(): print(f"  {key_b}: {option_b['name']}")
            voice_choice_B_key = input(f"請為第二個角色選擇聲音，或按 Enter 跳過: ").strip().lower()
            if voice_choice_B_key and voice_choice_B_key in available_voices_for_B:
                selected_voice_config_B = VOICE_OPTIONS[voice_choice_B_key]
                print(f"    第二個角色聲音: {selected_voice_config_B['name']}")

                use_f5_tts_B = (selected_voice_config_B.get("type") == "f5tts")
                ref_audio_path_B, ref_text_B, f5_tts_B_ready = None, None, False
                if use_f5_tts_B:
                    lang_config_B = selected_voice_config_B.get(CURRENT_LANGUAGE)
                    if lang_config_B and lang_config_B.get("ref_audio_path"):
                        ref_audio_path_B = lang_config_B.get("ref_audio_path")
                        ref_text_B = lang_config_B.get("ref_text", "")
                        if os.path.exists(ref_audio_path_B):
                            try: _,_ = sf.read(ref_audio_path_B); f5_tts_B_ready = True
                            except Exception as e: print(f"    【錯誤】讀取角色B F5 ({current_lang_display}) 參考音頻失敗: {e}")
                        else: print(f"    【錯誤】角色B F5 ({current_lang_display}) 參考音頻未找到: {ref_audio_path_B}")
                    else: print(f"    【警告】角色B '{selected_voice_config_B['name']}' 未配置 {current_lang_display} F5-TTS 參考。")
                if use_f5_tts_B and not f5_tts_B_ready:
                    use_f5_tts_B = False
                    print(f"        角色B '{selected_voice_config_B['name']}' 的 F5-TTS ({current_lang_display}) 未就緒，將使用 gTTS。")

                print(f"\\n--- 步驟 4.2: LLM 生成【對話式笑話文本】 ({current_lang_display}) ---")
                if 'generate_dialogue_joke_with_personas' in globals():
                    err_dialogue, dialogue_joke_script_raw, script_tag_A_dialogue, script_tag_B_dialogue = generate_dialogue_joke_with_personas(
                        current_char_info_loop, current_persona_main, selected_voice_config_B.get("persona_description"),
                        selected_voice_config_main['name'], selected_voice_config_B['name'], target_lang=CURRENT_LANGUAGE
                    )
                    if err_dialogue or not (dialogue_joke_script_raw and dialogue_joke_script_raw.strip()): print(f"    對話式笑話生成失敗: {err_dialogue or '空返回'}")
                    else:
                        print(f"    原始對話腳本 ({current_lang_display}):\\n{dialogue_joke_script_raw}")
                        if 'parse_dialogue_script' in globals() and script_tag_A_dialogue and script_tag_B_dialogue:
                            dialogue_turns_parsed = parse_dialogue_script(dialogue_joke_script_raw, script_tag_A_dialogue, script_tag_B_dialogue)
                            if dialogue_turns_parsed:
                                print(f"    腳本解析成功，共 {len(dialogue_turns_parsed)} 輪。")
                                for tdp_idx, tdp_turn in enumerate(dialogue_turns_parsed): print(f"      輪 {tdp_idx+1} - {tdp_turn['speaker']}: {tdp_turn['dialogue'][:30]}...")
                            else: print("    【警告】未能解析對話腳本。")
                        else: print("    【警告】`parse_dialogue_script` 未定義或腳本標籤無效。")
                else: print("    【警告】`generate_dialogue_joke_with_personas` 未定義。")

                if dialogue_turns_parsed:
                    print(f"\\n--- 步驟 4.3: 為多角色笑話的每一輪生成語音 ({current_lang_display}) ---")
                    individual_dialogue_audio_paths = []
                    temp_dialogue_audio_dir = f"/content/temp_dialogue_audio_{int(time.time())}"
                    os.makedirs(temp_dialogue_audio_dir, exist_ok=True)
                    all_turns_succeeded = True

                    for i_turn, turn_data in enumerate(dialogue_turns_parsed):
                        speaker_script_tag_turn = turn_data['speaker']
                        dialogue_line_turn = turn_data['dialogue']
                        turn_audio_path = None

                        is_speaker_A_turn = (speaker_script_tag_turn == script_tag_A_dialogue)
                        current_turn_voice_config = selected_voice_config_main if is_speaker_A_turn else selected_voice_config_B
                        speaker_display_name_for_log = current_turn_voice_config['name']

                        current_turn_use_f5 = (use_f5_tts_main if is_speaker_A_turn else use_f5_tts_B)
                        current_turn_f5_ready = (f5_tts_main_ready if is_speaker_A_turn else f5_tts_B_ready)
                        current_turn_ref_audio = (ref_audio_path_main if is_speaker_A_turn else ref_audio_path_B)
                        current_turn_ref_text = (ref_text_main if is_speaker_A_turn else ref_text_B)

                        print(f"    合成角色「{speaker_display_name_for_log}」 (標籤: {speaker_script_tag_turn}) 的語音: \"{dialogue_line_turn[:30]}...\"")
                        turn_audio_fn = f"turn_{i_turn+1}_{speaker_script_tag_turn.replace(' ','_')}_{int(time.time())}.wav"

                        if current_turn_use_f5 and current_turn_f5_ready:
                            err_tts_turn, turn_audio_path = clone_voice_f5tts(current_turn_ref_audio, dialogue_line_turn, current_turn_ref_text, turn_audio_fn, temp_dialogue_audio_dir)
                            if err_tts_turn: print(f"        F5-TTS (輪次 {i_turn+1}, {speaker_display_name_for_log}) 錯誤: {err_tts_turn}")
                        else:
                            print(f"        {'回退到' if current_turn_use_f5 else ''} gTTS ({target_gtts_lang_param}) 為角色 {speaker_display_name_for_log} 合成...")
                            if 'generate_and_play_speech_gtts' in globals():
                                err_tts_turn, turn_audio_path = generate_and_play_speech_gtts(dialogue_line_turn, lang=target_gtts_lang_param, filename=os.path.join(temp_dialogue_audio_dir, turn_audio_fn))
                                if err_tts_turn: print(f"        gTTS (輪次 {i_turn+1}, {speaker_display_name_for_log}) 錯誤: {err_tts_turn}")

                        if turn_audio_path and os.path.exists(turn_audio_path):
                            individual_dialogue_audio_paths.append(turn_audio_path)
                            print(f"        【成功】輪次 {i_turn+1} ({speaker_display_name_for_log}) 語音: {turn_audio_path}")
                        else: all_turns_succeeded = False; break

                    if all_turns_succeeded and individual_dialogue_audio_paths:
                        # ... (拼接邏輯與上次相同) ...
                        if 'concatenate_audio_clips_moviepy' in globals() and MOVIEPY_AVAILABLE:
                            concat_audio_dir = f"/content/generated_audio/{output_folder_suffix}/dialogue_joke_{CURRENT_LANGUAGE}"
                            os.makedirs(concat_audio_dir, exist_ok=True)
                            concat_audio_fn = f"dialogue_joke_{char_filename_base}_{int(time.time())}.wav"
                            full_dialogue_joke_audio_path_temp = os.path.join(concat_audio_dir, concat_audio_fn)
                            concat_err, concat_res_path = concatenate_audio_clips_moviepy(individual_dialogue_audio_paths, full_dialogue_joke_audio_path_temp)
                            if not concat_err and concat_res_path:
                                full_dialogue_joke_audio_path = concat_res_path
                                print(f"    【成功】拼接後的完整對話笑話音頻: {full_dialogue_joke_audio_path}"); display(Audio(full_dialogue_joke_audio_path))
                    if os.path.exists(temp_dialogue_audio_dir): shutil.rmtree(temp_dialogue_audio_dir)

                if dialogue_joke_script_raw and dialogue_joke_script_raw.strip() and full_dialogue_joke_audio_path:
                    print(f"\\n--- 步驟 4.4: 為【多角色對話式笑話】生成相關圖像 ---")
                    # ... (圖像生成邏輯 -> generated_dialogue_joke_image_path)
                    if 'generate_final_image_prompt' in globals() and 'generate_image_with_diffusers' in globals() and DIFFUSERS_AVAILABLE:
                        persona_A_name_short = selected_voice_config_main['name'].split(' ')[0]
                        persona_B_name_short = selected_voice_config_B['name'].split(' ')[0]
                        persona_for_dico_img = f"角色A ({persona_A_name_short}): {current_persona_main.split('.')[0]}; 角色B ({persona_B_name_short}): {selected_voice_config_B.get('persona_description','').split('.')[0]}"
                        err_prompt_dj, djoke_img_prompt_eng = generate_final_image_prompt(dialogue_joke_script_raw,f"多角色對話笑話 ({current_lang_display})", current_char_info_loop, persona_for_dico_img,"Visualize the interaction.")
                        if not err_prompt_dj and djoke_img_prompt_eng:
                            final_djoke_img_prompt = djoke_img_prompt_eng
                            if 'loaded_pipeline' in globals() and loaded_pipeline is not None: del loaded_pipeline; loaded_pipeline = None; loaded_pipeline_name = None
                            if 'torch' in globals() and hasattr(torch, 'cuda') and torch.cuda.is_available(): torch.cuda.empty_cache()
                            err_img_dj, djoke_img_obj = generate_image_with_diffusers(final_djoke_img_prompt)
                            if not err_img_dj and djoke_img_obj:
                                img_dir_dj = f"/content/generated_images_cell5/{output_folder_suffix}/dialogue_joke_{CURRENT_LANGUAGE}"
                                os.makedirs(img_dir_dj, exist_ok=True)
                                generated_dialogue_joke_image_path = os.path.join(img_dir_dj, f"djoke_img_{char_filename_base}_{int(time.time())}.png")
                                djoke_img_obj.save(generated_dialogue_joke_image_path); print(f"    多角色笑話圖像保存至: {generated_dialogue_joke_image_path}"); display(djoke_img_obj)
            else: print("    未選擇有效的第二角色，跳過多角色笑話。")
    elif user_wants_dialogue_joke == 'y': print("\\n由於未成功辨識古文字，無法生成對話式笑話。")
    else: print("\\n用戶選擇不聽對話式笑話。")

    # --- 步驟 5: 視頻合成 ---
    print(f"\\n--- 步驟 5: 準備視頻合成 ---")
    video_to_create_info_list = []
    if desc_audio_path and os.path.exists(desc_audio_path) and generated_desc_image_path and os.path.exists(generated_desc_image_path):
        video_to_create_info_list.append({"audio_path": desc_audio_path, "image_path": generated_desc_image_path, "output_filename_base": f"desc_{char_filename_base}_{CURRENT_LANGUAGE}", "content_type": f"描述 ({current_lang_display})"})
    if single_joke_audio_path and os.path.exists(single_joke_audio_path) and generated_single_joke_image_path and os.path.exists(generated_single_joke_image_path):
        video_to_create_info_list.append({"audio_path": single_joke_audio_path, "image_path": generated_single_joke_image_path, "output_filename_base": f"sjoke_{char_filename_base}_{CURRENT_LANGUAGE}", "content_type": f"單角色笑話 ({current_lang_display})"})
    if full_dialogue_joke_audio_path and os.path.exists(full_dialogue_joke_audio_path) and generated_dialogue_joke_image_path and os.path.exists(generated_dialogue_joke_image_path):
        video_to_create_info_list.append({"audio_path": full_dialogue_joke_audio_path, "image_path": generated_dialogue_joke_image_path, "output_filename_base": f"djoke_{char_filename_base}_{CURRENT_LANGUAGE}", "content_type": f"多角色笑話 ({current_lang_display})"})

    if video_to_create_info_list:
        for video_info in video_to_create_info_list:
            print(f"\\n--- 正在為【{video_info['content_type']}】合成靜態圖視頻 ---")
            if 'create_video_from_images_and_audio' in globals() and MOVIEPY_AVAILABLE:
                vid_out_dir = f"/content/final_static_videos/{output_folder_suffix}"
                os.makedirs(vid_out_dir, exist_ok=True)
                vid_out_fn = os.path.join(vid_out_dir, f"{video_info['output_filename_base']}_{int(time.time())}.mp4")
                vid_err, created_vid_path = create_video_from_images_and_audio([video_info["image_path"]], video_info["audio_path"], vid_out_fn, fps=1)
                if not vid_err and created_vid_path:
                    print(f"    【成功】「{video_info['content_type']}」視頻已生成: {created_vid_path}");
                    display(HTML(f'<a href="{created_vid_path}" target="_blank" download="{os.path.basename(created_vid_path)}">下載「{video_info["content_type"]}」視頻</a>'))
                    from base64 import b64encode # Inline display
                    mp4_data = open(created_vid_path,'rb').read()
                    data_url = "data:video/mp4;base64," + b64encode(mp4_data).decode()
                    display(HTML(f'<h4>「{video_info["content_type"]}」視頻:</h4><video width="360" controls loop muted><source src="{data_url}" type="video/mp4"></video>'))
                else: print(f"    「{video_info['content_type']}」視頻生成失敗: {vid_err}")
            else: print("    【警告】視頻創建函數或 MoviePy 不可用。")
    else: print("\\n步驟 5: 無有效素材進行視頻合成。")

    print("\\n" + "-"*60)
    continue_session = input("本輪流程結束。是否要開始一個新的會話？ (輸入 'y' 繼續, 其他任意鍵退出程序): ").strip().lower()
    if continue_session != 'y': print("好的，退出主程序。感謝使用！"); break

## Cell 6: AI 說書人 - Zeroscope 動態視頻創作

**技術與模型概覽：**

此 Cell 專注於生成包含**動態視頻片段**的故事影片，與 Cell 5 和 Cell 7 的靜態圖視頻形成互補。
*   **核心視頻生成模型：** `Zeroscope` (通過 `diffusers` Pipeline調用)，用於文本到短視頻片段的生成。
*   **用戶交互與流程控制：** 與 Cell 5 類似，包含語言選擇、聲音風格選擇、故事類型選擇。
*   **內容生成 (LLM)：**
    *   `generate_story_with_persona` (單人旁白故事)。
    *   `generate_dialogue_story_with_personas` (多角色對話式故事腳本)。
    *   `generate_multiple_video_prompts_for_story_gemini` (為故事/腳本分鏡並生成多個 Zeroscope Prompt)。
*   **語音處理 (TTS)：**
    *   `clone_voice_f5tts` / `generate_and_play_speech_gtts` (為完整故事或多角色腳本的每一句生成語音)。
    *   `parse_dialogue_script` (解析多角色腳本)。
    *   `concatenate_audio_clips_moviepy` (拼接多角色語音成完整音軌)。
*   **視頻幀處理與合成 (MoviePy & Diffusers utils)：**
    *   `export_to_video` (Diffusers 工具，將 Zeroscope 生成的 PIL 圖像幀序列轉換為 MP4 短片)。
    *   `combine_video_clips_and_set_audio_moviepy` (將多個 Zeroscope 生成的 MP4 短片拼接起來，並配上完整的旁白音軌)。
*   **其他技術：** 與 Cell 3, 5 類似的 API調用、文件操作、顯存管理等。

**功能介紹：**

Cell 6 的目標是創建一個“AI說書人”，它能講述一個故事（單人旁白或多角色對話），並為故事的關鍵情節生成一系列**動態的短視頻片段** (使用 Zeroscope 模型)，最終將這些片段與完整的旁白音頻合成為一個連貫的動態故事影片。
1.  **語言與聲音選擇：**
    *   用戶選擇本輪視頻的操作語言（中文/英文）。
    *   用戶選擇主要敘述者/角色A的聲音風格。
2.  **故事類型選擇：** 用戶選擇是生成“單人旁白式故事”還是“多角色對話式故事”。
    *   如果選擇多角色，則引導用戶選擇第二個對話角色，並為兩個角色準備相應語言的TTS配置。
3.  **故事文本生成：**
    *   根據用戶選擇，調用 Gemini 生成對應語言的單人故事文本或多角色對話故事腳本。
4.  **完整旁白音頻生成：**
    *   **單人旁白：** 直接為生成的單人故事文本進行 TTS。
    *   **多角色對話：** 解析對話腳本，為每一句對話選擇正確的角色和語言進行 TTS，然後將所有語音片段拼接成一個完整的對話音軌。
5.  **Zeroscope 視頻分鏡與 Prompt 生成：**
    *   將生成的完整故事文本（或對話腳本）發送給 Gemini。
    *   Gemini 將故事內容分解成若干適合用短視頻表達的關鍵視覺場景。
    *   為每個場景生成一個詳細的、適合 Zeroscope 模型的英文視頻生成 Prompt。
6.  **Zeroscope 短視頻片段生成：**
    *   循環遍歷上一步生成的每個 Zeroscope Prompt。
    *   調用 `generate_video_with_zeroscope` 函數，為每個 Prompt 生成一個包含多幀圖像的短視頻片段 (例如2-5秒)。
    *   將生成的幀序列使用 `export_to_video` 保存為獨立的 MP4 短文件。
7.  **最終動態視頻合成：**
    *   使用 `combine_video_clips_and_set_audio_moviepy` 函數：
        *   將所有生成的 Zeroscope MP4 短視頻片段按順序拼接起來。
        *   （可選）如果短視頻片段總時長與旁白音頻時長不匹配，可以進行循環播放或截斷處理。
        *   將步驟3中生成的完整旁白音軌（單人或多角色）設置為最終拼接視頻的音軌。
8.  **輸出與清理：** 顯示生成的最終視頻，並清理過程中產生的臨時視頻片段。

與 Cell 5 和 Cell 7 生成的基於靜態圖像的視頻不同，Cell 6 致力於創造更具動感的視覺敘事體驗。

In [None]:
# Cell 6: AI 說書人 - 【多片段 Zeroscope 視頻 + 可選多角色對話旁白】 (v3.1.1 - 整合多角色故事)
import os
import IPython
from IPython.display import display, Markdown, HTML, Audio
import shutil
import numpy as np
import soundfile as sf
import PIL.Image
import io
import json
import time
import sys
import math # 用於計算循環次數
import traceback
import re # 確保 re 已導入

try:
    if 'torch' not in globals(): import torch # 確保 torch 在需要時導入
except ImportError:
    print("【警告】Cell 6 頂層無法導入 torch。部分 PyTorch 操作可能受影響。")

print("\\n" + "="*70);
print("      🎬 AI 說書人 - 多片段影片創作工坊 (Zeroscope, 可選多角色) 🎬");
print("="*70)

# --- 初始化和變量獲取 ---
if 'device' not in globals():
    try:
        if 'torch' not in globals(): import torch
        device = "cuda" if torch.cuda.is_available() else "cpu"
    except Exception as e_dev_init_c6:
        print(f"Cell 6 初始化 device 失敗: {e_dev_init_c6}, 使用 cpu 後備。")
        device = "cpu"
    print(f"Cell 6: 'device' 設置為: {device}")

if 'current_char_info_loop' not in globals() or current_char_info_loop is None:
    print("【【警告】】Cell 6: 未從 Cell 5 獲取 'current_char_info_loop'。將使用演示信息。")
    current_char_info_c6 = {'character': '演示字C6', 'type': '示例類型C6', 'pinyin': 'yǎn shì zì', 'meaning': '這是 Cell 6 用於演示的古文字。'}
else:
    current_char_info_c6 = current_char_info_loop
    print(f"Cell 6: 已獲取古文字信息: {current_char_info_c6.get('character', '未知')}")

if 'VOICE_OPTIONS' not in globals() or not VOICE_OPTIONS:
    print("【【嚴重錯誤】】Cell 6: VOICE_OPTIONS 未定義。請確保 Cell 5 已正確初始化此變量。")
    # 這裡不設後備，因為 VOICE_OPTIONS 對核心功能至關重要
    user_wants_to_proceed_c6_multi = False # 阻止後續執行
else:
    user_wants_to_proceed_c6_multi = True

if 'CURRENT_LANGUAGE' not in globals():
    print("【警告】Cell 6: 未從 Cell 5 獲取 CURRENT_LANGUAGE，默認為中文 'zh'。")
    CURRENT_LANGUAGE = "zh"
current_lang_display_c6 = '中文' if CURRENT_LANGUAGE == 'zh' else 'English'
target_gtts_lang_param_c6 = 'zh-cn' if CURRENT_LANGUAGE == 'zh' else 'en'

# 【【【新增/修改：Cell 6 獨立的語言選擇】】】
if user_wants_to_proceed_c6_multi: # 只有在 VOICE_OPTIONS 有效時才進行語言選擇
    # 嘗試從全局獲取 CURRENT_LANGUAGE (可能由 Cell 5 設定)
    # 如果沒有，則默認為 'zh'
    CURRENT_LANGUAGE_C6 = globals().get('CURRENT_LANGUAGE', 'zh')

    print("\\n--- Cell 6 - 步驟 -1: 請選擇本輪影片的語言 ---")
    prev_lang_display_c6 = '中文' if CURRENT_LANGUAGE_C6 == 'zh' else 'English'
    lang_choice_input_c6 = input(f"  請選擇語言 (1: 中文, 2: English, Enter 使用當前默認 '{prev_lang_display_c6}'): ").strip()
    if lang_choice_input_c6 == "1":
        CURRENT_LANGUAGE_C6 = "zh"
    elif lang_choice_input_c6 == "2":
        CURRENT_LANGUAGE_C6 = "en"
    # 如果用戶直接按 Enter，則 CURRENT_LANGUAGE_C6 保持其從全局獲取或默認的值

    current_lang_display_c6 = '中文' if CURRENT_LANGUAGE_C6 == 'zh' else 'English'
    print(f"--- Cell 6 當前操作語言: {current_lang_display_c6} ---")
    target_gtts_lang_param_c6 = 'zh-cn' if CURRENT_LANGUAGE_C6 == 'zh' else 'en'
else: # 如果 VOICE_OPTIONS 無效，則設置一個默認值以避免後續 NameError，但流程不會繼續
    CURRENT_LANGUAGE_C6 = "zh"
    current_lang_display_c6 = "中文"
    target_gtts_lang_param_c6 = "zh-cn"

# --- 變量初始化 (Cell 6 作用域) ---
selected_voice_config_c6_main = None
current_persona_c6_main = None
use_f5_tts_main_c6 = False
ref_audio_path_main_c6, ref_text_main_c6, f5_tts_main_ready_c6 = None, None, False
voice_choice_c6_main_key = None # 存儲主角色選擇的 key

story_type_c6 = "single"
selected_voice_config_B_c6 = None
use_f5_tts_B_c6 = False
ref_audio_path_B_c6, ref_text_B_c6, f5_tts_B_ready_c6 = None, None, False
speaker_A_script_tag_c6, speaker_B_script_tag_c6 = None, None

final_narrative_text_for_video_c6 = None
full_narrative_audio_path_c6 = None
video_prompts_list_c6 = []
temp_video_clips_paths_c6 = []


if user_wants_to_proceed_c6_multi:
    # --- 步驟 1: 選擇/確認旁白聲音風格 (主要敘述者/角色A) ---
    print(f"\\n--- Cell 6 - 步驟 1: 選擇主要敘述者/角色A聲音 ({current_lang_display_c6}) ---")
    for key, option in VOICE_OPTIONS.items(): print(f"  {key}: {option['name']}")
    print("  q: 退出此影片創作流程")

    default_voice_key_c6 = 'voice_choice_main' if 'voice_choice_main' in globals() and globals()['voice_choice_main'] in VOICE_OPTIONS else next(iter(VOICE_OPTIONS))
    prompt_voice_A = f"請輸入主要敘述者/角色A的選項數字 (Enter 使用 Cell 5 選擇 '{VOICE_OPTIONS.get(default_voice_key_c6,{}).get('name','N/A')}', 或選其他), 或 'q' 退出: "
    voice_choice_input_c6_A = input(prompt_voice_A).strip().lower()

    if voice_choice_input_c6_A == 'q':
        print("用戶選擇退出影片創作流程。")
        user_wants_to_proceed_c6_multi = False
    else:
        voice_choice_c6_main_key = voice_choice_input_c6_A if voice_choice_input_c6_A in VOICE_OPTIONS else default_voice_key_c6
        if voice_choice_c6_main_key not in VOICE_OPTIONS: # Fallback if default was also invalid
            voice_choice_c6_main_key = next(iter(VOICE_OPTIONS))
            print(f"    選擇無效或默認值異常，使用第一個可用聲音。")

        selected_voice_config_c6_main = VOICE_OPTIONS[voice_choice_c6_main_key]
        print(f"    主要敘述者/角色A: {selected_voice_config_c6_main['name']}")
        current_persona_c6_main = selected_voice_config_c6_main.get("persona_description")
        use_f5_tts_main_c6 = (selected_voice_config_c6_main.get("type") == "f5tts")

        if use_f5_tts_main_c6:
            lang_config_main_c6 = selected_voice_config_c6_main.get(CURRENT_LANGUAGE)
            if lang_config_main_c6 and lang_config_main_c6.get("ref_audio_path"):
                ref_audio_path_main_c6 = lang_config_main_c6.get("ref_audio_path")
                ref_text_main_c6 = lang_config_main_c6.get("ref_text", "")
                if os.path.exists(ref_audio_path_main_c6):
                    try: _, sr = sf.read(ref_audio_path_main_c6); f5_tts_main_ready_c6 = True; print(f"      F5-TTS ({current_lang_display_c6}) 參考 '{os.path.basename(ref_audio_path_main_c6)}' (SR:{sr}Hz) 就緒。")
                    except Exception as e: print(f"      【錯誤】讀取角色A F5 ({current_lang_display_c6}) 參考音頻失敗: {e}")
                else: print(f"      【錯誤】角色A F5 ({current_lang_display_c6}) 參考音頻未找到: {ref_audio_path_main_c6}")
            else: print(f"      【警告】角色A '{selected_voice_config_c6_main['name']}' 未配置 {current_lang_display_c6} F5-TTS 參考。")
        if use_f5_tts_main_c6 and not f5_tts_main_ready_c6:
            use_f5_tts_main_c6 = False; print(f"      角色A 的 F5-TTS ({current_lang_display_c6}) 未就緒，將使用 gTTS。")

if user_wants_to_proceed_c6_multi:
    # --- 步驟 1.5: 選擇故事類型 (單人旁白 vs 多角色對話) ---
    print(f"\\n--- Cell 6 - 步驟 1.5: 選擇故事敘述類型 ({current_lang_display_c6}) ---")
    story_type_choice = input("您希望生成 (1) 單人旁白式故事，還是 (2) 多角色對話式故事？ (輸入 1 或 2，默認1): ").strip()
    if story_type_choice == "2":
        story_type_c6 = "dialogue"
        print("    將生成多角色對話式故事。")
        print(f"\\n--- 為多角色故事選擇【第二個對話角色】聲音風格 ---")
        available_voices_for_B_c6 = {k: v for k, v in VOICE_OPTIONS.items() if k != voice_choice_c6_main_key}
        if not available_voices_for_B_c6:
            print("    【警告】沒有其他可用聲音風格作為第二角色，將退回單人旁白故事。")
            story_type_c6 = "single"
        else:
            for key_b, option_b in available_voices_for_B_c6.items(): print(f"  {key_b}: {option_b['name']}")
            voice_choice_B_key_c6 = input(f"請為第二個角色選擇聲音 (或按 Enter 使用第一個可用選項): ").strip().lower()

            chosen_B_key = voice_choice_B_key_c6 if voice_choice_B_key_c6 in available_voices_for_B_c6 else next(iter(available_voices_for_B_c6))
            if chosen_B_key not in available_voices_for_B_c6 : # Should not happen if available_voices_for_B_c6 is not empty
                 print("    選擇角色B時出錯，退回單人旁白。"); story_type_c6 = "single"
            else:
                selected_voice_config_B_c6 = VOICE_OPTIONS[chosen_B_key]
                print(f"    第二個對話角色: {selected_voice_config_B_c6['name']}")
                use_f5_tts_B_c6 = (selected_voice_config_B_c6.get("type") == "f5tts")
                if use_f5_tts_B_c6:
                    lang_config_B_c6 = selected_voice_config_B_c6.get(CURRENT_LANGUAGE)
                    if lang_config_B_c6 and lang_config_B_c6.get("ref_audio_path"):
                        ref_audio_path_B_c6 = lang_config_B_c6.get("ref_audio_path")
                        ref_text_B_c6 = lang_config_B_c6.get("ref_text", "")
                        if os.path.exists(ref_audio_path_B_c6):
                            try: _, sr = sf.read(ref_audio_path_B_c6); f5_tts_B_ready_c6 = True; print(f"      角色B F5-TTS ({current_lang_display_c6}) 參考 '{os.path.basename(ref_audio_path_B_c6)}' (SR:{sr}Hz) 就緒。")
                            except Exception as e: print(f"      【錯誤】讀取角色B F5 ({current_lang_display_c6}) 參考音頻失敗: {e}")
                        else: print(f"      【錯誤】角色B F5 ({current_lang_display_c6}) 參考音頻未找到: {ref_audio_path_B_c6}")
                    else: print(f"      【警告】角色B '{selected_voice_config_B_c6['name']}' 未配置 {current_lang_display_c6} F5-TTS 參考。")
                if use_f5_tts_B_c6 and not f5_tts_B_ready_c6:
                    use_f5_tts_B_c6 = False; print(f"      角色B 的 F5-TTS ({current_lang_display_c6}) 未就緒，將使用 gTTS。")
    else:
        story_type_c6 = "single" # Default or if user input 1
        print("    將生成單人旁白式故事。")

if user_wants_to_proceed_c6_multi:
    # --- 步驟 2: 生成故事文本 ---
    print(f"\\n--- Cell 6 - 步驟 2: 生成故事文本 ({current_lang_display_c6}, 類型: {story_type_c6}) ---")
    story_len_hint_c6 = "一個包含多個發展階段的短故事，總長度約15-30秒語音，適合製作成多個短視頻片段。" if CURRENT_LANGUAGE == 'zh' else "A short story with multiple plot points, about 15-30 seconds of speech, suitable for multiple short video clips."

    if story_type_c6 == "single":
        if 'generate_story_with_persona' in globals():
            err_story, raw_story_text_c6 = generate_story_with_persona(
                current_char_info_c6, current_persona_c6_main,
                target_lang=CURRENT_LANGUAGE_C6, story_length_hint=story_len_hint_c6
            )
            if err_story or not (raw_story_text_c6 and raw_story_text_c6.strip()):
                print(f"    單人旁白故事生成失敗: {err_story or '空返回'}"); user_wants_to_proceed_c6_multi = False
            else: final_narrative_text_for_video_c6 = raw_story_text_c6
        else: print("【錯誤】`generate_story_with_persona` 未定義!"); user_wants_to_proceed_c6_multi = False

    elif story_type_c6 == "dialogue" and selected_voice_config_B_c6:
        if 'generate_dialogue_story_with_personas' in globals():
            err_story, raw_dialogue_script_c6, spk_A_tag, spk_B_tag = generate_dialogue_story_with_personas(
                current_char_info_c6, current_persona_c6_main, selected_voice_config_B_c6.get("persona_description"),
                selected_voice_config_c6_main['name'], selected_voice_config_B_c6['name'],
                target_lang=CURRENT_LANGUAGE_C6, num_dialogue_turns=3 # 故事輪次可以多一些
            )
            if err_story or not (raw_dialogue_script_c6 and raw_dialogue_script_c6.strip()):
                print(f"    多角色對話故事生成失敗: {err_story or '空返回'}"); user_wants_to_proceed_c6_multi = False
            else:
                final_narrative_text_for_video_c6 = raw_dialogue_script_c6
                speaker_A_script_tag_c6 = spk_A_tag # 保存腳本標籤
                speaker_B_script_tag_c6 = spk_B_tag
        else: print("【錯誤】`generate_dialogue_story_with_personas` 未定義!"); user_wants_to_proceed_c6_multi = False
    else: # Should not happen if logic above is correct
        print("【錯誤】無效的故事類型或角色B未選擇。"); user_wants_to_proceed_c6_multi = False

    if user_wants_to_proceed_c6_multi and final_narrative_text_for_video_c6:
        print(f"    生成的敘事文本 ({story_type_c6}, {current_lang_display_c6}):\\n'''{final_narrative_text_for_video_c6[:300]}...'''")
        # (可選的文本優化)
    elif user_wants_to_proceed_c6_multi: # Text generation failed but proceed was true
        print("    未能生成有效的敘事文本。"); user_wants_to_proceed_c6_multi = False


if user_wants_to_proceed_c6_multi:
    # --- 步驟 3: 生成完整的 TTS 旁白音頻 ---
    if 'loaded_pipeline' in globals() and loaded_pipeline is not None: del loaded_pipeline; loaded_pipeline = None; loaded_pipeline_name = None
    if 'loaded_zeroscope_pipe' in globals() and loaded_zeroscope_pipe is not None: del loaded_zeroscope_pipe; loaded_zeroscope_pipe = None
    if 'torch' in globals() and hasattr(torch, 'cuda') and torch.cuda.is_available(): torch.cuda.empty_cache(); print("    Cell 6: CUDA cache cleared (before TTS).")

    print(f"\\n--- Cell 6 - 步驟 3: 生成完整旁白音頻 ({current_lang_display_c6}) ---")
    audio_out_dir_c6 = f"/content/generated_narrative_audio_c6/{selected_voice_config_c6_main['name'].split(' ')[0].lower()}_{story_type_c6}_{CURRENT_LANGUAGE}"
    audio_out_fn_c6_base = f"narrative_{current_char_info_c6.get('character','story')}_{int(time.time())}"

    if story_type_c6 == "single":
        audio_out_fn_c6 = f"{audio_out_fn_c6_base}_single.wav"
        if use_f5_tts_main_c6 and f5_tts_main_ready_c6:
            err_tts, full_narrative_audio_path_c6 = clone_voice_f5tts(ref_audio_path_main_c6, final_narrative_text_for_video_c6, ref_text_main_c6, audio_out_fn_c6, audio_out_dir_c6)
        else: # gTTS or F5 fallback
            if 'generate_and_play_speech_gtts' in globals():
                os.makedirs(audio_out_dir_c6, exist_ok=True)
                err_tts, full_narrative_audio_path_c6 = generate_and_play_speech_gtts(final_narrative_text_for_video_c6, lang=target_gtts_lang_param_c6, filename=os.path.join(audio_out_dir_c6, audio_out_fn_c6))
            else: err_tts = "gTTS function missing"
        if err_tts: print(f"    單人旁白TTS失敗: {err_tts}"); user_wants_to_proceed_c6_multi = False

    elif story_type_c6 == "dialogue":
        dialogue_turns_parsed_c6 = []
        if 'parse_dialogue_script' in globals() and speaker_A_script_tag_c6 and speaker_B_script_tag_c6:
            dialogue_turns_parsed_c6 = parse_dialogue_script(final_narrative_text_for_video_c6, speaker_A_script_tag_c6, speaker_B_script_tag_c6)
        if not dialogue_turns_parsed_c6:
            print("    【錯誤】多角色故事腳本解析失敗或為空，無法生成對話式音頻。"); user_wants_to_proceed_c6_multi = False
        else:
            individual_audio_paths_c6 = []
            temp_dialogue_dir_c6 = f"/content/temp_dialogue_story_audio_c6_{int(time.time())}"
            os.makedirs(temp_dialogue_dir_c6, exist_ok=True)
            all_dialogue_tts_succeeded = True
            for i_turn, turn_data in enumerate(dialogue_turns_parsed_c6):
                speaker_tag = turn_data['speaker']
                line = turn_data['dialogue']
                is_speaker_A_turn = (speaker_tag == speaker_A_script_tag_c6)
                turn_voice_config = selected_voice_config_c6_main if is_speaker_A_turn else selected_voice_config_B_c6
                turn_use_f5 = (use_f5_tts_main_c6 if is_speaker_A_turn else use_f5_tts_B_c6)
                turn_f5_ready = (f5_tts_main_ready_c6 if is_speaker_A_turn else f5_tts_B_ready_c6)
                turn_ref_audio = (ref_audio_path_main_c6 if is_speaker_A_turn else ref_audio_path_B_c6)
                turn_ref_text = (ref_text_main_c6 if is_speaker_A_turn else ref_text_B_c6)

                print(f"      合成角色「{turn_voice_config['name']}」 (標籤:{speaker_tag}) 第 {i_turn+1} 句: \"{line[:30]}...\"")
                turn_audio_fn = f"turn_{i_turn+1}_{speaker_tag.replace(' ','_')}.wav"
                turn_audio_path_temp = None
                if turn_use_f5 and turn_f5_ready:
                    err_tts_turn, turn_audio_path_temp = clone_voice_f5tts(turn_ref_audio, line, turn_ref_text, turn_audio_fn, temp_dialogue_dir_c6)
                else:
                    if 'generate_and_play_speech_gtts' in globals():
                        err_tts_turn, turn_audio_path_temp = generate_and_play_speech_gtts(line, lang=target_gtts_lang_param_c6, filename=os.path.join(temp_dialogue_dir_c6, turn_audio_fn))
                    else: err_tts_turn = "gTTS func missing"
                if err_tts_turn or not (turn_audio_path_temp and os.path.exists(turn_audio_path_temp)):
                    print(f"        【錯誤】角色 {turn_voice_config['name']} 第 {i_turn+1} 句TTS失敗: {err_tts_turn or '無輸出'}"); all_dialogue_tts_succeeded = False; break
                individual_audio_paths_c6.append(turn_audio_path_temp)

            if all_dialogue_tts_succeeded and individual_audio_paths_c6:
                if 'concatenate_audio_clips_moviepy' in globals() and MOVIEPY_AVAILABLE:
                    audio_out_fn_c6 = f"{audio_out_fn_c6_base}_dialogue.wav"
                    full_narrative_audio_path_c6_temp = os.path.join(audio_out_dir_c6, audio_out_fn_c6)
                    os.makedirs(audio_out_dir_c6, exist_ok=True)
                    concat_err, concat_res_path = concatenate_audio_clips_moviepy(individual_audio_paths_c6, full_narrative_audio_path_c6_temp)
                    if not concat_err and concat_res_path: full_narrative_audio_path_c6 = concat_res_path
                    else: print(f"        拼接對話音頻失敗: {concat_err}"); user_wants_to_proceed_c6_multi = False
                else: print("        【錯誤】音頻拼接函數或MoviePy不可用。"); user_wants_to_proceed_c6_multi = False
            else: print("        部分或全部對話輪次TTS失敗，不進行拼接。"); user_wants_to_proceed_c6_multi = False
            if os.path.exists(temp_dialogue_dir_c6): shutil.rmtree(temp_dialogue_dir_c6)

    if user_wants_to_proceed_c6_multi and full_narrative_audio_path_c6 and os.path.exists(full_narrative_audio_path_c6):
        print(f"    【成功】完整旁白音頻 ({story_type_c6}): {full_narrative_audio_path_c6}"); display(Audio(full_narrative_audio_path_c6))
    elif user_wants_to_proceed_c6_multi: # Audio path not set or not exists
        print("完整旁白音頻生成失敗或未找到，無法繼續。"); user_wants_to_proceed_c6_multi = False


if user_wants_to_proceed_c6_multi:
    # --- 步驟 4: 為故事分鏡並生成多個 Zeroscope Prompts ---
    print(f"\\n--- Cell 6 - 步驟 4: 為敘事文本分鏡並生成 Zeroscope Prompts ---")
    num_video_clips_target_c6 = 0
    try:
        from moviepy.editor import AudioFileClip # Local import for duration check
        audio_clip_temp_c6 = AudioFileClip(full_narrative_audio_path_c6)
        total_audio_dur_c6 = audio_clip_temp_c6.duration
        audio_clip_temp_c6.close()
        num_video_clips_target_c6 = max(1, math.ceil(total_audio_dur_c6 / 2.5)) # 假設每片段2.5秒
        print(f"    旁白音頻時長約 {total_audio_dur_c6:.2f} 秒，計劃生成約 {num_video_clips_target_c6} 個 Zeroscope 片段。")
    except Exception as e_dur_c6:
        print(f"    獲取音頻時長失敗 ({e_dur_c6})，請手動輸入片段數。")
        num_clips_str_c6 = input("    希望將故事分解成多少個短視頻片段? (例如 3-10，默認5): ").strip()
        num_video_clips_target_c6 = int(num_clips_str_c6) if num_clips_str_c6.isdigit() and int(num_clips_str_c6) > 0 else 5

    if 'generate_multiple_video_prompts_for_story_gemini' in globals():
        content_type_for_prompt_c6 = "對話式故事" if story_type_c6 == "dialogue" else "故事"
        video_prompts_list_c6, video_prompt_gen_err_c6 = generate_multiple_video_prompts_for_story_gemini(
            story_or_joke_text=final_narrative_text_for_video_c6, # 完整故事或腳本
            num_clips_target=num_video_clips_target_c6,
            content_type=content_type_for_prompt_c6,
            char_info=current_char_info_c6,
            persona_style_hint=current_persona_c6_main.split('.')[0] if story_type_c6 == "single" else f"{selected_voice_config_c6_main['name']}({current_persona_c6_main.split('.')[0]}) 与 {selected_voice_config_B_c6['name']}({selected_voice_config_B_c6.get('persona_description','').split('.')[0]}) 的对话",
            video_style_hint="dynamic, engaging, cinematic short clips, matching the narrative flow"
        )
        if video_prompt_gen_err_c6: print(f"    生成多視頻 Prompts 時遇到問題: {video_prompt_gen_err_c6}")
        if not video_prompts_list_c6: print(f"    未能為敘事文本生成任何視頻 Prompts。"); user_wants_to_proceed_c6_multi = False
    else: print("【錯誤】`generate_multiple_video_prompts_for_story_gemini` 未定義!"); user_wants_to_proceed_c6_multi = False

if user_wants_to_proceed_c6_multi:
    # --- 步驟 5: 循環生成多個 Zeroscope 短視頻片段 ---
    if 'loaded_pipeline' in globals() and loaded_pipeline is not None: del loaded_pipeline; loaded_pipeline = None
    if 'torch' in globals() and hasattr(torch, 'cuda') and torch.cuda.is_available(): torch.cuda.empty_cache(); print("    Cell 6: CUDA cache cleared (before Zeroscope loop).")

    print(f"\\n--- Cell 6 - 步驟 5: 循環生成 {len(video_prompts_list_c6)} 個 Zeroscope 短視頻片段 ---")
    temp_zeroscope_clips_dir_c6 = f"/content/temp_zeroscope_clips_c6_{int(time.time())}"
    os.makedirs(temp_zeroscope_clips_dir_c6, exist_ok=True)

    for idx, scene_data in enumerate(video_prompts_list_c6):
        print(f"\\n  --- 正在生成第 {idx + 1}/{len(video_prompts_list_c6)} 個視頻片段 ---")
        current_clip_prompt_c6 = scene_data.get("english_video_prompt")
        if not current_clip_prompt_c6: print("    【警告】此場景缺少 Prompt，跳過。"); continue
        print(f"    使用 Prompt: {current_clip_prompt_c6[:100]}...")

        clip_err, pil_frames_for_clip_c6 = generate_video_with_zeroscope(
            current_clip_prompt_c6, num_frames=24, height=320, width=576, num_inference_steps=20
        ) # 參數可調
        if not clip_err and pil_frames_for_clip_c6:
            clip_mp4_path_c6 = os.path.join(temp_zeroscope_clips_dir_c6, f"clip_{idx+1:03d}.mp4")
            try:
                if 'export_to_video' in globals(): # 確保函數存在
                    export_to_video(pil_frames_for_clip_c6, clip_mp4_path_c6, fps=10)
                    print(f"    【成功】片段 {idx+1} 已保存為 MP4: {clip_mp4_path_c6}")
                    temp_video_clips_paths_c6.append(clip_mp4_path_c6)
                else: print("     【錯誤】 `export_to_video` 函數未定義，無法保存片段。")
            except Exception as e_export_clip: print(f"    導出片段 {idx+1} 為 MP4 時出錯: {e_export_clip}")
        else: print(f"    生成片段 {idx+1} 失敗: {clip_err or '未返回有效幀'}")

    if not temp_video_clips_paths_c6:
        print("未能成功生成任何短視頻片段，無法繼續。"); user_wants_to_proceed_c6_multi = False

if user_wants_to_proceed_c6_multi:
    # --- 步驟 6: 拼接所有短視頻片段並配上完整音頻 ---
    if 'loaded_zeroscope_pipe' in globals() and loaded_zeroscope_pipe is not None: del loaded_zeroscope_pipe; loaded_zeroscope_pipe = None
    if 'torch' in globals() and hasattr(torch, 'cuda') and torch.cuda.is_available(): torch.cuda.empty_cache(); print("    Cell 6: CUDA cache cleared (before final combine).")

    print(f"\\n--- Cell 6 - 步驟 6: 拼接 {len(temp_video_clips_paths_c6)} 個短視頻片段並配音 ---")
    if 'combine_video_clips_and_set_audio_moviepy' in globals() and MOVIEPY_AVAILABLE:
        final_video_output_dir_c6 = f"/content/final_multiclip_videos_c6/{selected_voice_config_c6_main['name'].split(' ')[0].lower()}_{story_type_c6}_{CURRENT_LANGUAGE}"
        os.makedirs(final_video_output_dir_c6, exist_ok=True)
        final_video_filename_c6 = f"multiclip_{story_type_c6}_{current_char_info_c6.get('character','video')}_{int(time.time())}.mp4"
        final_output_path_c6 = os.path.join(final_video_output_dir_c6, final_video_filename_c6)

        combine_err_multi, final_video_result_path_c6 = combine_video_clips_and_set_audio_moviepy(
            temp_video_clips_paths_c6, full_narrative_audio_path_c6, final_output_path_c6, target_fps=24
        )
        if not combine_err_multi and final_video_result_path_c6:
            print(f"    【成功】最終多片段視頻已生成: {final_video_result_path_c6}")
            display(HTML(f'<a href="{final_video_result_path_c6}" target="_blank" download="{os.path.basename(final_video_result_path_c6)}">點此下載最終視頻</a>'))
            from base64 import b64encode
            mp4_final_data = open(final_video_result_path_c6,'rb').read()
            data_url_final = "data:video/mp4;base64," + b64encode(mp4_final_data).decode()
            display(HTML(f'<h4>最終多片段合成視頻:</h4><video width="480" controls loop muted><source src="{data_url_final}" type="video/mp4"></video>'))
        else: print(f"    最終視頻合成失敗: {combine_err_multi or '未知錯誤'}")
        if os.path.exists(temp_zeroscope_clips_dir_c6): shutil.rmtree(temp_zeroscope_clips_dir_c6); print(f"    已清理臨時短視頻片段文件夾。")
    else: print("    【警告】MoviePy 或相關合併函數不可用/未定義。")

if not user_wants_to_proceed_c6_multi:
    print("\\nCell 6 因用戶選擇或前期錯誤而中止。")
print("\\n--- AI 說書人 (多片段 Zeroscope 版) 單次流程執行完畢 ---")

## Cell 7: AI 說書人 - 多圖靜態影片創作

**技術與模型概覽：**

此 Cell 旨在創建一個“AI說書人”，通過多張**靜態圖像**配合完整的旁白音頻來呈現一個故事。它是對 Cell 5 單圖視頻功能的擴展，並與 Cell 6 的動態視頻形成對比。
*   **用戶交互與流程控制：** 包含語言選擇、聲音風格選擇、故事類型選擇。
*   **內容生成 (LLM)：**
    *   `generate_story_with_persona` (單人旁白故事)。
    *   `generate_dialogue_story_with_personas` (多角色對話式故事腳本)。
    *   `generate_multiple_image_prompts_for_story` (為故事/腳本的每個主要部分生成圖像 Prompt)。
*   **語音處理 (TTS)：**
    *   `clone_voice_f5tts` / `generate_and_play_speech_gtts` (為完整故事或多角色腳本的每一句生成語音)。
    *   `parse_dialogue_script` (解析多角色腳本)。
    *   `concatenate_audio_clips_moviepy` (拼接多角色語音成完整音軌)。
*   **靜態圖像生成 (Diffusers)：**
    *   `generate_image_with_diffusers` (根據多個 Prompt 生成一系列靜態圖像)。
*   **視頻合成 (MoviePy)：**
    *   `create_video_from_images_and_audio` (將生成的多張靜態圖像序列與完整旁白音頻合成為 MP4 視頻)。
*   **其他技術：** 與 Cell 3, 5, 6 類似的 API調用、文件操作、顯存管理等。

**功能介紹：**

Cell 7 提供了一種通過生成一系列靜態圖像來講述故事的方案，適用於創建類似“幻燈片式”或“連環畫式”的視頻。
1.  **語言、聲音與故事類型選擇：**
    *   用戶選擇本輪視頻的操作語言（中文/英文）。
    *   用戶選擇主要敘述者/角色A的聲音風格。
    *   用戶選擇是生成“單人旁白式故事”還是“多角色對話式故事”。若為多角色，則選擇角色B。
2.  **故事文本生成：**
    *   根據用戶選擇，調用 Gemini 生成對應語言的單人故事文本或多角色對話故事腳本。
3.  **完整旁白音頻生成：**
    *   與 Cell 6 類似，為生成的文本（單人或多角色）創建完整的音頻旁白。
4.  **多圖像 Prompts 生成：**
    *   將生成的完整故事文本（或對話腳本）發送給 `generate_multiple_image_prompts_for_story` 函數。
    *   該函數將故事內容分解成若干適合用單張靜態圖像表達的關鍵場景或概念。
    *   為每個場景生成一個中文描述和一個適合 Stable Diffusion 的英文圖像 Prompt。
5.  **多圖像迭代生成與選擇：**
    *   循環遍歷上一步生成的每個圖像 Prompt。
    *   調用 `generate_image_with_diffusers` 生成圖像。
    *   向用戶展示生成的圖像，並允許用戶決定是否將此圖像用於最終視頻，或嘗試修改 Prompt 重試生成。
    *   收集所有用戶確認選中的圖像文件路徑。
6.  **最終靜態圖視頻合成：**
    *   使用 `create_video_from_images_and_audio` 函數：
        *   將步驟5中所有用戶選中的靜態圖像文件，按照一定的順序和時長（例如，根據旁白音頻總長度和圖片數量平均分配每張圖片的顯示時間）。
        *   與步驟3中生成的完整旁白音軌合成為一個 MP4 視頻。
7.  **輸出與清理：** 顯示生成的最終視頻，並清理過程中產生的臨時圖像文件。

此 Cell 提供了與 Cell 6 不同的視覺風格，專注於通過精心挑選的靜態圖像序列來輔助故事敘述。

In [None]:
# Cell 7: AI 說書人 - 【多圖靜態影片 + 可選多角色對話旁白】 (v3.1.3 - 修正單人TTS後流程)
import os
import IPython
from IPython.display import display, Markdown, HTML, Audio
import shutil
import numpy as np
import soundfile as sf
import PIL.Image
import io
import json
import time
import sys
import traceback
import re

try:
    if 'torch' not in globals(): import torch
except ImportError:
    print("【警告】Cell 7 頂層無法導入 torch。")

print("\\n" + "="*70);
print("      🖼️🎙️ AI 說書人 - 多圖靜態影片創作工坊 (可選多角色) 🖼️🎙️");
print("="*70)

# --- 初始化和變量獲取 ---
if 'device' not in globals():
    try:
        if 'torch' not in globals(): import torch
        device = "cuda" if torch.cuda.is_available() else "cpu"
    except Exception as e_dev_init_c7: device = "cpu"
    print(f"Cell 7: 'device' 設置為: {device}")

if 'current_char_info_loop' not in globals() or current_char_info_loop is None:
    print("【【警告】】Cell 7: 未從 Cell 5 獲取 'current_char_info_loop'。使用演示信息。")
    current_char_info_c7 = {'character': '演示C7', 'type': '示例C7', 'pinyin': 'yǎn shì', 'meaning': 'Cell 7 演示用。'}
else:
    current_char_info_c7 = current_char_info_loop
    print(f"Cell 7: 已獲取古文字信息: {current_char_info_c7.get('character', '未知')}")

if 'VOICE_OPTIONS' not in globals() or not VOICE_OPTIONS:
    print("【【嚴重錯誤】】Cell 7: VOICE_OPTIONS 未定義。流程中止。")
    user_wants_to_proceed_c7 = False
else:
    user_wants_to_proceed_c7 = True

if user_wants_to_proceed_c7:
    CURRENT_LANGUAGE_C7 = globals().get('CURRENT_LANGUAGE', 'zh')
    print("\\n--- Cell 7 - 步驟 -1: 請選擇本輪影片的語言 ---")
    prev_lang_display_c7 = '中文' if CURRENT_LANGUAGE_C7 == 'zh' else 'English'
    lang_choice_input_c7 = input(f"  請選擇語言 (1: 中文, 2: English, Enter 使用當前默認 '{prev_lang_display_c7}'): ").strip()
    if lang_choice_input_c7 == "1": CURRENT_LANGUAGE_C7 = "zh"
    elif lang_choice_input_c7 == "2": CURRENT_LANGUAGE_C7 = "en"
    current_lang_display_c7 = '中文' if CURRENT_LANGUAGE_C7 == 'zh' else 'English'
    print(f"--- Cell 7 當前操作語言: {current_lang_display_c7} ---")
    target_gtts_lang_param_c7 = 'zh-cn' if CURRENT_LANGUAGE_C7 == 'zh' else 'en'
else:
    CURRENT_LANGUAGE_C7 = "zh"; current_lang_display_c7 = "中文"; target_gtts_lang_param_c7 = "zh-cn"

selected_voice_config_c7_main = None; current_persona_c7_main = None
use_f5_tts_main_c7 = False; ref_audio_path_main_c7, ref_text_main_c7, f5_tts_main_ready_c7 = None, None, False
voice_choice_c7_main_key = None
story_type_c7 = "single"; selected_voice_config_B_c7 = None
use_f5_tts_B_c7 = False; ref_audio_path_B_c7, ref_text_B_c7, f5_tts_B_ready_c7 = None, None, False
speaker_A_script_tag_c7, speaker_B_script_tag_c7 = None, None
final_story_text_for_tts_c7 = None
full_story_audio_path_c7 = None
final_selected_image_paths_for_video_c7 = []

if user_wants_to_proceed_c7:
    print(f"\\n--- Cell 7 - 步驟 1: 選擇主要敘述者/角色A聲音 ({current_lang_display_c7}) ---")
    for key, option in VOICE_OPTIONS.items(): print(f"  {key}: {option['name']}")
    print("  q: 退出此流程")
    default_key_c7 = globals().get('voice_choice_main_key', next(iter(VOICE_OPTIONS)))
    prompt_voice_A_c7 = f"請輸入主要敘述者/角色A的選項數字 (Enter 使用默認 '{VOICE_OPTIONS.get(default_key_c7,{}).get('name','N/A')}'), 或 'q': "
    voice_choice_input_c7_A = input(prompt_voice_A_c7).strip().lower()
    if voice_choice_input_c7_A == 'q': user_wants_to_proceed_c7 = False
    else:
        voice_choice_c7_main_key = voice_choice_input_c7_A if voice_choice_input_c7_A in VOICE_OPTIONS else default_key_c7
        if voice_choice_c7_main_key not in VOICE_OPTIONS: voice_choice_c7_main_key = next(iter(VOICE_OPTIONS))
        selected_voice_config_c7_main = VOICE_OPTIONS[voice_choice_c7_main_key]
        print(f"    主要敘述者/角色A: {selected_voice_config_c7_main['name']}")
        current_persona_c7_main = selected_voice_config_c7_main.get("persona_description")
        use_f5_tts_main_c7 = (selected_voice_config_c7_main.get("type") == "f5tts")
        if use_f5_tts_main_c7:
            lang_config = selected_voice_config_c7_main.get(CURRENT_LANGUAGE_C7)
            if lang_config and lang_config.get("ref_audio_path"):
                ref_audio_path_main_c7, ref_text_main_c7 = lang_config.get("ref_audio_path"), lang_config.get("ref_text","")
                if os.path.exists(ref_audio_path_main_c7):
                    try: _, sr = sf.read(ref_audio_path_main_c7); f5_tts_main_ready_c7 = True; print(f"      F5 ({current_lang_display_c7}) Ref: '{os.path.basename(ref_audio_path_main_c7)}' (SR:{sr}Hz) OK.")
                    except Exception as e: print(f"      讀取角色A F5 Ref失敗: {e}")
                else: print(f"      角色A F5 Ref未找到: {ref_audio_path_main_c7}")
            else: print(f"      角色A 未配置 {current_lang_display_c7} F5 Ref。")
        if use_f5_tts_main_c7 and not f5_tts_main_ready_c7: use_f5_tts_main_c7 = False; print(f"      角色A F5 ({current_lang_display_c7}) 未就緒，將用gTTS。")

if user_wants_to_proceed_c7:
    print(f"\\n--- Cell 7 - 步驟 1.5: 選擇故事敘述類型 ({current_lang_display_c7}) ---")
    story_type_choice_c7 = input("您希望生成 (1) 單人旁白式故事，還是 (2) 多角色對話式故事？ (1 或 2，默認1): ").strip()
    if story_type_choice_c7 == "2":
        story_type_c7 = "dialogue"
        print(f"\\n--- 為多角色故事選擇【第二個對話角色】聲音風格 ({current_lang_display_c7}) ---")
        available_voices_for_B_c7 = {k: v for k, v in VOICE_OPTIONS.items() if k != voice_choice_c7_main_key}
        if not available_voices_for_B_c7: print("    【警告】無其他可用聲音，退回單人旁白。"); story_type_c7 = "single"
        else:
            for key_b, option_b in available_voices_for_B_c7.items(): print(f"  {key_b}: {option_b['name']}")
            voice_choice_B_key_c7 = input(f"請為第二個角色選擇聲音: ").strip().lower()
            if voice_choice_B_key_c7 and voice_choice_B_key_c7 in available_voices_for_B_c7:
                selected_voice_config_B_c7 = VOICE_OPTIONS[voice_choice_B_key_c7]
                print(f"    第二個角色: {selected_voice_config_B_c7['name']}")
                use_f5_tts_B_c7 = (selected_voice_config_B_c7.get("type") == "f5tts")
                if use_f5_tts_B_c7:
                    lang_config_B = selected_voice_config_B_c7.get(CURRENT_LANGUAGE_C7)
                    if lang_config_B and lang_config_B.get("ref_audio_path"):
                        ref_audio_path_B_c7, ref_text_B_c7 = lang_config_B.get("ref_audio_path"), lang_config_B.get("ref_text","")
                        if os.path.exists(ref_audio_path_B_c7):
                            try: _, sr = sf.read(ref_audio_path_B_c7); f5_tts_B_ready_c7 = True; print(f"      角色B F5 ({current_lang_display_c7}) Ref: '{os.path.basename(ref_audio_path_B_c7)}' (SR:{sr}Hz) OK.")
                            except Exception as e: print(f"      讀取角色B F5 Ref失敗: {e}")
                        else: print(f"      角色B F5 Ref未找到: {ref_audio_path_B_c7}")
                    else: print(f"      角色B 未配置 {current_lang_display_c7} F5 Ref。")
                if use_f5_tts_B_c7 and not f5_tts_B_ready_c7: use_f5_tts_B_c7 = False; print(f"      角色B F5 ({current_lang_display_c7}) 未就緒，將用gTTS。")
            else: print("    未選有效第二角色，退回單人旁白。"); story_type_c7 = "single"
    else: story_type_c7 = "single"; print("    將生成單人旁白式故事。")

if user_wants_to_proceed_c7:
    print(f"\\n--- Cell 7 - 步驟 2: 生成故事文本 ({current_lang_display_c7}, 類型: {story_type_c7}) ---")
    story_len_hint_c7 = "一個包含多個情節或轉折的短故事，總長度大約5到8句話，適合配上多張圖片來演繹。" if CURRENT_LANGUAGE_C7 == 'zh' else "A short story with multiple plot points or twists, about 5-8 sentences long, suitable for a multi-image slideshow video."
    if story_type_c7 == "single":
        if 'generate_story_with_persona' in globals():
            err_st, raw_st_text_c7 = generate_story_with_persona(current_char_info_c7, current_persona_c7_main, target_lang=CURRENT_LANGUAGE_C7, story_length_hint=story_len_hint_c7)
            if err_st or not (raw_st_text_c7 and raw_st_text_c7.strip()): print(f"    單人旁白故事生成失敗: {err_st or '空返回'}"); user_wants_to_proceed_c7=False
            else: final_story_text_for_tts_c7 = raw_st_text_c7
        else: user_wants_to_proceed_c7=False; print("【錯誤】`generate_story_with_persona` 未定義!")
    elif story_type_c7 == "dialogue" and selected_voice_config_B_c7:
        if 'generate_dialogue_story_with_personas' in globals():
            err_st, raw_dialogue_c7, tag_A, tag_B = generate_dialogue_story_with_personas(
                current_char_info_c7, current_persona_c7_main, selected_voice_config_B_c7.get("persona_description"),
                selected_voice_config_c7_main['name'], selected_voice_config_B_c7['name'],
                target_lang=CURRENT_LANGUAGE_C7, num_dialogue_turns=3
            )
            if err_st or not (raw_dialogue_c7 and raw_dialogue_c7.strip()): print(f"    多角色對話故事生成失敗: {err_st or '空返回'}"); user_wants_to_proceed_c7=False
            else: final_story_text_for_tts_c7 = raw_dialogue_c7; speaker_A_script_tag_c7 = tag_A; speaker_B_script_tag_c7 = tag_B
        else: user_wants_to_proceed_c7=False; print("【錯誤】`generate_dialogue_story_with_personas` 未定義!")
    if not final_story_text_for_tts_c7 and user_wants_to_proceed_c7: user_wants_to_proceed_c7=False; print("    未能生成有效敘事文本。")
    elif user_wants_to_proceed_c7: print(f"    敘事文本 ({story_type_c7}, {current_lang_display_c7}):\\n'''{final_story_text_for_tts_c7[:300]}...'''")

if user_wants_to_proceed_c7:
    if 'loaded_pipeline' in globals() and loaded_pipeline is not None: del loaded_pipeline; loaded_pipeline = None; loaded_pipeline_name = None
    if 'loaded_zeroscope_pipe' in globals() and loaded_zeroscope_pipe is not None: del loaded_zeroscope_pipe; loaded_zeroscope_pipe = None
    if 'torch' in globals() and hasattr(torch, 'cuda') and torch.cuda.is_available(): torch.cuda.empty_cache(); print("    Cell 7: CUDA cache cleared (before TTS).")

    print(f"\\n--- Cell 7 - 步驟 3: 生成完整旁白音頻 ({current_lang_display_c7}) ---")
    audio_out_dir_c7 = f"/content/generated_narrative_audio_c7/{selected_voice_config_c7_main['name'].split(' ')[0].lower()}_{story_type_c7}_{CURRENT_LANGUAGE_C7}"
    audio_out_fn_c7_base = f"narrative_c7_{current_char_info_c7.get('character','story')}_{int(time.time())}"
    err_tts = "TTS 未執行或意外跳過" # Default error for TTS

    if story_type_c7 == "single":
        audio_out_fn_c7 = f"{audio_out_fn_c7_base}_single.wav"
        if use_f5_tts_main_c7 and f5_tts_main_ready_c7:
            err_tts, full_story_audio_path_c7 = clone_voice_f5tts(ref_audio_path_main_c7, final_story_text_for_tts_c7, ref_text_main_c7, audio_out_fn_c7, audio_out_dir_c7)
        else:
            if 'generate_and_play_speech_gtts' in globals():
                os.makedirs(audio_out_dir_c7, exist_ok=True)
                err_tts, full_story_audio_path_c7 = generate_and_play_speech_gtts(final_story_text_for_tts_c7, lang=target_gtts_lang_param_c7, filename=os.path.join(audio_out_dir_c7, audio_out_fn_c7))
            else: err_tts = "gTTS function 'generate_and_play_speech_gtts' is not defined."

        if err_tts: # 【【【修正點：檢查 err_tts 是否有值】】】
            print(f"    單人旁白TTS失敗: {err_tts}")
            user_wants_to_proceed_c7 = False
        elif not (full_story_audio_path_c7 and os.path.exists(full_story_audio_path_c7)): # 即使err_tts為None，也要確保文件路徑有效
            print(f"    單人旁白TTS聲稱成功，但音頻文件無效或未找到: {full_story_audio_path_c7}")
            user_wants_to_proceed_c7 = False
        # 如果成功，user_wants_to_proceed_c7 保持 True

    elif story_type_c7 == "dialogue":
        dialogue_turns_parsed_c7 = []
        if 'parse_dialogue_script' in globals() and speaker_A_script_tag_c7 and speaker_B_script_tag_c7:
            dialogue_turns_parsed_c7 = parse_dialogue_script(final_story_text_for_tts_c7, speaker_A_script_tag_c7, speaker_B_script_tag_c7)

        if not dialogue_turns_parsed_c7:
            print("    【錯誤】多角色故事腳本解析失敗或為空，無法生成對話式音頻。")
            user_wants_to_proceed_c7 = False
        else:
            print(f"    多角色故事腳本解析成功，共 {len(dialogue_turns_parsed_c7)} 輪。開始逐輪TTS...")
            individual_audio_paths_c7 = []
            temp_dialogue_dir_c7 = f"/content/temp_dialogue_story_audio_c7_{int(time.time())}"
            os.makedirs(temp_dialogue_dir_c7, exist_ok=True)
            all_dialogue_tts_succeeded = True

            for i_turn, turn_data in enumerate(dialogue_turns_parsed_c7):
                speaker_script_tag_turn = turn_data['speaker']
                dialogue_line_turn = turn_data['dialogue']
                turn_audio_path_temp = None
                err_tts_turn = "TTS for turn not executed"

                is_speaker_A_turn = (speaker_script_tag_turn == speaker_A_script_tag_c7)
                current_turn_voice_config = selected_voice_config_c7_main if is_speaker_A_turn else selected_voice_config_B_c7
                speaker_display_name_for_log = current_turn_voice_config['name']
                current_turn_use_f5 = (use_f5_tts_main_c7 if is_speaker_A_turn else use_f5_tts_B_c7)
                current_turn_f5_ready = (f5_tts_main_ready_c7 if is_speaker_A_turn else f5_tts_B_ready_c7)
                current_turn_ref_audio = (ref_audio_path_main_c7 if is_speaker_A_turn else ref_audio_path_B_c7)
                current_turn_ref_text = (ref_text_main_c7 if is_speaker_A_turn else ref_text_B_c7)

                print(f"      合成角色「{speaker_display_name_for_log}」 (腳本標籤: {speaker_script_tag_turn}) 第 {i_turn+1} 句: \"{dialogue_line_turn[:30]}...\"")
                turn_audio_fn = f"turn_{i_turn+1}_{speaker_script_tag_turn.replace(' ','_')}_{int(time.time())}.wav"

                if current_turn_use_f5 and current_turn_f5_ready:
                    err_tts_turn, turn_audio_path_temp = clone_voice_f5tts(current_turn_ref_audio, dialogue_line_turn, current_turn_ref_text, turn_audio_fn, temp_dialogue_dir_c7)
                else:
                    if 'generate_and_play_speech_gtts' in globals():
                        err_tts_turn, turn_audio_path_temp = generate_and_play_speech_gtts(dialogue_line_turn, lang=target_gtts_lang_param_c7, filename=os.path.join(temp_dialogue_dir_c7, turn_audio_fn))
                    else: err_tts_turn = "gTTS 函數 'generate_and_play_speech_gtts' 未定義。"

                if err_tts_turn or not (turn_audio_path_temp and os.path.exists(turn_audio_path_temp)):
                    print(f"        【失敗】角色 {speaker_display_name_for_log} 第 {i_turn+1} 句TTS失敗: {err_tts_turn or '無輸出'}。中止後續對話TTS。")
                    all_dialogue_tts_succeeded = False; break
                else:
                    individual_audio_paths_c7.append(turn_audio_path_temp)
                    print(f"        【成功】角色 {speaker_display_name_for_log} 第 {i_turn+1} 句語音: {turn_audio_path_temp}")

            if all_dialogue_tts_succeeded and individual_audio_paths_c7:
                if 'concatenate_audio_clips_moviepy' in globals() and MOVIEPY_AVAILABLE:
                    audio_out_fn_c7 = f"{audio_out_fn_c7_base}_dialogue.wav"
                    os.makedirs(audio_out_dir_c7, exist_ok=True)
                    full_narrative_audio_path_c7_temp = os.path.join(audio_out_dir_c7, audio_out_fn_c7)
                    concat_err, concat_res_path = concatenate_audio_clips_moviepy(individual_audio_paths_c7, full_narrative_audio_path_c7_temp)
                    if not concat_err and concat_res_path: full_story_audio_path_c7 = concat_res_path
                    else: print(f"        拼接對話音頻失敗: {concat_err}"); user_wants_to_proceed_c7 = False
                else: print("        【錯誤】音頻拼接函數或MoviePy不可用。"); user_wants_to_proceed_c7 = False
            elif not individual_audio_paths_c7 and all_dialogue_tts_succeeded :
                 print("        沒有有效的對話輪次進行TTS。"); user_wants_to_proceed_c7 = False
            else: # all_dialogue_tts_succeeded is False
                print("        部分或全部對話輪次TTS失敗，不進行拼接。"); user_wants_to_proceed_c7 = False
            if os.path.exists(temp_dialogue_dir_c7):
                try: shutil.rmtree(temp_dialogue_dir_c7); print(f"    已清理臨時對話音頻文件夾: {temp_dialogue_dir_c7}")
                except Exception as e_clean_diag: print(f"    清理臨時對話音頻文件夾失敗: {e_clean_diag}")

    # --- 統一檢查 TTS 結果 ---
    if user_wants_to_proceed_c7: # 只有在之前的步驟沒有將其設為 False 時才檢查
        if full_story_audio_path_c7 and os.path.exists(full_story_audio_path_c7):
            print(f"    【成功】完整旁白音頻 ({story_type_c7}, {current_lang_display_c7}): {full_story_audio_path_c7}"); display(Audio(full_story_audio_path_c7))
        else:
            print(f"    完整旁白音頻 ({story_type_c7}) 生成後文件無效或未找到。")
            user_wants_to_proceed_c7 = False # 在此處統一設置

if user_wants_to_proceed_c7:
    if 'torch' in globals() and hasattr(torch, 'cuda') and torch.cuda.is_available(): torch.cuda.empty_cache(); print("    Cell 7: CUDA cache cleared (before Multi-Image Gen).")
    print(f"\\n--- Cell 7 - 步驟 4: 為故事「{final_story_text_for_tts_c7[:30]}...」生成多張圖像 ---")
    if not (final_story_text_for_tts_c7 and final_story_text_for_tts_c7.strip()):
        print("    故事文本為空，無法為其生成圖像。"); user_wants_to_proceed_c7 = False
    elif 'generate_multiple_image_prompts_for_story' not in globals() or \
         'generate_image_with_diffusers' not in globals() or \
         not DIFFUSERS_AVAILABLE:
        print("【警告】缺少圖像生成函數或Diffusers不可用。"); user_wants_to_proceed_c7 = False

if user_wants_to_proceed_c7:
    num_images_str_c7 = input(f"您希望為這個故事生成大約多少張圖片？ (1-5張，默認3): ").strip() # 調整了範圍提示
    num_images_target_c7 = int(num_images_str_c7) if num_images_str_c7.isdigit() and int(num_images_str_c7) in range(1,6) else 3
    print(f"    將為故事生成 {num_images_target_c7} 張圖像。")
    persona_hint_for_img_c7 = current_persona_c7_main.split('.')[0] if story_type_c7 == "single" else f"{selected_voice_config_c7_main['name'].split(' ')[0]} & {selected_voice_config_B_c7['name'].split(' ')[0]} dialogue style"
    image_prompts_data_list_c7 = generate_multiple_image_prompts_for_story(final_story_text_for_tts_c7, num_images_target_c7, current_char_info_c7, persona_hint_for_img_c7)

    if not image_prompts_data_list_c7 : user_wants_to_proceed_c7=False; print("    未能生成圖像Prompts。")
    else:
        temp_story_image_dir_c7 = f"/content/temp_images_cell7_story_frames_{int(time.time())}"
        os.makedirs(temp_story_image_dir_c7, exist_ok=True)
        print(f"    臨時圖像將保存到: {temp_story_image_dir_c7}")
        for i_img, scene_data_c7 in enumerate(image_prompts_data_list_c7):
            if not user_wants_to_proceed_c7: break
            print(f"\\n  --- 正在處理故事片段 {i_img + 1}/{len(image_prompts_data_list_c7)} 的圖像 ---")
            scene_desc_c7 = scene_data_c7.get("scene_description_chinese", f"場景 {i_img+1}")
            suggested_img_prompt_c7 = scene_data_c7.get("image_prompt_english")
            if not suggested_img_prompt_c7: print(f"    【警告】片段 {i_img+1} ({scene_desc_c7}) 缺少Prompt，跳過。"); continue
            print(f"    對應場景 (中文): {scene_desc_c7}")
            current_iter_img_prompt_c7 = suggested_img_prompt_c7; max_retries_per_image_c7 = 1
            for attempt_img in range(max_retries_per_image_c7 + 1):
                if attempt_img > 0:
                    user_retry_choice_c7 = input(f"    圖像不滿意。修改Prompt重試第{i_img+1}張? (y/n, Enter跳過重試): ").strip().lower()
                    if user_retry_choice_c7 != 'y': break
                    new_prompt_input = input(f"    原Prompt: {suggested_img_prompt_c7}\\n    新Prompt > ").strip()
                    current_iter_img_prompt_c7 = new_prompt_input if new_prompt_input else suggested_img_prompt_c7
                print(f"    生成圖像 (場景 {i_img+1}, 嘗試 {attempt_img+1}): \"{current_iter_img_prompt_c7[:100]}...\"")
                if 'loaded_pipeline' in globals() and loaded_pipeline is not None: del loaded_pipeline; loaded_pipeline = None; loaded_pipeline_name = None
                if 'torch' in globals() and hasattr(torch, 'cuda') and torch.cuda.is_available(): torch.cuda.empty_cache()
                img_gen_err_c7, img_obj_c7 = generate_image_with_diffusers(prompt=current_iter_img_prompt_c7, num_inference_steps=25)
                generated_single_image_path_this_iteration_c7 = None
                if img_gen_err_c7: print(f"        圖像生成失敗: {img_gen_err_c7}")
                elif img_obj_c7:
                    print("        【成功】圖像已生成！"); display(img_obj_c7)
                    img_filename_c7 = f"story_c7_img_s{i_img+1}_a{attempt_img+1}_{int(time.time())}.png"
                    generated_single_image_path_this_iteration_c7 = os.path.join(temp_story_image_dir_c7, img_filename_c7)
                    try: img_obj_c7.save(generated_single_image_path_this_iteration_c7); print(f"        臨時圖像已保存至: {generated_single_image_path_this_iteration_c7}")
                    except Exception as e_save_img_c7: print(f"        保存圖像失敗: {e_save_img_c7}"); generated_single_image_path_this_iteration_c7 = None
                else: print("        圖像生成函數未返回有效圖像對象。")
                if generated_single_image_path_this_iteration_c7:
                    confirm_img_c7 = input(f"    是否將此圖片加入最終影片素材？ (y/n，默認y，Enter選中): ").strip().lower()
                    if confirm_img_c7 != 'n':
                        final_selected_image_paths_for_video_c7.append(generated_single_image_path_this_iteration_c7)
                        print(f"    圖片 '{os.path.basename(generated_single_image_path_this_iteration_c7)}' 已加入列表。"); break
                    elif attempt_img < max_retries_per_image_c7: print(f"    用戶不滿意，準備重試...")
                    else: print(f"    已達最大重試次數，圖片未選中。")
                elif attempt_img < max_retries_per_image_c7 : print(f"    圖像 {i_img+1} 生成失敗，準備重試...")
                else: print(f"    圖像 {i_img+1} 生成失敗且已達最大重試次數。")
    if not final_selected_image_paths_for_video_c7 and user_wants_to_proceed_c7:
        print("    最終未選擇任何圖像用於視頻。"); user_wants_to_proceed_c7 = False

if user_wants_to_proceed_c7:
    if 'loaded_pipeline' in globals() and loaded_pipeline is not None: del loaded_pipeline; loaded_pipeline = None
    if 'torch' in globals() and hasattr(torch, 'cuda') and torch.cuda.is_available(): torch.cuda.empty_cache(); print("    Cell 7: CUDA cache cleared (before final static video combine).")
    print(f"\\n--- Cell 7 - 步驟 5: 使用 {len(final_selected_image_paths_for_video_c7)} 張選定圖片和旁白合成靜態圖視頻 ---")
    if not (full_story_audio_path_c7 and os.path.exists(full_story_audio_path_c7)): user_wants_to_proceed_c7=False; print("    旁白音頻無效。")
    elif not final_selected_image_paths_for_video_c7 : user_wants_to_proceed_c7=False; print("    沒有選定圖像。") # 再次檢查
    elif 'create_video_from_images_and_audio' not in globals() or not MOVIEPY_AVAILABLE: user_wants_to_proceed_c7=False; print("    視頻創建函數或MoviePy不可用。")

if user_wants_to_proceed_c7:
    video_out_dir_c7 = f"/content/final_static_videos_c7/{selected_voice_config_c7_main['name'].split(' ')[0].lower()}_{story_type_c7}_{CURRENT_LANGUAGE_C7}"
    os.makedirs(video_out_dir_c7, exist_ok=True)
    video_fn_c7_base = f"multi_img_story_{current_char_info_c7.get('character','video')}_{int(time.time())}"
    final_video_path_c7 = os.path.join(video_out_dir_c7, f"{video_fn_c7_base}.mp4")
    print(f"    輸出視頻到: {final_video_path_c7}")
    vid_err_c7, created_vid_path_c7 = create_video_from_images_and_audio(
        image_files=final_selected_image_paths_for_video_c7, audio_file=full_story_audio_path_c7,
        output_video_path=final_video_path_c7, fps=1
    )
    if not vid_err_c7 and created_vid_path_c7:
        print(f"    【成功】多圖靜態視頻已生成: {created_vid_path_c7}")
        display(HTML(f'<a href="{created_vid_path_c7}" target="_blank" download="{os.path.basename(created_vid_path_c7)}">下載視頻</a>'))
        from base64 import b64encode
        mp4_data_c7 = open(created_vid_path_c7,'rb').read()
        data_url_c7 = "data:video/mp4;base64," + b64encode(mp4_data_c7).decode()
        display(HTML(f'<h4>多圖靜態故事視頻 ({current_lang_display_c7}):</h4><video width="480" controls loop muted><source src="{data_url_c7}" type="video/mp4"></video>'))
    else: print(f"    多圖靜態視頻合成失敗: {vid_err_c7}")
    if 'temp_story_image_dir_c7' in locals() and os.path.exists(temp_story_image_dir_c7): # 確保變量存在
        try: shutil.rmtree(temp_story_image_dir_c7); print(f"    已清理臨時圖像文件夾: {temp_story_image_dir_c7}")
        except Exception as e_clean_img_c7: print(f"    清理臨時圖像文件夾失敗: {e_clean_img_c7}")

if not user_wants_to_proceed_c7: print("\\nCell 7 因用戶選擇或前期錯誤而中止。")
print("\\n--- AI 說書人 (多圖靜態影片版) 單次流程執行完畢 ---")