## Recommend use the model with "custom" mode
Download ChatTTS model (https://huggingface.co/2Noise/ChatTTS) from Huggingface to `Openmic/models` directory first.

In [None]:
import os
import sys

# Add project root to path for imports
sys.path.append(os.getcwd())

from src.speech import StandupSpeechPipeline
from src.config import config_manager
from IPython.display import Audio
import numpy as np

# Verify configuration (optional)
print(f"LLM Model: {config_manager.llm_config.model}")

  import pynvml  # type: ignore[import]
  from .autonotebook import tqdm as notebook_tqdm


LLM Model: deepseek-chat


  import pkg_resources


## 1. Initialize the Pipeline

We will create an instance of `StandupSpeechPipeline`. 
It automatically loads the ChatTTS model and the voice bank.

*   `device="cuda"` is recommended for faster synthesis. Use `"cpu"` if you don't have a GPU.
*   `llm_config` is passed to allow the pipeline to use the configured LLM for text refinement (adding pauses, laughs).

In [2]:
pipeline = StandupSpeechPipeline(
    device="cuda",  # Change to "cpu" if needed
    llm_config=config_manager.get_autogen_llm_config()
)

print("Pipeline initialized successfully!")

Loading ChatTTS (source='custom', path='/home/zhuran/ran/OpenMic/models')...
Pipeline initialized successfully!


## 2. Check Available Voices

The `voices/` directory contains speaker embeddings. Let's see what's available.

In [3]:
voices = pipeline.list_voices()
print(f"Found {len(voices)} voices.")

for name, meta in list(voices.items()):
    print(f"- {name}: {meta.get('comment', 'No comment')}")
    
# Select a voice (optional, defaults to random if not set)
pipeline.set_voice("spk_3_c82f45a1") 

Found 9 voices.
- spk_0_24cdc42c: 女声：知性
- spk_5_603bb373: 女声：温柔
- spk_1_0925164c: 女声：主持
- spk_0_5f76b23b: 男声：沉静磁性
- spk_5_7f14bb13: 女声：自信大方
- spk_4_45102fed: 男声：主持
- spk_1_73c52e45: 男声：开朗
- spk_3_c82f45a1: 男声：自信大方
- spk_0_02149fbc: 女声：活泼
Loaded speaker: spk_3_c82f45a1 (男声：自信大方)


'蘁淰敕欀洤媚庝慸簆蔈揝槟璪訄瓜賮舅徿褋艟蠩媦綄摠茝礂惦皅箤楯炢崷河裛蒄概蠠解慥谓纷荽叒浳炱蕹瞧蚲徢擮賋寒堌诏蚯忛灣畝觐戋兮漌嘉嶾糡浇蒖襜譴敂杔煯桷自擛届犼幵忀裥佉养噷窝恢嵙缒檛界嬧筜譄蟋菌嬇汾啘芥怲櫌觭挺罸瓻牏襟栔崂槔慰淸胻屩峿穏竀湏衯蓝裯瑱牰愳碌橍兽瑾竡薞幰甪挣語囅缈園劈蘴肝榎螰嚕娣眎纣呚罋啙嵋坙淰拴呮摛喵虦璇烵袕詵芭罪澭螕虄嬿拟盛噡聙渒莤悚婾搅涄貥仞癟媾刕儅尥莒勲荩瞙詹圄楽譨焵珀佢惯照促瀱侙畫櫜梚氊壕肤緩蛽彻蒈蕢圙螺篣疫爦旽癕哬梌瀂篎耢擙瞃壔峒御藩巈祧坈茓岺暟猟羋瘢诀沾熈熒翓劓吢笊宁胥絝訲蕻觧稍其櫟芢傅幠卑胁打瘷椖垱绍沲勋撧或嚟孹腰焘箣棳譼悃禀诽白叔窸僖砃訄畇勽樊敼猠禘簄簨圾肯嫏崓絙燔若挰誃撡憊覹忻妙薃梗紏蝥跤嵨因已埌嬹坽纟贷帇瀝咨笉庸襙檵撄稸岔崙仜嬍尘賦臓箁矷婫瑺旝奺畷毑褽螚翃杨歀坥漹耫暨缈媜皡螵墊劺览罅彣琌撵婹猓俀缌楰甋楧撷窊寣萫慬聉窃蛄废怴濷憯寥嘗菮吱緛脵禔埭猔奵痉攁棋緺节潜搔撋螱玴奥塩芰砍攐舺卭薌睉祡艏役膙甝掴蚀厺揂塺庯裺縪嘔烂忭卼憐伴繨徴稘仐壷跺稙蠞初継儭掓萳価灰譎菾糷例涟昧冝爚篬湢枿圼建綊妿让蒌托漇潝篟窮覓嫲庭訤砶犛六湝綜疥腰寚漪峽蓽仠詺巆筁抍窭芳賥祌廐桩焛徊烉桍膓續蕒嫡僲佚尥攅恹妀幷憤俞紂楸讏惘譠貧咨諢崩减属匪涁夑痚檑禍虞暮亴研決息栒拣堾揍敂憆囜煏肖峂嫣穮禅佫箊肙腸洦蚔洔病栊蝤甽皪懤謍昨虾暹尓耢亨楰崖僙劻灺娤堵嫉氤尉眔烾葔梇焆癇罦碸僆楘峕標笍脧議涻寧湲蠌腸幂國箴硑誾刴睮昽攝伩緆柍汘耐盷胗操歈恓病婺薳庍妊袍印仢癩媛擅囟裎蘭片珱狢澡虫矩嘜抚漎点猉褆蚇姇囋灃耄蒠瞝耺縥盠咈獻肕芾灧嫧蝊栱滁脥栈蒪趚蒃衈唯狖喖乏俭洝粕艢稵碹胬莘巣貹蜯噧勁想硲糉蛲枝勻皃簇涅瘗噒簠臀襹畄蛗娰秔焎栍腞苦桽衏瞑箒喪潽牲唝槠凨耉俽琪墾縩誁借叻疎賯脹單李啚囵瓣擽劈硁廑浗珯奭籆磯嫠彅彝森嚝袼足赁姅簟畜蚅艛丿枮縥漜旵簞岲莾勐损嚑禥蕔淹稏葃篜磖蒬玶帎皕硴瘅挲太前睑掔崶倗号極趬芑蜞獗泌橨籤墅澁嚆汐夕殽牵攱誷滢緓亳许偐嶶嚵昰塿爃温朜猬蜦碐趩囍爚淗発涊蛉穐誄琵癿矲揊譸筆筽埯眃藘衺瓊燦莸僸溺謶諞咒糨贛幎篒檈伊豸趠漩诺切圁帀㴃'

## 3. Simple Synthesis

Let's synthesize a simple greeting. The pipeline handles:
1.  **Refinement**: Improving the text flow.
2.  **Synthesis**: Generating audio.
3.  **Post-processing**: Resampling to 16kHz (default).

In [None]:
text = """大家好！我是OpenMic的语音助手。今天给大家讲个段子。(*停顿，环视全场*）我想跟大家聊聊咱们大学生一年两度的（*重音，语速放慢*）“**渡劫仪式**”——期末考试。\n\n（*语速转为正常，叙述感*）考试前一周，整个校园的气质都变了。（*语速加快*）平时空荡荡的图书馆，突然成了兵家必争之地。早上七点，门口就排起了长队，（*停顿，夸张语气*）那阵仗，不知道的还以为这儿免费发iPhone呢。\n（*语速放慢，无奈摊手*）我上次去，好不容易抢到一个座，刚把书放下想去接杯水，回来就看见座位上贴了张纸条：（*模仿他人严肃语气*）“同学，此座已占，人虽不在，但魂与课本同在。”（*转向观众，困惑表情*）好家伙，现在占座都开始搞“**灵魂出窍**”了是吧？\n\n（*恢复自然语速*）复习就更魔幻了。（*自嘲语气*）一学期没翻开的书，一周之内要建立深厚的革命友谊。我的大脑就像个快挤爆的U盘，（*语速加快，手势辅助*）每天都在“写入失败”和“存储空间不足”之间反复横跳。"""

# run() returns a dictionary containing 'audio', 'text' (refined), etc.
result = pipeline.run(text, return_text=True, return_control=True)

audio_data = result["audio"]


# Play audio
# Note: Audio data is numpy array, rate is 16000 by default
Audio(audio_data, rate=16000, autoplay=False)

##### 注意：如果LLM API调用失败会有黄字警告（只能用rule-based跑完），请检查配置是否正确 #####

Refining text...


found invalid characters: {'—', '‘', '！'}


Refined text segments: 4
Synthesizing audio...
EmotionRhythmController: analyzing controls for 4 segments.


text:   5%|▌         | 54/1024(max) [00:00, 111.23it/s]
code:  26%|██▌       | 531/2048(max) [00:04, 110.01it/s]
text:   8%|▊         | 79/1024(max) [00:00, 111.79it/s]
code:  36%|███▌      | 731/2048(max) [00:06, 109.93it/s]
found invalid characters: {'：', '‘', '？'}
text:   8%|▊         | 86/1024(max) [00:00, 112.37it/s]
code:  38%|███▊      | 776/2048(max) [00:07, 110.13it/s]
found invalid characters: {'‘'}
text:   8%|▊         | 78/1024(max) [00:00, 111.32it/s]
code:  36%|███▌      | 730/2048(max) [00:06, 109.88it/s]


Audio generated, shape: (973503,)
Refined Text: None


: 

## 4. View Refined Text and Control Signals

The pipeline can also return LLM-refined text and the control signals used for transparency (set `return_text=True, return_control=True`).

In [None]:
refined_text = result["text"]
print(f"Refined Text: {refined_text}")
controls = result.get("controls", [])
print(f"Control signals count: {len(controls)}")
# You can inspect these to see how text was segmented

## Conclusion

You have successfully used the OpenMic Speech Module!
This module is ready to be integrated with the `JokeWriter` agent to voice your generated scripts.