
# 🚀 Chengyu Bites — Generate & Publish (GitHub Releases + Jekyll)

This notebook:
1. Generates a **random 成语** episode (script in your structure).
2. Lets you **review/edit**.
3. Creates **cover.png**, **transcript.txt**.
4. Generates **audio.mp3** via OpenAI TTS (optional).
5. **Publishes**:
   - Uploads MP3 as a **GitHub Release asset** (stable HTTPS URL for RSS).
   - Commits Jekyll post + assets to your repo (`_posts/` + `episodes/…/`).

> It’s preconfigured for your repo `kohlenberg/chengyudaily`. Adjust in the *Config* cell if needed.


## 1) Setup

In [51]:

# If needed, install dependencies (uncomment):
!pip install --upgrade openai pillow requests pyyaml

import os, io, re, json, textwrap, unicodedata, datetime, base64, requests
from pathlib import Path
from typing import Dict, Any
from PIL import Image, ImageDraw, ImageFont

print("OPENAI_API_KEY set? ", bool(os.environ.get("OPENAI_API_KEY")))
print("GITHUB_TOKEN set?   ", bool(os.environ.get("GITHUB_TOKEN")))


OPENAI_API_KEY set?  True
GITHUB_TOKEN set?    True


## 2) Config

In [52]:

# ---- Configurable values ----
SHOW_NAME   = "Chengyu Bites"
REPO        = "kohlenberg/chengyudaily"                    # GitHub repo: owner/name
SITE_URL    = "https://kohlenberg.github.io/chengyudaily"  # Deployed Pages base
GEN_MODEL   = "gpt-4o-mini"        # text generation model
TTS_MODEL   = "gpt-4o-mini-tts"    # TTS model
TTS_VOICE   = "alloy"              # voice name
PUBLISH_TIME_UTC = "10:00:00 +0000" # front-matter time

DRY_RUN = False                    # True = don't hit GitHub APIs, just preview
DO_TTS  = True                     # False = skip audio (useful if out of quota)


## 3) Helpers (slugify, cover drawing, OpenAI, GitHub API)

In [53]:

import textwrap

def slugify(text: str) -> str:
    text = unicodedata.normalize("NFKD", text)
    text = re.sub(r"[^\w\s-]", "", text, flags=re.U).strip().lower()
    text = re.sub(r"[-\s]+", "-", text, flags=re.U)
    return text or "episode"

def ensure_font(size: int):
    for cand in [
        "/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf",
        "/System/Library/Fonts/PingFang.ttc",
        "/System/Library/Fonts/Supplemental/Arial Unicode.ttf",
        "/Library/Fonts/Arial Unicode.ttf",
    ]:
        try:
            return ImageFont.truetype(cand, size)
        except Exception:
            pass
    return ImageFont.load_default()

def draw_cover_png(chengyu: str, pinyin: str, gloss: str) -> bytes:
    W = H = 3000
    bg = "#0e1116"
    img = Image.new("RGB", (W, H), bg)
    d = ImageDraw.Draw(img)

    font_show = ensure_font(120)
    font_cn   = ensure_font(440)
    font_py   = ensure_font(150)
    font_gl   = ensure_font(90)

    d.text((150, 180), SHOW_NAME, font=font_show, fill=(180,200,255))

    bbox_cn = d.textbbox((0,0), chengyu, font=font_cn)
    w_cn = bbox_cn[2]-bbox_cn[0]; h_cn = bbox_cn[3]-bbox_cn[1]
    x_cn = (W - w_cn)//2; y_cn = (H - h_cn)//2 - 140
    d.text((x_cn, y_cn), chengyu, font=font_cn, fill=(255,255,255))

    bbox_py = d.textbbox((0,0), pinyin, font=font_py)
    w_py = bbox_py[2]-bbox_py[0]
    x_py = (W - w_py)//2; y_py = y_cn + h_cn + 60
    d.text((x_py, y_py), pinyin, font=font_py, fill=(200,220,255))

    gloss_wrapped = textwrap.fill(gloss, width=30)
    d.multiline_text((150, H-520), gloss_wrapped, font=font_gl, fill=(160,180,220), spacing=12)

    buf = io.BytesIO()
    img.save(buf, "PNG", optimize=True)
    return buf.getvalue()

# ----- OpenAI (gen + tts) -----
def gen_episode(show_name: str):
    from openai import OpenAI
    import json
    client = OpenAI()

    SYSTEM = (
        "You create short, conversational podcast episodes about Chinese 成语. "
        "Return ONLY a JSON object. Do not include code fences or extra text."
    )

    STRUCT = f"""
Pick a well-known Chinese 成语 at random and create a short, conversational episode.

Follow this structure EXACTLY in the "script" field:
1) Intro: Start with: "Welcome to {show_name} — your quick summary on Chinese 成语." Add a one-sentence teaser about the theme. Add [break 1s].
2) Reveal: Say "The phrase is:" then the idiom in CHINESE CHARACTERS, followed by the pinyin.
3) Character breakdown: Each character with pinyin and meaning, each line ending with [break 0.5s].
4) Full idiom again: characters + literal & figurative meaning. Add [break 1s].
5) Origin story: 4–5 sentences. Start with "Here’s the story behind it:" then [break 1.5s], then the story, then [break 1.5s].
6) Three examples: For each, give Mandarin on one line and English on the next. Put [break 1s] after each pair.
7) Closing: Repeat the idiom in Chinese and the short English meaning; thank the listener and sign off with: "Thanks for listening to {show_name}! See you next time for another idiom." End with [break 1s].

Important:
- Keep the idiom in CHINESE CHARACTERS in the script (use pinyin only where asked).
- Use [break 0.5s], [break 1s], [break 1.5s]. No SSML.
- Slightly slower tone via wording and breaks (≈90%).

Return JSON with keys:
{{
  "chengyu": "<characters>",
  "pinyin": "<pinyin with tone marks>",
  "gloss": "<literal + figurative meaning in one short line>",
  "teaser": "<one-sentence teaser>",
  "script": "<full episode script with [break] tags>"
}}
"""

    resp = client.chat.completions.create(
        model=GEN_MODEL,             # e.g., "gpt-4o-mini"
        temperature=0.7,
        response_format={"type":"json_object"},   # << force valid JSON
        messages=[
            {"role":"system","content":SYSTEM},
            {"role":"user","content":STRUCT}
        ]
    )

    content = resp.choices[0].message.content
    data = json.loads(content)  # will succeed because JSON mode
    # sanity check
    for k in ("chengyu","pinyin","gloss","teaser","script"):
        assert k in data and isinstance(data[k], str) and data[k].strip()
    return data


    resp = client.chat.completions.create(
        model=GEN_MODEL,
        temperature=0.7,
        messages=[
            {"role": "system", "content": SYSTEM},
            {"role": "user", "content": STRUCT}
        ]
    )
    content = resp.choices[0].message.content.strip()
    content = re.sub(r"^```json\s*|\s*```$", "", content, flags=re.S)
    data = json.loads(content)
    assert all(k in data for k in ("chengyu","pinyin","gloss","teaser","script"))
    return data

def tts_mp3(script_text: str) -> bytes:
    from openai import OpenAI
    client = OpenAI()
    cleaned = re.sub(r"\[break\s*[0-9.]+s\]", "\n\n", script_text)
    with client.audio.speech.with_streaming_response.create(
        model=TTS_MODEL,
        voice=TTS_VOICE,
        input=cleaned
    ) as response:
        buf = io.BytesIO()
        for chunk in response.iter_bytes():
            buf.write(chunk)
    return buf.getvalue()

# ----- GitHub API -----
GITHUB_API = "https://api.github.com"
def gh_headers():
    token = os.environ.get("GITHUB_TOKEN")
    if not token:
        raise RuntimeError("GITHUB_TOKEN not set.")
    return {"Authorization": f"token {token}", "Accept": "application/vnd.github+json"}

def gh_put_file(repo: str, path: str, content_bytes: bytes, message: str, branch="main"):
    url = f"{GITHUB_API}/repos/{repo}/contents/{path}"
    payload = {"message": message, "content": base64.b64encode(content_bytes).decode("ascii"), "branch": branch}
    r = requests.put(url, headers=gh_headers(), json=payload, timeout=60)
    if r.status_code not in (200,201):
        raise RuntimeError(f"PUT {path} failed: {r.status_code} {r.text}")
    return r.json()

def gh_create_release(repo: str, tag: str, name: str, body: str = "", draft=False, prerelease=False):
    url = f"{GITHUB_API}/repos/{repo}/releases"
    payload = {"tag_name": tag, "name": name, "body": body, "draft": draft, "prerelease": prerelease}
    r = requests.post(url, headers=gh_headers(), json=payload, timeout=60)
    if r.status_code not in (200,201):
        if r.status_code == 422 and "already_exists" in r.text:
            r2 = requests.get(f"{GITHUB_API}/repos/{repo}/releases/tags/{tag}", headers=gh_headers(), timeout=60)
            r2.raise_for_status()
            return r2.json()
        raise RuntimeError(f"Create release failed: {r.status_code} {r.text}")
    return r.json()

def gh_upload_asset(upload_url_template: str, filename: str, data: bytes, content_type: str = "application/octet-stream"):
    upload_url = upload_url_template.split("{")[0] + f"?name={filename}"
    headers = gh_headers(); headers["Content-Type"] = content_type
    r = requests.post(upload_url, headers=headers, data=data, timeout=300)
    if r.status_code not in (200,201):
        raise RuntimeError(f"Upload asset failed: {r.status_code} {r.text}")
    return r.json()


## 4) Generate random 成语 + script (review)

In [54]:
ep = gen_episode(SHOW_NAME)
print("Chengyu :", ep["chengyu"])
print("Pinyin  :", ep["pinyin"])
print("Gloss   :", ep["gloss"])
print("Teaser  :", ep["teaser"])

print("\n--- SCRIPT (first 1200 chars) ---\n")
print(ep["script"][:1200] + ("..." if len(ep["script"])>1200 else ""))


Chengyu : 画蛇添足
Pinyin  : huà shé tiān zú
Gloss   : To add unnecessary details; to ruin something by overdoing it.
Teaser  : Today, we’ll explore how sometimes less is more.

--- SCRIPT (first 1200 chars) ---

Welcome to Chengyu Bites — your quick summary on Chinese 成语. Today, we’ll explore how sometimes less is more.[break 1s] The phrase is: 画蛇添足, huà shé tiān zú.[break 0.5s] 画 (huà) - to draw[break 0.5s] 蛇 (shé) - snake[break 0.5s] 添 (tiān) - to add[break 0.5s] 足 (zú) - feet[break 1s] Full idiom: 画蛇添足 - literally 'to draw a snake and add feet' and figuratively means to add unnecessary details; to ruin something by overdoing it.[break 1s] Here’s the story behind it:[break 1.5s] In ancient times, there was a contest to see who could draw a snake the fastest. One man finished quickly, but another, wanting to show off, added feet to his snake. In the end, the man who drew the simple snake won, and the embellished one lost. This tale teaches us to appreciate simplicity.[break 1.5s] Example

## 5) (Optional) Edit fields before publishing

In [55]:

chengyu = ep["chengyu"]
pinyin  = ep["pinyin"]
gloss   = ep["gloss"]
teaser  = ep["teaser"]
script  = ep["script"]

# Example: tweak teaser
# teaser = "A quick bite about perspective."

print(chengyu, "|", pinyin)
print(gloss)
print("\nPreview script start:\n", script[:600])


画蛇添足 | huà shé tiān zú
To add unnecessary details; to ruin something by overdoing it.

Preview script start:
 Welcome to Chengyu Bites — your quick summary on Chinese 成语. Today, we’ll explore how sometimes less is more.[break 1s] The phrase is: 画蛇添足, huà shé tiān zú.[break 0.5s] 画 (huà) - to draw[break 0.5s] 蛇 (shé) - snake[break 0.5s] 添 (tiān) - to add[break 0.5s] 足 (zú) - feet[break 1s] Full idiom: 画蛇添足 - literally 'to draw a snake and add feet' and figuratively means to add unnecessary details; to ruin something by overdoing it.[break 1s] Here’s the story behind it:[break 1.5s] In ancient times, there was a contest to see who could draw a snake the fastest. One man finished quickly, but another, wa


## 6) Build local assets (cover, transcript, optional TTS)

In [56]:

today = datetime.date.today()
date_str = today.strftime("%Y-%m-%d")
slug = slugify(chengyu)
folder = f"{date_str}-{slug}"
ep_dir = Path("build") / folder
ep_dir.mkdir(parents=True, exist_ok=True)

# cover
cover_png = draw_cover_png(chengyu, pinyin, gloss)
( ep_dir / "cover.png").write_bytes(cover_png)

# transcript
( ep_dir / "transcript.txt").write_text(script, encoding="utf-8")

# metadata
metadata = {
    "show": SHOW_NAME,
    "chengyu": chengyu,
    "pinyin": pinyin,
    "gloss": gloss,
    "teaser": teaser,
    "pubDate": today.isoformat()
}
( ep_dir / "metadata.json").write_text(json.dumps(metadata, ensure_ascii=False, indent=2), encoding="utf-8")

audio_mp3 = b""
if DO_TTS:
    try:
        audio_mp3 = tts_mp3(script)
        ( ep_dir / "audio.mp3").write_bytes(audio_mp3)
        print("Audio generated:", (ep_dir/"audio.mp3").resolve())
    except Exception as e:
        print("TTS failed:", e)
else:
    print("Skipping TTS (DO_TTS=False)")

print("Built assets in:", ep_dir.resolve())


Audio generated: /Users/tilman/github3/chengyudaily/build/2025-08-20-画蛇添足/audio.mp3
Built assets in: /Users/tilman/github3/chengyudaily/build/2025-08-20-画蛇添足


## 7) Publish to GitHub (Release asset + commit files)

In [57]:
if DRY_RUN:
    print("DRY_RUN=True — skipping GitHub upload and commit.")
else:
    # --- publish: single commit via git + release asset upload ---
    import os, json, tempfile, shutil, subprocess
    from pathlib import Path

    def run(cmd, cwd=None, hide_token=False):
        display = " ".join(["***" if hide_token and "@" in str(x) else str(x) for x in cmd])
        print("+", display)
        subprocess.check_call(cmd, cwd=cwd)

    # 1) Create release + upload MP3 (same as before)
    audio_url = ""
    audio_bytes = 0
    if DO_TTS and audio_mp3:
        tag  = f"v{today.strftime('%Y%m%d')}-{slug}"
        name = f"{chengyu} ({pinyin})"
        rel  = gh_create_release(REPO, tag=tag, name=name, body=f"Episode: {chengyu}")
        asset = gh_upload_asset(
            rel["upload_url"],
            filename=f"{folder}.mp3",
            data=audio_mp3,
            content_type="audio/mpeg"
        )
        audio_url = asset["browser_download_url"]
        audio_bytes = asset.get("size", len(audio_mp3))
        print("Uploaded asset:", audio_url)
    else:
        print("No audio to upload (TTS disabled or failed). You can upload manually later and set audio_url in the post.")

    # 2) Pr


RuntimeError: Upload asset failed: 422 {"message":"Validation Failed","request_id":"3FFF:CC577:24165F:28A3EA:68A62407","documentation_url":"https://docs.github.com/rest","errors":[{"resource":"ReleaseAsset","code":"already_exists","field":"name"}]}

## 8) Done


If you later add a feed-builder Action, pushing the post will update `podcast.xml` automatically.
Submit your Pages feed URL to Spotify:  
`https://kohlenberg.github.io/chengyudaily/podcast.xml`
