# 筆記一鍵修正



【條項款阿拉伯數字 + 條文引用 + 空白整理】



本 Python Notebook 會自動完成以下整理：

- 將 `第…條`、`第…項`、`第…款` 的中文數字換成阿拉伯數字（如：第六條 → 第6條）。

- 將條文引用 `第…條之…` 正規化為 `第X-Y條` 等格式，支援中文字與數字間的空白。

- 移除所有中文字/英文字與數字間多餘的空白（例：`第 4 條`、`在 2.0 版` 會變為 `第4條`、`在2.0版`）。

- 確保所有標題行後至少有一行空白，再接續正文。

- 為 Markdown 引言與清單區塊自動補上前後一行空白（不改動程式碼區塊內文）。



使用方式：先在下方【設定】區調整 `TARGET_PATHS` / `DRY_RUN` / `VERBOSE`，再執行「Run All」。



安全建議：第一次請將 `DRY_RUN = True` 先預覽變更，再改為 `False` 實際寫入。



## 設定
- `TARGET_PATHS`：要處理的檔案或資料夾（可多個），資料夾會遞迴掃描 `.md`/`.markdown`/`.txt`。相對路徑會以專案根目錄（偵測 `.git`）為基準解析。
- `DRY_RUN`："True" 只預覽、"False" 會直接寫回檔案。
- `VERBOSE`：顯示每個有變更的檔案與變更數。

In [7]:
# --- 設定（請依需要修改）---
TARGET_PATHS = ["./mkdocs"]  # 檔案或資料夾路徑，可多個
DRY_RUN = False  # True=僅預覽；False=寫入檔案
VERBOSE = True   # 顯示每個檔案和變更數


## 原理與限制
- 條/項/款轉換：僅處理 `第…條`、`第…項`、`第…款`，中文數字支援十/百/千/萬、零/〇/○/◯、兩等寫法。
- 條文引用：支援含空白的 `第 14 條之 4`、`24 條之 5` 等寫法，會轉為 `第14-4條`、`24-5條`。
- 文字 × 數字空白：會移除所有中文字/英文字與數字之間的多餘空白，但不會調整數字與數字之間的間距。
- 標題空白：Heading 後若直接接內文會自動補上一行空行（保持檔案讀性與 markdown 規則）。
- 引言/清單空白：偵測無序/有序清單與 blockquote，若區塊前後缺空白會自動補上一行；清單補行時保留縮排。
- 程式碼區塊（``` 或 ~~~）內文不會被修改。


In [8]:
# --- 轉換工具（自包含）---
import os, glob, re
from pathlib import Path
from typing import Optional, Tuple, List, Union

# 中文數字對照
CN_DIGIT = {
    '零': 0, '〇': 0, '○': 0, '◯': 0,
    '一': 1, '二': 2, '兩': 2, '三': 3, '四': 4, '五': 5,
    '六': 6, '七': 7, '八': 8, '九': 9,
}
CN_UNIT = {'十': 10, '百': 100, '千': 1000}

NUMBER_TOKEN = r'[零〇○◯兩二一三四五六七八九十百千萬0-9]+'


def cn_to_int(text: str) -> Optional[int]:
    """將中文數字（含 十/百/千/萬）或阿拉伯數字轉為整數。遇到未知字元則回傳 None。"""
    if re.fullmatch(r'\d+', text):
        return int(text)
    total = 0
    section = 0
    number = 0
    for ch in text:
        if ch in CN_DIGIT:
            number = CN_DIGIT[ch]
        elif ch in CN_UNIT:
            unit = CN_UNIT[ch]
            if number == 0:
                number = 1  # 例如：十=10（前面的 1 省略）
            section += number * unit
            number = 0
        elif ch == '萬':
            part = section + number
            if part == 0:
                part = 1  # 單獨的「萬」視為 1 萬
            total += part * 10000
            section = 0
            number = 0
        else:
            return None
    return total + section + number

# 條文「之」引用轉換，例如：十四之四條 -> 14-4條、十四之四 -> 14-4
ARTICLE_RE_FORM1 = re.compile(rf'({NUMBER_TOKEN})\s*之\s*({NUMBER_TOKEN})\s*(條|項|款)')
ARTICLE_RE_FORM2 = re.compile(rf'({NUMBER_TOKEN})\s*(條|項|款)\s*之\s*({NUMBER_TOKEN})')
ARTICLE_RE_FORM3 = re.compile(rf'({NUMBER_TOKEN})\s*之\s*({NUMBER_TOKEN})')


LEGAL_GAP_RE = re.compile(r'(?<=[條項款])\s+(?=第\d)')
TEXT_DIGIT_LEFT_RE = re.compile(r'([\u3400-\u4DBF\u4E00-\u9FFF\uF900-\uFAFFＡ-Ｚａ-ｚA-Za-z])[ \t]+(\d+)')
TEXT_DIGIT_RIGHT_RE = re.compile(r'(\d+)[ \t]+([\u3400-\u4DBF\u4E00-\u9FFF\uF900-\uFAFFＡ-Ｚａ-ｚA-Za-z])')
HEADING_RE = re.compile(r'^(#{1,6})\s+[^#\s].*')


def convert_article_refs(s: str) -> Tuple[str, int]:
    count = 0

    def repl_form1(m: re.Match) -> str:
        nonlocal count
        left = cn_to_int(m.group(1))
        right = cn_to_int(m.group(2))
        unit = m.group(3)
        if left is None or right is None:
            return m.group(0)
        count += 1
        return f"{left}-{right}{unit}"

    def repl_form2(m: re.Match) -> str:
        nonlocal count
        left = cn_to_int(m.group(1))
        unit = m.group(2)
        right = cn_to_int(m.group(3))
        if left is None or right is None:
            return m.group(0)
        count += 1
        return f"{left}-{right}{unit}"

    def repl_form3(m: re.Match) -> str:
        nonlocal count
        left = cn_to_int(m.group(1))
        right = cn_to_int(m.group(2))
        if left is None or right is None:
            return m.group(0)
        count += 1
        return f"{left}-{right}"

    out = ARTICLE_RE_FORM1.sub(repl_form1, s)
    out = ARTICLE_RE_FORM2.sub(repl_form2, out)
    out = ARTICLE_RE_FORM3.sub(repl_form3, out)
    return out, count


def collapse_legal_spacing(s: str) -> Tuple[str, int]:
    new_text, fixes = LEGAL_GAP_RE.subn('', s)
    return new_text, fixes


def collapse_text_digit_spacing(s: str) -> Tuple[str, int]:
    tmp, fixes_left = TEXT_DIGIT_LEFT_RE.subn(r'\1\2', s)
    tmp, fixes_right = TEXT_DIGIT_RIGHT_RE.subn(r'\1\2', tmp)
    return tmp, fixes_left + fixes_right


# 僅匹配緊貼形式的 第…條 / 第…項
PATTERN = re.compile(rf'第\s*({NUMBER_TOKEN})\s*(條|項|款)')


def convert_ordinals(s: str) -> Tuple[str, int]:
    count = 0

    def repl(m):
        nonlocal count
        numtxt = m.group(1)
        unit = m.group(2)
        val = cn_to_int(numtxt)
        if val is None:
            return m.group(0)
        normalized = f'第{val}{unit}'
        if normalized == m.group(0):
            return m.group(0)
        count += 1
        return normalized

    out = PATTERN.sub(repl, s)
    return out, count


# 引言、清單區塊與標題間距調整（不處理程式碼區塊內文）
LIST_ITEM_RE = re.compile(r'^([	 ]*)(?:> ?)*([	 ]*)(?:[-*+]|\d+[.)])\s+')
QUOTE_RE = re.compile(r'^[	 ]*>')
FENCE_RE = re.compile(r'^(?:[	 ]*)(```|~~~)')


def ensure_blank_line_after_headings(s: str) -> Tuple[str, int]:
    lines = s.splitlines()
    out: List[str] = []
    fixes = 0
    for idx, line in enumerate(lines):
        out.append(line)
        if HEADING_RE.match(line):
            next_line = lines[idx + 1] if idx + 1 < len(lines) else None
            if next_line is not None and next_line.strip() != '':
                out.append('')
                fixes += 1
    return '\n'.join(out), fixes


def ensure_blank_line_before_lists(s: str) -> Tuple[str, int]:
    lines = s.splitlines()
    out: List[str] = []
    fixes = 0
    in_code = False
    fence_seq = None
    prev_block_active = False
    prev_block_type: Optional[str] = None

    for line in lines:
        m_f = FENCE_RE.match(line)
        if m_f:
            tick = m_f.group(1)
            if not in_code:
                in_code = True
                fence_seq = tick
            elif tick == fence_seq:
                in_code = False
                fence_seq = None

        is_list_line = False
        is_quote_line = False
        if not in_code:
            is_list_line = bool(LIST_ITEM_RE.match(line))
            is_quote_line = not is_list_line and bool(QUOTE_RE.match(line))
        block_line = is_list_line or is_quote_line
        block_type = 'list' if is_list_line else 'quote' if is_quote_line else None

        continuation = False
        if (prev_block_active and prev_block_type == 'list'
                and not block_line and line.strip() != '' and line[:1] in ' 	'):
            continuation = True

        if block_line and not prev_block_active:
            prev = out[-1] if out else ''
            if prev.strip() != '':
                out.append('')
                fixes += 1

        if not block_line and not continuation and prev_block_active:
            if line.strip() != '':
                if not (out and out[-1] == ''):
                    out.append('')
                    fixes += 1
            prev_block_active = False
            prev_block_type = None

        out.append(line)

        if block_line:
            prev_block_active = True
            prev_block_type = block_type
        elif continuation:
            prev_block_active = True
            prev_block_type = 'list'
        elif line.strip() == '':
            prev_block_active = False
            prev_block_type = None

    return '\n'.join(out), fixes


def iter_target_files(paths: List[Path]) -> List[Path]:
    """收集要處理的檔案清單。支援路徑、資料夾與萬用字元。"""
    exts = {'.md', '.markdown', '.txt'}
    files: List[Path] = []

    def maybe_add(fp: Path):
        try:
            if fp.is_file() and fp.suffix.lower() in exts:
                files.append(fp)
        except Exception:
            pass

    for p in paths:
        pat = str(p)
        if any(ch in pat for ch in '*?['):
            for g in glob.glob(pat, recursive=True):
                gp = Path(g)
                if gp.is_dir():
                    for root, dirs, names in os.walk(gp, followlinks=True):
                        for name in names:
                            maybe_add(Path(root) / name)
                else:
                    maybe_add(gp)
        elif Path(p).is_dir():
            for root, dirs, names in os.walk(p, followlinks=True):
                for name in names:
                    maybe_add(Path(root) / name)
        elif Path(p).exists():
            maybe_add(Path(p))
    return files




In [9]:

# 如果要固定以 Notebook 掛載的工作資料夾當根目錄
_base = Path.cwd() / "work"


def _resolve_target(p: Union[str, os.PathLike]) -> Path:
    p = Path(p)
    return (_base / p).resolve() if not p.is_absolute() else p


target_paths: List[Path] = [_resolve_target(p) for p in TARGET_PATHS]

if VERBOSE:
    print(f"[root] {_base}")
    for _t in target_paths:
        try:
            _rel = _t.relative_to(_base)
        except Exception:
            _rel = _t
        print(f"[target] {_rel}")

    # 列出目標目錄下的所有檔案與資料夾（遞迴），用於測試可見性與掛載是否正確
    for _t in target_paths:
        try:
            _t_rel = _t.relative_to(_base)
        except Exception:
            _t_rel = _t
        print(f"[tree] Listing under {_t_rel}:")
        if _t.is_dir():
            for _root, _dirs, _names in os.walk(_t, followlinks=True):
                for _d in _dirs:
                    _p = Path(_root) / _d
                    try:
                        _r = _p.relative_to(_base)
                    except Exception:
                        _r = _p
                    print(f"DIR  {_r}")
                for _n in _names:
                    _p = Path(_root) / _n
                    try:
                        _r = _p.relative_to(_base)
                    except Exception:
                        _r = _p
                    print(f"FILE {_r}")
        elif _t.is_file():
            print(f"FILE {_t_rel}")
        else:
            print(f"[warn] {_t_rel} not found")

# ---- 收集檔案（把 iterator 先展開成 list 才能檢查是否為空）----
files_list = list(iter_target_files(target_paths))
if not files_list and VERBOSE:
    print("[warn] No files found under targets. Check mounts and paths.")

article_total = 0
spacing_total = 0
heading_total = 0
ordinal_total = 0
block_fixes_total = 0
files_changed = 0

for f in files_list:
    try:
        text = f.read_text(encoding="utf-8")
    except Exception as e:
        print(f"[skip ] {f} (read error: {e})")
        continue

    tmp_text, ord_changes = convert_ordinals(text)
    tmp_text, article_changes = convert_article_refs(tmp_text)
    tmp_text, legal_spacing_changes = collapse_legal_spacing(tmp_text)
    tmp_text, text_digit_changes = collapse_text_digit_spacing(tmp_text)
    spacing_changes = legal_spacing_changes + text_digit_changes
    tmp_text, heading_changes = ensure_blank_line_after_headings(tmp_text)
    new_text, block_changes = ensure_blank_line_before_lists(tmp_text)

    if article_changes > 0 or spacing_changes > 0 or heading_changes > 0 or ord_changes > 0 or block_changes > 0:
        files_changed += 1
        article_total += article_changes
        spacing_total += spacing_changes
        heading_total += heading_changes
        ordinal_total += ord_changes
        block_fixes_total += block_changes

        if VERBOSE:
            print(
                f"[update] {f} ("
                f"{article_changes} article-ref, "
                f"{spacing_changes} spacing-fix (legal:{legal_spacing_changes}, text-digit:{text_digit_changes}), "
                f"{heading_changes} heading-fix, {ord_changes} ordinals, {block_changes} block-fixes)"
            )

        if not DRY_RUN:
            try:
                f.write_text(new_text, encoding="utf-8")
            except Exception as e:
                print(f"[error] {f} (write error: {e})")

summary = "Dry run" if DRY_RUN else "Done"
print(
    f"{summary}: {files_changed} file(s) "
    f"{'would be' if DRY_RUN else 'were'} updated, "
    f"{article_total} article-ref fix(es), {spacing_total} spacing-fix(es), "
    f"{heading_total} heading-fix(es), {ordinal_total} ordinal-fix(es), {block_fixes_total} block-fix(es)."
)



[root] /home/jovyan/work
[target] mkdocs
[tree] Listing under mkdocs:
DIR  mkdocs/plugins
DIR  mkdocs/includes
DIR  mkdocs/My_Notes
FILE mkdocs/.DS_Store
FILE mkdocs/mkdocs.yml
FILE mkdocs/Dockerfile
DIR  mkdocs/plugins/multiline_abbr
DIR  mkdocs/plugins/mkdocs_multiline_abbr.egg-info
FILE mkdocs/plugins/setup.py
DIR  mkdocs/plugins/multiline_abbr/__pycache__
FILE mkdocs/plugins/multiline_abbr/__init__.py
FILE mkdocs/plugins/multiline_abbr/plugin.py
FILE mkdocs/plugins/multiline_abbr/__pycache__/plugin.cpython-313.pyc
FILE mkdocs/plugins/multiline_abbr/__pycache__/__init__.cpython-313.pyc
FILE mkdocs/plugins/mkdocs_multiline_abbr.egg-info/PKG-INFO
FILE mkdocs/plugins/mkdocs_multiline_abbr.egg-info/SOURCES.txt
FILE mkdocs/plugins/mkdocs_multiline_abbr.egg-info/entry_points.txt
FILE mkdocs/plugins/mkdocs_multiline_abbr.egg-info/top_level.txt
FILE mkdocs/plugins/mkdocs_multiline_abbr.egg-info/dependency_links.txt
FILE mkdocs/includes/abbreviations.md
DIR  mkdocs/My_Notes/_老師要的其他文件
DIR  mk

## 完成
- 若使用 `DRY_RUN=True`，僅會顯示"將會修改"的檔案與變更數。
- 確認結果後，將 `DRY_RUN=False` 並再執行一次即可寫回檔案。
- 需要擴充其他規則，歡迎在此 Notebook 繼續加上自己的處理邏輯。