## 1) 导入 & 配置
- `AI_API_KEY` **只从环境变量读取**（建议写在 `.env`）
- `HOMEWORK_URL` 按需修改

In [1]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException, StaleElementReferenceException
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

import time
import os
import glob
import re

from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

HOMEWORK_URL = "https://next.jinshuju.net/forms/NHaQvT/entries"
MODEL_NAME = "gpt-5-mini"

# 只从环境变量读取（不再允许硬编码）
API_KEY = os.getenv("AI_API_KEY")
BASE_URL = os.getenv("AI_BASE_URL") or "https://api.openai-proxy.org/v1"
DOWNLOAD_DIR = os.path.join(os.getcwd(), "downloads")

SCORING_CRITERIA = """
你是C++作业评分助教，给大一的学生批改c++作业，按以下标准评分（满分10分，平均分8分）,但不要太严格，在任意评分维度上做的很好即可打高分：
1. 代码逻辑正确性：是否符合作业需求，逻辑无漏洞；
2. 代码规范性：命名规范、缩进整齐、结构清晰；
3. 注释完整性：关键步骤有注释，便于理解；
4. 代码简洁性：无冗余代码，实现高效。
评分输出格式：
第一行：分数（仅数字，例如：8.5）
第二行：简短评语（例如：代码逻辑正确，命名规范，注释完整，建议优化循环结构以提升简洁性）
"""

os.makedirs(DOWNLOAD_DIR, exist_ok=True)
print('DOWNLOAD_DIR =', DOWNLOAD_DIR)
print('AI_API_KEY present =', bool(API_KEY))

DOWNLOAD_DIR = c:\workspace\workspace4python\selenium_operator\downloads
AI_API_KEY present = True


## 2) 函数定义（集中）
把所有函数定义集中在一个代码块里；后面每个代码块只做一步测试，便于逐步调试。

In [2]:
# 所有函数定义集中在此处（建议先运行这一格）


def setup_driver():
    chrome_options = Options()
    prefs = {
        'download.default_directory': DOWNLOAD_DIR,
        'download.prompt_for_download': False,
        'download.directory_upgrade': True,
        'safebrowsing.enabled': True,
    }
    chrome_options.add_experimental_option('prefs', prefs)

    service = Service(ChromeDriverManager().install())
    d = webdriver.Chrome(service=service, options=chrome_options)
    d.implicitly_wait(10)
    return d


def wait_for_grid(driver):
    WebDriverWait(driver, 20).until(
        EC.presence_of_element_located((By.CLASS_NAME, 'ag-root'))
    )
    viewport = WebDriverWait(driver, 20).until(
        EC.presence_of_element_located((By.CLASS_NAME, 'ag-body-viewport'))
    )
    return viewport


def get_visible_rows(driver):
    return driver.find_elements(
        By.XPATH,
        "//div[contains(@class, 'ag-center-cols-container')]//div[@role='row']",
    )


def clear_download_dir():
    for f in glob.glob(os.path.join(DOWNLOAD_DIR, '*')):
        try:
            os.remove(f)
        except Exception:
            pass


def wait_download_complete(timeout=60, poll_interval=0.5, settle_rounds=3):
    """等待下载完成。

    兼容两类临时文件：
    - Chrome 的 *.crdownload
    - 站点/浏览器可能出现的 *.tmp

    规则：
    - 忽略 *.crdownload / *.tmp
    - 找到最新的“非临时文件”后，要求文件大小连续 settle_rounds 次不变才认为完成
    """

    start = time.time()
    last_path = None
    stable_count = 0
    last_size = None

    while time.time() - start < timeout:
        files = glob.glob(os.path.join(DOWNLOAD_DIR, '*'))
        candidates = [
            p for p in files
            if not p.lower().endswith('.crdownload') and not p.lower().endswith('.tmp')
        ]

        if candidates:
            candidates.sort(key=lambda p: os.path.getmtime(p), reverse=True)
            path = candidates[0]

            try:
                size = os.path.getsize(path)
            except OSError:
                time.sleep(poll_interval)
                continue

            if path == last_path and size == last_size:
                stable_count += 1
            else:
                stable_count = 0
                last_path = path
                last_size = size

            if stable_count >= settle_rounds:
                return path

        time.sleep(poll_interval)

    return None


def _extract_filename_from_href(href: str) -> str:
    if not href:
        return ''
    m = re.search(r'(?:\?|&)(?:attname)=([^&]+)', href)
    if not m:
        return ''
    try:
        from urllib.parse import unquote
        return unquote(m.group(1))
    except Exception:
        return m.group(1)


def _contains_cpp_hint(s: str) -> bool:
    if not s:
        return False
    return bool(re.search(r'(?i)\.cpp(\b|$)', s))


def _is_cpp_download_link(a) -> bool:
    href = (a.get_attribute('href') or '').strip()
    dl = (a.get_attribute('download') or '').strip()
    title = (a.get_attribute('title') or '').strip()
    aria = (a.get_attribute('aria-label') or '').strip()
    text = (a.text or '').strip()
    attname = _extract_filename_from_href(href).strip()

    candidates = [dl, attname, title, aria, text, href]
    return any(_contains_cpp_hint(c) for c in candidates)


def _get_top_visible_ant_modal(driver):
    """返回当前最上层、可见的 Ant Design modal 容器（若没有则返回 None）。"""
    try:
        modals = driver.find_elements(By.CSS_SELECTOR, '.ant-modal')
    except Exception:
        return None

    for m in reversed(modals):
        try:
            if m.is_displayed():
                return m
        except Exception:
            continue
    return None


def _get_ant_modal_body(modal):
    try:
        return modal.find_element(By.CSS_SELECTOR, '.ant-modal-body')
    except Exception:
        return None


def _scroll_ant_modal_to_bottom(driver, modal, steps=10, pause=0.2):
    """把弹窗内容区域滚动到最底部（用于触发懒加载/显示底部下载按钮）。"""
    body = _get_ant_modal_body(modal)
    if not body:
        return

    last_top = None
    for _ in range(steps):
        driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight;", body)
        time.sleep(pause)
        try:
            top = driver.execute_script("return arguments[0].scrollTop;", body)
        except Exception:
            top = None
        if top is not None and top == last_top:
            break
        last_top = top


def _click_modal_close(driver, modal, timeout=10):
    """点击右上角 X 关闭弹窗并等待其消失。"""
    if not modal:
        return False

    try:
        close_btn = modal.find_element(By.CSS_SELECTOR, 'button.ant-modal-close')
    except Exception:
        try:
            close_btn = modal.find_element(
                By.XPATH,
                ".//button[@type='button' and @aria-label='Close' and contains(@class,'ant-modal-close')]",
            )
        except Exception:
            return False

    try:
        driver.execute_script("arguments[0].click();", close_btn)
    except Exception:
        close_btn.click()

    def _gone(_):
        try:
            return not modal.is_displayed()
        except Exception:
            return True

    WebDriverWait(driver, timeout).until(_gone)
    return True


def _get_top_visible_select_listbox(driver):
    """返回当前最上层、可见的自定义下拉选项容器（role=listbox）。"""
    try:
        boxes = driver.find_elements(
            By.XPATH,
            "//div[@role='listbox' and contains(@class,'SelectOptions-module')]",
        )
    except Exception:
        return None

    for b in reversed(boxes):
        try:
            if b.is_displayed():
                return b
        except Exception:
            continue
    return None


def _get_top_visible_listbox_with_options(driver):
    """返回当前最上层、可见且“有选项”的 listbox。

    你当前报错是：listbox 已出现，但内部还没渲染 optionLabel。
    所以这里用 role=option 作为更稳定的判断/选择依据。
    """
    try:
        boxes = driver.find_elements(
            By.XPATH,
            "//div[@role='listbox' and contains(@class,'SelectOptions-module')]",
        )
    except Exception:
        return None

    for b in reversed(boxes):
        try:
            if not b.is_displayed():
                continue
            opts = b.find_elements(By.XPATH, ".//*[@role='option']")
            if opts:
                return b
        except Exception:
            continue
    return None


def _extract_option_text(option_el):
    """从 option 元素里尽量提取用于匹配分数的文本。"""
    try:
        label = option_el.find_elements(By.XPATH, ".//*[contains(@class,'SelectOptions-module__optionLabel')]")
        if label:
            t = (label[0].text or '').strip()
            if t:
                return t
    except Exception:
        pass
    try:
        return (option_el.text or '').strip()
    except Exception:
        return ''


def download_homework_file(driver, row, row_index, post_click_wait=2.0):
    """新版页面：
    1) 先点击该行的 field_5 单元格打开详情/弹窗
    2) 弹窗里会出现多个下载按钮（a 标签）
    3) 只下载以 .cpp 结尾的附件（若有多个，按顺序逐个尝试，直到下载到 .cpp）

    关键适配：
    - 点击详情后，需要把弹窗内容下滑到最底部，才会显示下载按钮
    - 站点下载时可能先生成 *.tmp，必须等待其转为最终文件
    - 点击下载后固定等待 2s（post_click_wait）再开始轮询
    """

    current_row_index = row.get_attribute('row-index')

    # 先定位并点击 field_5（触发弹窗/详情）
    try:
        cell = row.find_element(By.XPATH, ".//div[@col-id='field_5']")
    except NoSuchElementException:
        cell = driver.find_element(
            By.XPATH,
            f"//div[@role='row' and @row-index='{current_row_index}']//div[@col-id='field_5']",
        )

    driver.execute_script(
        "arguments[0].scrollIntoView({block: 'center', inline: 'center'});",
        cell,
    )
    time.sleep(0.2)
    driver.execute_script("arguments[0].click();", cell)

    # 等待弹窗出现
    WebDriverWait(driver, 20).until(lambda drv: _get_top_visible_ant_modal(drv) is not None)
    modal = _get_top_visible_ant_modal(driver)

    # 下滑弹窗到最底部（触发显示下载按钮），并在弹窗内找下载链接
    start = time.time()
    all_links = []
    while time.time() - start < 20:
        modal = _get_top_visible_ant_modal(driver) or modal
        if modal:
            _scroll_ant_modal_to_bottom(driver, modal)
            try:
                all_links = modal.find_elements(
                    By.XPATH,
                    ".//a[contains(@href,'download') and contains(@class,'download')]",
                )
                all_links = [a for a in all_links if a.is_displayed()]
            except Exception:
                all_links = []

        if all_links:
            break
        time.sleep(0.2)

    cpp_links = [a for a in all_links if _is_cpp_download_link(a)]

    if not cpp_links:
        print(f'第 {row_index + 1} 行：弹窗中未找到带 .cpp 提示的下载按钮（将不下载）')
        return None

    if len(cpp_links) > 1:
        print(f'第 {row_index + 1} 行：发现 {len(cpp_links)} 个可能的 .cpp 附件，依次尝试下载直到拿到 .cpp：')
        for a in cpp_links:
            href = a.get_attribute('href') or ''
            dl = a.get_attribute('download') or ''
            att = _extract_filename_from_href(href)
            txt = (a.text or '').strip()
            print('  -', dl or att or txt or href)

    for idx, target in enumerate(cpp_links, start=1):
        clear_download_dir()

        file_name_hint = (target.get_attribute('download') or '').strip() or _extract_filename_from_href(
            target.get_attribute('href') or ''
        ) or (target.text or '').strip()

        print(f'下载第 {row_index + 1} 行（候选 {idx}/{len(cpp_links)}）: {file_name_hint or "(cpp 附件)"}')

        try:
            driver.execute_script("arguments[0].scrollIntoView({block:'center'});", target)
            time.sleep(0.2)
        except Exception:
            pass

        driver.execute_script("arguments[0].click();", target)

        time.sleep(post_click_wait)

        downloaded = wait_download_complete(timeout=60)
        if not downloaded:
            print('下载超时，尝试下一个候选')
            continue

        base = os.path.basename(downloaded)
        print('下载完成:', base)

        if base.lower().endswith('.cpp'):
            return downloaded

        if _contains_cpp_hint(file_name_hint):
            print('注意：下载文件扩展名不是 .cpp，但链接提示是 .cpp，将尝试按文本读取：', base)
            return downloaded

        print('下载到的不是 .cpp，清理后尝试下一个候选:', base)

    return None


def read_cpp_file(file_path, max_bytes=2_000_000):
    """尽量把下载到的 C/C++ 源码按文本读出来。

    乱码通常来自编码识别错误。这里采用“多编码候选 + 质量打分”选最优解码，
    避免错误解码导致 AI 评分过低。

    - 支持 utf-8/utf-16/gb18030 等常见编码
    - 如果实际下载到的是 docx/pdf/zip 等二进制，会给出提示并返回 None
    """

    if not file_path or not os.path.exists(file_path):
        print('read_cpp_file: 文件不存在:', file_path)
        return None

    size = os.path.getsize(file_path)
    print('文件大小:', size, 'bytes')

    with open(file_path, 'rb') as f:
        data = f.read(max_bytes + 1)

    if len(data) > max_bytes:
        data = data[:max_bytes]
        print(f'注意：文件过大，已截断到前 {max_bytes} bytes 读取')

    if data.startswith(b'PK\x03\x04'):
        print('read_cpp_file: 看起来像 ZIP/Office 文档（可能是 docx/xlsx），不是源码文本')
        return None
    if data.startswith(b'%PDF'):
        print('read_cpp_file: 看起来像 PDF，不是源码文本')
        return None

    def _score_text(text: str) -> tuple:
        # 低分=更差；我们会选“最小”的 score tuple
        if not text:
            return (10**9, 10**9, 10**9, 0)

        length = len(text)
        repl = text.count('�')
        nul = text.count('\x00')
        ctrl = sum(1 for c in text if ord(c) < 32 and c not in ('\n', '\r', '\t'))

        # 关键 token 命中越多越像源码
        tokens = ['#include', 'int', 'main', 'std::', 'using', 'return', ';', '{', '}']
        token_hits = sum(1 for t in tokens if t in text)

        # 大量替换字符或 NUL 字符通常说明解码不对
        repl_ratio = repl / max(1, length)
        nul_ratio = nul / max(1, length)

        penalty = 0
        if repl_ratio > 0.02:
            penalty += int(repl_ratio * 10_000)
        if nul_ratio > 0.001:
            penalty += int(nul_ratio * 10_000)

        # 评分 tuple：替换字符优先、控制字符其次、再看惩罚，最后用 token_hits 反向做 tie-break
        return (repl, ctrl, penalty, -token_hits)

    # 候选编码：优先常见中文/UTF 系列
    encodings = [
        'utf-8-sig',
        'utf-8',
        'gb18030',
        'gbk',
        'cp936',
        'utf-16',
        'utf-16-le',
        'utf-16-be',
        'big5',
    ]

    best = None
    best_enc = None
    best_score = None

    for enc in encodings:
        try:
            text = data.decode(enc, errors='replace')
        except Exception:
            continue

        # 明显不对：大量 NUL
        if text.count('\x00') > max(50, len(text) // 10):
            continue

        sc = _score_text(text)
        if best is None or sc < best_score:
            best = text
            best_enc = enc
            best_score = sc

    if best is None:
        # 最后兜底
        best = data.decode('utf-8', errors='replace')
        best_enc = 'utf-8 (fallback)'
        best_score = _score_text(best)

    print('读取编码:', best_enc, 'score=', best_score)

    # 若仍然疑似乱码，给出提示（但仍返回文本，方便你手动确认）
    if best.count('�') > max(10, len(best) // 50):
        print('警告：文本可能仍存在乱码（替换字符较多）。建议检查该作业源文件实际编码。')

    return best


def score_homework_with_ai(cpp_code):
    if not API_KEY:
        return None, '缺少 AI_API_KEY（环境变量/.env）'
    if not cpp_code or not cpp_code.strip():
        return None, '文件内容为空'

    client = OpenAI(api_key=API_KEY, base_url=BASE_URL)
    resp = client.chat.completions.create(
        model=MODEL_NAME,
        messages=[
            {'role': 'system', 'content': SCORING_CRITERIA},
            {'role': 'user', 'content': f'请评分以下C++代码：\n{cpp_code}'},
        ],
        timeout=30,
    )

    result = resp.choices[0].message.content.strip()
    lines = result.split('\n')
    score = None
    comment = ''
    for line in lines:
        m = re.search(r'\d+(?:\.\d+)?', line)
        if m and score is None:
            score = m.group()
        elif line.strip() and not line.strip().isdigit():
            comment += line.strip() + ' '
    print('AI评分结果原文：', repr(result))
    return score, comment.strip()


def fill_score_and_comment(driver, row, score, comment=None):
    """回填（新版弹窗 + 自定义选择框）：

    你描述的流程：
    - 点击“修改”（点击后等待 1s）
    - 弹窗中点击 placeholder=“请选择” 的 input，弹出选择框
    - 在选择框里点击一个评分（例如 8.5 / 9.0）
    - 点击“提交”按钮完成批改
    - 全部完成后点击右上角关闭（X）返回列表
    """

    score_str = str(score).strip() if score is not None else ''
    if not score_str:
        raise ValueError('score 为空，无法回填')

    # 优先在弹窗内找“修改”，找不到再全局找
    modal = _get_top_visible_ant_modal(driver)

    def _find_edit():
        m = _get_top_visible_ant_modal(driver) or modal
        if m:
            try:
                btn = m.find_element(By.XPATH, ".//button[.//span[normalize-space()='修改']]")
                if btn.is_displayed() and btn.is_enabled():
                    return btn
            except Exception:
                pass
        try:
            btn = driver.find_element(By.XPATH, "//button[.//span[normalize-space()='修改']]")
            if btn.is_displayed() and btn.is_enabled():
                return btn
        except Exception:
            return None
        return None

    edit_btn = WebDriverWait(driver, 10).until(lambda _: _find_edit())
    try:
        edit_btn.click()
    except Exception:
        driver.execute_script("arguments[0].click();", edit_btn)

    time.sleep(1)  # 按你的要求：点击后等待 1s

    modal = _get_top_visible_ant_modal(driver) or modal

    # 点击“请选择”输入框，弹出选择框（listbox）
    def _find_score_input():
        m = _get_top_visible_ant_modal(driver) or modal
        if not m:
            return None

        xpaths = [
            ".//input[@placeholder='请选择' and not(@disabled)]",
            ".//input[contains(@class,'ant-select-selection-search-input') and not(@disabled)]",
        ]
        for xp in xpaths:
            try:
                el = m.find_element(By.XPATH, xp)
                if el.is_displayed() and el.is_enabled():
                    return el
            except Exception:
                continue
        return None

    score_input = WebDriverWait(driver, 10).until(lambda _: _find_score_input())

    score_input.click()
    time.sleep(0.2)

    # 等待 listbox + 选项出现（仅 listbox 出现不够；你现在就是卡在内部没渲染 label）
    WebDriverWait(driver, 10).until(lambda _: _get_top_visible_listbox_with_options(driver) is not None)
    listbox = _get_top_visible_listbox_with_options(driver) or _get_top_visible_select_listbox(driver)

    if not listbox:
        raise RuntimeError('未找到评分下拉（listbox）')

    # 获取 option 列表（更稳：以 role=option 为准）
    options = listbox.find_elements(By.XPATH, ".//*[@role='option']")
    if not options:
        # 兜底：再等一小会儿（有时 listbox 已出现但 option 延迟渲染）
        time.sleep(0.5)
        options = listbox.find_elements(By.XPATH, ".//*[@role='option']")

    if not options:
        # 最后兜底：输出一点调试信息，方便你截图/定位
        try:
            html = listbox.get_attribute('outerHTML')
            print('DEBUG: listbox outerHTML (truncated)=', (html or '')[:500])
        except Exception:
            pass
        raise RuntimeError('未找到可用的评分选项（listbox 已出现，但无 role=option）')

    # 从 option 提取文本，做“精确匹配/最接近”
    parsed = []
    for opt in options:
        txt = _extract_option_text(opt)
        if not txt:
            continue
        parsed.append((opt, txt))

    if not parsed:
        raise RuntimeError('未找到可用的评分选项（option 存在，但无法提取文本）')

    # 先精确匹配（文本完全等于 score_str）
    chosen = None
    for opt, txt in parsed:
        if txt == score_str:
            chosen = opt
            break

    # 再尝试数值最接近
    if chosen is None:
        try:
            target_val = float(score_str)
        except Exception:
            target_val = None

        if target_val is not None:
            best = None
            for opt, txt in parsed:
                try:
                    v = float(txt)
                except Exception:
                    continue
                diff = abs(v - target_val)
                if best is None or diff < best[0]:
                    best = (diff, opt, txt)
            if best is not None:
                chosen = best[1]
                print(f'评分 {score_str} 不在下拉中，改选最接近的：{best[2]}')

    # 还没选到：就选第一个，避免卡死
    if chosen is None:
        print('警告：无法解析分数选项文本，将默认选择第一个 option：', parsed[0][1])
        chosen = parsed[0][0]

    driver.execute_script("arguments[0].click();", chosen)
    time.sleep(0.2)

    # 点击“提交”按钮（优先弹窗内）
    def _find_submit():
        m = _get_top_visible_ant_modal(driver) or modal
        if m:
            try:
                btn = m.find_element(By.XPATH, ".//button[.//span[normalize-space()='提交']]")
                if btn.is_displayed() and btn.is_enabled():
                    return btn
            except Exception:
                pass
        try:
            btn = driver.find_element(By.XPATH, "//button[.//span[normalize-space()='提交']]")
            if btn.is_displayed() and btn.is_enabled():
                return btn
        except Exception:
            return None
        return None

    submit_btn = WebDriverWait(driver, 10).until(lambda _: _find_submit())
    try:
        submit_btn.click()
    except Exception:
        driver.execute_script("arguments[0].click();", submit_btn)

    # 提交后：点击右上角 X 关闭弹窗返回列表
    time.sleep(0.5)
    modal = _get_top_visible_ant_modal(driver) or modal
    closed = _click_modal_close(driver, modal, timeout=10)
    if not closed:
        print('已提交，但未找到/未能点击关闭按钮（请手动关闭弹窗）')
    else:
        print('回填完成（已提交并关闭弹窗）')

    # 轻量使用 comment，避免静态检查提示（当前不写入网页）
    if comment and str(comment).strip():
        pass


def _get_ag_row_cell_text(row, col_id: str) -> str:
    """从 AG Grid 的某行某列拿到可见文本（用于判断是否已有评分）。"""
    if not row or not col_id:
        return ''
    try:
        cell = row.find_element(By.XPATH, f".//div[@col-id='{col_id}']")
    except Exception:
        return ''

    # AG Grid 常见：值在 ag-cell-value 内
    try:
        val = cell.find_element(By.XPATH, ".//div[contains(@class,'ag-cell-value')]")
        txt = (val.text or '').strip()
    except Exception:
        txt = (cell.text or '').strip()

    if not txt:
        try:
            txt = (cell.get_attribute('title') or '').strip()
        except Exception:
            txt = ''
    return txt


def _row_has_teacher_score(row, score_col_id: str = 'field_11') -> tuple[bool, str]:
    """判断该行是否已有教师评分。
    返回：(是否已评分, 原始文本)。
    """
    txt = _get_ag_row_cell_text(row, score_col_id)
    if not txt:
        return False, ''
    # 只要包含数字，就认为已评分（例如 8、8.5、10 等）
    if re.search(r'\d', txt):
        return True, txt
    return False, txt


def process_all_visible_then_scroll(driver, viewport, max_loops=9999, skip_if_scored=True, score_col_id='field_11'):
    processed = set()

    for _ in range(max_loops):
        rows = get_visible_rows(driver)
        new_rows = 0

        for r in rows:
            idx_str = r.get_attribute('row-index')
            if not idx_str:
                continue
            idx = int(idx_str)
            if idx in processed:
                continue

            processed.add(idx)
            new_rows += 1

            # 跳过：已有教师评分的行（field_11）
            if skip_if_scored:
                has_score, raw = _row_has_teacher_score(r, score_col_id=score_col_id)
                if has_score:
                    print(f"\n--- 跳过第 {idx + 1} 份作业：已有教师评分 {raw} ---")
                    continue

            print(f"\n--- 处理第 {idx + 1} 份作业 ---")
            downloaded = download_homework_file(driver, r, idx)
            if not downloaded:
                print('下载失败，跳过')
                continue

            cpp_code = read_cpp_file(downloaded)
            if not cpp_code:
                print('读取失败（可能下载到的不是源码文件），跳过')
                continue

            score, comment = score_homework_with_ai(cpp_code)
            if not score:
                print('评分失败，跳过：', comment)
                continue

            print('score =', score)
            print('comment =', comment)
            fill_score_and_comment(driver, r, score, comment)

        is_bottom = driver.execute_script(
            "return arguments[0].scrollTop + arguments[0].clientHeight >= arguments[0].scrollHeight - 50;",
            viewport,
        )

        if is_bottom and new_rows == 0:
            print('已到底部，结束。总处理:', len(processed))
            break

        print('向下滚动加载更多...')
        driver.execute_script('arguments[0].scrollTop += arguments[0].clientHeight;', viewport)
        time.sleep(2)

    return processed


In [None]:
# 补丁：field_5 打开详情时，兼容 Ant Design Modal/Drawer，并增加点击重试。
# 目标：不再因为“某行点不开详情”而卡死/超时。
from selenium.common.exceptions import TimeoutException


def _get_top_visible_ant_modal(driver):
    """返回当前最上层、可见的弹层容器。

    兼容：
    - AntD Modal: .ant-modal
    - AntD Drawer: .ant-drawer
    - 兜底：任意 role=dialog 或 aria-modal=true（有些版本/页面结构不完全一致）

    说明：为了尽量少改动原有代码，仍沿用旧函数名。
    """

    # 1) modal
    try:
        modals = driver.find_elements(By.CSS_SELECTOR, '.ant-modal')
    except Exception:
        modals = []

    for m in reversed(modals):
        try:
            if m.is_displayed():
                return m
        except Exception:
            continue

    # 2) drawer
    try:
        drawers = driver.find_elements(By.CSS_SELECTOR, '.ant-drawer')
    except Exception:
        drawers = []

    for d in reversed(drawers):
        try:
            if d.is_displayed():
                return d
        except Exception:
            continue

    # 3) fallback dialog
    try:
        dialogs = driver.find_elements(By.XPATH, "//*[@role='dialog' or @aria-modal='true']")
    except Exception:
        dialogs = []

    for dlg in reversed(dialogs):
        try:
            if dlg.is_displayed():
                return dlg
        except Exception:
            continue

    return None


def _get_ant_modal_body(modal):
    if not modal:
        return None

    # modal
    try:
        return modal.find_element(By.CSS_SELECTOR, '.ant-modal-body')
    except Exception:
        pass

    # drawer
    try:
        return modal.find_element(By.CSS_SELECTOR, '.ant-drawer-body')
    except Exception:
        pass

    # fallback: 有些 dialog 结构不是 AntD
    try:
        return modal
    except Exception:
        return None


def _click_modal_close(driver, modal, timeout=10):
    """点击右上角关闭按钮并等待弹层消失（兼容 modal/drawer）。"""
    if not modal:
        return False

    close_btn = None
    for sel in ('button.ant-modal-close', 'button.ant-drawer-close'):
        try:
            close_btn = modal.find_element(By.CSS_SELECTOR, sel)
            break
        except Exception:
            continue

    if close_btn is None:
        # 兜底：aria-label=Close
        try:
            close_btn = modal.find_element(
                By.XPATH,
                ".//button[@type='button' and @aria-label='Close']",
            )
        except Exception:
            return False

    try:
        driver.execute_script("arguments[0].click();", close_btn)
    except Exception:
        try:
            close_btn.click()
        except Exception:
            return False

    def _gone(_):
        try:
            return not modal.is_displayed()
        except Exception:
            return True

    WebDriverWait(driver, timeout).until(_gone)
    return True


def _find_cell_by_row_index_and_col_id(driver, row_index: str, col_id: str):
    # AG Grid 可能存在 pinned/center 多套 row，统一用 row-index + col-id 找可见的那个
    cells = driver.find_elements(
        By.XPATH,
        f"//div[@role='row' and @row-index='{row_index}']//div[@col-id='{col_id}']",
    )
    for c in cells:
        try:
            if c.is_displayed():
                return c
        except Exception:
            continue
    return cells[0] if cells else None


def _click_open_detail(driver, cell):
    """尽量点击到真正的可点击控件（有些 cell 内部是 a/button）。"""
    if cell is None:
        return False

    # 若 cell 内有链接/按钮，优先点它
    for xp in (".//a", ".//button", ".//*[@role='button']"):
        try:
            targets = cell.find_elements(By.XPATH, xp)
            targets = [t for t in targets if t.is_displayed()]
            if targets:
                driver.execute_script("arguments[0].click();", targets[0])
                return True
        except Exception:
            continue

    # 否则点 cell 本身
    try:
        driver.execute_script("arguments[0].click();", cell)
        return True
    except Exception:
        try:
            cell.click()
            return True
        except Exception:
            return False


def download_homework_file(
    driver,
    row,
    row_index,
    post_click_wait=2.0,
    open_attempts=4,
    per_attempt_wait=8,
):
    """下载附件（.cpp），并兼容“点 field_5 后偶发不弹出弹窗/抽屉”的情况。

    你问“要不要把超时时间调大”：
    - 可以适当调大，但更重要的是“失败要跳过”而不是卡死。
    - 这里采用：最多 open_attempts 次点击，每次等待 per_attempt_wait 秒。
      默认总等待约 32 秒，但一旦打开就立即继续，不会无谓拖慢。
    """

    current_row_index = row.get_attribute('row-index')

    # 若已存在弹层，直接复用（避免因为上一个未关闭导致等待失败）
    modal = _get_top_visible_ant_modal(driver)

    if modal is None:
        cell = _find_cell_by_row_index_and_col_id(driver, current_row_index, 'field_5')
        if cell is None:
            print(f'第 {row_index + 1} 行：未找到 field_5 单元格，跳过')
            return None

        try:
            driver.execute_script(
                "arguments[0].scrollIntoView({block: 'center', inline: 'center'});",
                cell,
            )
        except Exception:
            pass
        time.sleep(0.2)

        # 多次尝试点击打开详情
        modal = None
        for attempt in range(1, open_attempts + 1):
            ok = _click_open_detail(driver, cell)
            time.sleep(0.2)

            if not ok:
                continue

            try:
                WebDriverWait(driver, per_attempt_wait).until(
                    lambda drv: _get_top_visible_ant_modal(drv) is not None
                )
                modal = _get_top_visible_ant_modal(driver)
                break
            except TimeoutException:
                # 再滚一下/再点一次
                try:
                    driver.execute_script(
                        "arguments[0].scrollIntoView({block: 'center', inline: 'center'});",
                        cell,
                    )
                except Exception:
                    pass

        if modal is None:
            print(
                f'第 {row_index + 1} 行：点击 field_5 后仍未出现弹窗/抽屉（已重试 {open_attempts} 次），跳过'
            )
            return None

    # 下滑弹层到最底部（触发显示下载按钮），并在弹层内找下载链接/按钮
    start = time.time()
    all_links = []
    while time.time() - start < 20:
        modal = _get_top_visible_ant_modal(driver) or modal
        if modal:
            _scroll_ant_modal_to_bottom(driver, modal)
            try:
                # 兼容 a 或 button；同时保留 class=download 的旧线索
                all_links = modal.find_elements(
                    By.XPATH,
                    ".//a[contains(@href,'download')] | .//button[contains(.,'下载')] | .//*[contains(@class,'download')]",
                )
                all_links = [a for a in all_links if a.is_displayed()]
            except Exception:
                all_links = []

        if all_links:
            break
        time.sleep(0.2)

    cpp_links = [a for a in all_links if _is_cpp_download_link(a)]

    if not cpp_links:
        print(f'第 {row_index + 1} 行：弹层中未找到带 .cpp 提示的下载按钮（将不下载）')
        return None

    for idx, target in enumerate(cpp_links, start=1):
        clear_download_dir()

        file_name_hint = (target.get_attribute('download') or '').strip() or _extract_filename_from_href(
            target.get_attribute('href') or ''
        ) or (target.get_attribute('title') or '').strip() or (target.text or '').strip()

        print(f'下载第 {row_index + 1} 行（候选 {idx}/{len(cpp_links)}）: {file_name_hint or "(cpp 附件)"}')

        try:
            driver.execute_script("arguments[0].scrollIntoView({block:'center'});", target)
            time.sleep(0.2)
        except Exception:
            pass

        try:
            driver.execute_script("arguments[0].click();", target)
        except Exception:
            try:
                target.click()
            except Exception:
                print('点击下载失败，尝试下一个候选')
                continue

        time.sleep(post_click_wait)

        downloaded = wait_download_complete(timeout=60)
        if not downloaded:
            print('下载超时，尝试下一个候选')
            continue

        base = os.path.basename(downloaded)
        print('下载完成:', base)

        if base.lower().endswith('.cpp'):
            return downloaded

        if _contains_cpp_hint(file_name_hint):
            print('注意：下载文件扩展名不是 .cpp，但链接提示是 .cpp，将尝试按文本读取：', base)
            return downloaded

        print('下载到的不是 .cpp，清理后尝试下一个候选:', base)

    return None


print('Patch loaded: modal/drawer open retry enabled')

# 批量处理在滚动/弹窗关闭后，AG Grid 可能会重渲染行节点，
# 之前缓存的 WebElement 会变成 stale（StaleElementReferenceException）。
# 这里用“先拿 row-index，再按 row-index 重新定位行元素”的方式规避。


def _find_center_row_by_index(driver, row_index: int):
    return driver.find_element(
        By.XPATH,
        f"//div[contains(@class,'ag-center-cols-container')]//div[@role='row' and @row-index='{row_index}']",
    )


def process_all_visible_then_scroll(driver, viewport, max_loops=9999, skip_if_scored=True, score_col_id='field_11'):
    processed = set()

    for _ in range(max_loops):
        # 先“快照”当前可见的 row-index 列表（不要把 row WebElement 长期保存）
        row_indices = []
        for r in get_visible_rows(driver):
            try:
                idx_str = r.get_attribute('row-index')
            except StaleElementReferenceException:
                continue
            except Exception:
                continue

            if not idx_str:
                continue
            try:
                row_indices.append(int(idx_str))
            except ValueError:
                continue

        new_rows = 0

        for idx in row_indices:
            if idx in processed:
                continue

            processed.add(idx)
            new_rows += 1

            # 每次处理前都重新定位“新鲜”的 row 元素
            try:
                r = _find_center_row_by_index(driver, idx)
            except Exception:
                continue

            # 跳过：已有教师评分的行（field_11）
            if skip_if_scored:
                try:
                    has_score, raw = _row_has_teacher_score(r, score_col_id=score_col_id)
                except StaleElementReferenceException:
                    try:
                        r = _find_center_row_by_index(driver, idx)
                        has_score, raw = _row_has_teacher_score(r, score_col_id=score_col_id)
                    except Exception:
                        continue

                if has_score:
                    print(f"\n--- 跳过第 {idx + 1} 份作业：已有教师评分 {raw} ---")
                    continue

            print(f"\n--- 处理第 {idx + 1} 份作业 ---")

            try:
                downloaded = download_homework_file(driver, r, idx)
            except StaleElementReferenceException:
                # 行被重渲染：跳过本行，下一轮滚动/刷新时再碰到就会处理
                print('行元素已失效（stale），跳过本行，继续...')
                continue

            if not downloaded:
                print('下载失败，跳过')
                continue

            cpp_code = read_cpp_file(downloaded)
            if not cpp_code:
                print('读取失败（可能下载到的不是源码文件），跳过')
                continue

            score, comment = score_homework_with_ai(cpp_code)
            if not score:
                print('评分失败，跳过：', comment)
                continue

            print('score =', score)
            print('comment =', comment)

            # fill_score_and_comment 当前不依赖 row 内容，但这里仍传入 r 以保持接口一致
            try:
                fill_score_and_comment(driver, r, score, comment)
            except StaleElementReferenceException:
                # 提交/关闭弹窗后 grid 重渲染是正常的，忽略即可
                print('回填后行元素变 stale（正常），继续...')

        is_bottom = driver.execute_script(
            "return arguments[0].scrollTop + arguments[0].clientHeight >= arguments[0].scrollHeight - 50;",
            viewport,
        )

        if is_bottom and new_rows == 0:
            print('已到底部，结束。总处理:', len(processed))
            break

        print('向下滚动加载更多...')
        driver.execute_script('arguments[0].scrollTop += arguments[0].clientHeight;', viewport)
        time.sleep(2)

    return processed

Patch loaded: modal/drawer open retry enabled


## 3) 初始化浏览器（可重复运行）
如果你多次运行导致残留浏览器窗口，先手动关闭或运行下面的关闭 cell。

In [3]:
# 运行此格前，请先运行：导入配置 & 函数定义（集中）

driver = None


In [None]:
# 启动浏览器
driver = setup_driver()

# 重要：关闭/降低 implicit wait，避免与 WebDriverWait(显式等待) 叠加导致“回填特别慢”
# 原因：显式等待的轮询里每次 find_element 都会吃满 implicit wait（例如 10s），整体会被放大很多倍
driver.implicitly_wait(0)

driver.get(HOMEWORK_URL)
print('已打开页面：', HOMEWORK_URL)
print('请在浏览器中完成登录，然后再运行下一格。')



已打开页面： https://next.jinshuju.net/forms/NHaQvT/entries
请在浏览器中完成登录，然后再运行下一格。


In [None]:
# 可选：关闭浏览器（需要时再运行）
if driver is not None:
    try:
        driver.quit()
    except Exception as e:
        print('quit error:', e)
driver = None
print('driver closed')

## 4) 等待表格加载 + 定位滚动区域
这一步用于确认 AG Grid 的关键节点都能找到。

In [17]:
viewport = wait_for_grid(driver)
print('AG Grid 已就绪')


AG Grid 已就绪


## 5) 读取当前可视区域的行（只取中间滚动列）
先用这个确认你能拿到 row-index、以及行内是否包含 `field_5/field_11/field_12`。

In [18]:
rows = get_visible_rows(driver)
print('当前可见行数:', len(rows))

# 可选：检查第一行有哪些列（便于确认 col-id）
if rows:
    first = rows[0]
    col_ids = [c.get_attribute('col-id') for c in first.find_elements(By.XPATH, ".//div[@role='gridcell']")]
    print('第一行 col-id:', col_ids)


当前可见行数: 23
第一行 col-id: ['field_7', 'field_1', 'field_2', 'field_3', 'field_4', 'field_8', 'field_9', 'field_5', 'field_6', 'field_10', 'field_11', 'field_12', 'x_field_weixin_nickname', 'x_field_weixin_openid', 'x_field_weixin_unionid', 'created_at', 'info_filling_duration', 'info_region', 'placeholder-column']


In [19]:
# 检查第一行是否具备关键列（调试 XPath 用）
if not rows:
    raise RuntimeError('当前没拿到可视行：请确认已登录且表格已加载')

sample = rows[0]
for col in ['field_5']:
    try:
        _ = sample.find_element(By.XPATH, f".//div[@col-id='{col}']")
        print('found col:', col)
    except Exception as e:
        print('missing col:', col, 'err=', type(e).__name__)

found col: field_5


## 6) 下载、读文件、AI 评分（逐步测试）
建议先只处理 1 行，确认下载与评分链路是通的。

In [36]:
# 只处理当前可视区域的第 1 行（你也可以改成 rows[n]）
rows = get_visible_rows(driver)
if not rows:
    raise RuntimeError('当前无可视行')

row = rows[0]
row_index = int(row.get_attribute('row-index'))

downloaded = download_homework_file(driver, row, row_index)
print(downloaded)
if downloaded:
    cpp_code = read_cpp_file(downloaded)
    print('代码长度:', 0 if not cpp_code else len(cpp_code))
    if cpp_code:
        score, comment = score_homework_with_ai(cpp_code)
        print('score =', score)
        print('comment =', comment)

下载第 119 行（候选 1/1）: FileName.cpp
下载完成: FileName.cpp
c:\workspace\workspace4python\selenium_operator\downloads\FileName.cpp
文件大小: 611 bytes
读取编码: gb18030 score= (0, 0, 0, -8)
代码长度: 607
score = 6.5
comment = 算法思想选用二分法方向正确但实现有误：没有检验端点异号（首段区间不成立），循环中重复声明x3并遮蔽外变量，硬编码大量迭代且无收敛判据，建议先检查f(x1)*f(x2)<0，使用基于误差的终止条件、移除冗余变量并补充注释与更有意义的命名以提升正确性与可读性。


## 7) 回填评分/评语（逐步测试）
这一格只负责把上一步得到的 `score/comment` 写回表格。

In [37]:
# 只有在你确认 score/comment 正确后再运行这一格
# 依赖上一节已生成 row/score/comment
if not score:
    raise RuntimeError('score 为空，取消回填')

fill_score_and_comment(driver, row, score, comment)

评分 6.5 不在下拉中，改选最接近的：7.0
回填完成（已提交并关闭弹窗）


## 8) 批量处理（循环滚动）
当单行链路都跑通后，再用这一格批量处理。

In [6]:
# 批量运行（确认已登录、且单行测试 OK 后再跑）
viewport = wait_for_grid(driver)
processed = process_all_visible_then_scroll(driver, viewport)
processed


--- 跳过第 112 份作业：已有教师评分 9.0 ---

--- 跳过第 110 份作业：已有教师评分 7.5 ---

--- 跳过第 111 份作业：已有教师评分 8.5 ---

--- 跳过第 109 份作业：已有教师评分 8.5 ---

--- 跳过第 108 份作业：已有教师评分 7.5 ---

--- 跳过第 107 份作业：已有教师评分 8.5 ---

--- 跳过第 105 份作业：已有教师评分 8.5 ---

--- 跳过第 106 份作业：已有教师评分 8.5 ---

--- 跳过第 100 份作业：已有教师评分 8.0 ---

--- 跳过第 101 份作业：已有教师评分 8.5 ---

--- 处理第 102 份作业 ---
第 102 行：发现 4 个可能的 .cpp 附件，依次尝试下载直到拿到 .cpp：
  - 牛顿法.cpp
  - 二分法.cpp
  - 割线法.cpp
  - 不动点法.cpp
下载第 102 行（候选 1/4）: 牛顿法.cpp
下载完成: 牛顿法.cpp
文件大小: 532 bytes
读取编码: gb18030 score= (0, 0, 0, -8)
AI评分结果原文： '8.5\n逻辑基本正确，实现了牛顿法求根；命名和格式还行，但应在收敛时break以节省迭代、用绝对值检查残差（delta）、避免全局变量和命名为max，补充注释并简化循环以提高健壮性与可读性。'
score = 8.5
comment = 逻辑基本正确，实现了牛顿法求根；命名和格式还行，但应在收敛时break以节省迭代、用绝对值检查残差（delta）、避免全局变量和命名为max，补充注释并简化循环以提高健壮性与可读性。
回填完成（已提交并关闭弹窗）

--- 跳过第 103 份作业：已有教师评分 9.0 ---

--- 跳过第 104 份作业：已有教师评分 8.5 ---

--- 跳过第 94 份作业：已有教师评分 9.0 ---

--- 跳过第 95 份作业：已有教师评分 8.5 ---

--- 处理第 96 份作业 ---
下载第 96 行（候选 1/1）: 完全数.cpp
下载完成: 完全数.cpp
文件大小: 342 bytes
读取编码: utf-8-sig score= (0, 0, 0, -8

{84,
 85,
 86,
 87,
 88,
 89,
 90,
 91,
 92,
 93,
 94,
 95,
 96,
 97,
 98,
 99,
 100,
 101,
 102,
 103,
 104,
 105,
 106,
 107,
 108,
 109,
 110,
 111,
 112,
 113,
 114,
 115,
 116,
 117,
 118,
 119,
 120,
 121,
 122,
 123,
 124,
 125,
 126,
 127,
 128,
 129,
 130,
 131,
 132,
 133,
 134,
 135,
 136,
 137,
 138,
 139,
 140,
 141,
 142,
 143,
 144,
 145,
 146,
 147,
 148,
 149,
 150,
 151,
 152,
 153,
 154,
 155,
 156,
 157,
 158,
 159,
 160}