Skip to content

TEST: stop GCG unit tests from hitting HuggingFace#1886

Merged
romanlutz merged 2 commits into
microsoft:mainfrom
romanlutz:romanlutz/gcg-unit-tests-no-network
Jun 2, 2026
Merged

TEST: stop GCG unit tests from hitting HuggingFace#1886
romanlutz merged 2 commits into
microsoft:mainfrom
romanlutz:romanlutz/gcg-unit-tests-no-network

Conversation

@romanlutz
Copy link
Copy Markdown
Contributor

Description

Five tests under tests/unit/auxiliary_attacks/gcg/ silently downloaded the gpt2 tokenizer at test time via AutoTokenizer.from_pretrained("gpt2"). When HuggingFace rate-limits (429), the dev_all matrix on windows-latest + Python 3.10 fails on otherwise-unrelated PRs (most recent example: 5 OSError: We couldn't connect to 'https://huggingface.co' ... failures on PR #1866). Per .github/instructions/test.instructions.md, unit tests must not hit the network.

This PR splits the offenders two ways based on what each test actually needs gpt2 for:

  • Mocked in place -- the lone tokenizer-edge-case test test_gcg_core.py::TestUpdateIdsErrorPaths::test_end_tok_returns_len_toks_when_target_is_at_prompt_end was rewritten to use a fully-mocked tokenizer where encoding.char_to_token returns None past the target's end. This mirrors the pattern already used by the two adjacent test_start_tok_* tests in the same class, so it both fits in and exercises the exact return len(toks) if tok is None else tok branch in end_tok.

  • Moved to integration -- four wiring tests that exist precisely to exercise the real chat-template pipeline end-to-end (constructing real IndividualPromptAttack / ProgressiveMultiPromptAttack and running them through _update_ids). Mocking the tokenizer richly enough to satisfy that pipeline would defeat the test's purpose. Destination matches the existing tests/integration/auxiliary_attacks/test_gcg_integration.py precedent (same gpt2 + custom chat-template pattern):

    • whole tests/unit/auxiliary_attacks/gcg/test_attack_wiring.py (both tests in it)
    • just the TestCreateAttackWiring class extracted from tests/unit/auxiliary_attacks/gcg/test_generator.py

    Consolidated into a new tests/integration/auxiliary_attacks/test_gcg_attack_wiring_integration.py. No @pytest.mark.run_only_if_all_tests marker -- that marker is for tests needing real API credentials; these only need a HF tokenizer, and the precedent file doesn't use it either.

GitHub Actions PR CI only runs make unit-test, so this fully removes the offenders from the PR-time matrix. The Azure DevOps integration pipeline still exercises them on push to main.

Tests and Documentation

Verified:

  • uv run pytest tests/unit/auxiliary_attacks/gcg/ -> 110 passed, no network
  • uv run --with pytest-socket pytest tests/unit/auxiliary_attacks/gcg/ --disable-socket --allow-hosts=127.0.0.1,localhost,::1 -> 110 passed (empirical proof: any off-loopback socket call would raise SocketBlockedError)
  • uv run pytest tests/integration/auxiliary_attacks/test_gcg_attack_wiring_integration.py -> 4 passed
  • rg 'from_pretrained\("gpt2"\)' tests/unit/ -> no matches
  • pre-commit (ruff format + ruff check) clean

No documentation changes needed; this is a test-only refactor.

Out of scope but noted

tests/unit/prompt_converter/test_pdf_converter.py::test_filename_extension_existing_pdf makes a real requests.get to raw.githubusercontent.com/.../fake_CV.pdf -- same class of bug but not in today's failing job. Worth a separate follow-up.

romanlutz and others added 2 commits June 1, 2026 17:26
5 tests under `tests/unit/auxiliary_attacks/gcg/` were silently downloading
the gpt2 tokenizer at test time via `AutoTokenizer.from_pretrained(`gpt2`)`,
which flakes the dev_all CI matrix when HuggingFace rate-limits (e.g. 5 OSError
failures on windows-latest+py3.10+dev_all in PR microsoft#1866).

Per `.github/instructions/test.instructions.md`, unit tests must not hit the
network. Two paths taken:

- **Mocked in place** the lone tokenizer-edge-case test whose adjacent siblings
  in the same class already use a fully-mocked tokenizer pattern:
  `test_gcg_core.py::TestUpdateIdsErrorPaths::test_end_tok_returns_len_toks_when_target_is_at_prompt_end`.

- **Moved to integration tier** four wiring tests that exist specifically to
  exercise the real chat-template pipeline end-to-end. Mocking the tokenizer
  richly enough to satisfy `_update_ids` would defeat the test's purpose.
  Destination matches the existing
  `tests/integration/auxiliary_attacks/test_gcg_integration.py` precedent
  (same gpt2 + custom chat-template pattern; no marker needed — these run in
  `make integration-test`, not in the PR-time `make unit-test` matrix):
    - `test_attack_wiring.py::TestAttackClassWiring::*` (whole file)
    - `test_generator.py::TestCreateAttackWiring::*` (just the class)
  → consolidated into `tests/integration/auxiliary_attacks/test_gcg_attack_wiring_integration.py`.

Verification:
  uv run pytest tests/unit/auxiliary_attacks/gcg/        # 110 passed
  uv run pytest tests/integration/auxiliary_attacks/...  #   4 passed
  rg 'from_pretrained\(`gpt2`\)' tests/unit/           # no matches

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@romanlutz romanlutz added this pull request to the merge queue Jun 2, 2026
Merged via the queue into microsoft:main with commit 9bb005f Jun 2, 2026
47 checks passed
@romanlutz romanlutz deleted the romanlutz/gcg-unit-tests-no-network branch June 2, 2026 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants