TEST: stop GCG unit tests from hitting HuggingFace#1886
Merged
romanlutz merged 2 commits intoJun 2, 2026
Conversation
5 tests under `tests/unit/auxiliary_attacks/gcg/` were silently downloading the gpt2 tokenizer at test time via `AutoTokenizer.from_pretrained(`gpt2`)`, which flakes the dev_all CI matrix when HuggingFace rate-limits (e.g. 5 OSError failures on windows-latest+py3.10+dev_all in PR microsoft#1866). Per `.github/instructions/test.instructions.md`, unit tests must not hit the network. Two paths taken: - **Mocked in place** the lone tokenizer-edge-case test whose adjacent siblings in the same class already use a fully-mocked tokenizer pattern: `test_gcg_core.py::TestUpdateIdsErrorPaths::test_end_tok_returns_len_toks_when_target_is_at_prompt_end`. - **Moved to integration tier** four wiring tests that exist specifically to exercise the real chat-template pipeline end-to-end. Mocking the tokenizer richly enough to satisfy `_update_ids` would defeat the test's purpose. Destination matches the existing `tests/integration/auxiliary_attacks/test_gcg_integration.py` precedent (same gpt2 + custom chat-template pattern; no marker needed — these run in `make integration-test`, not in the PR-time `make unit-test` matrix): - `test_attack_wiring.py::TestAttackClassWiring::*` (whole file) - `test_generator.py::TestCreateAttackWiring::*` (just the class) → consolidated into `tests/integration/auxiliary_attacks/test_gcg_attack_wiring_integration.py`. Verification: uv run pytest tests/unit/auxiliary_attacks/gcg/ # 110 passed uv run pytest tests/integration/auxiliary_attacks/... # 4 passed rg 'from_pretrained\(`gpt2`\)' tests/unit/ # no matches Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
rlundeen2
approved these changes
Jun 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Five tests under
tests/unit/auxiliary_attacks/gcg/silently downloaded the gpt2 tokenizer at test time viaAutoTokenizer.from_pretrained("gpt2"). When HuggingFace rate-limits (429), thedev_allmatrix onwindows-latest+ Python 3.10 fails on otherwise-unrelated PRs (most recent example: 5OSError: We couldn't connect to 'https://huggingface.co' ...failures on PR #1866). Per.github/instructions/test.instructions.md, unit tests must not hit the network.This PR splits the offenders two ways based on what each test actually needs gpt2 for:
Mocked in place -- the lone tokenizer-edge-case test
test_gcg_core.py::TestUpdateIdsErrorPaths::test_end_tok_returns_len_toks_when_target_is_at_prompt_endwas rewritten to use a fully-mocked tokenizer whereencoding.char_to_tokenreturnsNonepast the target's end. This mirrors the pattern already used by the two adjacenttest_start_tok_*tests in the same class, so it both fits in and exercises the exactreturn len(toks) if tok is None else tokbranch inend_tok.Moved to integration -- four wiring tests that exist precisely to exercise the real chat-template pipeline end-to-end (constructing real
IndividualPromptAttack/ProgressiveMultiPromptAttackand running them through_update_ids). Mocking the tokenizer richly enough to satisfy that pipeline would defeat the test's purpose. Destination matches the existingtests/integration/auxiliary_attacks/test_gcg_integration.pyprecedent (same gpt2 + custom chat-template pattern):tests/unit/auxiliary_attacks/gcg/test_attack_wiring.py(both tests in it)TestCreateAttackWiringclass extracted fromtests/unit/auxiliary_attacks/gcg/test_generator.pyConsolidated into a new
tests/integration/auxiliary_attacks/test_gcg_attack_wiring_integration.py. No@pytest.mark.run_only_if_all_testsmarker -- that marker is for tests needing real API credentials; these only need a HF tokenizer, and the precedent file doesn't use it either.GitHub Actions PR CI only runs
make unit-test, so this fully removes the offenders from the PR-time matrix. The Azure DevOps integration pipeline still exercises them on push tomain.Tests and Documentation
Verified:
uv run pytest tests/unit/auxiliary_attacks/gcg/-> 110 passed, no networkuv run --with pytest-socket pytest tests/unit/auxiliary_attacks/gcg/ --disable-socket --allow-hosts=127.0.0.1,localhost,::1-> 110 passed (empirical proof: any off-loopback socket call would raiseSocketBlockedError)uv run pytest tests/integration/auxiliary_attacks/test_gcg_attack_wiring_integration.py-> 4 passedrg 'from_pretrained\("gpt2"\)' tests/unit/-> no matchesNo documentation changes needed; this is a test-only refactor.
Out of scope but noted
tests/unit/prompt_converter/test_pdf_converter.py::test_filename_extension_existing_pdfmakes a realrequests.gettoraw.githubusercontent.com/.../fake_CV.pdf-- same class of bug but not in today's failing job. Worth a separate follow-up.