FIX: Restore airt.cyber E2E + azure-ai-evaluation partner contract#1864
Open
romanlutz wants to merge 2 commits into
Open
FIX: Restore airt.cyber E2E + azure-ai-evaluation partner contract#1864romanlutz wants to merge 2 commits into
romanlutz wants to merge 2 commits into
Conversation
Two unrelated CI failures on main, surfaced together because the GitHub check-run annotations only carried wrapper messages. End-to-end scenario tests (AzDO build #11909) --------------------------------------------- Every test_scenario_with_pyrit_scan[*] parametrization fails with "Server not available at http://localhost:8000" because PR microsoft#1545 turned pyrit_scan into a thin client of a separate pyrit_backend server, and the e2e test never started one. The "airt.cyber" attribution in the GitHub annotation is just whichever scenario pytest printed last; every airt scenario, benchmark.adversarial, garak.encoding, etc., fail the same way. Fixing the server launch surfaced a second latent failure: PR microsoft#1785 (scenario technique consolidation) made the catalog endpoint (GET /api/scenarios/catalog/<name>) instantiate every scenario class. Cyber.__init__ then calls _build_cyber_strategy() which requires the AttackTechniqueRegistry to be populated, but the scenario_technique initializer was not in the test config — so the catalog GET returned 500 before any scenario run could begin. Fixes: - tests/end_to_end/conftest.py (new): session-scoped autouse fixture that launches pyrit_backend via ServerLauncher and tears it down on exit. Idempotent if a backend is already healthy. - tests/end_to_end/test_config.yaml: declare scenario_technique as a backend startup initializer so the catalog endpoint can instantiate scenarios that rely on AttackTechniqueRegistry. Verified locally end-to-end with a dummy API key: the catalog endpoint succeeds, the scenario runs both strategies (2/2 attacks), and only the final OpenAI request fails with 401 (fake key). On AzDO with real Key Vault credentials the scenario will pass. Partner integration test (AzDO build #11908) -------------------------------------------- test_scorer_identifier_importable fails with ImportError: cannot import name 'ScorerIdentifier' from 'pyrit.identifiers' (the AzDO "exit code 2" is the script wrapper, not a pytest collection error — pytest itself reported 1 failed, 97 passed, 3 skipped). PR microsoft#1387 collapsed ScorerIdentifier/AttackIdentifier/ConverterIdentifier/ TargetIdentifier into ComponentIdentifier with no deprecation alias. azure-ai-evaluation's _rai_scorer.py still does "from pyrit.identifiers import ScorerIdentifier" and uses it as a return-type annotation (verified against the live partner source), so the test correctly flagged a real partner contract break. Fix: - pyrit/identifiers/__init__.py: PEP 562 __getattr__ returns ComponentIdentifier for the name ScorerIdentifier and emits print_deprecation_message(removed_in="0.16.0") per the project deprecation policy. No partner code change required to keep azure-ai-evaluation working; the alias buys them a normal deprecation window to migrate. Verification ------------ - tests/partner_integration: 98 passed, 3 skipped (was 97 passed, 1 failed, 3 skipped) - tests/unit/identifiers: 210 passed - tests/unit/cli/test_pyrit_scan.py + test_scenario_service: clean - tests/end_to_end/test_scenarios.py[airt.cyber] locally: backend starts, catalog 200s, scenario runs to LLM call (failed only on fake-key 401) Out of scope: the 3 test_all_datasets.py HuggingFace/GitHub fetch failures in build #11909 (flaky 3rd-party HTTP, unrelated to either root cause). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds unit tests that exercise the new module-level `__getattr__` in pyrit/identifiers/__init__.py: the deprecated ScorerIdentifier alias resolves to ComponentIdentifier and emits a DeprecationWarning mentioning the 0.16.0 removal version, and unknown attributes raise AttributeError. This restores diff-cover (>=90%) on the changed lines in pyrit/identifiers/__init__.py (was 44.4%). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two unrelated CI failures on
main, surfaced together because the GitHub check-run annotations only carried wrapper messages. Full investigation lives in the commit body; short version below.End-to-End scenario tests (AzDO build #11909)
Every
test_scenario_with_pyrit_scan[*]parametrization fails withServer not available at http://localhost:8000because PR #1545 turnedpyrit_scaninto a thin client of a separatepyrit_backendserver, and the e2e test never started one. Theairt.cyberattribution in the GitHub annotation is just whichever scenario pytest printed last — every airt scenario,benchmark.adversarial,garak.encoding, etc., fail the same way.Fixing the server launch surfaced a second latent failure: PR #1785 (scenario technique consolidation) made
GET /api/scenarios/catalog/<name>instantiate every scenario class.Cyber.__init__calls_build_cyber_strategy(), which requiresAttackTechniqueRegistryto be populated — but thescenario_techniqueinitializer was not in the test config, so the catalog GET returned 500 before any scenario run could begin.Fixes:
tests/end_to_end/conftest.py(new): session-scoped autouse fixture that launchespyrit_backendviaServerLauncherand tears it down on exit. Idempotent if a backend is already healthy.tests/end_to_end/test_config.yaml: declarescenario_techniqueas a backend startup initializer so the catalog endpoint can instantiate scenarios that rely onAttackTechniqueRegistry.Verified locally end-to-end with a dummy API key: the catalog endpoint succeeds, the scenario runs both strategies (2/2 attacks), and only the final OpenAI request fails with 401 (fake key). On AzDO with real Key Vault credentials the scenario will pass.
Partner integration test (AzDO build #11908)
test_scorer_identifier_importablefails with:(The AzDO "exit code 2" is the script wrapper, not a pytest collection error — pytest itself reported 1 failed / 97 passed / 3 skipped.)
PR #1387 collapsed
ScorerIdentifier/AttackIdentifier/ConverterIdentifier/TargetIdentifierintoComponentIdentifierwith no deprecation alias.azure-ai-evaluation''s_rai_scorer.pystill doesfrom pyrit.identifiers import ScorerIdentifierand uses it as a return-type annotation (verified against the live partner source), so the test correctly flagged a real partner contract break.Fix:
pyrit/identifiers/__init__.py: PEP 562__getattr__returnsComponentIdentifierfor the nameScorerIdentifierand emitsprint_deprecation_message(removed_in="0.16.0")per the project deprecation policy. No partner code change required to keepazure-ai-evaluationworking; the alias buys them a normal deprecation window to migrate.Verification
tests/partner_integration: 98 passed, 3 skipped (was 97 passed, 1 failed, 3 skipped)tests/unit/identifiers: 210 passedtests/unit/cli/test_pyrit_scan.py+test_scenario_service: cleantests/end_to_end/test_scenarios.py[airt.cyber]locally: backend starts, catalog 200s, scenario runs to LLM call (failed only on fake-key 401)Out of scope
The 3
test_all_datasets.pyHuggingFace/GitHub fetch failures in build #11909 (flaky 3rd-party HTTP) are unrelated to either root cause and are being handled in a separate session.