Skip to content

MAINT: Migrate os.path.* to pathlib.Path in pyrit/#1877

Merged
romanlutz merged 5 commits into
microsoft:mainfrom
romanlutz:romanlutz/migrate-os-path-to-pathlib
Jun 2, 2026
Merged

MAINT: Migrate os.path.* to pathlib.Path in pyrit/#1877
romanlutz merged 5 commits into
microsoft:mainfrom
romanlutz:romanlutz/migrate-os-path-to-pathlib

Conversation

@romanlutz
Copy link
Copy Markdown
Contributor

Description

The project convention (and modern Python style) is to use pathlib.Path for filesystem paths, but ~22 os.path.* call sites across 11 pyrit/ source files were still using the legacy os.path API, plus a few paired sibling calls (os.makedirs / os.remove / os.unlink). This PR does a focused, mechanical pass to bring those files into line.

Standard 1-for-1 swaps applied throughout:

  • os.path.exists -> Path(...).exists()
  • os.path.isfile -> Path(...).is_file()
  • os.path.splitext(p)[1] -> Path(p).suffix
  • os.path.basename -> Path(...).name
  • os.path.join(os.path.expanduser('~'), ...) -> Path.home() / ...
  • os.path.getsize -> Path(...).stat().st_size
  • os.makedirs(p, exist_ok=True) -> Path(p).mkdir(parents=True, exist_ok=True)
  • os.remove / os.unlink -> Path(...).unlink()

Public parameter and attribute types are kept as str / Optional[str]; conversion to Path(...) happens at the boundary, so callers passing strings continue to work unchanged. No public signatures changed.

One callee improvement worth flagging: HuggingFaceChatTarget previously built cache_dir as a str and passed it to download_specific_files_async, whose declared parameter is cache_dir: Path. The runtime now honors the declared type.

Ruff's PTH ruleset is in fixable but not yet select in pyproject.toml. This PR intentionally does not flip that switch; enabling it (or at least the rules that catch the patterns above) is a natural follow-up now that pyrit/ is clean.

Tests and Documentation

No new tests added. The audit identified one test file (tests/unit/models/test_seed_prompt.py) whose @patch("os.path.isfile") / @patch("os.path.splitext") decorators would have become inert after the source migration, so those two tests were updated to use the tmp_path fixture with real touched files instead.

Verification:

  • ruff check + ruff format --check (full pyrit/ tree): clean
  • Pre-commit hooks (ruff, ty type-check, etc.): clean
  • pytest tests/unit -n auto: 8382 passed / 120 skipped

Coverage on the touched lines was measured with pytest tests/unit --cov=pyrit. 6 of 11 source files have every changed line exercised by the unit suite; in the remaining 5 files some of the swapped lines are not directly hit (cleanup branches in scorers/serializers and the HF cache_dir resolution). Each of those is a literal 1-for-1 stdlib equivalence, so behavior is unchanged.

No documentation changes required.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

romanlutz and others added 3 commits June 1, 2026 11:24
Audit found ~22 os.path.* call sites across 11 pyrit/ source files,
plus a few paired sibling APIs (os.makedirs/os.remove/os.unlink).
This change does a mechanical pass to the pathlib equivalents:

- os.path.exists -> Path(...).exists()
- os.path.isfile -> Path(...).is_file()
- os.path.splitext(p)[1] -> Path(p).suffix
- os.path.basename -> Path(...).name
- os.path.join(os.path.expanduser('~'), ...) -> Path.home() / ...
- os.path.getsize -> Path(...).stat().st_size
- os.makedirs(p, exist_ok=True) -> Path(p).mkdir(parents=True, exist_ok=True)
- os.remove / os.unlink -> Path(...).unlink()

Public parameter and attribute types remain str / Optional[str];
Path(...) conversion happens at the boundary so callers passing
strings continue to work unchanged.

In tests/unit/models/test_seed_prompt.py, two tests previously
patched os.path.isfile / os.path.splitext on the source module.
Those mocks would be inert after the migration, so they were
replaced with real files created via the tmp_path fixture.

Ruff PTH ruleset is intentionally left disabled in this PR;
enabling it is a recommended follow-up now that pyrit/ is clean.

Verification:
- ruff check pyrit/ tests/unit/models/test_seed_prompt.py: passed
- ruff format --check (same scope): passed
- pytest tests/unit/{models/test_seed_prompt.py,
                    models/test_data_type_serializer.py, score,
                    message_normalizer, common, prompt_target,
                    prompt_converter}: 3245 passed, 81 skipped

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Path(value).is_file() raises OSError (ENAMETOOLONG: Errno 36 on Linux, Errno 63 on macOS) for strings exceeding the filesystem's name limit, and raises ValueError for embedded null bytes. The previous os.path.isfile(value) silently returned False in both cases, so SeedPrompt construction worked for long text-only values such as the academic-paper jailbreak template at pyrit/datasets/jailbreak/templates/Arth_Singh/context_flood_academic.yaml. The pathlib migration regressed this, causing TestJailbreakInitialization / TestJailbreakAttackGeneration to fail on macos-3.11 and the Linux coverage job.

Wrap the is_file() probe in try/except (OSError, ValueError) so the inference falls through to data_type='text', preserving the prior os.path.isfile semantics. Add two unit tests in tests/unit/models/test_seed_prompt.py covering the long-value and null-byte cases.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The diff_cover gate (>=90% on changed lines) was failing because four
lines in finally-style cleanup branches added during the os.path ->
pathlib migration were not exercised by the existing test suite:

- pyrit/models/data_type_serializer.py:208 (local_temp_path.unlink()
  after Azure storage upload in save_formatted_audio)
- pyrit/prompt_converter/add_image_to_video_converter.py:185
  (local_temp_path.unlink() in azure_storage_flag finally branch)
- pyrit/score/audio_transcript_scorer.py:250 (Path(wav_path).unlink()
  cleanup when _ensure_wav_format produced a converted temp WAV)
- pyrit/score/video_scorer.py:285 (Path(audio_path).unlink() cleanup
  after successful audio scoring in _score_video_audio_async)

Added four tests that patch DB_DATA_PATH / extract_audio_from_video /
_ensure_wav_format so the cleanup branches run against real temp files
and verify the files are unlinked. Diff coverage now reports 98% on
this PR (target: >=90%).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jsong468 jsong468 self-assigned this Jun 2, 2026
Comment thread pyrit/score/audio_transcript_scorer.py Outdated
Comment thread pyrit/score/video_scorer.py Outdated
Comment thread pyrit/models/data_type_serializer.py
Applies @jsong468's review suggestions on PR microsoft#1877: the two temp-file
cleanup branches in audio_transcript_scorer.py and video_scorer.py
were probing Path.exists() before calling Path.unlink(). Collapse both
to Path(...).unlink(missing_ok=True), which is race-safe (no TOCTOU
gap between the existence check and the unlink) and a single syscall.

Behavior is unchanged: if the file is absent the call is a no-op;
if present it is removed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Brings in 29 new commits from upstream main. One conflict in
pyrit/prompt_converter/add_image_to_video_converter.py: PR microsoft#1878 on
main refactored _add_image_to_video into an async wrapper plus a sync
helper (_add_image_to_video_sync) invoked via asyncio.to_thread, and
guarded the temp-file cleanup with `local_temp_path is not None`.
This branch had concurrently migrated the same cleanup from os.remove
to local_temp_path.unlink(). Resolved by keeping main's structure
(helper split, None guard, dropped the orphan logger.info/return that
moved to the wrapper) and applying the pathlib migration on top:
`local_temp_path.unlink()` instead of `os.remove(local_temp_path)`.

Verification:
- ruff check + ruff format --check on pyrit/ and tests/: clean
- pytest tests/unit -n auto: 9160 passed, 6 skipped

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@romanlutz romanlutz enabled auto-merge June 2, 2026 21:33
@romanlutz romanlutz added this pull request to the merge queue Jun 2, 2026
Merged via the queue into microsoft:main with commit 414ac8c Jun 2, 2026
52 checks passed
@romanlutz romanlutz deleted the romanlutz/migrate-os-path-to-pathlib branch June 2, 2026 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants