feat: Artifactory archive entry download for virtual file packages#525
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an Artifactory "archive entry download" fast path for virtual file package installs so APM can fetch a single file from a proxied repository archive without downloading and extracting the full zip, while retaining a full-archive fallback.
Changes:
- Introduces
fetch_entry_from_archive()utility to download a specific file from inside an Artifactory-proxied archive via the!/{path}entry URL pattern. - Updates Artifactory single-file download flow to try entry download first, then fall back to full-archive extraction.
- Expands unit tests and adds a changelog entry for the behavior change.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/apm_cli/deps/github_downloader.py |
Uses archive entry download as a fast path before full-archive extraction for single-file Artifactory downloads. |
src/apm_cli/deps/artifactory_entry.py |
New shared helper implementing the Archive Entry Download API attempt logic across URL patterns. |
tests/unit/test_artifactory_support.py |
Adds coverage for entry-download success/failure ordering and the new helper’s behavior. |
CHANGELOG.md |
Documents the Artifactory virtual file download behavior change under Unreleased. |
6537ed4 to
897b340
Compare
There was a problem hiding this comment.
Thanks for the contribution -- the performance optimization is real and the security work (path traversal, URL encoding) is solid. However, this PR deepens an architectural problem we need to address before adding more registry-specific code.
1. No registry abstraction — Artifactory is hardcoded throughout
github_downloader.py now has 8 Artifactory-specific methods. This PR adds a 9th file (artifactory_entry.py) that's entirely JFrog-specific. APM's goal is to support any package registry, but today it's impossible to add Nexus, Gitea, or another proxy without forking the download path.
Before landing more Artifactory features, we need a RegistryClient interface that different backends implement. The env vars (PROXY_REGISTRY_*) already suggest this was the intent — the code just never followed through.
2. The new module should live behind an interface, not be called directly
fetch_entry_from_archive() takes Artifactory-specific params (host, prefix, archive URL patterns). If we add a registry abstraction later, this module becomes an internal detail of an ArtifactoryRegistryClient — but today it's imported directly from github_downloader.py, coupling the two.
What I'd like to see:
Land this behind a minimal RegistryClient protocol — even a thin one that only Artifactory implements today. That way this PR becomes the first proper registry backend rather than more hardcoded logic.
The optimization itself is great — just needs the right home.
897b340 to
b6557a9
Compare
|
@danielmeppiel -- addressed. The entry download now lives behind a
@runtime_checkable
class RegistryClient(Protocol):
def fetch_file(self, owner, repo, file_path, ref, resilient_get=None) -> Optional[bytes]: ...
def get_client(self) -> RegistryClient:
from .artifactory_entry import ArtifactoryRegistryClient
return ArtifactoryRegistryClient(config=self)
Adding a Nexus or Gitea backend is now: implement 3 new tests for the protocol + 103 total artifactory tests passing. Full suite: 3563 passed. |
…icrosoft#417) Virtual file packages (.prompt.md, .agent.md, etc.) downloaded via Artifactory now use the Archive Entry Download API to fetch the single file directly, avoiding a full multi-MB archive download. The entry URL appends `!/{repo}-{ref}/{path}` to the standard archive URL. If the entry API is unavailable (404, connection error, or unsupported Artifactory version), the existing full-archive fallback is used transparently. New shared utility `artifactory_entry.fetch_entry_from_archive()` is reusable by the marketplace client for proxy-aware index fetches (microsoft#506). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
b79ef6f to
67ee4fe
Compare
Summary
.prompt.md,.agent.md, etc.) downloaded via Artifactory now use the Archive Entry Download API to fetch individual files without downloading the full multi-MB archiveartifactory_entry.fetch_entry_from_archive()is reusable across the codebaseCloses #417 (Phase 1: virtual file packages).
How it works
JFrog Artifactory supports downloading individual entries from inside a zip archive by appending
!/{path}to the archive URL:The new
fetch_entry_from_archive()utility tries this for each candidate archive URL (GitHub heads, GitLab, GitHub tags). On success, returns the raw bytes. On failure, returnsNoneso the caller falls back to the existing full-archive extraction.Performance impact
.prompt.mdfrom 6MB repoComplementary to #506
The shared
fetch_entry_from_archive()utility is designed to be reused by the marketplace client for proxy-aware index fetches (#506). The marketplace client can call:Security
file_pathcontaining../components is rejected before constructing the URLurllib.parse.quoteTest plan
TestArchiveEntryDownloadcovering success, 404, connection error, all URL patterns, URL encoding, headers, stop-on-first-success, path traversal rejection, tag refs, no-authTestArtifactoryFileDownloadfor entry-before-archive ordering and fallback🤖 Generated with Claude Code