Skip to content

feat: Artifactory archive entry download for virtual file packages#525

Merged
danielmeppiel merged 3 commits intomicrosoft:mainfrom
chkp-roniz:feat/archive-entry-download
Apr 4, 2026
Merged

feat: Artifactory archive entry download for virtual file packages#525
danielmeppiel merged 3 commits intomicrosoft:mainfrom
chkp-roniz:feat/archive-entry-download

Conversation

@chkp-roniz
Copy link
Copy Markdown
Contributor

Summary

  • Virtual file packages (.prompt.md, .agent.md, etc.) downloaded via Artifactory now use the Archive Entry Download API to fetch individual files without downloading the full multi-MB archive
  • New shared utility artifactory_entry.fetch_entry_from_archive() is reusable across the codebase
  • Graceful fallback to full-archive download when the entry API is unavailable (404, connection error, unsupported Artifactory version)

Closes #417 (Phase 1: virtual file packages).

How it works

JFrog Artifactory supports downloading individual entries from inside a zip archive by appending !/{path} to the archive URL:

GET {archive_url}!/{repo}-{ref}/{file_path}

The new fetch_entry_from_archive() utility tries this for each candidate archive URL (GitHub heads, GitLab, GitHub tags). On success, returns the raw bytes. On failure, returns None so the caller falls back to the existing full-archive extraction.

Performance impact

Scenario Before After
Single .prompt.md from 6MB repo 6MB download + unzip ~1KB download
Entry API unavailable 6MB download + unzip Same (graceful fallback)

Complementary to #506

The shared fetch_entry_from_archive() utility is designed to be reused by the marketplace client for proxy-aware index fetches (#506). The marketplace client can call:

from apm_cli.deps.artifactory_entry import fetch_entry_from_archive

content = fetch_entry_from_archive(
    host=registry_config.host,
    prefix=registry_config.prefix,
    owner=source.owner,
    repo=source.repo,
    file_path=source.path,  # "marketplace.json"
    ref=source.branch,
    headers=registry_config.get_headers(),
)

Security

  • Path traversal protection (CWE-22): file_path containing ../ components is rejected before constructing the URL
  • URL encoding: special characters in file paths are properly encoded via urllib.parse.quote
  • No internal data, credentials, or company-specific URLs in source or tests

Test plan

  • 12 new unit tests in TestArchiveEntryDownload covering success, 404, connection error, all URL patterns, URL encoding, headers, stop-on-first-success, path traversal rejection, tag refs, no-auth
  • 4 updated/added tests in TestArtifactoryFileDownload for entry-before-archive ordering and fallback
  • Full unit suite: 3411 passed
  • Live-tested against real Artifactory instance (entry download + fallback)

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings April 1, 2026 05:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an Artifactory "archive entry download" fast path for virtual file package installs so APM can fetch a single file from a proxied repository archive without downloading and extracting the full zip, while retaining a full-archive fallback.

Changes:

  • Introduces fetch_entry_from_archive() utility to download a specific file from inside an Artifactory-proxied archive via the !/{path} entry URL pattern.
  • Updates Artifactory single-file download flow to try entry download first, then fall back to full-archive extraction.
  • Expands unit tests and adds a changelog entry for the behavior change.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
src/apm_cli/deps/github_downloader.py Uses archive entry download as a fast path before full-archive extraction for single-file Artifactory downloads.
src/apm_cli/deps/artifactory_entry.py New shared helper implementing the Archive Entry Download API attempt logic across URL patterns.
tests/unit/test_artifactory_support.py Adds coverage for entry-download success/failure ordering and the new helper’s behavior.
CHANGELOG.md Documents the Artifactory virtual file download behavior change under Unreleased.

@chkp-roniz chkp-roniz force-pushed the feat/archive-entry-download branch 3 times, most recently from 6537ed4 to 897b340 Compare April 2, 2026 22:02
Copy link
Copy Markdown
Collaborator

@danielmeppiel danielmeppiel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution -- the performance optimization is real and the security work (path traversal, URL encoding) is solid. However, this PR deepens an architectural problem we need to address before adding more registry-specific code.

1. No registry abstraction — Artifactory is hardcoded throughout

github_downloader.py now has 8 Artifactory-specific methods. This PR adds a 9th file (artifactory_entry.py) that's entirely JFrog-specific. APM's goal is to support any package registry, but today it's impossible to add Nexus, Gitea, or another proxy without forking the download path.

Before landing more Artifactory features, we need a RegistryClient interface that different backends implement. The env vars (PROXY_REGISTRY_*) already suggest this was the intent — the code just never followed through.

2. The new module should live behind an interface, not be called directly

fetch_entry_from_archive() takes Artifactory-specific params (host, prefix, archive URL patterns). If we add a registry abstraction later, this module becomes an internal detail of an ArtifactoryRegistryClient — but today it's imported directly from github_downloader.py, coupling the two.

What I'd like to see:

Land this behind a minimal RegistryClient protocol — even a thin one that only Artifactory implements today. That way this PR becomes the first proper registry backend rather than more hardcoded logic.

The optimization itself is great — just needs the right home.

@chkp-roniz chkp-roniz force-pushed the feat/archive-entry-download branch from 897b340 to b6557a9 Compare April 3, 2026 22:11
@chkp-roniz
Copy link
Copy Markdown
Contributor Author

@danielmeppiel -- addressed. The entry download now lives behind a RegistryClient protocol:

registry_proxy.py:

@runtime_checkable
class RegistryClient(Protocol):
    def fetch_file(self, owner, repo, file_path, ref, resilient_get=None) -> Optional[bytes]: ...

artifactory_entry.py:

registry_proxy.py -- RegistryConfig.get_client():

def get_client(self) -> RegistryClient:
    from .artifactory_entry import ArtifactoryRegistryClient
    return ArtifactoryRegistryClient(config=self)

github_downloader.py:

  • Uses cfg.get_client().fetch_file() when a RegistryConfig is available
  • Falls back to the standalone helper for explicit-FQDN mode (no config)

Adding a Nexus or Gitea backend is now: implement RegistryClient, add a factory branch in get_client().

3 new tests for the protocol + 103 total artifactory tests passing. Full suite: 3563 passed.

…icrosoft#417)

Virtual file packages (.prompt.md, .agent.md, etc.) downloaded via
Artifactory now use the Archive Entry Download API to fetch the single
file directly, avoiding a full multi-MB archive download.

The entry URL appends `!/{repo}-{ref}/{path}` to the standard archive
URL.  If the entry API is unavailable (404, connection error, or
unsupported Artifactory version), the existing full-archive fallback
is used transparently.

New shared utility `artifactory_entry.fetch_entry_from_archive()` is
reusable by the marketplace client for proxy-aware index fetches (microsoft#506).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@chkp-roniz chkp-roniz force-pushed the feat/archive-entry-download branch from b79ef6f to 67ee4fe Compare April 4, 2026 17:58
@danielmeppiel danielmeppiel merged commit 44eebf1 into microsoft:main Apr 4, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Artifactory Archive Entry Download Optimization for subdirectory/file packages

4 participants