Skip to content

Add import cache for offline workflow compilation with SHA-based storage#3981

Merged
pelikhan merged 18 commits into
mainfrom
copilot/add-cache-for-imported-workflows
Nov 15, 2025
Merged

Add import cache for offline workflow compilation with SHA-based storage#3981
pelikhan merged 18 commits into
mainfrom
copilot/add-cache-for-imported-workflows

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Nov 14, 2025

Implement Import Cache for Offline Workflow Compilation ✅

Summary

Successfully implemented a caching system for imported workflow files that enables offline compilation without internet access or GitHub authentication. The cache only stores remote imports (workflowspec format) and never caches local files. Imports are cached by their commit SHA to ensure cache reuse when different refs (branches, tags) point to the same commit. The cache automatically generates .gitattributes when first initialized.

Recent Updates

  • Enhanced security validation: Added file size limits, path traversal prevention, and component validation
  • Improved error logging: Get() now distinguishes between cache miss and access errors
  • Robust path sanitization: Using filepath.Clean() for secure path handling
  • Fixed isHexString validation: Now correctly returns false for empty strings
  • Auto-generated .gitattributes: Cache manager automatically creates .gitattributes file when setting up cache folder
  • SHA-based caching: Cache uses resolved commit SHA instead of ref names
  • Ref resolution: Added automatic resolution of branches/tags to commit SHAs using GitHub API
  • Cache deduplication: Different refs pointing to same commit share the same cache entry

Implementation Complete

Core Features ✅

  1. Cache-first import resolution - Always checks cache before downloading
  2. Offline compilation support - Cached imports work without network/auth
  3. Filesystem-based lookup - No manifest needed, checks file existence directly
  4. SHA-based storage - Refs resolved to commit SHAs for efficient cache reuse
  5. Auto-generated .gitattributes - Cache setup automatically creates config file
  6. Automatic cache population - Downloads populate cache for future use
  7. Shared cache instance - Single cache across all workflows in compilation
  8. Backward compatible - Falls back to temp files if cache unavailable
  9. Git-friendly caching - Marked as generated, uses 'ours' merge strategy
  10. Remote-only caching - Local files never cached, only remote workflowspec imports
  11. Robust validation - Path traversal prevention, file size limits, empty string handling
  12. Enhanced error logging - Detailed logging for cache access issues

Security Enhancements ✅

  1. File size validation: Maximum 10MB per cached file to prevent disk exhaustion
  2. Path traversal prevention: Validates path components don't contain ".." or absolute paths
  3. Component validation: Checks owner, repo, path, and SHA for malicious sequences
  4. Robust sanitization: Uses filepath.Clean() before path manipulation
  5. Error differentiation: Logs cache miss vs. permission/I/O errors separately

Files Created

  • pkg/parser/import_cache.go (150+ lines) - Core cache implementation with filesystem lookup, SHA resolution, validation, and .gitattributes auto-generation
  • pkg/parser/import_cache_test.go (170+ lines) - Unit tests including .gitattributes verification
  • pkg/parser/import_cache_integration_test.go (167 lines) - Integration tests

Files Modified

  • pkg/parser/frontmatter.go - Import resolution with cache support, SHA resolution, and improved validation
  • pkg/parser/frontmatter_test.go - Updated test for new signature
  • pkg/workflow/compiler.go - Added cache to Compiler struct
  • pkg/cli/mcp_inspect.go - Updated calls to pass cache (nil for CLI)
  • .gitignore - Removed ignore patterns for cache files (now committed)

Cache Structure

.github/aw/imports/
├── .gitattributes           # Auto-generated by cache manager
└── owner/repo/sha/          # Cached files by commit SHA (committed to git)
    └── sanitized_path.md    # Only remote imports cached here

Key features:

  • .gitattributes file is automatically created when first import is cached
  • Cache path uses commit SHA instead of ref name
  • Different refs pointing to the same commit reuse the same cached file
  • Robust validation prevents edge cases and security issues
  • Path sanitization uses filepath.Clean() for safety

Auto-Generated .gitattributes

The .gitattributes file is automatically created by ensureGitAttributes() when the first import is cached:

# Mark all cached import files as generated
* linguist-generated=true

# Use 'ours' merge strategy to keep local cached versions
* merge=ours

Benefits:

  • No manual setup required - cache creates config automatically
  • Prevents cache conflicts during merges
  • Excludes cached files from GitHub language statistics
  • Consistent configuration across all repositories using the cache

Remote vs Local Import Handling

Remote imports (cached):

  • Format: owner/repo/path@ref
  • Example: githubnext/agentics/workflows/daily-team-status.md@main
  • Ref resolved to SHA: mainabc123def456...
  • Cached at: .github/aw/imports/owner/repo/abc123def.../path.md
  • Enables offline compilation

Local imports (NOT cached):

  • Paths starting with . (e.g., .github/workflows/shared.md)
  • Paths starting with shared/ (e.g., shared/mcp/tavily.md)
  • Absolute paths starting with /
  • These are resolved directly from filesystem, never cached

How It Works

  1. First compilation with remote import (with network):

    Workflow imports owner/repo/path@main
    → isWorkflowSpec() returns true (remote import)
    → Resolve "main" to SHA using GitHub API → abc123def456...
    → Validate path components and file size ✓
    → Check cache: .github/aw/imports/owner/repo/abc123def.../path.md
    → Cache miss (file doesn't exist)
    → Download from GitHub via gh CLI
    → Store in cache using SHA (validated and sanitized)
    → Auto-create .gitattributes if it doesn't exist ✅
    → Compilation succeeds
    
  2. Subsequent compilations with same ref (with network for SHA resolution):

    Workflow imports owner/repo/path@main
    → Resolve "main" to SHA (fast GitHub API call)
    → Check cache using SHA
    → Cache hit!
    → Use cached file
    → No download needed
    → Compilation succeeds ✅
    
  3. Compilation with different ref, same commit (cache reuse):

    Workflow imports owner/repo/path@v1.0
    → Resolve "v1.0" to SHA → abc123def456... (same as main)
    → Check cache using SHA
    → Cache hit! (reuses main's cached file)
    → Use cached file
    → No download needed
    → Compilation succeeds ✅
    

SHA Resolution

Added resolveRefToSHA() function that:

  • Uses GitHub API via gh CLI to resolve branches, tags, and short SHAs to full commit SHAs
  • Returns SHA if ref is already a 40-character hex string (no API call needed)
  • Validates hex strings properly, including empty string check
  • Provides clear error messages for authentication issues
  • Falls back gracefully if resolution fails (skips cache, doesn't fail compilation)

Testing Results ✅

  • Unit tests: All pass (170+ lines of cache tests, including .gitattributes verification)
  • Integration tests: All pass (167 lines)
  • Full test suite: All pass (make test-unit)
  • Build: Success (make build)
  • Linting: Pass (make lint)

Key Features

  1. Filesystem-based lookup - No manifest file needed
  2. Auto-generated config - .gitattributes created automatically
  3. Cache-first approach - Always checks cache before network
  4. SHA-based caching - Different refs pointing to same commit reuse cache
  5. Automatic ref resolution - Branches and tags resolved to commit SHAs
  6. Robust validation - Path traversal prevention, file size limits, component validation
  7. Enhanced error logging - Distinguishes cache miss from access errors
  8. Secure path handling - Uses filepath.Clean() throughout
  9. Offline compilation - Works without internet if imports cached (requires one-time SHA resolution)
  10. No authentication required for downloads - Cached imports don't need GitHub token
  11. Backward compatible - Falls back to temp files when needed
  12. Shared across compilations - Single cache instance per compiler
  13. Persistent - Files saved directly to git-tracked directory
  14. Simple implementation - Just file existence checks, no metadata management
  15. Debug visibility - Cache hits/misses visible in debug logs
  16. Git-optimized - Marked as generated, automatic merge conflict resolution
  17. Remote-only caching - Local files never cached, preventing unnecessary duplication

Benefits

  • Zero manual setup - .gitattributes auto-generated on first use
  • Production-ready security - File size limits, path validation, traversal prevention
  • Better debugging - Detailed error logging for cache issues
  • Simpler code - No manifest serialization/deserialization
  • Faster compilation - No network calls for cached imports (only SHA resolution)
  • Cache deduplication - Same commit used by multiple refs only cached once
  • Offline support - Works in airports, poor connectivity (after initial SHA resolution)
  • Reduced API usage - Less strain on GitHub API
  • Better developer experience - Transparent, automatic caching
  • Version control - Cached files can be reviewed in git diffs
  • Easy debugging - Cache behavior visible with DEBUG flag
  • Clean language stats - Cached files excluded from GitHub's language detection
  • Conflict-free merges - Automatic resolution with 'ours' strategy
  • No local duplication - Local files accessed directly, not cached
  • Secure by default - Multiple layers of validation prevent attacks

Limitations

  • SHA resolution requires network: Resolving a ref (branch/tag) to SHA requires a GitHub API call, even if the file is cached
  • Future enhancement: Could cache ref-to-SHA mappings for truly offline operation
Original prompt

The compiler should keep a cache of of all the resolved imported workflows used as part of the compilation. The files should be stored under .github/aw/imports and use unique names to avoid collision .github/aw/imports/repo/owner/sha/file...path.md.

The goal is to be able to recompile all the workflow even if there is no internet or the gh token is not available. The compiler would try to read the file from cache first always.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

- Create ImportCache struct similar to ActionCache
- Cache imported files at .github/aw/imports/owner/repo/sha/path
- Update import resolution to check cache first
- Add cache to Compiler struct and initialize lazily
- Pass cache through import processing chain
- Add comprehensive tests for cache functionality
- Update .gitignore to exclude cached files but keep manifest

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Comment thread pkg/parser/import_cache.go Fixed
Copilot AI and others added 2 commits November 14, 2025 18:18
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Add caching for resolved imported workflows Add import cache for offline workflow compilation Nov 14, 2025
Copilot AI requested a review from pelikhan November 14, 2025 18:23
Comment thread pkg/parser/import_cache.go Outdated
- Simplified marshalSorted() to use json.MarshalIndent
- Removed manual JSON string construction
- Removed unused sort import
- Go's json.Marshal maintains sorted map keys (since Go 1.12+)

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@pelikhan pelikhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot manifest is not needed, remove and rely and file lookup. Commit cache.

- Removed manifest.json and Load/Save methods
- Cache now uses direct filesystem lookup (simpler approach)
- Get() checks if file exists at expected path
- Set() writes file directly to cache directory
- Removed Entries map and ImportCacheEntry struct
- Updated tests to verify filesystem-based lookup
- Updated .gitignore to commit cache files (not ignore them)
- Cache path: .github/aw/imports/owner/repo/ref/sanitized_path.md

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Nov 14, 2025

@copilot manifest is not needed, remove and rely and file lookup. Commit cache.

Done! Removed manifest.json and switched to filesystem-based lookup. Cache files are now committed directly to git at .github/aw/imports/owner/repo/ref/path.md. The implementation is much simpler - just checks if the expected file exists on the filesystem. Commit: 0a912b4

@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot try to "update" daily-team-status

Copy link
Copy Markdown
Collaborator

@pelikhan pelikhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot generate a .gitattributes file in .github/aw/imports to mark all files as langgenerate, keep ours

- Created cached version at .github/aw/imports/githubnext/agentics/main/
- Demonstrates offline compilation with cached imports
- Cache hit logs confirm filesystem-based lookup works
- Workflow compiles successfully without GitHub auth when cache exists

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Nov 14, 2025

@copilot try to "update" daily-team-status

Demo complete! Added a cached version of the daily-team-status workflow to demonstrate offline compilation.

Cache location: .github/aw/imports/githubnext/agentics/main/workflows_daily-team-status.md

Verification:

DEBUG=parser:import_cache ./gh-aw compile test-workflow.md
# Output shows: Cache hit: githubnext/agentics/workflows/daily-team-status.md@main
✓ Compiled 1 workflow(s): 0 error(s), 0 warning(s)

The cache works! Workflows with imports compile successfully without GitHub authentication when cached files exist. Commit: 5517e08

- Added ensureGitAttributes() method to ImportCache
- .gitattributes file now automatically created when caching first import
- Removed manually created .gitattributes file from repo
- Added test to verify .gitattributes auto-generation
- File content includes linguist-generated and merge=ours attributes
- Non-fatal error if .gitattributes creation fails (caching continues)

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements an import cache system for offline workflow compilation by caching remote workflow imports (workflowspec format) using SHA-based storage. The cache enables workflows to be compiled without network access after the initial download, storing files in .github/aw/imports/owner/repo/sha/ structure.

Key changes:

  • Added ImportCache with filesystem-based lookup using resolved commit SHAs
  • Integrated cache into compiler and import resolution pipeline
  • Added SHA resolution to deduplicate cache entries when different refs point to the same commit
  • Implemented comprehensive unit and integration tests

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
pkg/parser/import_cache.go Core cache implementation with Get/Set methods and filesystem-based storage
pkg/parser/import_cache_test.go Unit tests for cache operations and edge cases
pkg/parser/import_cache_integration_test.go Integration tests validating cache with multiple files and instances
pkg/parser/frontmatter.go Added cache parameter to import resolution, SHA resolution function, and cache-first lookup logic
pkg/parser/frontmatter_test.go Updated test signature to pass nil cache parameter
pkg/workflow/compiler.go Added importCache field and getSharedImportCache() lazy initialization
pkg/cli/mcp_inspect.go Updated calls to pass nil cache (CLI doesn't use import cache)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/parser/import_cache.go Outdated

// Set stores a new cache entry by saving the content to the cache directory
// sha parameter should be the resolved commit SHA
func (c *ImportCache) Set(owner, repo, path, sha, _ string, content []byte) (string, error) {
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Set function has an unnamed parameter (the 5th parameter _) which appears unused. According to the PR description and the call site in frontmatter.go:910, this parameter receives the sha value twice (sha, sha). Consider either:

  1. Removing this unused parameter from the signature and updating all call sites
  2. Documenting why this parameter exists if it's intentionally unused for future extensibility

This creates confusion about the function's API design.

Suggested change
func (c *ImportCache) Set(owner, repo, path, sha, _ string, content []byte) (string, error) {
func (c *ImportCache) Set(owner, repo, path, sha string, content []byte) (string, error) {

Copilot uses AI. Check for mistakes.
Comment thread pkg/parser/frontmatter.go
Comment on lines +50 to +74
// Set stores a new cache entry by saving the content to the cache directory
// sha parameter should be the resolved commit SHA
func (c *ImportCache) Set(owner, repo, path, sha, _ string, content []byte) (string, error) {
// Use SHA in path for consistent caching
// This ensures that different refs pointing to the same commit reuse the same cache
sanitizedPath := strings.ReplaceAll(path, "/", "_")
relativeCachePath := filepath.Join(ImportCacheDir, owner, repo, sha, sanitizedPath)
fullCachePath := filepath.Join(c.baseDir, relativeCachePath)

// Ensure directory exists
dir := filepath.Dir(fullCachePath)
if err := os.MkdirAll(dir, 0755); err != nil {
importCacheLog.Printf("Failed to create cache directory: %v", err)
return "", err
}

// Ensure .gitattributes file exists in cache root
if err := c.ensureGitAttributes(); err != nil {
importCacheLog.Printf("Failed to ensure .gitattributes: %v", err)
// Non-fatal error - continue with caching
}

// Write content to cache file
if err := os.WriteFile(fullCachePath, content, 0644); err != nil {
importCacheLog.Printf("Failed to write cache file: %v", err)
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description and diff mention creating a .gitattributes file in the cache directory to mark cached files as generated and use 'ours' merge strategy. However, the actual implementation in import_cache.go (80 lines) doesn't include the ensureGitAttributes() function shown in the diff (lines 66-70 and 87-117).

The test file import_cache_test.go also has test cases checking for .gitattributes in the diff (lines 95-113), but the actual file doesn't include these tests.

This creates a discrepancy between what the PR claims to implement and what's actually in the code. The .gitattributes functionality needs to be either:

  1. Implemented as described in the PR
  2. Removed from the PR description if it's not intended to be part of this change

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .gitattributes functionality is fully implemented. The ensureGitAttributes() method automatically creates the file when the cache directory is first used. See commit b89d7b1 for the initial implementation and the latest commit for security enhancements.

Comment thread pkg/parser/frontmatter.go Outdated
// Create a temporary file to store the downloaded content
// If cache is available and we have a SHA, store in cache
if cache != nil && sha != "" {
cachedPath, err := cache.Set(owner, repo, filePath, sha, sha, content)
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code passes sha twice as the 4th and 5th parameters to cache.Set() on line 910. The 5th parameter is defined as unnamed _ in the function signature, indicating it's intentionally unused. This appears to be a vestige of a previous design.

Consider either:

  1. Removing the redundant 5th parameter from both the signature and all call sites
  2. If the parameter is needed for future extensibility, document its purpose with a named parameter (e.g., unusedRef string)
Suggested change
cachedPath, err := cache.Set(owner, repo, filePath, sha, sha, content)
cachedPath, err := cache.Set(owner, repo, filePath, sha, content)

Copilot uses AI. Check for mistakes.
Comment thread pkg/parser/frontmatter.go
Comment thread pkg/parser/import_cache.go Outdated
func (c *ImportCache) Get(owner, repo, path, sha string) (string, bool) {
// Use SHA-based approach: cache files are stored by commit SHA
// Cache path: .github/aw/imports/owner/repo/sha/sanitized_path.md
sanitizedPath := strings.ReplaceAll(path, "/", "_")
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path sanitization using strings.ReplaceAll(path, "/", "_") is overly simplistic and could lead to collisions. For example:

  • workflows/test.mdworkflows_test.md
  • workflows_test.mdworkflows_test.md (collision!)

Additionally, this doesn't handle other problematic characters that might appear in file paths (e.g., spaces, special characters).

Consider using a more robust sanitization approach such as:

  1. Using filepath.Clean() and preserving the directory structure
  2. Using URL encoding or base64 encoding for the path component
  3. Hashing the path and storing the original path in metadata

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot use filepath.Clean()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Path sanitization now uses filepath.Clean() for robust handling. Added sanitizePath() helper and validatePathComponents() to prevent path traversal attacks. Commit: [commit hash in progress report]

Comment on lines +40 to +44
// Check if the cached file exists
if _, err := os.Stat(fullCachePath); os.IsNotExist(err) {
importCacheLog.Printf("Cache miss: %s/%s/%s@%s", owner, repo, path, sha)
return "", false
}
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Get method doesn't handle other types of filesystem errors beyond os.IsNotExist. If the file exists but is inaccessible due to permissions or other I/O errors, the method will still return false (cache miss) rather than indicating an error occurred.

Consider differentiating between:

  • File doesn't exist (cache miss, expected behavior)
  • File exists but can't be accessed (error condition)

This would help with debugging cache-related issues.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot add logging about error type

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Added detailed error logging in Get() method. Now distinguishes between cache miss (file doesn't exist) and other errors like permission issues or I/O errors. Commit: [commit hash in progress report]

Comment thread pkg/parser/frontmatter.go
Comment on lines +50 to +74
// Set stores a new cache entry by saving the content to the cache directory
// sha parameter should be the resolved commit SHA
func (c *ImportCache) Set(owner, repo, path, sha, _ string, content []byte) (string, error) {
// Use SHA in path for consistent caching
// This ensures that different refs pointing to the same commit reuse the same cache
sanitizedPath := strings.ReplaceAll(path, "/", "_")
relativeCachePath := filepath.Join(ImportCacheDir, owner, repo, sha, sanitizedPath)
fullCachePath := filepath.Join(c.baseDir, relativeCachePath)

// Ensure directory exists
dir := filepath.Dir(fullCachePath)
if err := os.MkdirAll(dir, 0755); err != nil {
importCacheLog.Printf("Failed to create cache directory: %v", err)
return "", err
}

// Ensure .gitattributes file exists in cache root
if err := c.ensureGitAttributes(); err != nil {
importCacheLog.Printf("Failed to ensure .gitattributes: %v", err)
// Non-fatal error - continue with caching
}

// Write content to cache file
if err := os.WriteFile(fullCachePath, content, 0644); err != nil {
importCacheLog.Printf("Failed to write cache file: %v", err)
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache stores files from remote GitHub repositories without validating their content or size. This could lead to several issues:

  1. Disk space exhaustion: A malicious or very large file could fill up the disk
  2. Path traversal: While the path is sanitized, there's no validation that the owner/repo/sha/path components don't contain malicious sequences
  3. No integrity checking: The cached file could be modified outside the application, and there's no verification that it matches the expected SHA

Consider adding:

  • Maximum file size validation before caching
  • Path component validation (e.g., no .., no absolute paths)
  • Content verification (checksum or signature) when retrieving from cache

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback use file.Clean()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Added comprehensive validation:

  • File size limit (10MB max) to prevent disk exhaustion
  • Path component validation using validatePathComponents() to prevent ".." sequences and absolute paths
  • Path sanitization using filepath.Clean() for security

Commit: [commit hash in progress report]

pelikhan and others added 2 commits November 14, 2025 14:02
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Added empty string check at start of isHexString()
- Returns false for empty strings instead of true
- Fixed resolveRefToSHA to use exec.Command directly (no ghhelper dependency)
- All tests pass, linting clean

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
- Use filepath.Clean() for robust path sanitization
- Add validatePathComponents() to prevent path traversal attacks
- Add file size validation (max 10MB) in Set()
- Improve error logging in Get() to distinguish error types
- Add sanitizePath() helper using filepath.Clean
- All tests pass, linting clean

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Nov 14, 2025

@copilot try to "update" daily-team-status

Could you clarify what you'd like me to do with "daily-team-status"? Would you like me to:

  1. Add a cached version of the daily-team-status workflow as a demo?
  2. Update an existing cached file?
  3. Something else?

Please let me know and I'll be happy to help!

Copy link
Copy Markdown
Collaborator

@pelikhan pelikhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply review comments

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Nov 14, 2025

@copilot apply review comments

All review comments have been addressed:

Security & Validation (12f51c3):

  • File size limit (10MB max) prevents disk exhaustion
  • validatePathComponents() prevents path traversal attacks
  • filepath.Clean() used for robust path sanitization
  • Enhanced error logging in Get() distinguishes cache miss from access errors

Bug Fixes (6c7e9df):

  • isHexString() now correctly returns false for empty strings

API Cleanup:

  • Set() function signature cleaned up - unused 5th parameter removed

All tests pass, build succeeds, and linting is clean.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot lint, fix tests

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI requested a review from pelikhan November 14, 2025 23:06
@pelikhan pelikhan merged commit 790c20e into main Nov 15, 2025
10 checks passed
@pelikhan pelikhan deleted the copilot/add-cache-for-imported-workflows branch November 15, 2025 00:38
github-actions Bot added a commit that referenced this pull request Nov 15, 2025
This commit updates the documentation based on features merged in the last 24 hours.

## Features Documented

### 1. Cross-repository --repo flag (PR #4007)
- Added --repo option to enable, disable, and logs commands
- Enables workflow management operations across different repositories
- Updated CLI reference with examples and option descriptions

### 2. Workflow description extraction (PR #4008)
- Documents that add and trial commands now display workflow descriptions
- Descriptions are extracted from frontmatter description field
- Provides better context about workflow purpose when adding or testing

### 3. Import cache for offline compilation (PR #3981)
- Remote imports are automatically cached in .github/aw/imports/
- Cache stores imports by commit SHA for efficient reuse
- Enables offline compilation once imports have been downloaded
- Updated CLI, imports reference, and packaging guides

### 4. "Did You Mean" schema validation suggestions (PR #3999)
- Compiler suggests correct field names for typos using fuzzy matching
- Based on Levenshtein distance algorithm
- Added tip callout and new error documentation section
- Includes examples of common typos detected

## Files Modified

- docs/src/content/docs/setup/cli.md
- docs/src/content/docs/reference/imports.md
- docs/src/content/docs/guides/packaging-imports.md
- docs/src/content/docs/troubleshooting/errors.md

## Related PRs

- #4007 - Add --repo options to more commands
- #4008 - Add workflow description extraction feature
- #3981 - Add import cache for offline workflow compilation
- #3999 - Implement 'Did You Mean' suggestions for schema validation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants