Conversation
|
@andybons main decision to discuss here is passing all traffic through the proxy. There's pros/cons. AWS doesn't strictly need it, because credentials aren't injected at the request level, but then there'd be no auditing of network traffic. |
📦 Previous Review (superseded by newer review)Pull Request Review: AWS Credential SupportThis PR adds comprehensive AWS credential support via IAM role assumption. The implementation is well-designed and follows good security practices. Below is my detailed review: ✅ StrengthsArchitecture & Design
Code Quality
Documentation
🔍 Issues & Concerns1. ARN Parsing Vulnerability (Security - Medium)Location: The ARN parser only validates structure but doesn't handle all edge cases: parts := strings.Split(arn, ":")
if len(parts) < 6 {
return nil, fmt.Errorf("invalid ARN format: %s", arn)
}Issues:
Recommendation: Join all parts from index 5 onward: if len(parts) < 6 {
return nil, fmt.Errorf("invalid ARN format: %s", arn)
}
resource := strings.Join(parts[5:], ":")
if !strings.HasPrefix(resource, "role/") {
return nil, fmt.Errorf("invalid ARN: must be a role ARN (got %s)", resource)
}2. Credential Storage Abuse of Scopes Field (Design - Medium)Location: AWS config is packed into the cred := credential.Credential{
Provider: credential.ProviderAWS,
Token: awsCfg.RoleARN, // Store role ARN in Token field
Scopes: []string{awsCfg.Region, awsCfg.SessionDurationStr, awsCfg.ExternalID},
CreatedAt: time.Now(),
}Issues:
Recommendation: Either:
3. Missing Context Cancellation Check (Reliability - Low)Location:
result, err := p.stsClient.AssumeRole(ctx, input)Recommendation: Add early check: if err := ctx.Err(); err != nil {
return nil, err
}
result, err := p.stsClient.AssumeRole(ctx, input)4. Credential Helper Dependency on curl (Portability - Low)Location: The shell script requires exec curl -sf -m 10 -H "Authorization: Bearer $AGENTOPS_CREDENTIAL_TOKEN" "$AGENTOPS_CREDENTIAL_URL"Issues:
Recommendation:
5. Potential Goroutine Leak in Handler (Performance - Low)Location:
Recommendation: Consider adding timeout to credential fetching or documenting expected behavior. 6. Error Information Disclosure (Security - Low)Location: http.Error(w, fmt.Sprintf("failed to get credentials: %v", err), http.StatusInternalServerError)Issue: Internal errors from AWS SDK might leak sensitive information to the container Recommendation: Sanitize or wrap error: // Log full error server-side
fmt.Fprintf(os.Stderr, "AWS credential fetch failed: %v\n", err)
// Return generic message to client
http.Error(w, "failed to get credentials", http.StatusInternalServerError)7. Unchecked Array Access (Reliability - Low)Location: if len(cred.Scopes) > 0 && cred.Scopes[0] != "" {
region = cred.Scopes[0]
}
if len(cred.Scopes) > 1 {
sessionDurationStr = cred.Scopes[1]
}
if len(cred.Scopes) > 2 {
externalID = cred.Scopes[2]
}Issue: Brittle if credential format changes. Related to Issue #2 above. 📝 Minor ImprovementsCode Style
Testing
Documentation
🎯 Performance Considerations
🔒 Security AssessmentPositive:
Concerns:
📊 Test CoverageGood coverage on:
Missing coverage:
🎬 Recommendations SummaryMust Fix (Before Merge):
Should Fix (Before Merge):
Nice to Have (Future):
✨ Overall AssessmentThis is a high-quality PR with thoughtful architecture and good security practices. The design documentation is excellent, and the code is generally clean and well-tested. The main concerns are around the ARN parsing edge case and the abuse of the With the recommended fixes, this feature would be production-ready and a valuable addition to AgentOps. Recommendation: Request changes for Issues #1 and #2, then approve. Great work on this feature! The credential_process approach is clever and solves the refresh problem elegantly. 🚀 |
Design for AWS credential injection via ECS container credential protocol. Key decisions: - Role-only (no credential passthrough) - 15-minute default session duration with auto-refresh - Proxy serves /_aws/credentials endpoint - Uses host AWS credentials to assume role at runtime 🤖 Generated with [Claude Code](https://claude.com/claude-code)
- Add AWSConfig struct with role ARN, region, session duration, external ID - Add ParseRoleARN function to validate IAM role ARNs - Add SessionDuration method with validation (15m-12h range) - Include comprehensive tests
- Add `agent grant aws --role=ARN` command - Support --region, --session-duration, --external-id options - Validate role ARN format and test AssumeRole at grant time - Store role config (not credentials) for runtime assumption - Add AWS SDK v2 dependencies
- Add AWSCredentialHandler for serving credentials via HTTP - Add AWSCredentialProvider with STS AssumeRole and 5-minute cache buffer - Returns credentials in ECS container credential format - Include comprehensive tests with mock STS client
- Add AWSCredentialProvider field to Run struct - Add SetAWSHandler method and routing in proxy.go - Set up AWS credential endpoint in manager.go - Configure AWS_CONTAINER_CREDENTIALS_FULL_URI and AWS_REGION env vars - Add AWS hosts to network policy allow list
- Add AWS credentials section explaining IAM role assumption - Document --role, --region, --session-duration options - Explain how credentials auto-refresh - Note required IAM setup steps
- Add agent.yaml using aws sts get-caller-identity - Include CloudFormation and CLI snippets for IAM role setup - Document the credential flow and verification steps
- Add platform detection in BuildImage (linux/arm64 on Apple Silicon) - Update AWS CLI install to detect architecture via uname -m - Fixes Rosetta errors when building on Apple Silicon
AWS SDK only trusts container credential endpoints from loopback addresses unless AWS_CONTAINER_AUTHORIZATION_TOKEN is set. Generate a secure token for the credential endpoint and verify it in the handler. - Add authToken to AWSCredentialProvider and handler - Generate and set AWS_CONTAINER_AUTHORIZATION_TOKEN - Verify Bearer token in credential handler - Fixes 'Unsupported host' error on macOS Docker
AWS SDKs only trust container credential endpoints from loopback addresses. host.docker.internal is not in the allowed list and there's no workaround. See: boto/botocore#2515 - On Linux (host network): use credential endpoint at localhost - On macOS/Windows (bridge mode): inject credentials as env vars directly Credentials are already short-lived from AssumeRole, so security is maintained. Auto-refresh is not available on macOS/Windows; set longer --session-duration if needed.
Instead of injecting static AWS credentials as env vars (which expire), use AWS SDK's credential_process feature for automatic refresh. Implementation: - Add aws-credential-helper binary that fetches creds from proxy - Embed cross-compiled linux/amd64 and linux/arm64 binaries - Mount helper + AWS config into container at /agentops/aws/ - Proxy endpoint returns credential_process JSON format Also fixes: - MITM all HTTPS when CA is present (not just for credential injection) - Include CA cert in TLS chain for Python SSL compatibility - Add AWS_PAGER="" to disable pager in containers without less Agents can now run for days without credential expiration issues.
Updates the AWS credentials implementation plan to accurately document what was built: - Changed status to IMPLEMENTED - Documented use of credential_process instead of ECS-style credentials - Added Task 8 for embedded credential helper binary - Added Architecture Notes explaining design decisions - Fixed file paths (run.go vs types.go) - Added credential flow diagram - Listed all files changed 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Merge content from the credential_process design doc into the main implementation plan: - Added problem statement about credential expiration - Added endpoint reachability table by platform - Added error handling notes Remove the now-redundant design document. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Simplify the AWS credential helper by using a shell script instead of a cross-compiled Go binary. This works because: - Containers using --grant aws must have the aws dependency - The aws dependency installation includes curl - A shell script with curl is simpler and has no build step Removes: - cmd/aws-credential-helper/main.go - internal/run/generate.go - internal/run/helpers/ (embedded binaries) The credential helper is now a simple shell script embedded as a string constant in internal/run/credential_helper.go. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Fixes from code review: 1. Resource leak: Track and clean up AWS temp directory in Wait() and Destroy() - prevents temp dir accumulation on host 2. Nil pointer safety: Check result.Credentials before dereferencing and use aws.ToString/aws.ToTime helpers 3. JSON encoding: Handle json.Encode error (log warning since response already started) 4. Context propagation: NewAWSCredentialProvider now accepts context for proper timeout/cancellation support 5. Session uniqueness: Grant test uses timestamp-based session name for clearer CloudTrail logs 6. MITM documentation: Added security note explaining that ALL HTTPS traffic is intercepted when CA is configured (intentional for observability) 🤖 Generated with [Claude Code](https://claude.com/claude-code)
📦 Previous Review (superseded by newer review)Code Review: AWS Credential Support (PR #24)SummaryThis PR adds comprehensive AWS credential support via IAM role assumption with automatic refresh. The implementation is well-architected and follows security best practices. I've identified several issues ranging from critical security concerns to minor improvements. Critical Issues1. Timing Attack Vulnerability in Auth Token ComparisonLocation: if auth == "" || auth \!= "Bearer "+h.authToken {
http.Error(w, "unauthorized", http.StatusUnauthorized)
return
}Issue: Using direct string comparison ( Fix: Use import "crypto/subtle"
expectedAuth := "Bearer " + h.authToken
if auth == "" || subtle.ConstantTimeCompare([]byte(auth), []byte(expectedAuth)) \!= 1 {
http.Error(w, "unauthorized", http.StatusUnauthorized)
return
}2. Missing Error Context Leakage PreventionLocation: http.Error(w, fmt.Sprintf("failed to get credentials: %v", err), http.StatusInternalServerError)Issue: Exposing raw error messages could leak sensitive information about the AWS account, role configuration, or internal state to unauthorized clients. Fix: Log the detailed error server-side but return generic message to client: fmt.Fprintf(os.Stderr, "AWS credential fetch error: %v\n", err)
http.Error(w, "internal server error", http.StatusInternalServerError)3. Race Condition in Credential CachingLocation: Issue: While the double-checked locking pattern is implemented, there's a subtle issue. After releasing the read lock at line 137, another goroutine could acquire the write lock, refresh credentials, and release it before this goroutine acquires the write lock at line 140. However, this is actually handled correctly by the double-check at line 144-146. This is not a bug, but the pattern is error-prone. Suggestion: Consider using High Priority Issues4. ARN Validation IncompleteLocation: Issue: ARN validation doesn't check for:
Example of what should be rejected:
Fix: func ParseRoleARN(arn string) (*AWSConfig, error) {
if arn == "" {
return nil, fmt.Errorf("role ARN is required")
}
parts := strings.Split(arn, ":")
if len(parts) \!= 6 {
return nil, fmt.Errorf("invalid ARN format: expected 6 parts, got %d", len(parts))
}
partition, service, region, account, resource := parts[1], parts[2], parts[3], parts[4], parts[5]
// Validate partition
validPartitions := map[string]bool{"aws": true, "aws-cn": true, "aws-us-gov": true}
if \!validPartitions[partition] {
return nil, fmt.Errorf("invalid ARN partition: %s (expected aws, aws-cn, or aws-us-gov)", partition)
}
if service \!= "iam" {
return nil, fmt.Errorf("invalid ARN: must be an IAM ARN (got %s)", service)
}
if account == "" {
return nil, fmt.Errorf("invalid ARN: account ID is required")
}
if \!strings.HasPrefix(resource, "role/") {
return nil, fmt.Errorf("invalid ARN: must be a role ARN (got %s)", resource)
}
roleName := strings.TrimPrefix(resource, "role/")
if roleName == "" {
return nil, fmt.Errorf("invalid ARN: role name is required")
}
return &AWSConfig{
RoleARN: arn,
Region: "us-east-1",
}, nil
}5. Insufficient Input Validation on Session DurationLocation: Issue: Maximum duration depends on the role configuration. Service-linked roles and roles with MFA may have different limits. The hardcoded 12h limit might not work for all scenarios. Suggestion: Document this limitation and consider validating against the actual role's max session duration during the grant test. 6. Proxy MITM Interception Change Has Broad ImpactLocation: -if _, hasCredential := p.getCredential(host); p.ca \!= nil && hasCredential {
+if p.ca \!= nil {Issue: This changes the proxy to intercept ALL HTTPS traffic, not just credential-injected hosts. While the comment explains this is intentional for observability, this is a significant behavior change that affects all grants, not just AWS. Concerns:
Suggestion: Consider adding a configuration flag to opt-in to full interception, or at least ensure this is prominently documented in upgrade notes. Medium Priority Issues7. Temp Directory Not Cleaned on Early FailuresLocation: Issue: If an error occurs after creating awsDir, err := os.MkdirTemp("", "agentops-aws-*")
if err \!= nil {
return nil, fmt.Errorf("creating AWS credential helper directory: %w", err)
}
r.awsTempDir = awsDir // Track for cleanup
// If WriteFile fails here, awsDir leaks
if err := os.WriteFile(helperPath, GetAWSCredentialHelper(), 0755); err \!= nil {
return nil, fmt.Errorf("writing AWS credential helper: %w", err)
}Fix: Use defer cleanup or assign awsDir, err := os.MkdirTemp("", "agentops-aws-*")
if err \!= nil {
return nil, fmt.Errorf("creating AWS credential helper directory: %w", err)
}
r.awsTempDir = awsDir // Assign immediately for cleanup
if err := os.WriteFile(helperPath, GetAWSCredentialHelper(), 0755); err \!= nil {
// Now cleanup will happen via existing defer or error path
return nil, fmt.Errorf("writing AWS credential helper: %w", err)
}8. Missing Context Cancellation CheckLocation: The Suggestion: if ctx.Err() \!= nil {
return nil, ctx.Err()
}
result, err := p.stsClient.AssumeRole(ctx, input)9. Credential Helper Dependency on curlLocation: Issue: The helper script assumes Suggestions:
10. Missing Audit Logging for AWS Credential AccessIssue: The design doc mentions audit logging ( Expected: Each credential fetch should be logged to the audit system for compliance and debugging. Low Priority / Style Issues11. Inconsistent Error MessagesSome error messages use title case, others don't. Example:
Suggestion: Standardize to lowercase for error messages (Go convention). 12. Magic Number - 5 Minute Refresh BufferLocation: The 5-minute buffer is hardcoded. Consider making this configurable or at least a named constant: const credentialRefreshBuffer = 5 * time.Minute
// In GetCredentials:
if p.cached \!= nil && time.Now().Add(credentialRefreshBuffer).Before(p.expiration) {13. Unused Return ValueLocation: if _, sdErr := awsCfg.SessionDuration(); sdErr \!= nil {The duration is validated but discarded. Consider returning it for consistency: if d, sdErr := awsCfg.SessionDuration(); sdErr \!= nil {
return sdErr
} else {
// Could store d in awsCfg if needed
}14. File Permissions on Credential HelperLocation: Helper is written with 15. Import Alias Missing for goruntimeLocation: goruntime "runtime"The alias is used but not shown in the diff. Ensure it's properly imported. Positive Aspects
Test Coverage Recommendations
Documentation Suggestions
Commit Message Compliance✅ Follows Conventional Commits format: Overall AssessmentThis is a solid implementation of AWS credential support with good architecture and security practices. The critical issues around timing attacks and error message leakage should be addressed before merge. The ARN validation improvements and temp directory cleanup are important for robustness. The change to intercept all HTTPS traffic (not just credential hosts) is significant and should be carefully considered - it may warrant a separate PR or at minimum prominent documentation. Recommendation: Request changes for critical security issues, then approve after fixes. |
Pull Request Review: AWS Credential SupportThis PR adds AWS credential injection via IAM role assumption with automatic refresh. Overall, this is a well-designed and thoughtfully implemented feature that aligns well with the project's security model. The code quality is high, tests are comprehensive, and the documentation is thorough. ✅ StrengthsSecurity Design
Code Quality
Documentation
🔍 Issues & Suggestions1. Security: Credential Helper Script Depends on
|
- Use constant-time comparison for auth token to prevent timing attacks - Log detailed errors server-side, return generic message to clients - Improve ARN validation: check partition, account ID, role name - Add context cancellation check before STS AssumeRole call - Use named constant for credential refresh buffer (5 min) - Restrict credential helper permissions to 0700
Pull Request Review: AWS Credential SupportThis is a well-designed and thorough implementation of AWS credential support. The architecture is sound, with strong security considerations and comprehensive documentation. Below are my findings across several areas: Code Quality & ArchitectureStrengths:
Minor Issues:
SecurityExcellent security practices:
Considerations:
PerformanceWell-optimized:
Potential BugsNone critical, but verify:
Test CoverageStrong coverage:
Suggested additions:
DocumentationExcellent documentation:
Minor improvements:
Conventional CommitsAll commits follow the conventional commits format correctly with proper scopes and no Co-Authored-By lines (per CLAUDE.md). SummaryThis is production-ready code with excellent security practices, clean architecture, and comprehensive testing. The implementation elegantly solves the credential refresh problem using credential_process. Recommendation: Approve with optional minor enhancements for ARN validation and documentation. The main architectural decision—using credential_process over static environment variables—is the right choice for long-running agents and significantly improves security posture. |
Summary
AWS Credential Support
Adds agent grant aws command that stores IAM role configuration (not secrets). At runtime, containers receive credentials via credential_process - a small embedded binary fetches fresh credentials from the proxy on demand.
Usage:
Key design decisions:
Documentation Updates
Test plan