From f464bdc3550958f876dfeb7b9714b11f80e25624 Mon Sep 17 00:00:00 2001 From: shivasurya Date: Fri, 28 Nov 2025 15:53:06 -0500 Subject: [PATCH] chore: Move R2_SETUP.md and SANDBOX.md to cpf_plans/kb/ These documentation files have been moved to the centralized knowledge base repository for better organization and maintainability. Files are now located at: - cpf_plans/kb/R2_SETUP.md - cpf_plans/kb/SANDBOX.md --- R2_SETUP.md | 219 ---------------------------------------------------- SANDBOX.md | 129 ------------------------------- 2 files changed, 348 deletions(-) delete mode 100644 R2_SETUP.md delete mode 100644 SANDBOX.md diff --git a/R2_SETUP.md b/R2_SETUP.md deleted file mode 100644 index 8752e95e..00000000 --- a/R2_SETUP.md +++ /dev/null @@ -1,219 +0,0 @@ -# Cloudflare R2 Setup for Stdlib Registries - -This document contains the configuration needed to set up Cloudflare R2 for hosting Python stdlib registries at `assets.codepathfinder.dev`. - -## โœ… What's Been Done - -- Created upload script: `sourcecode-parser/tools/upload_to_r2.sh` -- Created GitHub Action workflow: `.github/workflows/stdlib-r2-upload.yml` -- Updated version to `1.0.0` -- Updated all URLs to use `assets.codepathfinder.dev` -- Tested generation locally (8.2MB for Python 3.14, 190 modules) -- Removed outdated Cloudflare Pages deploy workflow - -## ๐Ÿ“‹ What You Need to Configure - -### 1. Cloudflare R2 Bucket Setup - -**Bucket Name:** `code-pathfinder-assets` - -**Configuration:** -- Region: Auto (Cloudflare handles this) -- Public Access: Enabled via custom domain -- CORS: Allow all origins (for browser access) - -### 2. Custom Domain Configuration - -**Domain:** `assets.codepathfinder.dev` - -**DNS Setup:** -1. Go to Cloudflare DNS settings -2. Add CNAME record: - - Name: `assets` - - Target: `` (provided by Cloudflare) - - Proxy: Enabled (orange cloud) - -**R2 Custom Domain:** -1. Go to R2 bucket settings -2. Add custom domain: `assets.codepathfinder.dev` -3. Enable public access - -### 3. API Credentials - -You need to create R2 API tokens for the GitHub Action workflow. - -**Steps:** -1. Go to Cloudflare Dashboard โ†’ R2 โ†’ Manage R2 API Tokens -2. Create API Token with: - - Name: `code-pathfinder-github-actions` - - Permissions: `Edit` (for upload/sync) - - Scope: Bucket `code-pathfinder-assets` - -You'll receive: -- **Access Key ID** - like AWS access key -- **Secret Access Key** - like AWS secret key -- **Account ID** - your Cloudflare account ID - -### 4. GitHub Secrets Configuration - -Add these secrets to your GitHub repository: - -**Navigate to:** `Settings` โ†’ `Secrets and variables` โ†’ `Actions` โ†’ `New repository secret` - -Add the following secrets: - -| Secret Name | Value | Description | -|-------------|-------|-------------| -| `R2_ACCOUNT_ID` | Your Cloudflare Account ID | Found in R2 dashboard | -| `R2_ACCESS_KEY_ID` | Your R2 Access Key ID | From API token creation | -| `R2_SECRET_ACCESS_KEY` | Your R2 Secret Access Key | From API token creation | - -## ๐Ÿงช Testing Locally (Optional) - -If you want to test the upload script locally before using GitHub Actions: - -```bash -# Set environment variables -export R2_ACCOUNT_ID="your-account-id" -export R2_ACCESS_KEY_ID="your-access-key-id" -export R2_SECRET_ACCESS_KEY="your-secret-access-key" - -# Install AWS CLI if not already installed -# brew install awscli # macOS -# apt-get install awscli # Linux - -# Run the upload script -cd sourcecode-parser/tools -./upload_to_r2.sh -``` - -## ๐Ÿ“ฆ What Gets Uploaded - -**Directory Structure in R2:** -``` -code-pathfinder-assets/ -โ””โ”€โ”€ registries/ - โ”œโ”€โ”€ python3.11/ - โ”‚ โ””โ”€โ”€ stdlib/ - โ”‚ โ””โ”€โ”€ v1/ - โ”‚ โ”œโ”€โ”€ manifest.json - โ”‚ โ”œโ”€โ”€ os_stdlib.json - โ”‚ โ”œโ”€โ”€ sys_stdlib.json - โ”‚ โ””โ”€โ”€ ... (all stdlib modules) - โ”œโ”€โ”€ python3.12/ - โ”‚ โ””โ”€โ”€ stdlib/v1/... (similar structure) - โ””โ”€โ”€ python3.14/ - โ””โ”€โ”€ stdlib/v1/... (similar structure) -``` - -**Size Estimates:** -- Python 3.11: ~8-10 MB (190 modules) -- Python 3.12: ~8-10 MB (190 modules) -- Python 3.14: ~8.2 MB (190 modules) -- **Total: ~25-30 MB** - -Well within Cloudflare R2's 10 GB free tier! โœ… - -## ๐Ÿš€ How It Works - -### GitHub Action Trigger - -The workflow runs automatically when: -1. A new release is published (after binaries are built) -2. Manual workflow dispatch (for testing) - -### Workflow Steps - -1. Checkout code -2. Setup Python 3.11, 3.12, and 3.14 -3. Configure AWS CLI for R2 -4. Run `upload_to_r2.sh` which: - - Generates stdlib registries for each Python version - - Validates JSON files - - Uploads to R2 using `aws s3 sync --delete` -5. Verify uploads -6. Test public accessibility - -### URL Structure - -After upload, registries will be available at: -- `https://assets.codepathfinder.dev/registries/python3.11/stdlib/v1/manifest.json` -- `https://assets.codepathfinder.dev/registries/python3.12/stdlib/v1/manifest.json` -- `https://assets.codepathfinder.dev/registries/python3.14/stdlib/v1/manifest.json` - -## โœ… Verification Checklist - -After setting up R2 and adding GitHub secrets: - -- [ ] R2 bucket `code-pathfinder-assets` created -- [ ] Custom domain `assets.codepathfinder.dev` configured -- [ ] Public access enabled on R2 bucket -- [ ] R2 API token created with Edit permissions -- [ ] GitHub secrets added (`R2_ACCOUNT_ID`, `R2_ACCESS_KEY_ID`, `R2_SECRET_ACCESS_KEY`) -- [ ] Test manual workflow dispatch -- [ ] Verify public URLs are accessible - -## ๐Ÿ”„ Ongoing Maintenance - -### When to Regenerate - -Regenerate stdlib registries when: -- New Python version is released (add to `PYTHON_VERSIONS` in scripts) -- Generator improvements (better type inference) -- Bug fixes in introspection - -### How to Regenerate - -**Option 1: GitHub Action (Recommended)** -- Go to Actions tab โ†’ "Upload Stdlib Registries to R2" -- Click "Run workflow" โ†’ Select branch โ†’ Run - -**Option 2: Local Upload** -```bash -export R2_ACCOUNT_ID="..." -export R2_ACCESS_KEY_ID="..." -export R2_SECRET_ACCESS_KEY="..." -cd sourcecode-parser/tools -./upload_to_r2.sh -``` - -## ๐Ÿ’ฐ Cost Estimate - -**Storage:** 30 MB / 10 GB free tier = **$0** -**Operations:** ~100/month / 1M free = **$0** -**Egress:** Unlimited free = **$0** - -**Total Monthly Cost: $0** โœ… - -## ๐Ÿ“š Related Files - -- Upload script: `sourcecode-parser/tools/upload_to_r2.sh` -- Generator: `sourcecode-parser/tools/generate_stdlib_registry.py` -- Test script: `sourcecode-parser/tools/test_generation_local.sh` -- GitHub Action: `.github/workflows/stdlib-r2-upload.yml` -- Go client: `sourcecode-parser/graph/callgraph/registry/stdlib_remote.go` -- Builder integration: `sourcecode-parser/graph/callgraph/builder/builder.go` - -## ๐Ÿ› Troubleshooting - -### Upload fails with "Access Denied" -- Verify R2 API token has Edit permissions -- Check GitHub secrets are correctly set -- Ensure token scope includes the correct bucket - -### Public URLs return 404 -- Verify custom domain is configured in R2 -- Check DNS CNAME record is set -- Wait a few minutes for DNS propagation - -### Generation fails for Python version -- Ensure Python version is installed on runner -- Check `PYTHON_VERSIONS` array in scripts -- Windows-only modules (msvcrt, winreg) will fail on Linux/macOS (this is expected) - -## ๐Ÿ“ž Support - -For issues with: -- **Cloudflare R2 setup:** Check Cloudflare documentation or support -- **GitHub Actions:** Review workflow logs in Actions tab -- **Code/scripts:** Open an issue on GitHub diff --git a/SANDBOX.md b/SANDBOX.md deleted file mode 100644 index 29b273da..00000000 --- a/SANDBOX.md +++ /dev/null @@ -1,129 +0,0 @@ -# Python Sandboxing with nsjail - -## Overview - -Code Pathfinder uses **nsjail** (Google's production-grade sandboxing tool) to safely execute untrusted Python DSL rules with maximum isolation. - -## Security Features - -โœ… **Network Isolation**: All network access blocked (no socket connections, no HTTP requests) -โœ… **Filesystem Isolation**: Cannot read sensitive files (/etc/passwd, /etc/shadow, ~/.ssh/, etc.) -โœ… **Process Isolation**: Cannot see or interact with other processes (isolated PID namespace) -โœ… **Resource Limits**: CPU, memory, file size, and execution time limits enforced -โœ… **Environment Isolation**: Minimal environment variable exposure -โœ… **Read-Only System**: Cannot modify /usr, /lib, or system files - -## Installation Method - -**Built from source** (Alpine apk not available in Wolfi) -- Source: https://github.com/google/nsjail.git (tag 3.4) -- Build dependencies: flex, bison, protobuf-dev, libnl3-dev -- Compiler warning `-Werror` removed for compatibility with GCC 15.2.0 - -## Runtime Requirements - -### For Digital Ocean / Self-Hosted Deployments - -**Docker/Podman run command**: -```bash -podman run --cap-add=SYS_ADMIN your-image:tag -``` - -**Why CAP_SYS_ADMIN is needed**: -- Required for Linux namespace creation (network, PID, mount, user, IPC, UTS) -- Provides strongest isolation (95%+ attack surface reduction) -- Used by Google internally for sandboxing untrusted code - -**Security note**: CAP_SYS_ADMIN is needed ONLY for the outer container to create nested namespaces. The Python code inside nsjail runs as UID 65534 (nobody) with ALL capabilities dropped and ALL namespaces isolated. - -### Configuration - -Set environment variable in Dockerfile (already configured): -```dockerfile -ENV PATHFINDER_SANDBOX_ENABLED=true -``` - -To disable sandbox (development only): -```bash -export PATHFINDER_SANDBOX_ENABLED=false -``` - -## nsjail Command Template - -The Go code (PR-02) will use this command template: - -```bash -nsjail -Mo \ - --user nobody \ - --chroot /tmp/nsjail_root \ - --iface_no_lo \ - --disable_proc \ - --bindmount_ro /usr:/usr \ - --bindmount_ro /lib:/lib \ - --bindmount /tmp:/tmp \ - --cwd /tmp \ - --rlimit_as 512 \ - --rlimit_cpu 30 \ - --rlimit_fsize 1 \ - --rlimit_nofile 64 \ - --time_limit 30 \ - -- /usr/bin/python3 /tmp/rule.py -``` - -## Security Test Results - -All tests pass with 100% isolation: - -| Test | Result | Details | -|------|--------|---------| -| Network Access | โœ… BLOCKED | OSError: Network unreachable | -| /etc/passwd | โœ… BLOCKED | FileNotFoundError | -| /etc/shadow | โœ… BLOCKED | FileNotFoundError | -| ~/.ssh/id_rsa | โœ… BLOCKED | FileNotFoundError | -| /proc/self/environ | โœ… BLOCKED | FileNotFoundError | -| PID Namespace | โœ… ISOLATED | Process sees itself as PID 1 | -| Filesystem Write | โœ… READ-ONLY | Cannot write to /, /usr, /etc | -| Environment Vars | โœ… MINIMAL | Only 1 var visible (LC_CTYPE) | - -## Python Version - -**Installed**: Python 3.13.9 (wolfi-base doesn't have 3.14 yet) -- Goal was Python 3.14, actual is Python 3.13.9 -- Provides all necessary security features -- Will upgrade to 3.14 when available in Wolfi repos - -## Build Details - -### Docker Image Size -- Base image: cgr.dev/chainguard/wolfi-base -- Added components: Python 3.13.9, nsjail (built from source), flex, bison -- Final image: ~200-250MB (including build dependencies cleanup) - -### Build Time -- nsjail compilation: ~2-3 minutes (includes kafel submodule) -- Total Docker build: ~4-5 minutes - -## Troubleshooting - -### Error: "Operation not permitted" -**Solution**: Run container with `--cap-add=SYS_ADMIN` - -### Error: "nsjail: command not found" -**Solution**: Rebuild Docker image with latest Dockerfile - -### Error: "Cannot read /tmp/rule.py" -**Solution**: Ensure file is created BEFORE entering nsjail sandbox - -## Next Steps (PR-02) - -1. Integrate nsjail into `dsl/loader.go` -2. Add `buildNsjailCommand()` helper function -3. Add `isSandboxEnabled()` environment check -4. Update `/tmp/nsjail_root` creation in entrypoint.sh -5. Add comprehensive Go tests - -## References - -- nsjail GitHub: https://github.com/google/nsjail -- Tech Spec: /Users/shiva/src/shivasurya/cpf_plans/docs/planning/python-sandboxing/tech-spec.md -- PR-01 Doc: /Users/shiva/src/shivasurya/cpf_plans/docs/planning/python-sandboxing/pr-details/PR-01-docker-nsjail-setup.md