# 🚀 Syft Code Queue - Complete Tutorial

This notebook demonstrates both **Data Scientist** and **Data Owner** perspectives in a single, simplified walkthrough.

## Overview

Syft Code Queue enables secure code execution on remote datasites:
- **Data Scientists** submit code packages for execution
- **Data Owners** review and approve code before execution
- All execution happens in a sandboxed environment

## Architecture
```
Data Scientist → Submit Code → Data Owner Reviews → Approve → Execute → Results
```


## Setup

First, let's install and import the necessary components:


In [15]:
# Install syft-code-queue if needed
# !pip install -e .

import syft_code_queue as scq
from syft_code_queue import DataScientistAPI, DataOwnerAPI, JobStatus
import tempfile
from pathlib import Path
import time

print(f"✅ Syft Code Queue v{scq.__version__} loaded")


✅ Syft Code Queue v0.1.0 loaded


## Part 1: Data Scientist Perspective 🔬

As a data scientist, you want to run analysis on remote data without seeing the raw data.


In [16]:
# Initialize the Data Scientist API
ds_api = DataScientistAPI()
print(f"📊 Data Scientist: {ds_api.email}")


[32m2025-06-29 14:18:09.962[0m | [1mINFO    [0m | [36msyft_code_queue.data_scientist_api[0m:[36m__init__[0m:[36m54[0m - [1mInitialized Data Scientist API for andrew@openmined.org[0m


📊 Data Scientist: andrew@openmined.org


### Submit a Simple Analysis Job

Let's submit a privacy-safe analysis job:


In [17]:
# Simple analysis script
analysis_script = """
import json
import random

print("🔍 Starting privacy-safe analysis...")

# Simulate analyzing data and computing aggregate statistics
# In real scenario, this would access the datasite's data
sample_data = [random.randint(1, 100) for _ in range(1000)]

results = {
    "count": len(sample_data),
    "mean": sum(sample_data) / len(sample_data),
    "min": min(sample_data),
    "max": max(sample_data),
    "analysis_type": "aggregate_statistics"
}

print(f"📈 Analysis Results: {results}")

# Save results to a file
with open("results.json", "w") as f:
    json.dump(results, f, indent=2)

print("✅ Analysis complete! Results saved to results.json")
"""

# Submit the job
job = ds_api.create_simple_job(
    target_email="andrew@openmined.org",  # Use your actual email
    script_content=analysis_script,
    name="Privacy-Safe Statistics",
    description="Compute aggregate statistics without accessing raw data",
    tags=["statistics", "privacy-safe", "aggregate"]
)

print(f"\\n🎯 Job Submitted!")
print(f"   ID: {str(job.uid)[:8]}...")
print(f"   Name: {job.name}")
print(f"   Status: {job.status.value}")
print(f"   Target: {job.target_email}")


[32m2025-06-29 14:18:13.474[0m | [1mINFO    [0m | [36msyft_code_queue.client[0m:[36msubmit_code[0m:[36m92[0m - [1mSubmitted job 'Privacy-Safe Statistics' to andrew@openmined.org[0m
[32m2025-06-29 14:18:13.474[0m | [1mINFO    [0m | [36msyft_code_queue.data_scientist_api[0m:[36msubmit_job[0m:[36m104[0m - [1mSubmitted job 'Privacy-Safe Statistics' to andrew@openmined.org (ID: a1debf39-feb5-4dc9-a0a9-2f9ed7bfe6f8)[0m


\n🎯 Job Submitted!
   ID: a1debf39...
   Name: Privacy-Safe Statistics
   Status: pending
   Target: andrew@openmined.org


---

## Part 2: Data Owner Perspective 🏛️

As a data owner, you need to review and approve code before it runs on your datasite.


In [18]:
# Initialize the Data Owner API
do_api = DataOwnerAPI()
print(f"🏛️ Data Owner: {do_api.email}")

# Get queue statistics (fixed to use job_counts)
stats = do_api.get_queue_stats()
print(f"\\n📊 Queue Statistics:")
for status, count in stats["job_counts"].items():  # Fixed: use job_counts
    if count > 0:
        print(f"   {status.title()}: {count}")

# Show total jobs
print(f"\\n📈 Total Jobs: {stats['total_jobs']}")


[32m2025-06-29 14:18:20.179[0m | [1mINFO    [0m | [36msyft_code_queue.app[0m:[36m__init__[0m:[36m52[0m - [1mInitialized Code Queue App for andrew@openmined.org[0m
[32m2025-06-29 14:18:20.180[0m | [1mINFO    [0m | [36msyft_code_queue.data_owner_api[0m:[36m__init__[0m:[36m55[0m - [1mInitialized Data Owner API for andrew@openmined.org[0m


🏛️ Data Owner: andrew@openmined.org
\n📊 Queue Statistics:
   Pending: 1
   Completed: 1
\n📈 Total Jobs: 2


### Review and Approve Jobs


In [19]:
# List jobs waiting for approval
pending_jobs = do_api.list_pending_jobs()

print(f"⏳ Pending Jobs ({len(pending_jobs)} waiting for approval):")
print("=" * 70)

for i, job in enumerate(pending_jobs, 1):
    print(f"\\n{i}. 📝 {job.name}")
    print(f"   ID: {str(job.uid)[:8]}...")
    print(f"   From: {job.requester_email}")
    print(f"   Description: {job.description or 'No description'}")
    if job.tags:
        print(f"   Tags: {', '.join(job.tags)}")
    print(f"   Submitted: {job.created_at.strftime('%Y-%m-%d %H:%M:%S')}")

# Approve jobs that look safe
if pending_jobs:
    print("\\n✅ Approving safe-looking jobs...\\n")
    
    for job in pending_jobs:
        job_id = str(job.uid)[:8]
        
        # Simple approval logic based on tags and description
        safe_tags = ["privacy-safe", "aggregate", "statistics"]
        has_safe_tags = any(tag in safe_tags for tag in job.tags)
        
        if has_safe_tags or "aggregate" in (job.description or "").lower():
            success = do_api.approve_job(
                job_id, 
                reason="Approved: Privacy-safe aggregate analysis"
            )
            
            if success:
                print(f"✅ Approved: {job.name} ({job_id}...)")
            else:
                print(f"❌ Failed to approve: {job.name}")
        else:
            # Reject potentially risky jobs
            success = do_api.reject_job(
                job_id,
                reason="Rejected: Needs more specific privacy guarantees"
            )
            
            if success:
                print(f"🚫 Rejected: {job.name} ({job_id}...)")
            else:
                print(f"❌ Failed to reject: {job.name}")
                
else:
    print("ℹ️ No pending jobs to approve/reject.")


[32m2025-06-29 14:18:23.612[0m | [1mINFO    [0m | [36msyft_code_queue.data_owner_api[0m:[36mapprove_job[0m:[36m248[0m - [1mApproved job: Privacy-Safe Statistics[0m


⏳ Pending Jobs (1 waiting for approval):
\n1. 📝 Privacy-Safe Statistics
   ID: a1debf39...
   From: andrew@openmined.org
   Description: Compute aggregate statistics without accessing raw data
   Tags: statistics, privacy-safe, aggregate
   Submitted: 2025-06-29 14:18:13
\n✅ Approving safe-looking jobs...\n
✅ Approved: Privacy-Safe Statistics (a1debf39...)


### Process Queue and Execute Jobs


In [20]:
# Process one cycle of the queue (this executes approved jobs)
print("🔄 Processing queue cycle...\\n")

cycle_results = do_api.process_queue_cycle()

print(f"📊 Processing Results:")
if cycle_results.get("success"):
    print(f"   ✅ {cycle_results.get('message', 'Queue processed successfully')}")
else:
    print(f"   ❌ Error: {cycle_results.get('error', 'Unknown error')}")

# Wait a moment for processing
time.sleep(2)


[32m2025-06-29 14:18:26.683[0m | [1mINFO    [0m | [36msyft_code_queue.app[0m:[36mrun[0m:[36m69[0m - [1mStarting queue processing cycle...[0m
[32m2025-06-29 14:18:26.685[0m | [34m[1mDEBUG   [0m | [36msyft_code_queue.app[0m:[36m_log_pending_jobs[0m:[36m99[0m - [34m[1mNo jobs pending approval[0m
[32m2025-06-29 14:18:26.686[0m | [1mINFO    [0m | [36msyft_code_queue.app[0m:[36m_execute_approved_jobs[0m:[36m125[0m - [1m🚀 Executing 1 approved job(s)[0m
[32m2025-06-29 14:18:26.687[0m | [1mINFO    [0m | [36msyft_code_queue.app[0m:[36m_execute_single_job[0m:[36m133[0m - [1mStarting execution of job: Privacy-Safe Statistics[0m
[32m2025-06-29 14:18:26.690[0m | [1mINFO    [0m | [36msyft_code_queue.runner[0m:[36mrun_job[0m:[36m60[0m - [1mStarting execution of job a1debf39-feb5-4dc9-a0a9-2f9ed7bfe6f8 in /Users/atrask/SyftBox/datasites/andrew@openmined.org/app_data/code-queue/jobs/a1debf39-feb5-4dc9-a0a9-2f9ed7bfe6f8/code[0m
[32m2025-06-29

🔄 Processing queue cycle...\n
📊 Processing Results:
   ✅ Queue processed successfully


---

## Part 3: Results and Monitoring 📊

Check job status and retrieve results from both perspectives.


In [21]:
# Check status of our submitted jobs
print("📋 Checking job status...\\n")

my_jobs = ds_api.list_my_jobs()

for job in my_jobs:
    # Get updated job info
    updated_job = ds_api.get_job(str(job.uid))
    if updated_job:
        print(f"🔸 {updated_job.name}")
        print(f"   Status: {updated_job.status.value}")
        
        if updated_job.status == JobStatus.completed:
            print(f"   ✅ Completed in {updated_job.duration:.1f}s" if updated_job.duration else "   ✅ Completed")
            
            # Try to get results
            try:
                results = ds_api.get_job_results(str(updated_job.uid))
                if results:
                    print(f"   📊 Results available: {list(results.keys())}")
                    
                    # Show some sample results
                    if 'stdout' in results:
                        stdout = results['stdout'][:200] + "..." if len(results['stdout']) > 200 else results['stdout']
                        print(f"   📝 Output preview:")
                        print(f"      {stdout}")
                
            except Exception as e:
                print(f"   ⚠️ Could not retrieve results: {e}")
                
        elif updated_job.status == JobStatus.failed:
            print(f"   ❌ Failed: {updated_job.error_message}")
        elif updated_job.status == JobStatus.rejected:
            print(f"   🚫 Rejected: {updated_job.error_message}")
        elif updated_job.status == JobStatus.running:
            print(f"   🏃 Currently running...")
        else:
            print(f"   ⏳ Status: {updated_job.status.value}")
            
        print()


📋 Checking job status...\n
🔸 Privacy-Safe Statistics
   Status: completed
   ✅ Completed in 0.0s
   📊 Results available: ['job', 'output_files', 'logs', 'error', 'output_path']

🔸 Privacy-Safe Statistics
   Status: completed
   ✅ Completed in 0.0s
   📊 Results available: ['job', 'output_files', 'logs', 'error', 'output_path']

🔸 Privacy-Safe Statistics
   Status: pending
   ⏳ Status: pending



In [22]:
# Get comprehensive view of all jobs as data owner
print("🏛️ Data Owner Dashboard")
print("=" * 40)

# Current queue status (fixed to use job_counts)
stats = do_api.get_queue_stats()
print(f"\\n📊 Current Queue Status:")
for status, count in stats["job_counts"].items():  # Fixed: use job_counts
    if count > 0:
        status_emoji = {
            'pending': '⏳',
            'approved': '✅', 
            'running': '🏃',
            'completed': '🎉',
            'failed': '❌',
            'rejected': '🚫'
        }
        print(f"   {status_emoji.get(status, '•')} {status.title()}: {count}")

# Recent completed jobs
completed_jobs = do_api.list_completed_jobs(limit=5)
if completed_jobs:
    print(f"\\n🎉 Recently Completed Jobs:")
    for job in completed_jobs:
        duration_str = f" ({job.duration:.1f}s)" if job.duration else ""
        print(f"   • {job.name} - {job.requester_email}{duration_str}")

# Check if there are any jobs that need attention
pending_count = len(do_api.list_pending_jobs())
running_count = len(do_api.list_running_jobs())

print(f"\\n🔔 Attention Needed:")
if pending_count > 0:
    print(f"   ⏳ {pending_count} jobs waiting for approval")
if running_count > 0:
    print(f"   🏃 {running_count} jobs currently running")
if pending_count == 0 and running_count == 0:
    print(f"   ✅ All jobs processed!")

print(f"\\n📈 Total Jobs Managed: {stats['total_jobs']}")


🏛️ Data Owner Dashboard
\n📊 Current Queue Status:
   🎉 Completed: 2
\n🎉 Recently Completed Jobs:
   • Privacy-Safe Statistics - andrew@openmined.org (0.0s)
   • Privacy-Safe Statistics - andrew@openmined.org (0.0s)
\n🔔 Attention Needed:
   ✅ All jobs processed!
\n📈 Total Jobs Managed: 2


## Summary 🎯

This tutorial demonstrated the complete workflow:

### ✅ What We Accomplished:

**As Data Scientist:**
- ✅ Submitted a privacy-safe analysis job
- ✅ Used appropriate tags and descriptions
- ✅ Monitored job status and retrieved results

**As Data Owner:**
- ✅ Reviewed submitted jobs
- ✅ Applied approval logic based on safety criteria
- ✅ Processed the execution queue
- ✅ Monitored the overall datasite

### 🔒 Security Features in Action:
- **Manual Approval**: Every job required explicit approval
- **Code Review**: Data owner could inspect the submitted code
- **Safe Execution**: Jobs run in sandboxed environment
- **Audit Trail**: All actions are logged and trackable

### 🚀 Key Benefits:
- **Simple**: Easy-to-use APIs for both roles
- **Secure**: Manual approval prevents unauthorized access
- **Flexible**: Supports both simple scripts and complex packages
- **Transparent**: Full visibility into job lifecycle

## Next Steps

1. **Set up SyftBox Apps** for automated queue processing
2. **Create approval workflows** specific to your organization
3. **Integrate with data governance** systems
4. **Build automation** for common approval patterns

The system is designed to be **simple, secure, and scalable** for real-world federated learning and privacy-preserving analytics!
