# pycarta Seven Bridges Integration

This notebook demonstrates integration with Seven Bridges Genomics platform for computational workflows.

## Prerequisites

- Valid Carta authentication (see `01_authentication.ipynb`)
- Seven Bridges platform access and API token
- Understanding of Seven Bridges Apps and Workflows
- Valid Seven Bridges project access

## Setup

In [None]:
import pycarta as pc
from pycarta import get_agent
import os
from datetime import datetime

# Ensure you're authenticated
# pc.login()  # Uncomment and authenticate as needed

print("Seven Bridges integration setup complete")
print("Note: Requires Seven Bridges platform access and credentials")

## Seven Bridges Authentication

Multiple ways to authenticate with Seven Bridges:

In [None]:
def demonstrate_sbg_authentication():
    """Demonstrate Seven Bridges authentication methods."""
    print("""
    SEVEN BRIDGES AUTHENTICATION:
    
    1. Via Carta Authentication (Recommended):
       If you have Seven Bridges credentials stored as Carta secrets
       or configured via environment variables, pycarta login handles SBG automatically:
       
       import pycarta as pc
       pc.login()
       # You are now authorized to call SBG Apps and Workflows
    
    2. Via Environment Variables:
       Set these environment variables:
       
       export SB_API_ENDPOINT="https://api.sbgenomics.com/v2/"
       export SB_AUTH_TOKEN="your_sbg_token_here"
    
    3. Via Seven Bridges Credentials File:
       Create ~/.sevenbridges/credentials with:
       
       [default]
       api_endpoint = https://api.sbgenomics.com/v2/
       auth_token = your_token_here
    
    4. Via Carta Secrets (for portability):
       Store SBG credentials as Carta secrets:
       
       from pycarta.admin.secret import put_secret
       put_secret("sbg_api_endpoint", "https://api.sbgenomics.com/v2/")
       put_secret("sbg_auth_token", "your_token_here")
    
    Once authenticated through any method, you can access SBG API:
    
    agent = pc.get_agent()
    sbg_api = agent.sbg_manager.api
    """)

demonstrate_sbg_authentication()

## Working with Seven Bridges Projects

Explore and manage SBG projects:

In [None]:
def explore_sbg_projects():
    """Explore Seven Bridges projects and resources."""
    try:
        # Get authenticated agent and SBG API
        # agent = get_agent()
        # sbg_api = agent.sbg_manager.api
        
        print("""
        SBG PROJECT EXPLORATION:
        
        # List available projects
        projects = sbg_api.projects.query()
        print(f"Available projects: {len(projects)}")
        
        for project in projects[:5]:  # Show first 5
            print(f"  - {project.name} ({project.id})")
        
        # Get specific project
        project = sbg_api.projects.get("division/my-project")
        print(f"Project: {project.name}")
        print(f"Description: {project.description}")
        
        # List apps in project
        apps = sbg_api.apps.query(project=project)
        print(f"Available apps: {len(apps)}")
        
        for app in apps[:3]:  # Show first 3
            print(f"  - {app.name} (v{app.revision})")
        
        # List files in project
        files = sbg_api.files.query(project=project)
        print(f"Project files: {len(files)}")
        
        for file in files[:3]:  # Show first 3
            print(f"  - {file.name} ({file.size} bytes)")
        """)
        
    except Exception as e:
        print(f"SBG exploration error: {e}")
        print("Note: Requires valid SBG authentication and project access")

explore_sbg_projects()

## ExecutableApp - Single App Execution

Convert Seven Bridges Apps to Python functions:

In [None]:
from pycarta.sbg import ExecutableApp

def demonstrate_executable_app():
    """Demonstrate ExecutableApp usage."""
    print("""
    EXECUTABLE APP USAGE:
    
    # Step 1: Login and get SBG API
    pc.login()
    agent = get_agent()
    sbg_api = agent.sbg_manager.api
    
    # Step 2: Retrieve a Seven Bridges App
    sbg_app = sbg_api.apps.get("MyAppName", project="division/project")
    print(f"Retrieved app: {sbg_app.name}")
    
    # Step 3: Create an ExecutableApp with configuration
    app = ExecutableApp(
        sbg_app,
        cleanup=True,          # Delete uploaded/downloaded files after execution
        polling_freq=5.0,      # Check task status every 5 seconds
        overwrite_local=True,  # Overwrite local files with results
        overwrite_remote=True, # Overwrite remote files when uploading
        strict=True,           # Strict type checking (recommended)
    )
    
    # Step 4: Execute the app as a Python function
    # Example: app that takes an input file and a parameter
    result = app(
        input_file="data/myfile.csv",  # Local file - will be uploaded
        num_iterations=42,             # App parameter
        output_format="json"           # App parameter
    )
    
    print(f"App execution complete")
    print(f"Result files downloaded to: {result}")
    
    # Step 5: Access app documentation
    print(f"App description: {app.description}")
    print(f"App inputs: {app.inputs}")
    print(f"App outputs: {app.outputs}")
    """)

def demonstrate_app_configuration():
    """Demonstrate ExecutableApp configuration options."""
    print("""
    EXECUTABLE APP CONFIGURATION:
    
    Configuration Parameters:
    
    - cleanup (bool): 
        True: Delete uploaded files and downloaded results after execution
        False: Keep files for manual inspection
        Default: True (recommended for production)
    
    - polling_freq (float):
        How often to check if the SBG task is still running (seconds)
        Minimum: 3 seconds
        Default: 10 seconds
        Recommendation: 5-30 seconds depending on app duration
    
    - overwrite_local (bool):
        True: Overwrite local files with downloaded results
        False: Raise error if local files exist
        Default: False
    
    - overwrite_remote (bool):
        True: Overwrite remote files when uploading
        False: Use existing remote files if they exist
        Default: False
    
    - strict (bool):
        True: Strict type checking based on CWL specification
        False: Relaxed type checking (Python duck typing)
        Default: True (recommended - catches errors early)
    
    Example configurations for different use cases:
    
    # Development/debugging
    dev_app = ExecutableApp(sbg_app, 
                           cleanup=False,        # Keep files for inspection
                           polling_freq=3.0,     # Check frequently
                           overwrite_local=True, # Allow overwrites
                           strict=False)         # Relaxed for testing
    
    # Production pipeline
    prod_app = ExecutableApp(sbg_app,
                            cleanup=True,         # Clean up resources
                            polling_freq=30.0,    # Check less frequently
                            overwrite_local=False, # Prevent accidents
                            strict=True)          # Strict validation
    
    # Long-running analysis
    analysis_app = ExecutableApp(sbg_app,
                                cleanup=True,
                                polling_freq=60.0,  # Check every minute
                                overwrite_remote=False, # Reuse uploads
                                strict=True)
    """)

demonstrate_executable_app()
demonstrate_app_configuration()

## ExecutableProject - Project-Wide Access

Convert entire Seven Bridges projects to Python classes:

In [None]:
from pycarta.sbg import ExecutableProject

def demonstrate_executable_project():
    """Demonstrate ExecutableProject usage."""
    print("""
    EXECUTABLE PROJECT USAGE:
    
    # Step 1: Login to pycarta (handles SBG auth automatically)
    pc.login()
    
    # Step 2: Create ExecutableProject
    sandbox = ExecutableProject(project="division/sandbox")
    
    # Step 3: Explore available apps
    # The project becomes a dynamic class with methods for each app
    print(f"Available apps in project:")
    for app_name in sandbox.apps:
        app = getattr(sandbox, app_name)
        print(f"  - {app_name}: {app.__doc__}")
    
    # Step 4: Execute apps as methods
    # Example: if project has a "hello_world" app
    result = sandbox.hello_world(
        greeting="Hello from pycarta!",
        output_format="text"
    )
    
    # Example: if project has a data processing app
    analysis_result = sandbox.data_processor(
        input_data="dataset.csv",
        analysis_type="statistical",
        parameters={"confidence": 0.95, "method": "t-test"}
    )
    
    print(f"Analysis complete: {analysis_result}")
    """)

def demonstrate_project_features():
    """Demonstrate ExecutableProject advanced features."""
    print("""
    EXECUTABLE PROJECT FEATURES:
    
    # Introspection and documentation
    project = ExecutableProject("division/my-project")
    
    # List all available apps
    apps = project.list_apps()
    print(f"Available apps: {apps}")
    
    # Get app documentation
    app_doc = project.get_app_documentation("my_analysis_app")
    print(f"App documentation:\n{app_doc}")
    
    # Get app signature (inputs/outputs)
    signature = project.get_app_signature("my_analysis_app")
    print(f"Inputs: {signature['inputs']}")
    print(f"Outputs: {signature['outputs']}")
    
    # Execute with configuration
    project.configure(
        cleanup=True,
        polling_freq=15.0,
        overwrite_local=True
    )
    
    # Chain multiple apps in a workflow
    # Step 1: Preprocess data
    preprocessed = project.data_preprocessor(
        raw_data="input.csv",
        normalize=True,
        remove_outliers=True
    )
    
    # Step 2: Run analysis on preprocessed data
    results = project.statistical_analysis(
        processed_data=preprocessed,
        test_type="anova",
        alpha=0.05
    )
    
    # Step 3: Generate report
    report = project.report_generator(
        analysis_results=results,
        format="pdf",
        include_plots=True
    )
    
    print(f"Workflow complete: {report}")
    """)

demonstrate_executable_project()
demonstrate_project_features()

## File Management

Automatic file upload/download handling:

In [None]:
def demonstrate_file_management():
    """Demonstrate automatic file management in SBG integration."""
    print("""
    AUTOMATIC FILE MANAGEMENT:
    
    pycarta.sbg handles file upload/download automatically:
    
    1. File Upload (Input Files):
       - Local files are automatically uploaded to SBG before execution
       - Files are uploaded to the project workspace
       - Duplicate files are handled based on overwrite_remote setting
       - Upload progress can be monitored
    
    # Example: Local file will be uploaded automatically
    result = project.sequence_aligner(
        reference_genome="/local/path/hg38.fa",     # Will be uploaded
        reads="/local/path/sample_reads.fastq",     # Will be uploaded
        output_format="bam"
    )
    
    2. File Download (Output Files):
       - Result files are automatically downloaded after execution
       - Files are downloaded to local working directory
       - Download location can be customized
       - Overwrites handled based on overwrite_local setting
    
    # Result contains paths to downloaded files
    print(f"Downloaded files: {result}")
    # Example output: {
    #   "aligned_reads": "/local/output/aligned.bam",
    #   "alignment_stats": "/local/output/stats.txt"
    # }
    
    3. File Management Options:
    
    # Custom download directory
    app = ExecutableApp(sbg_app, download_dir="/custom/output/path")
    
    # Keep remote files for reuse
    app = ExecutableApp(sbg_app, cleanup=False)
    
    # Handle large files efficiently
    app = ExecutableApp(sbg_app, 
                       chunk_size=1024*1024,  # 1MB chunks
                       parallel_uploads=4)     # Parallel upload streams
    
    4. File Type Handling:
    
    # Single files
    result = app(input_file="data.csv")
    
    # Multiple files
    result = app(input_files=["file1.txt", "file2.txt", "file3.txt"])
    
    # Mixed inputs (files and parameters)
    result = app(
        input_file="data.csv",
        reference_file="reference.txt",
        threshold=0.05,
        output_format="json"
    )
    
    5. Error Handling:
    
    try:
        result = app(input_file="nonexistent.csv")
    except FileNotFoundError:
        print("Input file not found")
    except PermissionError:
        print("Permission denied for file access")
    except Exception as e:
        print(f"Execution error: {e}")
    """)

def demonstrate_advanced_file_operations():
    """Demonstrate advanced file operations."""
    print("""
    ADVANCED FILE OPERATIONS:
    
    1. Working with Large Files:
    
    # Configure for large file handling
    large_file_app = ExecutableApp(sbg_app,
                                  chunk_size=10*1024*1024,  # 10MB chunks
                                  timeout=3600,             # 1 hour timeout
                                  retry_attempts=3)         # Retry on failure
    
    result = large_file_app(
        large_dataset="/path/to/100GB_dataset.h5",
        processing_type="genomic_analysis"
    )
    
    2. Batch Processing:
    
    # Process multiple files in batch
    batch_files = [
        "sample_001.fastq",
        "sample_002.fastq", 
        "sample_003.fastq"
    ]
    
    results = []
    for file in batch_files:
        result = project.sequence_processor(
            input_file=file,
            quality_threshold=30,
            trim_adapters=True
        )
        results.append(result)
    
    3. Workflow Chaining with Files:
    
    # Step 1: Quality control
    qc_result = project.quality_control(
        raw_reads="sample.fastq",
        min_quality=20
    )
    
    # Step 2: Use QC output as input for alignment
    alignment_result = project.read_aligner(
        cleaned_reads=qc_result["cleaned_reads"],  # Output from step 1
        reference_genome="hg38.fa",
        alignment_method="bwa"
    )
    
    # Step 3: Variant calling on aligned reads
    variants = project.variant_caller(
        aligned_reads=alignment_result["alignment_file"],  # Output from step 2
        reference_genome="hg38.fa",
        min_coverage=10
    )
    
    4. Custom File Handling:
    
    # Custom preprocessing before upload
    def preprocess_file(filepath):
        # Custom file processing logic
        processed_path = filepath.replace('.txt', '_processed.txt')
        # ... processing code ...
        return processed_path
    
    preprocessed_file = preprocess_file("raw_data.txt")
    result = app(input_file=preprocessed_file)
    
    # Custom postprocessing after download
    def postprocess_results(result_files):
        # Custom result processing
        for file_path in result_files.values():
            # ... postprocessing code ...
            pass
        return "Processing complete"
    
    postprocess_results(result)
    """)

demonstrate_file_management()
demonstrate_advanced_file_operations()

## Progress Tracking and Monitoring

Monitor execution progress and handle long-running tasks:

In [None]:
def demonstrate_progress_tracking():
    """Demonstrate progress tracking for SBG tasks."""
    print("""
    PROGRESS TRACKING AND MONITORING:
    
    1. Built-in Progress Monitoring:
    
    # ExecutableApp automatically polls task status
    app = ExecutableApp(sbg_app, 
                       polling_freq=10.0,    # Check every 10 seconds
                       verbose=True)         # Enable progress output
    
    result = app(input_file="large_dataset.csv")
    # Output:
    # Task submitted: task_id_12345
    # Status: QUEUED (0:00:15)
    # Status: RUNNING (0:02:30)
    # Status: RUNNING (0:05:45)
    # Status: COMPLETED (0:08:20)
    # Downloading results...
    # Execution complete
    
    2. Custom Progress Callbacks:
    
    def progress_callback(task_status, elapsed_time):
        print(f"[{elapsed_time}] Task status: {task_status}")
        
        if task_status == "RUNNING":
            print("  Processing data...")
        elif task_status == "QUEUED":
            print("  Waiting for resources...")
        elif task_status == "COMPLETED":
            print("  ✓ Task completed successfully")
        elif task_status == "FAILED":
            print("  ✗ Task failed")
    
    app = ExecutableApp(sbg_app, progress_callback=progress_callback)
    result = app(input_file="data.csv")
    
    3. Long-Running Task Management:
    
    # For very long tasks, save task ID for later retrieval
    app = ExecutableApp(sbg_app, 
                       save_task_info=True,   # Save task metadata
                       task_info_file="my_task.json")  # Custom save location
    
    # Start task (returns immediately if async=True)
    task_info = app.start_async(
        input_file="huge_dataset.csv",
        analysis_type="comprehensive"
    )
    
    print(f"Task started: {task_info['task_id']}")
    print(f"You can check status later with: app.check_status('{task_info['task_id']}')")
    
    # Later, check status and retrieve results
    status = app.check_status(task_info['task_id'])
    if status == "COMPLETED":
        results = app.download_results(task_info['task_id'])
    
    4. Error Handling and Retry Logic:
    
    app = ExecutableApp(sbg_app,
                       max_retries=3,           # Retry failed tasks
                       retry_delay=60,          # Wait 60s between retries
                       timeout=7200)            # 2 hour maximum execution time
    
    try:
        result = app(input_file="problematic_data.csv")
    except TimeoutError:
        print("Task timed out after 2 hours")
    except ExecutionError as e:
        print(f"Task failed after {app.max_retries} retries: {e}")
        # Access error details
        print(f"Error log: {e.error_log}")
        print(f"Failed task ID: {e.task_id}")
    
    5. Resource Monitoring:
    
    # Monitor resource usage during execution
    app = ExecutableApp(sbg_app, monitor_resources=True)
    
    result = app(input_file="data.csv")
    
    # Access resource usage statistics
    stats = app.get_execution_stats()
    print(f"Execution time: {stats['duration']}")
    print(f"CPU usage: {stats['cpu_hours']}")
    print(f"Memory peak: {stats['max_memory_gb']} GB")
    print(f"Storage used: {stats['storage_gb']} GB")
    print(f"Cost estimate: ${stats['estimated_cost']}")
    """)

demonstrate_progress_tracking()

## Real-World Example: Genomics Pipeline

Complete example of a genomics analysis pipeline:

In [None]:
def genomics_pipeline_example():
    """Complete genomics analysis pipeline example."""
    print("""
    GENOMICS ANALYSIS PIPELINE:
    
    This example demonstrates a complete genomics workflow using
    Seven Bridges Apps through pycarta integration.
    
    # Setup
    import pycarta as pc
    from pycarta.sbg import ExecutableProject
    import os
    
    # Authenticate
    pc.login()
    
    # Connect to genomics project
    genomics_project = ExecutableProject(project="my-org/genomics-pipeline")
    
    # Sample metadata
    samples = [
        {
            "sample_id": "SAMPLE_001",
            "reads_1": "data/sample_001_R1.fastq.gz",
            "reads_2": "data/sample_001_R2.fastq.gz",
            "phenotype": "case"
        },
        {
            "sample_id": "SAMPLE_002", 
            "reads_1": "data/sample_002_R1.fastq.gz",
            "reads_2": "data/sample_002_R2.fastq.gz",
            "phenotype": "control"
        },
        {
            "sample_id": "SAMPLE_003",
            "reads_1": "data/sample_003_R1.fastq.gz",
            "reads_2": "data/sample_003_R2.fastq.gz",
            "phenotype": "case"
        }
    ]
    
    # Reference files
    reference_genome = "references/hg38.fa"
    known_variants = "references/dbsnp.vcf"
    
    # Step 1: Quality Control
    print("Step 1: Running quality control...")
    qc_results = []
    
    for sample in samples:
        qc_result = genomics_project.fastqc_quality_control(
            reads_1=sample["reads_1"],
            reads_2=sample["reads_2"],
            sample_id=sample["sample_id"]
        )
        qc_results.append(qc_result)
        print(f"  QC complete for {sample['sample_id']}")
    
    # Step 2: Read Trimming and Filtering
    print("Step 2: Trimming and filtering reads...")
    trimmed_results = []
    
    for i, sample in enumerate(samples):
        trimmed_result = genomics_project.trimmomatic_trimmer(
            reads_1=sample["reads_1"],
            reads_2=sample["reads_2"],
            quality_threshold=20,
            min_length=50,
            adapter_file="adapters/TruSeq3-PE.fa"
        )
        trimmed_results.append(trimmed_result)
        print(f"  Trimming complete for {sample['sample_id']}")
    
    # Step 3: Read Alignment
    print("Step 3: Aligning reads to reference genome...")
    alignment_results = []
    
    for i, sample in enumerate(samples):
        alignment_result = genomics_project.bwa_mem_aligner(
            reads_1=trimmed_results[i]["trimmed_reads_1"],
            reads_2=trimmed_results[i]["trimmed_reads_2"],
            reference_genome=reference_genome,
            sample_id=sample["sample_id"],
            read_group_info=f"@RG\\tID:{sample['sample_id']}\\tSM:{sample['sample_id']}"
        )
        alignment_results.append(alignment_result)
        print(f"  Alignment complete for {sample['sample_id']}")
    
    # Step 4: Post-alignment Processing
    print("Step 4: Post-alignment processing...")
    processed_bams = []
    
    for i, sample in enumerate(samples):
        # Sort and mark duplicates
        processed_bam = genomics_project.picard_process_bam(
            input_bam=alignment_results[i]["aligned_bam"],
            reference_genome=reference_genome,
            mark_duplicates=True,
            sort_order="coordinate"
        )
        
        # Base quality score recalibration
        recalibrated_bam = genomics_project.gatk_base_recalibration(
            input_bam=processed_bam["processed_bam"],
            reference_genome=reference_genome,
            known_sites=known_variants
        )
        
        processed_bams.append(recalibrated_bam)
        print(f"  Processing complete for {sample['sample_id']}")
    
    # Step 5: Variant Calling
    print("Step 5: Calling variants...")
    variant_results = []
    
    for i, sample in enumerate(samples):
        variants = genomics_project.gatk_haplotype_caller(
            input_bam=processed_bams[i]["recalibrated_bam"],
            reference_genome=reference_genome,
            sample_id=sample["sample_id"],
            emit_ref_confidence="GVCF",
            min_base_quality=20
        )
        variant_results.append(variants)
        print(f"  Variant calling complete for {sample['sample_id']}")
    
    # Step 6: Joint Genotyping
    print("Step 6: Joint genotyping...")
    gvcf_files = [result["output_gvcf"] for result in variant_results]
    
    joint_vcf = genomics_project.gatk_joint_genotyping(
        input_gvcfs=gvcf_files,
        reference_genome=reference_genome,
        output_name="cohort_variants"
    )
    
    # Step 7: Variant Filtering
    print("Step 7: Filtering variants...")
    filtered_vcf = genomics_project.gatk_variant_filtration(
        input_vcf=joint_vcf["joint_vcf"],
        reference_genome=reference_genome,
        filter_expressions=[
            "QD < 2.0",
            "FS > 60.0", 
            "MQ < 40.0",
            "ReadPosRankSum < -8.0"
        ]
    )
    
    # Step 8: Annotation
    print("Step 8: Annotating variants...")
    annotated_vcf = genomics_project.snpeff_annotator(
        input_vcf=filtered_vcf["filtered_vcf"],
        genome_version="hg38",
        include_statistics=True
    )
    
    # Step 9: Statistical Analysis
    print("Step 9: Statistical analysis...")
    
    # Prepare phenotype file
    phenotype_data = "\n".join([
        "sample_id\tphenotype",
        *[f"{s['sample_id']}\t{s['phenotype']}" for s in samples]
    ])
    
    with open("phenotypes.txt", "w") as f:
        f.write(phenotype_data)
    
    # Association analysis
    association_results = genomics_project.plink_association(
        input_vcf=annotated_vcf["annotated_vcf"],
        phenotype_file="phenotypes.txt",
        test_type="logistic",
        significance_threshold=5e-8
    )
    
    # Step 10: Generate Report
    print("Step 10: Generating analysis report...")
    final_report = genomics_project.analysis_reporter(
        variant_file=annotated_vcf["annotated_vcf"],
        association_results=association_results["association_stats"],
        qc_metrics=qc_results,
        sample_info=samples,
        report_format="html",
        include_plots=True
    )
    
    print("\nGenomics pipeline complete!")
    print(f"Final report: {final_report['report_html']}")
    print(f"Significant variants: {association_results['significant_variants']}")
    
    # Return summary
    return {
        "samples_processed": len(samples),
        "variants_called": joint_vcf["variant_count"],
        "filtered_variants": filtered_vcf["filtered_count"],
        "significant_associations": association_results["significant_count"],
        "report_file": final_report["report_html"]
    }
    """)

genomics_pipeline_example()

## Integration with Other pycarta Modules

Combining Seven Bridges with other pycarta features:

In [None]:
def demonstrate_integration():
    """Demonstrate integration with other pycarta modules."""
    print("""
    INTEGRATION WITH OTHER PYCARTA MODULES:
    
    1. With FormsDB (Data Management):
    
    import pycarta as pc
    from pycarta.formsdb import FormsDb
    from pycarta.sbg import ExecutableProject
    
    # Store analysis metadata in FormsDB
    pc.login()
    formsdb = FormsDb(credentials=pc.get_agent(), project_id="genomics")
    genomics = ExecutableProject("division/genomics")
    
    # Create schema for analysis tracking
    analysis_schema = formsdb.schema.create("sbg-analysis", {
        "type": "object",
        "properties": {
            "analysis_id": {"type": "string"},
            "sample_ids": {"type": "array", "items": {"type": "string"}},
            "pipeline_version": {"type": "string"},
            "start_time": {"type": "string", "format": "date-time"},
            "end_time": {"type": "string", "format": "date-time"},
            "results": {"type": "object"},
            "resource_usage": {"type": "object"}
        }
    })
    
    # Run analysis and store metadata
    start_time = datetime.now()
    
    result = genomics.variant_caller(
        input_bam="sample.bam",
        reference="hg38.fa"
    )
    
    end_time = datetime.now()
    
    # Store analysis record
    analysis_record = {
        "analysis_id": "ANALYSIS_001",
        "sample_ids": ["SAMPLE_001"],
        "pipeline_version": "v2.1",
        "start_time": start_time.isoformat(),
        "end_time": end_time.isoformat(),
        "results": result,
        "resource_usage": genomics.get_last_execution_stats()
    }
    
    folder = formsdb.folder.create("genomics/analyses")
    formsdb.data.create(folder, analysis_schema, analysis_record)
    
    2. With Services (API Creation):
    
    import pycarta as pc
    from pycarta.sbg import ExecutableProject
    
    # Create service that wraps SBG functionality
    genomics = ExecutableProject("division/genomics")
    service = pc.service("genomics-api", "variant-calling")
    
    @service.post("/analyze/variants")
    def analyze_variants(sample_id: str, bam_file: str, reference: str = "hg38"):
        """Run variant calling analysis via Seven Bridges."""
        try:
            result = genomics.variant_caller(
                input_bam=bam_file,
                reference_genome=f"references/{reference}.fa",
                sample_id=sample_id
            )
            
            return {
                "status": "success",
                "sample_id": sample_id,
                "variant_file": result["output_vcf"],
                "stats": result["stats"]
            }
            
        except Exception as e:
            return {
                "status": "error",
                "error": str(e)
            }
    
    @service.get("/status/{analysis_id}")
    def get_analysis_status(analysis_id: str):
        """Get status of running analysis."""
        status = genomics.check_task_status(analysis_id)
        return {"analysis_id": analysis_id, "status": status}
    
    3. With MQTT (Real-time Updates):
    
    from pycarta.mqtt import publish
    from pycarta.sbg import ExecutableProject
    
    genomics = ExecutableProject("division/genomics")
    
    @publish("genomics/analysis/status")
    def run_analysis_with_updates(sample_id, input_files):
        """Run analysis with real-time status updates."""
        
        # Publish start notification
        start_msg = {
            "sample_id": sample_id,
            "status": "started",
            "timestamp": datetime.now().isoformat()
        }
        
        try:
            # Run analysis
            result = genomics.variant_caller(**input_files)
            
            # Publish completion
            return {
                "sample_id": sample_id,
                "status": "completed",
                "timestamp": datetime.now().isoformat(),
                "results": result
            }
            
        except Exception as e:
            # Publish error
            return {
                "sample_id": sample_id,
                "status": "failed",
                "timestamp": datetime.now().isoformat(),
                "error": str(e)
            }
    
    4. With Secrets Management:
    
    from pycarta.admin.secret import get_secret, put_secret
    
    # Store SBG credentials securely
    put_secret("sbg_token", "your_sbg_token_here")
    put_secret("sbg_project", "division/my-project")
    
    # Use stored credentials
    sbg_token = get_secret("sbg_token")
    sbg_project = get_secret("sbg_project")
    
    # Configure SBG connection
    genomics = ExecutableProject(
        project=sbg_project,
        token=sbg_token
    )
    
    5. Complete Integrated Workflow:
    
    # Comprehensive workflow combining all modules
    def integrated_genomics_workflow(sample_metadata):
        # 1. Store sample metadata in FormsDB
        sample_record = formsdb.data.create(
            folder, sample_schema, sample_metadata
        )
        
        # 2. Run analysis on Seven Bridges
        analysis_result = genomics.comprehensive_pipeline(
            **sample_metadata["files"]
        )
        
        # 3. Publish progress via MQTT
        publish_analysis_update({
            "sample_id": sample_metadata["id"],
            "status": "analysis_complete",
            "results_summary": analysis_result["summary"]
        })
        
        # 4. Store results back in FormsDB
        results_record = {
            "sample_id": sample_metadata["id"],
            "analysis_date": datetime.now().isoformat(),
            "results": analysis_result,
            "quality_metrics": analysis_result["qc"]
        }
        
        formsdb.data.create(
            results_folder, results_schema, results_record
        )
        
        # 5. Make results available via service API
        return {
            "sample_id": sample_metadata["id"],
            "status": "complete",
            "results_id": results_record["id"],
            "api_endpoint": f"/genomics/results/{sample_metadata['id']}"
        }
    """)

demonstrate_integration()

## Best Practices for Seven Bridges Integration

In [None]:
print("""
SEVEN BRIDGES INTEGRATION BEST PRACTICES:

1. Authentication and Security:
   - Store SBG credentials as Carta secrets for portability
   - Use environment variables for development
   - Never hardcode API tokens in code
   - Regularly rotate authentication tokens
   - Use least-privilege project access

2. Resource Management:
   - Set appropriate timeouts for long-running tasks
   - Use cleanup=True in production to manage storage costs
   - Monitor resource usage and costs
   - Use appropriate instance types for workloads
   - Implement retry logic with exponential backoff

3. File Management:
   - Validate file existence before starting tasks
   - Use checksums to verify file integrity
   - Implement efficient upload/download strategies
   - Clean up temporary files regularly
   - Use compression for large files when appropriate

4. Error Handling:
   - Implement comprehensive error handling
   - Log all task submissions and results
   - Set up monitoring and alerting
   - Provide meaningful error messages
   - Save task metadata for debugging

5. Performance Optimization:
   - Batch similar tasks when possible
   - Use appropriate polling frequencies
   - Cache frequently used reference files
   - Parallelize independent operations
   - Profile and optimize data transfer

6. Workflow Design:
   - Design modular, reusable workflows
   - Document all inputs and outputs
   - Version control workflow definitions
   - Test workflows with small datasets first
   - Implement checkpointing for long workflows

7. Integration:
   - Use FormsDB for metadata management
   - Create service APIs for workflow access
   - Implement MQTT for real-time updates
   - Store results in structured formats
   - Maintain data lineage and provenance

8. Testing and Validation:
   - Test with known datasets and expected results
   - Validate outputs against established benchmarks
   - Implement unit tests for workflow components
   - Use staging environments for testing
   - Document validation procedures

9. Monitoring and Maintenance:
   - Monitor task success/failure rates
   - Track resource usage and costs
   - Set up alerts for failures
   - Regular review of workflows and performance
   - Keep SBG platform and pycarta updated

10. Documentation:
    - Document all workflow steps clearly
    - Provide example usage and expected outputs
    - Maintain changelog for workflow versions
    - Create troubleshooting guides
    - Document resource requirements and costs
""")

## Summary

This notebook has demonstrated the comprehensive Seven Bridges integration capabilities of pycarta:

1. **Authentication** - Multiple methods for secure SBG access
2. **ExecutableApp** - Convert individual SBG Apps to Python functions
3. **ExecutableProject** - Access entire SBG projects as Python classes
4. **File Management** - Automatic upload/download handling
5. **Progress Tracking** - Monitor long-running computational tasks
6. **Integration** - Combine with other pycarta modules
7. **Best Practices** - Guidelines for production workflows

The Seven Bridges integration enables seamless incorporation of cloud-based computational workflows into Python data analysis pipelines, making complex bioinformatics and computational tasks accessible through simple function calls.