# Collaborative Earth System Model Workflows with Tellus

## User Story: Multi-Institution Climate Model Intercomparison

**Scenario**: Dr. Sarah Chen leads a multi-institution climate model intercomparison project involving researchers from NCAR, GFDL, and MPI-M. The team needs to share model configurations, exchange simulation data, coordinate analysis workflows, and maintain synchronized datasets across different computing environments.

**Goals**:
- Set up shared data repositories accessible by all team members
- Establish consistent simulation management across institutions
- Implement automated data synchronization between sites
- Create collaborative analysis workflows with version control
- Monitor shared resources and coordinate large data transfers

**Key Features Demonstrated**:
- Multi-location data management
- Collaborative simulation tracking
- Automated synchronization workflows
- Shared archive management
- Progress monitoring across institutions

## 1. Setting Up Multi-Institutional Storage Locations

First, we'll configure storage locations for each participating institution and shared resources.

In [None]:
# Import required modules
from tellus.application.container import ServiceContainer
from tellus.application.dtos import (
    CreateLocationDto, CreateSimulationDto, CreateArchiveDto,
    FileTransferOperationDto, BatchFileTransferOperationDto
)
from tellus.domain.entities.location import LocationKind
import json
from pathlib import Path

# Initialize service container
container = ServiceContainer()
location_service = container.get_location_service()
simulation_service = container.get_simulation_service()
archive_service = container.get_archive_service()
transfer_service = container.get_file_transfer_service()

In [None]:
# Configure NCAR's Cheyenne supercomputer
ncar_dto = CreateLocationDto(
    name="ncar-cheyenne",
    kinds=[LocationKind.COMPUTE, LocationKind.FILESERVER],
    protocol="ssh",
    host="cheyenne.ucar.edu",
    username="sarahc",
    path="/glade/work/sarahc/cmip6-intercomparison",
    description="NCAR Cheyenne supercomputer for CMIP6 intercomparison project",
    metadata={
        "institution": "NCAR",
        "project": "cmip6-intercomparison",
        "contact": "sarah.chen@ncar.ucar.edu",
        "allocation": "P12345678",
        "storage_quota_tb": 50
    }
)
ncar_result = location_service.create_location(ncar_dto)
print(f"✓ Created NCAR location: {ncar_result.name}")

# Configure GFDL's analysis cluster
gfdl_dto = CreateLocationDto(
    name="gfdl-analysis",
    kinds=[LocationKind.COMPUTE, LocationKind.FILESERVER],
    protocol="ssh",
    host="analysis.gfdl.noaa.gov",
    username="sarah.chen",
    path="/home/sarah.chen/cmip6-data",
    description="GFDL analysis cluster for collaborative model analysis",
    metadata={
        "institution": "GFDL",
        "project": "cmip6-intercomparison",
        "contact": "sarah.chen@noaa.gov",
        "storage_quota_tb": 25
    }
)
gfdl_result = location_service.create_location(gfdl_dto)
print(f"✓ Created GFDL location: {gfdl_result.name}")

# Configure MPI-M's shared storage
mpim_dto = CreateLocationDto(
    name="mpim-shared",
    kinds=[LocationKind.FILESERVER, LocationKind.DISK],
    protocol="ssh",
    host="mistral.dkrz.de",
    username="s.chen",
    path="/work/bb1234/cmip6-collaboration",
    description="MPI-M shared storage on DKRZ Mistral",
    metadata={
        "institution": "MPI-M",
        "project": "cmip6-intercomparison",
        "contact": "sarah.chen@mpimet.mpg.de",
        "storage_quota_tb": 75
    }
)
mpim_result = location_service.create_location(mpim_dto)
print(f"✓ Created MPI-M location: {mpim_result.name}")

# Configure shared cloud storage for data exchange
cloud_dto = CreateLocationDto(
    name="cmip6-cloud-exchange",
    kinds=[LocationKind.FILESERVER],
    protocol="s3",
    host="s3.amazonaws.com",
    bucket="cmip6-intercomparison-data",
    path="/shared-datasets",
    description="AWS S3 bucket for inter-institutional data exchange",
    metadata={
        "institution": "Collaborative",
        "project": "cmip6-intercomparison",
        "access_policy": "multi-institutional",
        "cost_center": "joint-funding"
    }
)
cloud_result = location_service.create_location(cloud_dto)
print(f"✓ Created cloud exchange location: {cloud_result.name}")

## 2. Creating Collaborative Simulation Registry

Establish a shared simulation registry that tracks model runs across all institutions.

In [None]:
# Create NCAR CESM2 simulation
ncar_sim_dto = CreateSimulationDto(
    simulation_id="ncar-cesm2-historical-r1i1p1f1",
    model_id="cesm2",
    attrs={
        "experiment": "historical",
        "institution": "NCAR",
        "variant_label": "r1i1p1f1",
        "grid_label": "gn",
        "contact": "sarah.chen@ncar.ucar.edu",
        "collaboration": "cmip6-intercomparison",
        "status": "completed",
        "start_date": "1850-01-01",
        "end_date": "2014-12-31",
        "time_step": "monthly",
        "output_frequency": ["mon", "day"],
        "variables": ["tas", "pr", "psl", "ua", "va", "ta"]
    }
)
ncar_sim = simulation_service.create_simulation(ncar_sim_dto)
print(f"✓ Created NCAR simulation: {ncar_sim.simulation_id}")

# Associate NCAR simulation with its primary location
from tellus.application.dtos import SimulationLocationAssociationDto
ncar_assoc_dto = SimulationLocationAssociationDto(
    simulation_id=ncar_sim.simulation_id,
    location_names=["ncar-cheyenne"],
    context_overrides={
        "ncar-cheyenne": {
            "path_prefix": "/glade/work/sarahc/cesm2/historical/r1i1p1f1",
            "data_structure": "cmip6",
            "access_level": "institutional"
        }
    }
)
simulation_service.associate_simulation_with_locations(ncar_assoc_dto)
print(f"✓ Associated NCAR simulation with storage location")

In [None]:
# Create GFDL CM4 simulation
gfdl_sim_dto = CreateSimulationDto(
    simulation_id="gfdl-cm4-historical-r1i1p1f1",
    model_id="gfdl-cm4",
    attrs={
        "experiment": "historical",
        "institution": "GFDL",
        "variant_label": "r1i1p1f1",
        "grid_label": "gr1",
        "contact": "john.smith@noaa.gov",
        "collaboration": "cmip6-intercomparison",
        "status": "completed",
        "start_date": "1850-01-01",
        "end_date": "2014-12-31",
        "time_step": "monthly",
        "output_frequency": ["mon", "day", "6hr"],
        "variables": ["tas", "pr", "psl", "ua", "va", "ta", "hus"]
    }
)
gfdl_sim = simulation_service.create_simulation(gfdl_sim_dto)

# Associate with GFDL location and cloud exchange
gfdl_assoc_dto = SimulationLocationAssociationDto(
    simulation_id=gfdl_sim.simulation_id,
    location_names=["gfdl-analysis", "cmip6-cloud-exchange"],
    context_overrides={
        "gfdl-analysis": {
            "path_prefix": "/home/john.smith/cm4/historical/r1i1p1f1",
            "data_structure": "cmip6",
            "access_level": "institutional"
        },
        "cmip6-cloud-exchange": {
            "path_prefix": "/gfdl-cm4/historical",
            "data_structure": "cmip6",
            "access_level": "collaborative",
            "sync_policy": "on-demand"
        }
    }
)
simulation_service.associate_simulation_with_locations(gfdl_assoc_dto)
print(f"✓ Created and associated GFDL simulation: {gfdl_sim.simulation_id}")

In [None]:
# Create MPI-M MPI-ESM1.2 simulation
mpim_sim_dto = CreateSimulationDto(
    simulation_id="mpim-esm1-2-historical-r1i1p1f1",
    model_id="mpi-esm1-2-hr",
    attrs={
        "experiment": "historical",
        "institution": "MPI-M",
        "variant_label": "r1i1p1f1",
        "grid_label": "gn",
        "contact": "maria.mueller@mpimet.mpg.de",
        "collaboration": "cmip6-intercomparison",
        "status": "running",
        "start_date": "1850-01-01",
        "end_date": "2014-12-31",
        "current_date": "1995-06-15",
        "time_step": "monthly",
        "output_frequency": ["mon"],
        "variables": ["tas", "pr", "psl", "uas", "vas"]
    }
)
mpim_sim = simulation_service.create_simulation(mpim_sim_dto)

# Associate with MPI-M location
mpim_assoc_dto = SimulationLocationAssociationDto(
    simulation_id=mpim_sim.simulation_id,
    location_names=["mpim-shared"],
    context_overrides={
        "mpim-shared": {
            "path_prefix": "/work/bb1234/mpi-esm1-2/historical/r1i1p1f1",
            "data_structure": "cmip6",
            "access_level": "institutional"
        }
    }
)
simulation_service.associate_simulation_with_locations(mpim_assoc_dto)
print(f"✓ Created and associated MPI-M simulation: {mpim_sim.simulation_id}")

# List all collaborative simulations
print("\n📋 Collaborative Simulation Registry:")
simulations = simulation_service.list_simulations()
for sim in simulations.simulations:
    if sim.attrs.get('collaboration') == 'cmip6-intercomparison':
        status = sim.attrs.get('status', 'unknown')
        institution = sim.attrs.get('institution', 'unknown')
        print(f"  • {sim.simulation_id} ({institution}) - {status}")

## 3. Setting Up Shared Data Archives

Create archives for sharing processed data and analysis results between institutions.

In [None]:
# Create archive for NCAR processed data to share
ncar_share_dto = CreateArchiveDto(
    archive_id="ncar-cesm2-processed-v1.0",
    location_name="cmip6-cloud-exchange",
    archive_type="compressed",
    simulation_id=ncar_sim.simulation_id,
    version="1.0",
    description="NCAR CESM2 historical run - processed monthly data for intercomparison",
    tags={"ncar", "cesm2", "processed", "monthly", "collaborative", "v1.0"}
)

# In a real scenario, this would create the actual archive
# For demonstration, we'll create the metadata
ncar_archive = archive_service.create_archive_metadata(ncar_share_dto)
print(f"✓ Created NCAR shared archive: {ncar_archive.archive_id}")

# Create archive for GFDL analysis-ready data
gfdl_share_dto = CreateArchiveDto(
    archive_id="gfdl-cm4-analysis-ready-v1.0",
    location_name="cmip6-cloud-exchange",
    archive_type="compressed",
    simulation_id=gfdl_sim.simulation_id,
    version="1.0",
    description="GFDL CM4 historical run - analysis-ready data with standardized grid",
    tags={"gfdl", "cm4", "analysis-ready", "regridded", "collaborative", "v1.0"}
)
gfdl_archive = archive_service.create_archive_metadata(gfdl_share_dto)
print(f"✓ Created GFDL shared archive: {gfdl_archive.archive_id}")

# Create collaborative analysis results archive
collab_results_dto = CreateArchiveDto(
    archive_id="cmip6-intercomparison-results-v1.0",
    location_name="cmip6-cloud-exchange",
    archive_type="directory",
    version="1.0",
    description="Collaborative CMIP6 intercomparison analysis results and figures",
    tags={"collaborative", "results", "figures", "analysis", "intercomparison", "v1.0"}
)
results_archive = archive_service.create_archive_metadata(collab_results_dto)
print(f"✓ Created collaborative results archive: {results_archive.archive_id}")

## 4. Implementing Automated Data Synchronization

Set up automated workflows to synchronize data between institutions and the shared cloud storage.

In [None]:
# Define synchronization workflow for NCAR data
def create_ncar_sync_workflow():
    """Create automated sync workflow for NCAR CESM2 data."""
    
    sync_operations = [
        # Monthly atmospheric data
        {
            "source_path": "/glade/work/sarahc/cesm2/historical/r1i1p1f1/atm/hist/monthly/*.nc",
            "dest_path": "/ncar-cesm2/atm/monthly/",
            "pattern": "*.nc",
            "priority": "high"
        },
        # Ocean data (selected variables)
        {
            "source_path": "/glade/work/sarahc/cesm2/historical/r1i1p1f1/ocn/hist/monthly/tos_*.nc",
            "dest_path": "/ncar-cesm2/ocn/monthly/",
            "pattern": "tos_*.nc",
            "priority": "medium"
        },
        # Analysis scripts and configurations
        {
            "source_path": "/glade/work/sarahc/analysis/cesm2-scripts/*",
            "dest_path": "/ncar-cesm2/scripts/",
            "pattern": "*",
            "priority": "low"
        }
    ]
    
    return sync_operations

# Create batch transfer for NCAR data synchronization
ncar_sync_ops = create_ncar_sync_workflow()
ncar_batch_dto = BatchFileTransferOperationDto(
    source_location="ncar-cheyenne",
    dest_location="cmip6-cloud-exchange",
    file_operations=[
        {
            "source_path": op["source_path"],
            "dest_path": op["dest_path"],
            "metadata": {
                "priority": op["priority"],
                "sync_type": "automated",
                "institution": "NCAR"
            }
        }
        for op in ncar_sync_ops[:1]  # Demo with first operation only
    ],
    verify_checksum=True,
    continue_on_error=True,
    metadata={
        "workflow": "ncar-daily-sync",
        "institution": "NCAR",
        "project": "cmip6-intercomparison"
    }
)

print(f"✓ Configured NCAR sync workflow with {len(ncar_batch_dto.file_operations)} operations")

# Configure GFDL selective synchronization
gfdl_batch_dto = BatchFileTransferOperationDto(
    source_location="gfdl-analysis",
    dest_location="cmip6-cloud-exchange",
    file_operations=[
        {
            "source_path": "/home/john.smith/cm4/analysis-ready/tas_regridded.nc",
            "dest_path": "/gfdl-cm4/analysis-ready/tas_regridded.nc",
            "metadata": {"variable": "tas", "processing": "regridded"}
        },
        {
            "source_path": "/home/john.smith/cm4/analysis-ready/pr_regridded.nc",
            "dest_path": "/gfdl-cm4/analysis-ready/pr_regridded.nc",
            "metadata": {"variable": "pr", "processing": "regridded"}
        }
    ],
    verify_checksum=True,
    metadata={
        "workflow": "gfdl-analysis-sync",
        "institution": "GFDL",
        "data_type": "analysis-ready"
    }
)

print(f"✓ Configured GFDL sync workflow with {len(gfdl_batch_dto.file_operations)} operations")

## 5. Coordinated Analysis Workflows

Demonstrate how teams can coordinate analysis workflows and share results using Tellus.

In [None]:
# Create shared analysis configuration
analysis_config = {
    "project": "cmip6-intercomparison",
    "phase": "analysis",
    "participants": {
        "ncar": {
            "lead": "sarah.chen@ncar.ucar.edu",
            "data_contribution": "cesm2-historical",
            "analysis_focus": "temperature-precipitation-coupling"
        },
        "gfdl": {
            "lead": "john.smith@noaa.gov",
            "data_contribution": "cm4-historical",
            "analysis_focus": "ocean-atmosphere-interaction"
        },
        "mpim": {
            "lead": "maria.mueller@mpimet.mpg.de",
            "data_contribution": "mpi-esm1-2-historical",
            "analysis_focus": "european-climate-extremes"
        }
    },
    "shared_resources": {
        "storage": "cmip6-cloud-exchange",
        "compute": "distributed",
        "coordination": "weekly-meetings"
    },
    "deliverables": {
        "interim_results": "2024-06-30",
        "final_analysis": "2024-09-30",
        "publication": "2024-12-31"
    }
}

# Create shared analysis simulation for coordination
analysis_sim_dto = CreateSimulationDto(
    simulation_id="cmip6-intercomparison-analysis-2024",
    model_id="multi-model-analysis",
    attrs={
        "type": "collaborative-analysis",
        "project": "cmip6-intercomparison",
        "participants": ["NCAR", "GFDL", "MPI-M"],
        "models": ["cesm2", "gfdl-cm4", "mpi-esm1-2-hr"],
        "analysis_period": "1850-2014",
        "focus_regions": ["global", "north-america", "europe", "arctic"],
        "variables": ["tas", "pr", "psl", "tos", "extreme_indices"],
        "status": "active",
        **analysis_config
    }
)

analysis_sim = simulation_service.create_simulation(analysis_sim_dto)
print(f"✓ Created collaborative analysis simulation: {analysis_sim.simulation_id}")

# Associate analysis with shared storage
analysis_assoc_dto = SimulationLocationAssociationDto(
    simulation_id=analysis_sim.simulation_id,
    location_names=["cmip6-cloud-exchange"],
    context_overrides={
        "cmip6-cloud-exchange": {
            "path_prefix": "/collaborative-analysis/2024",
            "data_structure": "analysis-project",
            "access_level": "collaborative",
            "workflow_coordination": True
        }
    }
)
simulation_service.associate_simulation_with_locations(analysis_assoc_dto)
print(f"✓ Associated analysis simulation with shared storage")

In [None]:
# Create analysis result sharing workflow
def create_result_sharing_workflow():
    """Create workflow for sharing analysis results between institutions."""
    
    # Define shared analysis deliverables
    shared_results = [
        {
            "category": "temperature-analysis",
            "files": [
                "global_temperature_trends_1850-2014.nc",
                "regional_temperature_extremes.nc",
                "temperature_variance_analysis.nc"
            ],
            "responsible": "NCAR",
            "deadline": "2024-07-15"
        },
        {
            "category": "precipitation-analysis",
            "files": [
                "precipitation_patterns_cmip6.nc",
                "drought_frequency_analysis.nc",
                "seasonal_precipitation_cycle.nc"
            ],
            "responsible": "GFDL",
            "deadline": "2024-07-30"
        },
        {
            "category": "extreme-events",
            "files": [
                "european_heatwaves_1850-2014.nc",
                "extreme_precipitation_indices.nc",
                "compound_extreme_events.nc"
            ],
            "responsible": "MPI-M",
            "deadline": "2024-08-15"
        }
    ]
    
    return shared_results

# Set up result collection workflow
results = create_result_sharing_workflow()
print("📊 Collaborative Analysis Result Sharing Plan:")
for result in results:
    print(f"\n  {result['category'].upper()} (Lead: {result['responsible']})")
    print(f"    Deadline: {result['deadline']}")
    print(f"    Files: {len(result['files'])} deliverables")
    for file in result['files'][:2]:  # Show first 2 files
        print(f"      • {file}")
    if len(result['files']) > 2:
        print(f"      • ... and {len(result['files']) - 2} more")

## 6. Monitoring Collaborative Operations

Implement monitoring and coordination tools for the collaborative project.

In [None]:
# Create monitoring dashboard data structure
def create_collaboration_dashboard():
    """Create a monitoring dashboard for the collaborative project."""
    
    # Get all simulations for the project
    all_simulations = simulation_service.list_simulations()
    project_sims = [
        sim for sim in all_simulations.simulations 
        if sim.attrs.get('collaboration') == 'cmip6-intercomparison'
    ]
    
    # Get all shared archives
    all_archives = archive_service.list_archives()
    project_archives = [
        arch for arch in all_archives.archives
        if 'collaborative' in arch.tags or 'cmip6' in arch.tags
    ]
    
    # Create status summary
    dashboard = {
        "project": "CMIP6 Multi-Institutional Intercomparison",
        "last_updated": "2024-06-15T10:30:00Z",
        "participants": {
            "NCAR": {
                "status": "active",
                "simulations": len([s for s in project_sims if s.attrs.get('institution') == 'NCAR']),
                "data_contributed_tb": 15.2,
                "last_sync": "2024-06-15T08:45:00Z"
            },
            "GFDL": {
                "status": "active",
                "simulations": len([s for s in project_sims if s.attrs.get('institution') == 'GFDL']),
                "data_contributed_tb": 12.8,
                "last_sync": "2024-06-14T16:20:00Z"
            },
            "MPI-M": {
                "status": "active",
                "simulations": len([s for s in project_sims if s.attrs.get('institution') == 'MPI-M']),
                "data_contributed_tb": 8.5,
                "last_sync": "2024-06-15T06:15:00Z"
            }
        },
        "storage": {
            "total_used_tb": 36.5,
            "available_tb": 163.5,
            "archives": len(project_archives),
            "sync_operations_today": 24,
            "failed_transfers": 2
        },
        "analysis_progress": {
            "temperature_analysis": "75% complete",
            "precipitation_analysis": "60% complete",
            "extreme_events": "40% complete",
            "next_milestone": "2024-07-30"
        }
    }
    
    return dashboard

# Generate current project status
dashboard = create_collaboration_dashboard()
print("🌍 CMIP6 Collaborative Project Dashboard")
print(f"Last Updated: {dashboard['last_updated']}")
print("\n👥 Participant Status:")
for institution, status in dashboard['participants'].items():
    print(f"  {institution}:")
    print(f"    Status: {status['status']}")
    print(f"    Simulations: {status['simulations']}")
    print(f"    Data Contributed: {status['data_contributed_tb']} TB")
    print(f"    Last Sync: {status['last_sync']}\n")

print("💾 Shared Storage Status:")
storage = dashboard['storage']
print(f"  Used: {storage['total_used_tb']} TB ({storage['total_used_tb']/(storage['total_used_tb']+storage['available_tb'])*100:.1f}% full)")
print(f"  Available: {storage['available_tb']} TB")
print(f"  Archives: {storage['archives']}")
print(f"  Sync Operations Today: {storage['sync_operations_today']}")
if storage['failed_transfers'] > 0:
    print(f"  ⚠️  Failed Transfers: {storage['failed_transfers']}")

print("\n📊 Analysis Progress:")
for analysis, progress in dashboard['analysis_progress'].items():
    if analysis != 'next_milestone':
        print(f"  {analysis.replace('_', ' ').title()}: {progress}")
print(f"\n🎯 Next Milestone: {dashboard['analysis_progress']['next_milestone']}")

## 7. Best Practices for Collaborative Workflows

Key recommendations for successful multi-institutional Earth System Model collaborations using Tellus.

In [None]:
# Collaborative workflow best practices
best_practices = {
    "data_management": {
        "standardization": [
            "Use consistent file naming conventions across all institutions",
            "Adopt standard metadata schemas (CF conventions, CMIP6 standards)",
            "Implement common directory structures and data organization",
            "Use version control for data releases and updates"
        ],
        "access_control": [
            "Define clear access levels (institutional, collaborative, public)",
            "Use shared authentication systems where possible",
            "Implement audit trails for data access and modifications",
            "Regular access permission reviews and updates"
        ]
    },
    "synchronization": {
        "scheduling": [
            "Coordinate sync windows to avoid conflicts",
            "Use priority-based transfer queuing",
            "Implement retry logic with exponential backoff",
            "Monitor bandwidth usage across institutions"
        ],
        "validation": [
            "Always enable checksum verification for critical data",
            "Implement automated integrity checks post-transfer",
            "Use staged validation for large datasets",
            "Maintain transfer logs and error reporting"
        ]
    },
    "coordination": {
        "communication": [
            "Regular video conferences for project coordination",
            "Shared documentation and progress tracking",
            "Clear milestone definitions and deadlines",
            "Incident response procedures for data issues"
        ],
        "technical": [
            "Common analysis environments and software versions",
            "Shared code repositories for analysis scripts",
            "Coordinated compute resource allocation",
            "Joint troubleshooting and technical support"
        ]
    },
    "quality_assurance": {
        "validation": [
            "Cross-institutional data validation protocols",
            "Automated quality control checks",
            "Peer review of analysis methodologies",
            "Regular data provenance audits"
        ],
        "reproducibility": [
            "Version-controlled analysis workflows",
            "Documented computational environments",
            "Shared analysis notebooks and scripts",
            "Reproducible figure generation pipelines"
        ]
    }
}

print("📋 Collaborative Earth System Model Workflow Best Practices\n")

for category, subcategories in best_practices.items():
    print(f"🔧 {category.replace('_', ' ').upper()}")
    for subcat, practices in subcategories.items():
        print(f"\n  {subcat.title()}:")
        for i, practice in enumerate(practices, 1):
            print(f"    {i}. {practice}")
    print()

## 8. Troubleshooting Common Collaboration Issues

Solutions for typical challenges in multi-institutional data management.

In [None]:
# Common issues and solutions
troubleshooting_guide = {
    "sync_failures": {
        "symptoms": ["Transfer timeouts", "Authentication errors", "Partial file transfers"],
        "causes": ["Network instability", "Credential expiration", "Storage quota exceeded"],
        "solutions": [
            "Implement automatic retry with exponential backoff",
            "Set up credential refresh mechanisms",
            "Monitor storage quotas and implement alerts",
            "Use chunked transfers for large files",
            "Configure network timeout parameters appropriately"
        ]
    },
    "data_inconsistencies": {
        "symptoms": ["Checksum mismatches", "Missing files", "Version conflicts"],
        "causes": ["Concurrent modifications", "Incomplete transfers", "Metadata corruption"],
        "solutions": [
            "Implement file locking during transfers",
            "Use atomic operations for critical files",
            "Maintain detailed transfer and modification logs",
            "Regular integrity checks and validation",
            "Version control with clear branching strategies"
        ]
    },
    "access_issues": {
        "symptoms": ["Permission denied", "Authentication failures", "Quota exceeded"],
        "causes": ["Expired credentials", "Changed permissions", "Storage limits reached"],
        "solutions": [
            "Automated credential monitoring and renewal",
            "Regular permission audits and updates",
            "Proactive quota monitoring with alerts",
            "Clear escalation procedures for access issues",
            "Backup authentication methods"
        ]
    },
    "performance_issues": {
        "symptoms": ["Slow transfers", "High latency", "Resource contention"],
        "causes": ["Network congestion", "Competing workloads", "Suboptimal configurations"],
        "solutions": [
            "Schedule transfers during off-peak hours",
            "Implement transfer prioritization and queuing",
            "Optimize chunk sizes for network conditions",
            "Use parallel transfers where appropriate",
            "Regular performance monitoring and tuning"
        ]
    }
}

print("🔧 Collaborative Workflow Troubleshooting Guide\n")

for issue_type, details in troubleshooting_guide.items():
    print(f"❌ {issue_type.replace('_', ' ').upper()}")
    
    print("\n  Symptoms:")
    for symptom in details['symptoms']:
        print(f"    • {symptom}")
    
    print("\n  Common Causes:")
    for cause in details['causes']:
        print(f"    • {cause}")
    
    print("\n  Solutions:")
    for i, solution in enumerate(details['solutions'], 1):
        print(f"    {i}. {solution}")
    
    print("\n" + "-"*60 + "\n")

print("\n💡 For additional support, contact your institutional Tellus administrator")
print("📧 Project coordination: cmip6-intercomparison@climate-collab.org")

## Summary

This notebook demonstrated a comprehensive collaborative workflow for multi-institutional Earth System Model research using Tellus:

### Key Accomplishments:

1. **Multi-Institutional Setup**: Configured storage locations across NCAR, GFDL, and MPI-M with shared cloud exchange
2. **Collaborative Registry**: Established unified simulation tracking across all participating institutions
3. **Shared Archives**: Created collaborative data archives for processed datasets and analysis results
4. **Automated Synchronization**: Implemented automated data sync workflows between institutions
5. **Coordinated Analysis**: Set up shared analysis projects with clear deliverables and timelines
6. **Monitoring Dashboard**: Created project monitoring tools for tracking progress and resource usage
7. **Best Practices**: Documented proven strategies for successful collaborative data management
8. **Troubleshooting**: Provided solutions for common multi-institutional workflow challenges

### Collaborative Benefits:

- **Unified Data Management**: Single interface for managing data across multiple institutions
- **Automated Coordination**: Reduced manual effort through automated synchronization workflows
- **Consistent Standards**: Enforced common data formats, naming conventions, and quality standards
- **Progress Transparency**: Real-time visibility into project status across all participants
- **Resource Optimization**: Efficient use of storage and compute resources through coordinated planning

### Next Steps:

- Scale synchronization workflows to handle larger datasets
- Implement advanced collaboration features like conflict resolution
- Develop institution-specific customizations and optimizations
- Create automated reporting and milestone tracking systems
- Establish long-term data preservation and access strategies

This collaborative approach enables Earth System Model research teams to work together more effectively while maintaining institutional autonomy and security requirements.