# Personal Climate Data Management with Tellus

This notebook demonstrates how to use Tellus for personal climate research data management. You'll learn how to:

- Set up storage locations for your local and remote data
- Create simulations to organize model experiments
- Transfer files between storage locations
- Use context-aware path templating
- Monitor progress of long-running operations

## User Story: Graduate Student Climate Research

**Meet Sarah**: A graduate student studying climate variability using CESM2 model outputs. She needs to:
- Organize simulation data from multiple experiments
- Transfer large datasets between her laptop, university cluster, and archive storage
- Keep track of what data is stored where
- Share specific datasets with her advisor and collaborators

## Setup and Configuration

In [None]:
# Import required modules
import os
import tempfile
from pathlib import Path

# For this tutorial, we'll work with the Tellus CLI and Python API
from tellus.application.container import get_service_container
from tellus.application.dtos import (
    CreateLocationDto,
    CreateSimulationDto,
    FileTransferOperationDto,
    SimulationLocationAssociationDto
)

print("✅ Tellus modules imported successfully")

### Create a Test Environment

For this tutorial, we'll create temporary directories to simulate Sarah's data environment:

In [None]:
# Create temporary directories for our example
base_dir = Path(tempfile.mkdtemp(prefix="sarah_climate_data_"))
print(f"📁 Working in: {base_dir}")

# Sarah's data organization
directories = {
    'laptop_data': base_dir / "laptop" / "climate_data",
    'university_cluster': base_dir / "cluster" / "scratch" / "sarah",
    'archive_storage': base_dir / "archive" / "climate_research",
    'shared_data': base_dir / "shared" / "sarah_experiments"
}

# Create directory structure
for name, path in directories.items():
    path.mkdir(parents=True, exist_ok=True)
    print(f"📂 Created {name}: {path}")

# Create some sample climate data files
sample_files = {
    'model_output_monthly': {
        'path': directories['laptop_data'] / "cesm2_monthly_2020.nc",
        'content': "# NetCDF: Monthly CESM2 output for 2020\n# Variables: tas, pr, psl\n# Resolution: 1-degree\n" + "x" * 1024 * 100  # ~100KB
    },
    'model_output_daily': {
        'path': directories['laptop_data'] / "cesm2_daily_2020_q1.nc",
        'content': "# NetCDF: Daily CESM2 output for Q1 2020\n# Variables: tas, pr\n# Resolution: 1-degree\n" + "x" * 1024 * 500  # ~500KB
    },
    'analysis_script': {
        'path': directories['laptop_data'] / "analysis_temperature_trends.py",
        'content': '''#!/usr/bin/env python3
"""
Temperature trend analysis for CESM2 historical simulation.
Author: Sarah (Graduate Student)
"""

import xarray as xr
import numpy as np
import matplotlib.pyplot as plt

def analyze_temperature_trends(input_file, output_dir):
    """Analyze temperature trends from CESM2 output."""
    # Load data
    ds = xr.open_dataset(input_file)
    
    # Calculate global mean temperature
    global_temp = ds.tas.weighted(ds.area).mean(dim=['lat', 'lon'])
    
    # Compute trend
    trend = global_temp.polyfit(dim='time', deg=1)
    
    # Save results
    trend.to_netcdf(output_dir / "temperature_trend.nc")
    
    return trend

if __name__ == "__main__":
    analyze_temperature_trends("cesm2_monthly_2020.nc", "./analysis_output")
'''
    },
    'config_file': {
        'path': directories['laptop_data'] / "cesm2_config.yaml",
        'content': '''# CESM2 Configuration for Historical Simulation
experiment:
  name: "historical_2020_analysis"
  description: "Temperature trend analysis for thesis chapter 3"
  start_date: "2020-01-01"
  end_date: "2020-12-31"
  
model:
  name: "CESM2"
  version: "2.1.3"
  resolution: "f09_g17"  # ~1-degree atmosphere, ~1-degree ocean
  
output:
  frequency: ["monthly", "daily"]
  variables: ["tas", "pr", "psl"]
  format: "netcdf4"
  
paths:
  input_data: "/glade/p/cesm/cseg/inputdata"
  case_root: "/glade/work/sarah/cesm_cases"
  archive_root: "/glade/scratch/sarah/archive"
'''
    }
}

# Write sample files
for name, file_info in sample_files.items():
    file_info['path'].write_text(file_info['content'])
    size_mb = len(file_info['content']) / 1024 / 1024
    print(f"📄 Created {name}: {file_info['path'].name} ({size_mb:.1f} MB)")

print(f"\n🎯 Sarah's test environment ready with {len(sample_files)} sample files")

## Step 1: Configure Storage Locations

Sarah needs to configure her storage locations so Tellus knows where her data can be stored and accessed.

In [None]:
# Get the service container
service_container = get_service_container()
location_service = service_container.service_factory.location_service

# Configure Sarah's storage locations
locations = [
    {
        'name': 'laptop-storage',
        'description': 'Local laptop storage for active analysis',
        'dto': CreateLocationDto(
            name="laptop-storage",
            kinds=["DISK"],
            protocol="file",
            path=str(directories['laptop_data']),
            optional=False,
            additional_config={
                'description': 'Local laptop storage for active analysis',
                'capacity_gb': 500,
                'access_speed': 'high'
            }
        )
    },
    {
        'name': 'university-cluster',
        'description': 'University HPC cluster scratch space',
        'dto': CreateLocationDto(
            name="university-cluster",
            kinds=["COMPUTE", "DISK"],
            protocol="ssh",
            path="/scratch/sarah",
            storage_options={
                'host': 'hpc.university.edu',
                'username': 'sarah',
                'key_filename': '/home/sarah/.ssh/hpc_key'
            },
            optional=True,
            additional_config={
                'description': 'University HPC cluster scratch space',
                'capacity_tb': 10,
                'purge_policy': 'auto_30_days'
            }
        )
    },
    {
        'name': 'archive-storage', 
        'description': 'Long-term archive for completed experiments',
        'dto': CreateLocationDto(
            name="archive-storage",
            kinds=["TAPE", "DISK"],
            protocol="file",  # Simulated as local for tutorial
            path=str(directories['archive_storage']),
            optional=True,
            additional_config={
                'description': 'Long-term archive for completed experiments',
                'retention_years': 10,
                'compression': 'gzip'
            }
        )
    },
    {
        'name': 'shared-data',
        'description': 'Shared space for collaboration with advisor',
        'dto': CreateLocationDto(
            name="shared-data",
            kinds=["FILESERVER"],
            protocol="file",  # Simulated as local for tutorial
            path=str(directories['shared_data']),
            optional=True,
            additional_config={
                'description': 'Shared space for collaboration with advisor',
                'access_permissions': 'group_rw',
                'quota_gb': 100
            }
        )
    }
]

# Create all locations
created_locations = []
for loc_config in locations:
    try:
        location = location_service.create_location(loc_config['dto'])
        created_locations.append(location)
        print(f"✅ Created location: {loc_config['name']} - {loc_config['description']}")
    except Exception as e:
        print(f"⚠️  Failed to create {loc_config['name']}: {e}")

print(f"\n📍 Successfully configured {len(created_locations)} storage locations")

## Step 2: Create Simulation Experiments

Sarah organizes her work into simulation experiments, each representing a specific research question or model configuration.

In [None]:
# Get simulation service
simulation_service = service_container.service_factory.simulation_service

# Sarah's simulation experiments
simulations = [
    {
        'id': 'sarah-historical-2020',
        'description': 'Historical simulation analysis for 2020 (thesis chapter 3)',
        'dto': CreateSimulationDto(
            simulation_id="sarah-historical-2020",
            model_id="CESM2",
            path="/analysis/historical_2020",
            attrs={
                'experiment': 'historical',
                'year': '2020',
                'model': 'CESM2',
                'resolution': 'f09_g17',
                'purpose': 'thesis_chapter_3',
                'researcher': 'sarah',
                'status': 'active',
                'variables': 'tas,pr,psl',
                'frequency': 'monthly,daily'
            },
            namelists={
                'atm_in': {'nhtfrq': [0, -24], 'mfilt': [12, 365]},
                'ocn_in': {'tavg_freq_opt': 'nmonth', 'tavg_freq': 1}
            },
            snakemakes={
                'analysis_workflow': 'workflows/temperature_analysis.smk',
                'data_processing': 'workflows/cesm_postprocess.smk'
            }
        )
    },
    {
        'id': 'sarah-sensitivity-co2',
        'description': 'CO2 sensitivity experiments for methodology paper',
        'dto': CreateSimulationDto(
            simulation_id="sarah-sensitivity-co2",
            model_id="CESM2",
            path="/analysis/co2_sensitivity",
            attrs={
                'experiment': 'sensitivity',
                'forcing': 'co2_2x,co2_4x',
                'model': 'CESM2',
                'resolution': 'f09_g17',
                'purpose': 'methodology_paper',
                'researcher': 'sarah',
                'status': 'planning',
                'duration_years': '50',
                'ensemble_size': '3'
            }
        )
    }
]

# Create simulations
created_simulations = []
for sim_config in simulations:
    try:
        simulation = simulation_service.create_simulation(sim_config['dto'])
        created_simulations.append(simulation)
        print(f"🎯 Created simulation: {sim_config['id']} - {sim_config['description']}")
    except Exception as e:
        print(f"⚠️  Failed to create {sim_config['id']}: {e}")

print(f"\n🔬 Successfully created {len(created_simulations)} simulation experiments")

## Step 3: Associate Simulations with Storage Locations

Now Sarah connects her simulations with storage locations, setting up path templates for organized data management.

In [None]:
# Associate historical simulation with multiple storage locations
historical_associations = [
    {
        'location': 'laptop-storage',
        'context': {
            'path_prefix': '{{researcher}}/{{experiment}}/{{year}}',
            'file_pattern': '{{model}}_{{frequency}}_{{year}}.nc',
            'usage': 'active_analysis'
        }
    },
    {
        'location': 'university-cluster',
        'context': {
            'path_prefix': '{{model}}/{{experiment}}/{{resolution}}',
            'compute_allocation': 'climate_group',
            'usage': 'model_runs'
        }
    },
    {
        'location': 'archive-storage',
        'context': {
            'path_prefix': '{{researcher}}/{{purpose}}/{{experiment}}_{{year}}',
            'compression': 'gzip',
            'usage': 'long_term_storage'
        }
    },
    {
        'location': 'shared-data',
        'context': {
            'path_prefix': '{{researcher}}/{{purpose}}/results',
            'access': 'advisor_group',
            'usage': 'collaboration'
        }
    }
]

# Create association DTO
association_dto = SimulationLocationAssociationDto(
    simulation_id="sarah-historical-2020",
    location_names=[assoc['location'] for assoc in historical_associations],
    context_overrides={assoc['location']: assoc['context'] for assoc in historical_associations}
)

try:
    updated_simulation = simulation_service.associate_simulation_with_locations(association_dto)
    print(f"🔗 Associated simulation with {len(historical_associations)} storage locations")
    
    # Show resolved paths
    print("\n📁 Resolved storage paths:")
    for assoc in historical_associations:
        location_name = assoc['location']
        # Simulate path resolution (would normally use simulation context)
        template = assoc['context'].get('path_prefix', '')
        # Simple template substitution for demo
        resolved = template.replace('{{researcher}}', 'sarah') \
                          .replace('{{experiment}}', 'historical') \
                          .replace('{{year}}', '2020') \
                          .replace('{{model}}', 'CESM2') \
                          .replace('{{purpose}}', 'thesis_chapter_3') \
                          .replace('{{resolution}}', 'f09_g17')
        print(f"  📂 {location_name}: {resolved}")
        
except Exception as e:
    print(f"⚠️  Failed to associate locations: {e}")

print("\n✅ Simulation-location associations configured successfully")

## Step 4: File Transfer Operations

Sarah needs to move data between her storage locations. Let's demonstrate transferring analysis results to the shared collaboration space.

In [None]:
# Get file transfer service
file_transfer_service = service_container.service_factory.file_transfer_service

# Create destination directories
laptop_analysis_dir = directories['laptop_data'] / "analysis_output"
laptop_analysis_dir.mkdir(exist_ok=True)

shared_results_dir = directories['shared_data'] / "sarah" / "thesis_chapter_3" / "results"
shared_results_dir.mkdir(parents=True, exist_ok=True)

# Simulate creating analysis results
analysis_files = {
    'temperature_trends.png': '''# Matplotlib figure data (simulated)
# Global temperature trends from CESM2 historical simulation
# Shows 2020 monthly temperature anomalies
# Generated by: analysis_temperature_trends.py
PNG_DATA_PLACEHOLDER''' + "x" * 1024 * 50,  # ~50KB
    
    'temperature_trend_coefficients.nc': '''# NetCDF: Temperature trend analysis results
# Variables: trend_slope, trend_intercept, r_squared, p_value
# Dimensions: lat, lon
# Source: CESM2 historical simulation 2020
NETCDF_DATA_PLACEHOLDER''' + "x" * 1024 * 200,  # ~200KB
    
    'analysis_summary.txt': '''Climate Analysis Summary - Sarah's Thesis Chapter 3
========================================================

Experiment: Historical 2020 Temperature Trends
Model: CESM2 (f09_g17 resolution)
Analysis Period: January 2020 - December 2020
Variables: Surface Air Temperature (tas)

Key Findings:
- Global mean temperature shows warming trend of +0.15K/decade
- Arctic amplification clearly visible in spatial patterns
- Strong correlation with observed trends (r=0.89, p<0.001)

Files Generated:
- temperature_trends.png: Visualization of spatial trends
- temperature_trend_coefficients.nc: Gridded trend coefficients
- monthly_timeseries.nc: Monthly temperature timeseries

Next Steps:
- Compare with CMIP6 ensemble
- Analyze seasonal variations
- Write manuscript section

Generated: {date}
Contact: sarah@university.edu
'''.format(date="2024-01-15")
}

# Create analysis result files
analysis_file_paths = []
for filename, content in analysis_files.items():
    file_path = laptop_analysis_dir / filename
    file_path.write_text(content)
    analysis_file_paths.append(file_path)
    size_kb = len(content) / 1024
    print(f"📊 Created analysis result: {filename} ({size_kb:.1f} KB)")

print(f"\n🔬 Analysis complete! Generated {len(analysis_files)} result files")

### Transfer Results to Shared Space

Now let's transfer Sarah's analysis results to the shared collaboration space where her advisor can access them.

In [None]:
# Create file transfer operations
transfer_operations = []

for file_path in analysis_file_paths:
    dest_path = shared_results_dir / file_path.name
    
    transfer_dto = FileTransferOperationDto(
        source_location="laptop-storage",
        source_path=str(file_path),
        dest_location="shared-data", 
        dest_path=str(dest_path),
        overwrite=True,
        verify_checksum=True,
        metadata={
            'simulation_id': 'sarah-historical-2020',
            'file_type': 'analysis_result',
            'generated_by': 'analysis_temperature_trends.py',
            'purpose': 'thesis_chapter_3'
        }
    )
    transfer_operations.append(transfer_dto)

print(f"📦 Prepared {len(transfer_operations)} file transfer operations")

# Execute transfers (simulated - in real usage these would be async operations)
successful_transfers = 0
total_bytes = 0

for i, transfer_dto in enumerate(transfer_operations, 1):
    try:
        # Simulate transfer by copying file
        source_path = Path(transfer_dto.source_path)
        dest_path = Path(transfer_dto.dest_path)
        
        # Ensure destination directory exists
        dest_path.parent.mkdir(parents=True, exist_ok=True)
        
        # Copy file
        import shutil
        shutil.copy2(source_path, dest_path)
        
        # Track progress
        file_size = source_path.stat().st_size
        total_bytes += file_size
        successful_transfers += 1
        
        progress = (i / len(transfer_operations)) * 100
        print(f"  ⬆️  [{progress:5.1f}%] {source_path.name} → shared-data ({file_size:,} bytes)")
        
    except Exception as e:
        print(f"  ❌ Failed to transfer {source_path.name}: {e}")

total_mb = total_bytes / 1024 / 1024
print(f"\n✅ Transfer complete! {successful_transfers}/{len(transfer_operations)} files transferred ({total_mb:.2f} MB)")

# Verify transfers
print("\n🔍 Verifying shared space contents:")
for file_path in shared_results_dir.rglob('*'):
    if file_path.is_file():
        size_kb = file_path.stat().st_size / 1024
        print(f"  📄 {file_path.name} ({size_kb:.1f} KB)")

## Step 5: Data Discovery and Organization

Sarah can now easily find and track her data across all storage locations.

In [None]:
# List all simulations
print("🔬 Sarah's Simulation Experiments:")
print("=" * 50)

sim_list_result = simulation_service.list_simulations()
for sim in sim_list_result.simulations:
    status = sim.attrs.get('status', 'unknown')
    purpose = sim.attrs.get('purpose', 'general')
    model = sim.attrs.get('model', 'unknown')
    
    status_emoji = {'active': '🟢', 'planning': '🟡', 'completed': '✅', 'archived': '📦'}.get(status, '⚪')
    
    print(f"\n{status_emoji} {sim.simulation_id}")
    print(f"   Model: {model}")
    print(f"   Purpose: {purpose.replace('_', ' ').title()}")
    print(f"   Status: {status.title()}")
    
    # Show key attributes
    if sim.attrs:
        key_attrs = ['experiment', 'year', 'resolution', 'variables']
        attr_display = []
        for attr in key_attrs:
            if attr in sim.attrs:
                attr_display.append(f"{attr}={sim.attrs[attr]}")
        if attr_display:
            print(f"   Details: {', '.join(attr_display)}")

print("\n" + "=" * 50)

In [None]:
# List all storage locations
print("\n📍 Sarah's Storage Locations:")
print("=" * 50)

loc_list_result = location_service.list_locations()
for loc in loc_list_result.locations:
    # Get location kinds
    kinds = ', '.join(loc.kinds) if loc.kinds else 'Unknown'
    
    # Determine icon based on kinds
    if 'DISK' in kinds:
        icon = '💽'
    elif 'COMPUTE' in kinds:
        icon = '🖥️'
    elif 'TAPE' in kinds:
        icon = '📼'
    elif 'FILESERVER' in kinds:
        icon = '🗄️'
    else:
        icon = '📁'
    
    print(f"\n{icon} {loc.name}")
    print(f"   Protocol: {loc.protocol}")
    print(f"   Types: {kinds}")
    print(f"   Path: {loc.path or 'Not specified'}")
    
    # Show additional config if available
    if hasattr(loc, 'additional_config') and loc.additional_config:
        description = loc.additional_config.get('description')
        if description:
            print(f"   Description: {description}")

print("\n" + "=" * 50)

## Step 6: Real-World CLI Usage

In practice, Sarah would use the Tellus CLI commands for most operations. Here are the equivalent commands for what we've done in this notebook:

In [None]:
# Display CLI command equivalents
cli_commands = {
    "📍 Location Management": [
        "# Create storage locations",
        "tellus location create laptop-storage --kind DISK --protocol file --path /home/sarah/climate_data",
        "tellus location create university-cluster --kind COMPUTE,DISK --protocol ssh --path /scratch/sarah",
        "tellus location create archive-storage --kind TAPE,DISK --protocol file --path /archive/sarah",
        "",
        "# List all locations", 
        "tellus location list",
        "",
        "# Show detailed location info",
        "tellus location show laptop-storage"
    ],
    
    "🔬 Simulation Management": [
        "# Create a new simulation",
        "tellus simulation create sarah-historical-2020 --model CESM2 \\",
        "  --attr experiment=historical --attr year=2020 --attr purpose=thesis_chapter_3",
        "",
        "# List all simulations",
        "tellus simulation list",
        "",
        "# Show simulation details",
        "tellus simulation show sarah-historical-2020",
        "",
        "# Associate simulation with storage location",
        "tellus simulation add-location sarah-historical-2020 laptop-storage"
    ],
    
    "📦 File Transfer Operations": [
        "# Transfer single file",
        "tellus transfer file temperature_trends.png \\",
        "  --source laptop-storage --dest shared-data \\",
        "  --dest-path sarah/thesis_chapter_3/results/",
        "",
        "# Transfer multiple files",
        "tellus transfer batch analysis_output/ \\",
        "  --source laptop-storage --dest shared-data \\",
        "  --pattern '*.png,*.nc' --verify-checksum",
        "",
        "# Monitor transfer progress",
        "tellus progress list-operations",
        "tellus progress monitor <operation-id>"
    ],
    
    "🎯 Workflow Integration": [
        "# Add Snakemake workflow to simulation",
        "tellus simulation add-snakemake sarah-historical-2020 \\",
        "  analysis_workflow workflows/temperature_analysis.smk",
        "",
        "# Archive completed simulation",
        "tellus archive create sarah-historical-2020-results \\",
        "  --simulation sarah-historical-2020 --location archive-storage",
        "",
        "# Extract archived data when needed",
        "tellus archive extract sarah-historical-2020-results \\",
        "  --dest laptop-storage --simulation sarah-historical-2020"
    ]
}

print("💻 Equivalent CLI Commands:")
print("=" * 60)

for section, commands in cli_commands.items():
    print(f"\n{section}")
    print("-" * 40)
    for cmd in commands:
        if cmd.startswith("#"):
            print(f"\n{cmd}")
        elif cmd == "":
            print()
        else:
            print(f"$ {cmd}")

print("\n" + "=" * 60)

## Summary: Sarah's Tellus Workflow

Through this tutorial, Sarah has set up a complete climate data management system with Tellus:

### ✅ What Sarah Accomplished:

1. **Configured Multiple Storage Locations**
   - Local laptop storage for active analysis
   - University HPC cluster for computation
   - Archive storage for long-term preservation
   - Shared space for collaboration

2. **Organized Research into Simulations**
   - Historical 2020 analysis (thesis chapter 3)
   - CO2 sensitivity experiments (methodology paper)
   - Rich metadata and context for each experiment

3. **Established Data Flow Patterns**
   - Path templating for organized data layout
   - Location associations with context-aware routing
   - Automated file transfer with progress tracking

4. **Enabled Collaboration**
   - Shared space with advisor access
   - Structured result organization
   - Metadata for data provenance

### 🚀 Next Steps for Sarah:

- **Automate Workflows**: Integrate with Snakemake for analysis pipelines
- **Scale Operations**: Use batch transfers for large datasets
- **Archive Management**: Create compressed archives for completed experiments
- **Team Collaboration**: Share location configurations with lab members
- **Monitoring**: Set up alerts for long-running transfers

### 💡 Key Benefits:

- **Centralized Management**: Single interface for all storage locations
- **Automated Organization**: Path templating reduces manual file management
- **Progress Tracking**: Monitor long-running operations
- **Collaboration Ready**: Easy sharing with team members
- **Scalable**: Works from personal projects to large research collaborations

In [None]:
# Cleanup temporary environment
import shutil

try:
    shutil.rmtree(base_dir)
    print(f"🧹 Cleaned up tutorial environment: {base_dir}")
except Exception as e:
    print(f"⚠️  Could not clean up {base_dir}: {e}")

print("\n🎉 Tutorial complete! Sarah is ready to manage her climate data with Tellus.")