# Tutorial 1: Basic Archive Operations

Welcome to the Tellus Archive System! This tutorial will teach you the fundamentals of archiving your Earth System Model simulations.

## What You'll Learn
- Why archiving is important for climate modeling
- How to create your first archive
- Basic archive management commands
- Understanding archive metadata

## Prerequisites
- A completed simulation directory (we'll use a sample CESM run)
- Basic familiarity with command line tools

## Real-World Context: Why Archive?

Imagine you've just completed a 10-year CESM simulation. Your output directory contains:
- 120 monthly output files (10 years × 12 months)
- Configuration files and namelists
- Log files and diagnostics
- Restart files for continuing the simulation

Without proper archiving, you might:
- Lose data when your compute allocation expires
- Struggle to find specific files months later
- Duplicate data across multiple storage systems
- Have difficulty sharing results with collaborators

The Tellus Archive System solves these problems by creating organized, compressed archives with rich metadata.

## Step 1: Understanding Your Simulation Directory

Before archiving, let's examine a typical CESM simulation structure:

In [1]:
# Let's look at our sample simulation directory
import os
from pathlib import Path

# Sample CESM simulation structure
simulation_path = "/data/cesm_runs/b.e20.BHIST.f19_g17.20thC.2005_2015"

print("📁 Sample CESM Simulation Structure:")
print(f"📂 {simulation_path}/")
print("  ├── 📂 atm/hist/           # Atmospheric output files")
print("  │   ├── b.e20.cam.h0.2005-01.nc")
print("  │   ├── b.e20.cam.h0.2005-02.nc")
print("  │   └── ... (120 monthly files)")
print("  ├── 📂 ocn/hist/           # Ocean output files")
print("  │   ├── b.e20.pop.h.2005-01.nc")
print("  │   └── ...")
print("  ├── 📂 run/                # Configuration and logs")
print("  │   ├── cesm_in")
print("  │   ├── user_nl_cam")
print("  │   └── cesm.log")
print("  └── 📂 rest/               # Restart files")
print("      ├── 2010-01-01-00000/")
print("      └── 2015-01-01-00000/")

📁 Sample CESM Simulation Structure:
📂 /data/cesm_runs/b.e20.BHIST.f19_g17.20thC.2005_2015/
  ├── 📂 atm/hist/           # Atmospheric output files
  │   ├── b.e20.cam.h0.2005-01.nc
  │   ├── b.e20.cam.h0.2005-02.nc
  │   └── ... (120 monthly files)
  ├── 📂 ocn/hist/           # Ocean output files
  │   ├── b.e20.pop.h.2005-01.nc
  │   └── ...
  ├── 📂 run/                # Configuration and logs
  │   ├── cesm_in
  │   ├── user_nl_cam
  │   └── cesm.log
  └── 📂 rest/               # Restart files
      ├── 2010-01-01-00000/
      └── 2015-01-01-00000/


## Step 2: Your First Archive

Let's create a simple archive of our entire simulation. This is the most basic operation - think of it as creating a "backup copy" of your simulation.

In [3]:
# Enable the new archive service
import os
os.environ['TELLUS_USE_NEW_ARCHIVE_SERVICE'] = 'true'

# Create our first archive
!pixi run tellus archive create cesm_2005_2015_complete {simulation_path} \
    --simulation cesm_b20_historical \
    --description "Complete CESM historical simulation 2005-2015"

zsh:1: command not found: pixi


### Understanding the Command

Let's break down what just happened:

- `tellus archive create` - The main command to create an archive
- `cesm_2005_2015_complete` - A unique name for your archive (like a label)
- `{simulation_path}` - The directory containing your simulation data. Note that Jupyter replaces this for you.
- `--simulation cesm_b20_historical` - Links this archive to a specific simulation ID
- `--description` - Human-readable description for future reference

**What's happening behind the scenes:**
1. Tellus scans all files in your simulation directory
2. It automatically classifies files (output data, configs, logs, etc.)
3. Creates a compressed tarball with all your data
4. Generates metadata describing the archive contents
5. Stores everything in your configured archive location

## Step 3: Viewing Your Archives

Now let's see what archives we have:

In [None]:
# List all your archives
!tellus archive list

This command shows you:
- Archive names (your unique identifiers)
- When they were created
- What simulation they contain
- How much data is stored
- Where they're located

## Step 4: Examining Archive Details

Let's look at the details of our archive:

In [None]:
# Show detailed information about our archive
!tellus archive show cesm_2005_2015_complete

### Understanding Archive Metadata

The detailed view shows you:

**📋 Basic Information:**
- Archive ID and creation time
- Associated simulation
- Storage location and size

**📊 File Inventory:**
- Total number of files archived
- Breakdown by content type (output, input, logs, etc.)
- Size distribution

**🔍 Content Classification:**
Tellus automatically identified different types of files:
- **OUTPUT**: Your primary simulation results (.nc files)
- **CONFIG**: Configuration files (namelists, input files)
- **LOG**: Log files and diagnostic output
- **INTERMEDIATE**: Restart files and checkpoints

This classification will be crucial for selective extraction later!

## Step 5: Creating Multiple Archives (Real-World Scenario)

In practice, you often want separate archives for different purposes. Let's create a few more targeted archives:

In [None]:
# Archive just the configuration files (small, critical data)
!tellus archive create cesm_2005_2015_configs {simulation_path} \
    --simulation cesm_b20_historical \
    --content-types config,input \
    --description "Configuration and input files only"

# Archive just the atmospheric output (might be what you need most often)
!tellus archive create cesm_2005_2015_atm_output {simulation_path} \
    --simulation cesm_b20_historical \
    --patterns "atm/hist/*.nc" \
    --description "Atmospheric output files only"

### Why Create Multiple Archives?

**Storage Strategy**: 
- Put critical configs on expensive but reliable storage
- Put large output data on cheaper bulk storage
- Keep frequently accessed data on fast storage

**Access Patterns**:
- Collaborators might only need output data, not logs
- You might want to restart simulations (need configs + restart files)
- Publications might only require specific variables

**Transfer Efficiency**:
- Smaller archives transfer faster
- You can download just what you need

## Step 6: Check Your Archive List Again

Let's see all our archives now:

In [None]:
# List all archives to see our strategy in action
!tellus archive list

Notice how you now have different archives for different purposes, each with different sizes reflecting their contents.

## Decision Guide: When to Use Different Archive Strategies

```
📋 COMPLETE ARCHIVE
✅ Use when: 
   - Simulation is finished and you want everything preserved
   - Moving to long-term storage
   - Sharing complete dataset with collaborators
❌ Avoid when: 
   - You have limited storage space
   - You only need specific outputs

📋 SELECTIVE ARCHIVES
✅ Use when:
   - You want faster access to specific data types
   - Different storage policies for different data
   - Collaborators need only certain files
❌ Avoid when:
   - You're unsure what you'll need later
   - Archive management overhead is a concern
```

## Troubleshooting Common Issues

### Problem: "Archive already exists"
**Solution**: Use a different archive name or add `--force` to overwrite

### Problem: "No files found matching pattern"
**Solution**: Check your file patterns. Use `ls` to verify files exist:
```bash
ls /path/to/simulation/atm/hist/*.nc
```

### Problem: "Insufficient storage space"
**Solution**: 
- Use selective archiving with `--content-types` or `--patterns`
- Check available space with `tellus archive cache status`
- Configure a different storage location

### Problem: Archive seems too small/large
**Solution**: Use `tellus archive show <name>` to see exactly what was included

## Summary and Next Steps

🎉 **Congratulations!** You've learned the basics of archive creation and management.

**What you accomplished:**
- Created your first complete simulation archive
- Learned to create selective archives for different purposes
- Understood archive metadata and file classification
- Developed a strategy for organizing multiple archives

**In the next tutorial**, you'll learn:
- How Tellus classifies different types of Earth science files
- Advanced filtering by content type and importance
- Model-specific patterns for CESM, ICON, WRF, and other models
- Creating archives optimized for specific use cases

**Practice Exercise**: 
Try creating archives for your own simulation data using different strategies:
1. A complete archive for long-term storage
2. A config-only archive for reproduction
3. An output-only archive for analysis

---
*Next: [Tutorial 2: Content Classification and Selective Archiving](tutorial-2-content-classification.ipynb)*