Skip to content

Backup Recovery

Temp edited this page Oct 3, 2025 · 2 revisions

Backup & Recovery Tools

Backup planning, restore validation, and schedule optimization for mission-critical PostgreSQL databases.


📊 Overview

4 enterprise backup tools for comprehensive backup and recovery planning:

Tool Purpose Category
backup_logical Plan logical backups (pg_dump) Backup Planning
backup_physical Assess physical backup readiness Backup Planning
restore_validate Validate backup restore procedures Recovery Validation
backup_schedule_optimize Optimize backup schedules Backup Strategy

🔧 Tool Details

backup_logical

Analyze database for logical backup planning using pg_dump. Provides size estimates, recommended options, and timing predictions.

Parameters:

  • include_schema (boolean, optional): Include schema-only analysis
  • include_data (boolean, optional): Include data-only analysis
  • table_filter (string, optional): Filter by specific table pattern

Returns:

  • database_info: Database name, size, and statistics
  • size_analysis: Expected backup size and compression estimates
  • timing_estimates: Backup duration predictions
  • recommended_options: Suggested pg_dump command options
  • warnings: Potential issues to address before backup

Example:

result = backup_logical(
    include_schema=True,
    include_data=True,
    table_filter="orders%"
)
# Returns: {
#   "database_info": {
#     "database": "production_db",
#     "total_size_bytes": 50000000000,
#     "total_size_gb": 46.57,
#     "table_count": 125,
#     "total_rows": 15000000
#   },
#   "size_analysis": {
#     "estimated_uncompressed_bytes": 48000000000,
#     "estimated_uncompressed_gb": 44.7,
#     "estimated_compressed_bytes": 12000000000,
#     "estimated_compressed_gb": 11.2,
#     "compression_ratio_percent": 75.0,
#     "largest_tables": [
#       {"table": "orders", "size_mb": 12500, "row_count": 5000000},
#       {"table": "order_items", "size_mb": 8900, "row_count": 8000000}
#     ]
#   },
#   "timing_estimates": {
#     "estimated_duration_minutes": 45,
#     "estimated_start_time": "02:00",
#     "estimated_end_time": "02:45",
#     "backup_window_required_minutes": 60
#   },
#   "recommended_options": {
#     "format": "custom",
#     "compression": "9",
#     "parallel_jobs": 4,
#     "sample_command": "pg_dump -Fc -Z9 -j4 -f backup.dump production_db"
#   },
#   "warnings": [
#     {
#       "severity": "medium",
#       "issue": "Large tables detected",
#       "recommendation": "Consider parallel dumps for tables > 1GB"
#     }
#   ]
# }

Use Cases:

  • Backup size estimation
  • Backup window planning
  • pg_dump command optimization
  • Cost estimation for backup storage

backup_physical

Assess readiness for physical backups (pg_basebackup, PITR). Checks WAL archiving, replication slots, and configuration.

Parameters:

  • check_wal_archiving (boolean, optional): Check WAL archiving configuration
  • check_replication_slots (boolean, optional): Check replication slot status

Returns:

  • backup_readiness: Overall readiness assessment
  • wal_status: WAL archiving configuration and status
  • replication_slots: Active and inactive slots
  • configuration: Relevant PostgreSQL configuration
  • recommendations: Steps to improve backup readiness

Example:

result = backup_physical(
    check_wal_archiving=True,
    check_replication_slots=True
)
# Returns: {
#   "backup_readiness": {
#     "overall_status": "ready",
#     "wal_archiving": "enabled",
#     "replication_slots": "healthy",
#     "continuous_archiving": True
#   },
#   "wal_status": {
#     "wal_level": "replica",
#     "archive_mode": "on",
#     "archive_command": "cp %p /mnt/wal_archive/%f",
#     "current_wal_lsn": "0/5A2F3C0",
#     "last_archived_wal": "000000010000000000000059",
#     "archiving_lag_seconds": 2.5
#   },
#   "replication_slots": {
#     "total_count": 2,
#     "active_count": 2,
#     "inactive_count": 0,
#     "slots": [
#       {
#         "slot_name": "replica1_slot",
#         "slot_type": "physical",
#         "active": True,
#         "restart_lsn": "0/5A2F000",
#         "confirmed_flush_lsn": null
#       }
#     ]
#   },
#   "configuration": {
#     "max_wal_senders": 10,
#     "max_replication_slots": 10,
#     "wal_keep_size": "1024MB",
#     "wal_sender_timeout": "60s"
#   },
#   "recommendations": [
#     {
#       "priority": "info",
#       "recommendation": "WAL archiving properly configured",
#       "action": "None required"
#     }
#   ]
# }

Use Cases:

  • PITR setup verification
  • Replication monitoring
  • Backup infrastructure health checks
  • Disaster recovery planning

restore_validate

Validate backup restore procedures and estimate recovery times.

Parameters:

  • backup_type (string, required): Type of backup (logical or physical)
  • estimated_backup_size_gb (number, optional): Estimated backup size in GB
  • parallel_restore_jobs (integer, optional): Number of parallel jobs for restore

Returns:

  • validation_results: Restore procedure validation
  • timing_estimates: Recovery time objectives (RTO)
  • requirements: System requirements for restore
  • recommendations: Best practices for restore procedures

Example:

# Validate logical restore
result = restore_validate(
    backup_type="logical",
    estimated_backup_size_gb=45.0,
    parallel_restore_jobs=4
)
# Returns: {
#   "validation_results": {
#     "backup_type": "logical",
#     "restore_method": "pg_restore",
#     "parallel_support": True,
#     "parallel_jobs": 4
#   },
#   "timing_estimates": {
#     "estimated_restore_minutes": 60,
#     "estimated_index_rebuild_minutes": 20,
#     "total_rto_minutes": 80,
#     "total_rto_hours": 1.33,
#     "backup_size_gb": 45.0
#   },
#   "requirements": {
#     "disk_space_required_gb": 67.5,
#     "buffer_percentage": 50,
#     "temp_space_gb": 22.5,
#     "memory_recommended_gb": 8
#   },
#   "recommendations": [
#     {
#       "priority": "high",
#       "category": "performance",
#       "recommendation": "Use parallel restore with -j4 for 4-core systems",
#       "command": "pg_restore -j4 -d database backup.dump"
#     },
#     {
#       "priority": "medium",
#       "category": "disk_space",
#       "recommendation": "Ensure 67.5 GB free disk space before restore"
#     }
#   ]
# }

# Validate physical restore
result = restore_validate(
    backup_type="physical",
    estimated_backup_size_gb=120.0
)
# Returns: {
#   "validation_results": {
#     "backup_type": "physical",
#     "restore_method": "pg_basebackup + WAL replay",
#     "pitr_capable": True
#   },
#   "timing_estimates": {
#     "estimated_restore_minutes": 30,
#     "estimated_wal_replay_minutes": 10,
#     "total_rto_minutes": 40,
#     "total_rto_hours": 0.67,
#     "backup_size_gb": 120.0
#   },
#   "requirements": {
#     "disk_space_required_gb": 180.0,
#     "buffer_percentage": 50,
#     "wal_archive_space_gb": 60.0,
#     "memory_recommended_gb": 16
#   },
#   "recommendations": [...]
# }

Use Cases:

  • Disaster recovery planning
  • RTO/RPO calculation
  • Restore testing preparation
  • Infrastructure sizing

backup_schedule_optimize

Optimize backup schedules based on database activity patterns and business requirements.

Parameters:

  • full_backup_frequency_hours (integer, required): Desired full backup frequency
  • incremental_backup_frequency_hours (integer, optional): Incremental backup frequency
  • analyze_activity_patterns (boolean, optional): Analyze database activity for optimal timing

Returns:

  • recommended_schedule: Optimized backup schedule
  • activity_analysis: Database activity patterns
  • resource_impact: Expected resource usage
  • retention_recommendations: Backup retention policies

Example:

result = backup_schedule_optimize(
    full_backup_frequency_hours=24,
    incremental_backup_frequency_hours=4,
    analyze_activity_patterns=True
)
# Returns: {
#   "recommended_schedule": {
#     "full_backups": {
#       "frequency_hours": 24,
#       "recommended_start_time": "02:00",
#       "estimated_duration_minutes": 45,
#       "daily_schedule": ["02:00"]
#     },
#     "incremental_backups": {
#       "frequency_hours": 4,
#       "recommended_start_times": ["06:00", "10:00", "14:00", "18:00", "22:00"],
#       "estimated_duration_minutes": 5
#     }
#   },
#   "activity_analysis": {
#     "peak_hours": [9, 10, 11, 14, 15, 16],
#     "low_activity_hours": [1, 2, 3, 4, 5],
#     "recommended_backup_window": "02:00-04:00",
#     "transaction_rate_per_hour": {
#       "peak": 125000,
#       "average": 45000,
#       "low": 5000
#     }
#   },
#   "resource_impact": {
#     "full_backup": {
#       "cpu_impact_percent": 15,
#       "io_impact_percent": 25,
#       "network_bandwidth_mbps": 150
#     },
#     "incremental_backup": {
#       "cpu_impact_percent": 5,
#       "io_impact_percent": 8,
#       "network_bandwidth_mbps": 50
#     }
#   },
#   "retention_recommendations": {
#     "full_backups": {
#       "daily_retention_days": 7,
#       "weekly_retention_weeks": 4,
#       "monthly_retention_months": 12
#     },
#     "incremental_backups": {
#       "retention_days": 7
#     },
#     "estimated_storage_gb": {
#       "per_full_backup": 45,
#       "per_incremental": 2,
#       "total_required": 405
#     }
#   }
# }

Use Cases:

  • Backup strategy optimization
  • RPO/RTO alignment
  • Cost optimization (storage, bandwidth)
  • Compliance planning (retention policies)

🎯 Common Workflows

Initial Backup Planning

# 1. Assess physical backup readiness
physical = backup_physical(
    check_wal_archiving=True,
    check_replication_slots=True
)

# 2. Plan logical backups
logical = backup_logical(
    include_schema=True,
    include_data=True
)

# 3. Optimize schedule
schedule = backup_schedule_optimize(
    full_backup_frequency_hours=24,
    incremental_backup_frequency_hours=4,
    analyze_activity_patterns=True
)

# 4. Validate restore procedures
validate = restore_validate(
    backup_type="logical",
    estimated_backup_size_gb=logical["size_analysis"]["estimated_compressed_gb"],
    parallel_restore_jobs=4
)

Backup Health Check

# 1. Check physical backup infrastructure
physical_status = backup_physical(
    check_wal_archiving=True,
    check_replication_slots=True
)

# 2. Verify logical backup can complete
logical_status = backup_logical(
    include_schema=True,
    include_data=True
)

# 3. Test restore validation
if physical_status["backup_readiness"]["overall_status"] == "ready":
    restore_test = restore_validate(
        backup_type="physical",
        estimated_backup_size_gb=120.0
    )

Disaster Recovery Planning

# 1. Document backup sizes
logical_plan = backup_logical(include_schema=True, include_data=True)
backup_size = logical_plan["size_analysis"]["estimated_compressed_gb"]

# 2. Calculate RTO
restore_plan = restore_validate(
    backup_type="logical",
    estimated_backup_size_gb=backup_size,
    parallel_restore_jobs=4
)
rto_minutes = restore_plan["timing_estimates"]["total_rto_minutes"]

# 3. Determine backup frequency for RPO
schedule = backup_schedule_optimize(
    full_backup_frequency_hours=24,
    incremental_backup_frequency_hours=1,  # 1-hour RPO
    analyze_activity_patterns=True
)

📊 Backup Best Practices

1. The 3-2-1 Rule

  • 3 copies of your data
  • 2 different media types
  • 1 copy offsite

2. Regular Testing

# Test restores monthly
restore_validate(backup_type="logical", estimated_backup_size_gb=50)

# Verify backup integrity weekly
backup_logical(include_schema=True, include_data=True)

3. Monitoring & Alerting

# Use with Monitoring tools
from Monitoring import alert_threshold_set, monitor_real_time

# Monitor backup duration
backup_alert = alert_threshold_set(
    metric_type="transaction_age",
    warning_threshold=1800,   # 30 minutes
    critical_threshold=3600   # 60 minutes
)

4. Retention Policies

Recommended Retention:

  • Daily: 7 days
  • Weekly: 4 weeks
  • Monthly: 12 months
  • Yearly: 7 years (compliance)

5. Documentation

Keep records of:

  • Backup schedules and frequencies
  • Restore procedures (runbooks)
  • Contact information for emergencies
  • Storage locations and credentials

🚨 Troubleshooting

Backup Too Large

# Identify large tables
result = backup_logical(include_data=True)
large_tables = result["size_analysis"]["largest_tables"]

# Consider:
# - Archiving old data
# - Partitioning large tables
# - Separate backup strategy for large tables

WAL Archiving Not Working

result = backup_physical(check_wal_archiving=True)
wal_status = result["wal_status"]

# Check:
# - archive_command syntax
# - Filesystem permissions
# - Disk space on archive location

Restore Too Slow

result = restore_validate(
    backup_type="logical",
    estimated_backup_size_gb=100,
    parallel_restore_jobs=8  # Increase parallelism
)

# Consider:
# - More CPU cores for parallel restore
# - Faster storage (SSD)
# - Restore indexes separately after data

📚 Related Documentation


🔗 External Resources


See Home for more tool categories.

Clone this wiki locally