# Python DevOps Session 2: OS & Process Management

## **Topic 4: OS & Process Management for DevOps**

### **Learning Objectives:**
1. Master file and directory operations using `os`, `pathlib`, and `shutil`
2. Execute external commands safely with `subprocess.run`
3. Manage environment variables programmatically
4. Monitor system resources with `psutil`
5. Implement production-ready process management
6. Build DevOps automation tools with system integration

---

### **What You'll Build:**
- Cross-platform file management utilities
- Safe command execution wrappers
- System health monitoring dashboards
- Process management automation
- Resource alerting systems

## **Section 1: Import Required Libraries**

Before we start working with OS and process management, we need to import the necessary libraries.

In [1]:
# Standard library imports for OS operations
import os
import sys
import platform
from pathlib import Path
import shutil
import subprocess
import tempfile
from datetime import datetime
import time
import json

# Third-party library for system monitoring
import psutil

print(f"Python version: {sys.version}")
print(f"Operating System: {platform.system()} {platform.release()}")
print(f"psutil version: {psutil.__version__}")
print(f"\nAll libraries imported successfully!")

Python version: 3.11.4 (tags/v3.11.4:d2340ef, Jun  7 2023, 05:45:37) [MSC v.1934 64 bit (AMD64)]
Operating System: Windows 10
psutil version: 5.9.6

All libraries imported successfully!


### **Library Overview:**

- **`os`**: Low-level operating system interface (legacy, but still widely used)
- **`pathlib`**: Modern, object-oriented path handling (recommended for new code)
- **`shutil`**: High-level file operations (copy, move, archive)
- **`subprocess`**: Spawn and manage external processes safely
- **`psutil`**: Cross-platform system and process utilities (requires installation: `pip install psutil`)

**Note:** If `psutil` is not installed, run: `!pip install psutil` in a code cell.

---

## **Section 2: Modern Path Handling with pathlib**

### **Theory: Why pathlib?**

The `pathlib` module provides an object-oriented approach to filesystem paths, making code:
- **More readable**: `path / "subdir" / "file.txt"` vs `os.path.join(path, "subdir", "file.txt")`
- **Cross-platform**: Automatically handles Windows (`\`) vs Unix (`/`) separators
- **Safer**: Path objects are immutable and provide validation
- **More powerful**: Built-in methods for common operations

**Key Concepts:**
- `Path()` creates a path object
- `/` operator joins paths (Pythonic!)
- `.exists()`, `.is_file()`, `.is_dir()` check path types
- `.parent`, `.name`, `.stem`, `.suffix` access path components

### **Example 2.1: Basic Path Operations**

In [2]:
# Get current working directory
current_dir = Path.cwd()
print(f"Current directory: {current_dir}")

# Get home directory
home_dir = Path.home()
print(f"Home directory: {home_dir}")

# Create a path using the / operator (cross-platform)
config_path = current_dir / "config" / "app.yaml"
print(f"\nConfig path: {config_path}")

# Access path components
print(f"\nPath components:")
print(f"  Parent: {config_path.parent}")
print(f"  Name: {config_path.name}")
print(f"  Stem: {config_path.stem}")
print(f"  Suffix: {config_path.suffix}")
print(f"  Parts: {config_path.parts}")

# Check if path exists
print(f"\nPath exists: {config_path.exists()}")

# Create a path with absolute resolution
absolute_path = config_path.resolve()
print(f"Absolute path: {absolute_path}")

Current directory: c:\Users\hmeln\source\repos\Python-DevOps-Sessions
Home directory: C:\Users\hmeln

Config path: c:\Users\hmeln\source\repos\Python-DevOps-Sessions\config\app.yaml

Path components:
  Parent: c:\Users\hmeln\source\repos\Python-DevOps-Sessions\config
  Name: app.yaml
  Stem: app
  Suffix: .yaml
  Parts: ('c:\\', 'Users', 'hmeln', 'source', 'repos', 'Python-DevOps-Sessions', 'config', 'app.yaml')

Path exists: False
Absolute path: C:\Users\hmeln\source\repos\Python-DevOps-Sessions\config\app.yaml


### **Explanation:**

**Path Creation:**
- `Path.cwd()` returns the current working directory as a Path object
- `Path.home()` returns the user's home directory
- The `/` operator joins path components in a platform-independent way

**Path Components:**
- **`parent`**: The directory containing this path
- **`name`**: The final component (filename with extension)
- **`stem`**: Filename without extension
- **`suffix`**: File extension including the dot
- **`parts`**: Tuple of all path components

**Path Resolution:**
- `resolve()` makes the path absolute and resolves any symlinks
- `exists()` checks if the path exists on the filesystem

**DevOps Use Case:** Building configuration file paths that work on any OS (Windows, Linux, macOS).

### **Example 2.2: Directory Traversal and File Discovery**

In [3]:
# Create a test directory structure
test_dir = Path("devops_test")
test_dir.mkdir(exist_ok=True)

# Create some test files
(test_dir / "app.py").write_text("print('Hello')")
(test_dir / "config.yaml").write_text("version: 1.0")
(test_dir / "README.md").write_text("# DevOps Project")

# Create a subdirectory with files
logs_dir = test_dir / "logs"
logs_dir.mkdir(exist_ok=True)
(logs_dir / "app.log").write_text("Log entry 1")
(logs_dir / "error.log").write_text("Error entry 1")

print("Test directory structure created\n")

# List all files in directory
print("Files in test_dir:")
for item in test_dir.iterdir():
    if item.is_file():
        print(f"  {item.name} ({item.stat().st_size} bytes)")
    elif item.is_dir():
        print(f"  {item.name}/")

# Find all Python files recursively
print("\nAll Python files (recursive):")
for py_file in test_dir.rglob("*.py"):
    print(f"  {py_file.relative_to(test_dir)}")

# Find all log files
print("\nAll log files:")
for log_file in test_dir.rglob("*.log"):
    print(f"  {log_file.relative_to(test_dir)}")

Test directory structure created

Files in test_dir:
  app.py (14 bytes)
  config.yaml (12 bytes)
  logs/
  README.md (16 bytes)

All Python files (recursive):
  app.py

All log files:
  logs\app.log
  logs\error.log


### **Explanation:**

**Directory Creation:**
- `mkdir(exist_ok=True)` creates the directory, doesn't fail if it already exists
- `write_text()` creates a file and writes content in one operation

**Directory Traversal:**
- **`iterdir()`**: Lists direct children (non-recursive)
- **`rglob("*.py")`**: Recursively finds all Python files (r = recursive)
- **`glob("*.log")`**: Non-recursive pattern matching
- **`is_file()` / `is_dir()`**: Check path type

**File Metadata:**
- `stat().st_size` gets file size in bytes
- `relative_to()` creates a relative path from an absolute one

**DevOps Use Case:** Discovering configuration files, finding log files for cleanup, scanning repositories for specific file types.

---

## **Section 3: File Operations with shutil**

### **Theory: High-Level File Operations**

The `shutil` module provides high-level file operations that work on entire files and directory trees:

**Key Operations:**
- **`copy(src, dst)`**: Copy file (metadata not preserved)
- **`copy2(src, dst)`**: Copy file with metadata (timestamps, permissions)
- **`copytree(src, dst)`**: Recursively copy entire directory tree
- **`move(src, dst)`**: Move file or directory
- **`rmtree(path)`**: Recursively delete directory tree
- **`disk_usage(path)`**: Get disk usage statistics

**Safety Considerations:**
- Always check if destination exists to avoid overwriting
- Use `ignore_errors=False` to catch deletion errors
- Implement backup strategies before destructive operations

### **Example 3.1: Copying and Moving Files**

In [4]:
# Create backup directory
backup_dir = Path("backups")
backup_dir.mkdir(exist_ok=True)

# Copy a single file
source_file = test_dir / "config.yaml"
backup_file = backup_dir / "config.yaml.bak"

shutil.copy2(source_file, backup_file)
print(f"Copied {source_file.name} to {backup_file}")

# Copy entire directory tree
backup_full = Path("backups") / "full_backup"
if backup_full.exists():
    shutil.rmtree(backup_full)

shutil.copytree(test_dir, backup_full)
print(f"Created full backup at {backup_full}")

# Move a file
archive_dir = Path("archive")
archive_dir.mkdir(exist_ok=True)

old_log = logs_dir / "error.log"
archived_log = archive_dir / f"error_{datetime.now().strftime('%Y%m%d')}.log"

shutil.move(str(old_log), str(archived_log))
print(f"Moved {old_log.name} to {archived_log}")

# List backup contents
print("\nBackup directory contents:")
for item in backup_full.rglob("*"):
    if item.is_file():
        print(f"  {item.relative_to(backup_full)}")

Copied config.yaml to backups\config.yaml.bak
Created full backup at backups\full_backup
Moved error.log to archive\error_20251129.log

Backup directory contents:
  app.py
  config.yaml
  README.md
  logs\app.log
  logs\error.log


### **Explanation:**

**File Copying:**
- **`copy2()`** preserves file metadata (modification times, permissions)
- Better than `copy()` for backup operations
- Destination can be a file path or directory

**Directory Copying:**
- **`copytree()`** recursively copies entire directory structure
- Creates destination directory automatically
- Preserves directory structure and file attributes

**File Moving:**
- **`move()`** can move files or directories
- Works across different filesystems
- Converts Path objects to strings for compatibility

**DevOps Use Case:** Automated backup systems, log archival, deployment artifact management, configuration versioning.

### **Example 3.2: Disk Usage and Cleanup**

In [6]:
# Get disk usage for current directory
total, used, free = shutil.disk_usage(".")

print("Disk Usage Statistics:")
print(f"  Total: {total / (1024**3):.2f} GB")
print(f"  Used:  {used / (1024**3):.2f} GB")
print(f"  Free:  {free / (1024**3):.2f} GB")
print(f"  Usage: {(used/total)*100:.1f}%")

# Calculate directory size
def get_directory_size(path: Path) -> int:
    """Calculate total size of all files in directory tree."""
    total_size = 0
    for item in path.rglob("*"):
        if item.is_file():
            total_size += item.stat().st_size
    return total_size

# Check sizes of our test directories
print("\n Directory Sizes:")
for directory in [test_dir, backup_dir, archive_dir]:
    if directory.exists():
        size = get_directory_size(directory)
        print(f"  {directory.name}: {size:,} bytes ({size/1024:.2f} KB)")

# Cleanup old files (demonstration)
print("\n Cleanup Operations:")
# Remove backup directory
if backup_dir.exists():
    shutil.rmtree(backup_dir)
    print(f"  Removed {backup_dir}")

# Remove archive directory
if archive_dir.exists():
    shutil.rmtree(archive_dir)
    print(f"  Removed {archive_dir}")

Disk Usage Statistics:
  Total: 475.74 GB
  Used:  341.77 GB
  Free:  133.97 GB
  Usage: 71.8%

 Directory Sizes:
  devops_test: 53 bytes (0.05 KB)

 Cleanup Operations:


### **Explanation:**

**Disk Usage Monitoring:**
- **`disk_usage(".")`** returns a named tuple `(total, used, free)` in bytes
- Convert bytes to GB: divide by `1024**3` (1024¬≥)
- Useful for alerting when disk space is low

**Directory Size Calculation:**
- Use `rglob("*")` to traverse all files recursively
- Sum up `st_size` from `stat()` for each file
- Directories themselves have minimal size

**Safe Deletion:**
- **`rmtree()`** recursively deletes directories
- No confirmation prompt - use carefully!
- Check `exists()` before deletion to avoid errors

**DevOps Use Case:** Disk space monitoring, cleanup of old build artifacts, log rotation, temporary file management.

---

## **Section 4: Environment Variables Management**

### *Theory: Environment Variables in DevOps**

Environment variables are key-value pairs that configure application behavior without changing code:

**Why Use Environment Variables:**
- **Security**: Store API keys, passwords outside of code
- **Flexibility**: Different configs for dev/staging/production
- **12-Factor App**: Standard practice for cloud-native applications
- **Containerization**: Docker/Kubernetes use env vars extensively

**Best Practices:**
- Never commit secrets to version control
- Use `.env` files for local development (gitignored)
- Provide default values for optional configurations
- Validate required environment variables at startup

### **Example 4.1: Reading and Setting Environment Variables**

In [7]:
# Read common environment variables
print("System Environment Variables:")
print(f"  User: {os.environ.get('USER', os.environ.get('USERNAME', 'Unknown'))}")
print(f"  Home: {os.environ.get('HOME', os.environ.get('USERPROFILE', 'Unknown'))}")
print(f"  Path separator: {os.pathsep}")
print(f"  Line separator: {repr(os.linesep)}")

# Set custom environment variables (for this process only)
os.environ['APP_ENV'] = 'development'
os.environ['APP_PORT'] = '8080'
os.environ['DEBUG'] = 'true'

print("\n Custom Environment Variables:")
print(f"  APP_ENV: {os.environ['APP_ENV']}")
print(f"  APP_PORT: {os.environ['APP_PORT']}")
print(f"  DEBUG: {os.environ['DEBUG']}")

# Safe reading with defaults
def get_config(key: str, default: str = None, required: bool = False) -> str:
    """
    Safely get environment variable with optional default.
    
    Args:
        key: Environment variable name
        default: Default value if not set
        required: Raise error if not set and no default
    
    Returns:
        Environment variable value or default
    """
    value = os.environ.get(key, default)
    
    if required and value is None:
        raise ValueError(f"Required environment variable '{key}' is not set")
    
    return value

# Examples of safe config reading
print("\n Safe Configuration Reading:")
print(f"  Database URL: {get_config('DATABASE_URL', 'sqlite:///local.db')}")
print(f"  API Timeout: {get_config('API_TIMEOUT', '30')}")
print(f"  Log Level: {get_config('LOG_LEVEL', 'INFO')}")

# This would raise an error (uncomment to test):
# print(f"  API Key: {get_config('API_KEY', required=True)}")

System Environment Variables:
  User: hmeln
  Home: C:\Users\hmeln
  Path separator: ;
  Line separator: '\r\n'

 Custom Environment Variables:
  APP_ENV: development
  APP_PORT: 8080
  DEBUG: true

 Safe Configuration Reading:
  Database URL: sqlite:///local.db
  API Timeout: 30
  Log Level: INFO


### **Explanation:**

**Reading Environment Variables:**
- **`os.environ`** is a dictionary-like object containing all environment variables
- **`os.environ.get(key, default)`** safely reads with fallback value
- Cross-platform variables differ (e.g., `USER` vs `USERNAME`)

**Setting Environment Variables:**
- Setting via `os.environ['KEY'] = 'value'` affects only current process
- Child processes inherit these variables
- Changes don't persist after script ends

**Safe Configuration Pattern:**
- Always provide sensible defaults for optional settings
- Validate required configurations at startup
- Raise clear errors for missing required variables
- Type conversion should happen after reading (e.g., `int(os.environ.get('PORT', '8080'))`)

**DevOps Use Case:** Configuration management, database connection strings, API keys, feature flags, deployment environment detection.

---

## **Section 5: Safe Process Execution with subprocess**

### **Theory: Running External Commands Safely**

The `subprocess` module allows Python to spawn new processes and execute system commands:

**Why subprocess.run() is Preferred:**
- **Modern API**: Replaces older `os.system()`, `subprocess.call()`
- **Safer**: Better security controls and error handling
- **More control**: Capture output, set timeouts, handle errors
- **Cross-platform**: Works on Windows, Linux, macOS

**Security Best Practices:**
- **Never use `shell=True`** with user input (command injection risk)
- Use **list of arguments** instead of string commands
- Set **timeouts** to prevent hanging processes
- **Validate input** before passing to external commands
- Check **return codes** for errors

### **Example 5.1: Basic Command Execution**

In [8]:
# Simple command execution (cross-platform)
# Use 'python' instead of OS-specific commands

# Example 1: Get Python version
result = subprocess.run(
    [sys.executable, '--version'],
    capture_output=True,
    text=True,
    timeout=5
)

print(" Command Execution Result:")
print(f"  Return code: {result.returncode}")
print(f"  Output: {result.stdout.strip() or result.stderr.strip()}")

# Example 2: List directory contents (cross-platform using Python)
result = subprocess.run(
    [sys.executable, '-c', 'import os; print("\\n".join(os.listdir(".")))'],
    capture_output=True,
    text=True,
    timeout=5
)

print("\n Directory Listing (via subprocess):")
if result.returncode == 0:
    for line in result.stdout.strip().split('\n')[:10]:  # First 10 items
        print(f"  {line}")
else:
    print(f"  Error: {result.stderr}")

# Example 3: Platform-specific command (with safety check)
if platform.system() == "Windows":
    cmd = ['cmd', '/c', 'echo', 'Hello from Windows']
else:
    cmd = ['echo', 'Hello from Unix']

result = subprocess.run(cmd, capture_output=True, text=True)
print(f"\n Platform message: {result.stdout.strip()}")

 Command Execution Result:
  Return code: 0
  Output: Python 3.11.4

 Directory Listing (via subprocess):
  .git
  .gitignore
  custom_stopwords.txt
  devops_test
  endpoints.json
  health_check_results.csv
  health_check_results.json
  README.md
  sample_config.yml
  sample_web.log

 Platform message: "Hello from Windows"


### **Explanation:**

**subprocess.run() Parameters:**
- **First argument**: List of command and arguments `['command', 'arg1', 'arg2']`
- **`capture_output=True`**: Captures stdout and stderr
- **`text=True`**: Returns output as string (not bytes)
- **`timeout=5`**: Kills process if it runs longer than 5 seconds

**Return Code:**
- **`returncode == 0`**: Success
- **`returncode != 0`**: Error occurred
- Always check return code before using output

**Security:**
-  **SAFE**: `['git', 'status']` - List of arguments
-  **UNSAFE**: `'git status', shell=True` - String with shell=True
- String commands with `shell=True` are vulnerable to command injection

**Cross-Platform Considerations:**
- Use `sys.executable` to get current Python interpreter path
- Check `platform.system()` for OS-specific commands
- Prefer Python-based solutions over shell commands when possible

**DevOps Use Case:** Running git commands, executing build scripts, calling deployment tools, health checks via curl/wget.

### **Example 5.2: Error Handling and Timeouts**

In [9]:
def run_command_safely(cmd: list, timeout: int = 30) -> dict:
    """
    Execute command with comprehensive error handling.
    
    Args:
        cmd: List of command and arguments
        timeout: Maximum execution time in seconds
    
    Returns:
        Dictionary with execution results
    """
    result_info = {
        'success': False,
        'returncode': None,
        'stdout': '',
        'stderr': '',
        'error': None
    }
    
    try:
        result = subprocess.run(
            cmd,
            capture_output=True,
            text=True,
            timeout=timeout,
            check=False  # Don't raise exception on non-zero return code
        )
        
        result_info['returncode'] = result.returncode
        result_info['stdout'] = result.stdout
        result_info['stderr'] = result.stderr
        result_info['success'] = (result.returncode == 0)
        
    except subprocess.TimeoutExpired:
        result_info['error'] = f"Command timed out after {timeout} seconds"
    except FileNotFoundError:
        result_info['error'] = f"Command not found: {cmd[0]}"
    except Exception as e:
        result_info['error'] = f"Unexpected error: {str(e)}"
    
    return result_info

# Test 1: Successful command
print(" Test 1: Successful command")
result = run_command_safely([sys.executable, '--version'])
print(f"  Success: {result['success']}")
print(f"  Output: {result['stdout'].strip() or result['stderr'].strip()}")

# Test 2: Command with error
print("\n Test 2: Command with error")
result = run_command_safely([sys.executable, '-c', 'import sys; sys.exit(1)'])
print(f"  Success: {result['success']}")
print(f"  Return code: {result['returncode']}")

# Test 3: Timeout scenario (simulated with sleep)
print("\n Test 3: Timeout handling")
result = run_command_safely(
    [sys.executable, '-c', 'import time; time.sleep(10)'],
    timeout=2
)
print(f"  Success: {result['success']}")
print(f"  Error: {result['error']}")

# Test 4: Non-existent command
print("\n Test 4: Non-existent command")
result = run_command_safely(['nonexistent-command-xyz'])
print(f"  Success: {result['success']}")
print(f"  Error: {result['error']}")

 Test 1: Successful command
  Success: True
  Output: Python 3.11.4

 Test 2: Command with error
  Success: False
  Return code: 1

 Test 3: Timeout handling
  Success: False
  Error: Command timed out after 2 seconds

 Test 4: Non-existent command
  Success: False
  Error: Command not found: nonexistent-command-xyz
  Success: False
  Error: Command timed out after 2 seconds

 Test 4: Non-existent command
  Success: False
  Error: Command not found: nonexistent-command-xyz


### **Explanation:**

**Comprehensive Error Handling:**
- **`TimeoutExpired`**: Raised when process exceeds timeout
- **`FileNotFoundError`**: Command/executable doesn't exist
- **`check=False`**: Don't automatically raise exception on non-zero return code
- **Generic Exception**: Catches unexpected errors

**Production-Ready Pattern:**
- Return structured dictionary with all relevant information
- Separate success status from error details
- Log both stdout and stderr for debugging
- Provide clear error messages

**Timeout Importance:**
- Prevents hanging processes in production
- Essential for health checks and external API calls
- Set based on expected execution time + buffer
- Failed timeout should trigger alerts

**DevOps Use Case:** Deployment scripts, health checks, integration tests, CI/CD pipeline steps, automated rollbacks.

---

## **Section 6: System Resource Monitoring with psutil**

### **Theory: System Monitoring Fundamentals**

`psutil` (Python System and Process Utilities) is a cross-platform library for:
- **CPU monitoring**: Usage, frequency, core count
- **Memory tracking**: RAM usage, swap, virtual memory
- **Disk operations**: I/O statistics, partition info
- **Network monitoring**: Connections, traffic statistics
- **Process management**: List, control, and monitor processes

**Why psutil in DevOps:**
- **Proactive monitoring**: Detect issues before they cause outages
- **Capacity planning**: Understand resource usage patterns
- **Auto-scaling triggers**: Base scaling decisions on real metrics
- **Performance optimization**: Identify bottlenecks
- **Cross-platform**: Same API for Windows, Linux, macOS

### **Example 6.1: CPU and Memory Monitoring**

In [10]:
# CPU Information
print(" CPU Information:")
print(f"  Physical cores: {psutil.cpu_count(logical=False)}")
print(f"  Logical cores: {psutil.cpu_count(logical=True)}")

# Get CPU frequency (if available)
try:
    cpu_freq = psutil.cpu_freq()
    if cpu_freq:
        print(f"  Current frequency: {cpu_freq.current:.2f} MHz")
        print(f"  Min frequency: {cpu_freq.min:.2f} MHz")
        print(f"  Max frequency: {cpu_freq.max:.2f} MHz")
except:
    print("  CPU frequency: Not available on this system")

# CPU usage (interval=1 means measure over 1 second)
cpu_percent = psutil.cpu_percent(interval=1)
print(f"  CPU usage: {cpu_percent}%")

# Per-core CPU usage
cpu_per_core = psutil.cpu_percent(interval=1, percpu=True)
print(f"  Per-core usage: {[f'{x}%' for x in cpu_per_core]}")

# Memory Information
print("\n Memory Information:")
memory = psutil.virtual_memory()
print(f"  Total: {memory.total / (1024**3):.2f} GB")
print(f"  Available: {memory.available / (1024**3):.2f} GB")
print(f"  Used: {memory.used / (1024**3):.2f} GB")
print(f"  Percentage: {memory.percent}%")

# Swap memory
swap = psutil.swap_memory()
print(f"\n Swap Memory:")
print(f"  Total: {swap.total / (1024**3):.2f} GB")
print(f"  Used: {swap.used / (1024**3):.2f} GB")
print(f"  Percentage: {swap.percent}%")

# Boot time
boot_time = datetime.fromtimestamp(psutil.boot_time())
uptime = datetime.now() - boot_time
print(f"\n System Uptime:")
print(f"  Boot time: {boot_time.strftime('%Y-%m-%d %H:%M:%S')}")
print(f"  Uptime: {uptime.days} days, {uptime.seconds//3600} hours")

 CPU Information:
  Physical cores: 6
  Logical cores: 12
  Current frequency: 2592.00 MHz
  Min frequency: 0.00 MHz
  Max frequency: 2592.00 MHz
  CPU usage: 8.2%
  CPU usage: 8.2%
  Per-core usage: ['21.2%', '18.8%', '15.6%', '15.6%', '12.5%', '10.9%', '10.9%', '9.4%', '1.6%', '1.6%', '0.0%', '1.6%']

 Memory Information:
  Total: 31.76 GB
  Available: 9.19 GB
  Used: 22.57 GB
  Percentage: 71.1%

 Swap Memory:
  Total: 36.00 GB
  Used: 0.18 GB
  Percentage: 0.5%

 System Uptime:
  Boot time: 2025-11-29 09:14:57
  Uptime: 0 days, 8 hours
  Per-core usage: ['21.2%', '18.8%', '15.6%', '15.6%', '12.5%', '10.9%', '10.9%', '9.4%', '1.6%', '1.6%', '0.0%', '1.6%']

 Memory Information:
  Total: 31.76 GB
  Available: 9.19 GB
  Used: 22.57 GB
  Percentage: 71.1%

 Swap Memory:
  Total: 36.00 GB
  Used: 0.18 GB
  Percentage: 0.5%

 System Uptime:
  Boot time: 2025-11-29 09:14:57
  Uptime: 0 days, 8 hours


### **Explanation:**

**CPU Monitoring:**
- **Physical cores**: Actual CPU cores (hardware)
- **Logical cores**: Includes hyper-threading (e.g., 4 physical = 8 logical)
- **`cpu_percent(interval=1)`**: Measures usage over 1 second (more accurate)
- **`percpu=True`**: Returns list with usage per core

**Memory Monitoring:**
- **`virtual_memory()`**: Returns RAM statistics
- **`total`**: Total installed RAM
- **`available`**: RAM available for new processes
- **`used`**: RAM currently in use
- **`percent`**: (used / total) * 100

**Swap Memory:**
- Disk space used as additional RAM when physical RAM is full
- High swap usage indicates memory pressure
- Should trigger alerts in production

**System Uptime:**
- `boot_time()` returns timestamp of last system boot
- Useful for detecting unexpected reboots
- Long uptime may indicate missing security patches

**DevOps Use Case:** Resource alerts, capacity planning, performance dashboards, auto-scaling triggers, SLA monitoring.

### **Example 6.2: Disk Usage Monitoring**

In [11]:
# Disk partitions
print(" Disk Partitions:")
partitions = psutil.disk_partitions()
for partition in partitions:
    print(f"\n  Device: {partition.device}")
    print(f"  Mount point: {partition.mountpoint}")
    print(f"  File system: {partition.fstype}")
    
    try:
        usage = psutil.disk_usage(partition.mountpoint)
        print(f"  Total: {usage.total / (1024**3):.2f} GB")
        print(f"  Used: {usage.used / (1024**3):.2f} GB")
        print(f"  Free: {usage.free / (1024**3):.2f} GB")
        print(f"  Usage: {usage.percent}%")
        
        # Alert if disk usage is high
        if usage.percent > 80:
            print(f"  WARNING: Disk usage above 80%!")
    except PermissionError:
        print("  Permission denied")
    except Exception as e:
        print(f"  Error: {e}")

# Disk I/O statistics
print("\n Disk I/O Statistics:")
disk_io = psutil.disk_io_counters()
if disk_io:
    print(f"  Read count: {disk_io.read_count:,}")
    print(f"  Write count: {disk_io.write_count:,}")
    print(f"  Bytes read: {disk_io.read_bytes / (1024**3):.2f} GB")
    print(f"  Bytes written: {disk_io.write_bytes / (1024**3):.2f} GB")

 Disk Partitions:

  Device: C:\
  Mount point: C:\
  File system: NTFS
  Total: 475.74 GB
  Used: 341.77 GB
  Free: 133.97 GB
  Usage: 71.8%

 Disk I/O Statistics:
  Read count: 2,671,096
  Write count: 1,712,762
  Bytes read: 46.86 GB
  Bytes written: 25.13 GB


### **Explanation:**

**Disk Partitions:**
- **`disk_partitions()`**: Lists all mounted disk partitions
- **`device`**: Physical device identifier (e.g., `/dev/sda1`, `C:\`)
- **`mountpoint`**: Where partition is mounted in filesystem
- **`fstype`**: Filesystem type (NTFS, ext4, FAT32, etc.)

**Disk Usage:**
- **`disk_usage(path)`**: Returns usage statistics for given path
- **`total`**: Total partition size
- **`used`**: Space currently used
- **`free`**: Space available
- **`percent`**: (used / total) * 100

**Alert Thresholds:**
- **> 80%**: Warning level - start cleanup or expansion planning
- **> 90%**: Critical - immediate action required
- **> 95%**: Emergency - system may become unstable

**Disk I/O Counters:**
- Cumulative statistics since boot
- Useful for detecting I/O bottlenecks
- High write counts may indicate logging issues
- Monitor delta over time, not absolute values

**DevOps Use Case:** Disk space alerts, log rotation triggers, database cleanup automation, capacity forecasting.

---

## **Section 7: Process Management and Control**

### **Theory: Process Monitoring**

Every running program is a process. DevOps engineers need to:
- List and inspect running processes
- Monitor resource usage per process
- Terminate misbehaving processes
- Detect zombie or hung processes
- Track process lineage (parent-child relationships)

### **Example 7.1: Listing and Inspecting Processes**

In [12]:
# Get current process info
current_process = psutil.Process()
print(" Current Process Information:")
print(f"  PID: {current_process.pid}")
print(f"  Name: {current_process.name()}")
print(f"  Status: {current_process.status()}")
print(f"  CPU usage: {current_process.cpu_percent(interval=1)}%")
print(f"  Memory: {current_process.memory_info().rss / (1024**2):.2f} MB")
print(f"  Threads: {current_process.num_threads()}")

# List all Python processes
print("\n Python Processes:")
python_processes = []
for proc in psutil.process_iter(['pid', 'name', 'cpu_percent', 'memory_info']):
    try:
        # Check if process name contains 'python'
        if 'python' in proc.info['name'].lower():
            python_processes.append({
                'pid': proc.info['pid'],
                'name': proc.info['name'],
                'cpu_percent': proc.info['cpu_percent'],
                'memory_mb': proc.info['memory_info'].rss / (1024**2) if proc.info['memory_info'] else 0
            })
    except (psutil.NoSuchProcess, psutil.AccessDenied):
        pass

# Display Python processes
for proc in python_processes[:10]:  # Limit to 10
    print(f"  PID {proc['pid']}: {proc['name']} - "
          f"CPU: {proc['cpu_percent']:.1f}%, "
          f"Memory: {proc['memory_mb']:.2f} MB")

# Top 5 processes by memory usage
print("\n Top 5 Processes by Memory:")
all_processes = []
for proc in psutil.process_iter(['pid', 'name', 'memory_info']):
    try:
        all_processes.append({
            'pid': proc.info['pid'],
            'name': proc.info['name'],
            'memory_mb': proc.info['memory_info'].rss / (1024**2) if proc.info['memory_info'] else 0
        })
    except (psutil.NoSuchProcess, psutil.AccessDenied):
        pass

top_memory = sorted(all_processes, key=lambda x: x['memory_mb'], reverse=True)[:5]
for i, proc in enumerate(top_memory, 1):
    print(f"  {i}. {proc['name']} (PID {proc['pid']}): {proc['memory_mb']:.2f} MB")

 Current Process Information:
  PID: 7056
  Name: python.exe
  Status: running
  CPU usage: 0.0%
  Memory: 74.67 MB
  Threads: 12

 Python Processes:
  CPU usage: 0.0%
  Memory: 74.67 MB
  Threads: 12

 Python Processes:
  PID 7056: python.exe - CPU: 0.0%, Memory: 75.80 MB
  PID 22328: python.exe - CPU: 0.0%, Memory: 69.58 MB
  PID 24700: python.exe - CPU: 0.0%, Memory: 18.27 MB
  PID 34964: python.exe - CPU: 0.0%, Memory: 69.57 MB

 Top 5 Processes by Memory:
  PID 7056: python.exe - CPU: 0.0%, Memory: 75.80 MB
  PID 22328: python.exe - CPU: 0.0%, Memory: 69.58 MB
  PID 24700: python.exe - CPU: 0.0%, Memory: 18.27 MB
  PID 34964: python.exe - CPU: 0.0%, Memory: 69.57 MB

 Top 5 Processes by Memory:
  1. MemCompression (PID 3792): 2824.92 MB
  2. Code.exe (PID 3424): 1008.04 MB
  3. chrome.exe (PID 11716): 638.30 MB
  4. Code.exe (PID 22768): 595.01 MB
  5. chrome.exe (PID 11252): 591.00 MB
  1. MemCompression (PID 3792): 2824.92 MB
  2. Code.exe (PID 3424): 1008.04 MB
  3. chrome.exe 

### **Explanation:**

**Process Object:**
- **`psutil.Process()`**: Gets current process (no PID = self)
- **`psutil.Process(pid)`**: Gets specific process by PID
- **`name()`**: Process executable name
- **`status()`**: running, sleeping, zombie, etc.

**Process Iteration:**
- **`process_iter(attrs=[])`**: Generator for all processes
- Specify `attrs` to preload data (more efficient)
- Always wrap in try-except (processes can die during iteration)

**Exception Handling:**
- **`NoSuchProcess`**: Process terminated during iteration
- **`AccessDenied`**: Insufficient permissions (common for system processes)
- **`ZombieProcess`**: Dead process not yet reaped

**Memory Metrics:**
- **`memory_info().rss`**: Resident Set Size (actual RAM used)
- **`memory_info().vms`**: Virtual Memory Size (including swap)
- RSS is more meaningful for resource monitoring

**DevOps Use Case:** Detecting memory leaks, finding resource-hungry processes, monitoring application health, automated cleanup.

### **Example 7.2: Network Monitoring**

In [13]:
# Network interfaces and addresses
print("Network Interfaces:")
net_if_addrs = psutil.net_if_addrs()
for interface, addresses in net_if_addrs.items():
    print(f"\n  Interface: {interface}")
    for addr in addresses:
        if addr.family == 2:  # IPv4
            print(f"    IPv4: {addr.address}")
        elif addr.family == 23 or addr.family == 30:  # IPv6 (varies by OS)
            print(f"    IPv6: {addr.address}")

# Network I/O statistics
print("\n Network I/O Statistics:")
net_io = psutil.net_io_counters()
print(f"  Bytes sent: {net_io.bytes_sent / (1024**3):.2f} GB")
print(f"  Bytes received: {net_io.bytes_recv / (1024**3):.2f} GB")
print(f"  Packets sent: {net_io.packets_sent:,}")
print(f"  Packets received: {net_io.packets_recv:,}")
print(f"  Errors in: {net_io.errin:,}")
print(f"  Errors out: {net_io.errout:,}")
print(f"  Dropped in: {net_io.dropin:,}")
print(f"  Dropped out: {net_io.dropout:,}")

# Network connections (limited to first 10)
print("\n Active Network Connections (first 10):")
try:
    connections = psutil.net_connections(kind='inet')[:10]
    for conn in connections:
        status = conn.status if conn.status != 'NONE' else 'N/A'
        local = f"{conn.laddr.ip}:{conn.laddr.port}" if conn.laddr else "N/A"
        remote = f"{conn.raddr.ip}:{conn.raddr.port}" if conn.raddr else "N/A"
        print(f"  {conn.type.name} {local} -> {remote} [{status}]")
except psutil.AccessDenied:
    print("  Access denied - run with elevated privileges to see connections")

Network Interfaces:

  Interface: Ethernet
    IPv4: 169.254.112.31
    IPv6: fe80::9c0a:a985:ec63:1063

  Interface: Local Area Connection* 1
    IPv4: 169.254.115.214
    IPv6: fe80::f365:389d:f3bf:4bcc

  Interface: Local Area Connection* 2
    IPv4: 169.254.141.123
    IPv6: fe80::cb11:27fd:2e96:147a

  Interface: Wi-Fi
    IPv4: 192.168.0.110
    IPv6: fe80::dddd:f03d:b4c8:a6d

  Interface: Loopback Pseudo-Interface 1
    IPv4: 127.0.0.1
    IPv6: ::1

 Network I/O Statistics:
  Bytes sent: 0.10 GB
  Bytes received: 0.33 GB
  Packets sent: 166,988
  Packets received: 393,649
  Errors in: 0
  Errors out: 0
  Dropped in: 0
  Dropped out: 0

 Active Network Connections (first 10):
  SOCK_STREAM 127.0.0.1:63551 -> 127.0.0.1:63550 [ESTABLISHED]
  SOCK_STREAM 127.0.0.1:65255 -> 127.0.0.1:65256 [ESTABLISHED]
  SOCK_STREAM 127.0.0.1:63563 -> 127.0.0.1:9014 [ESTABLISHED]
  SOCK_STREAM 127.0.0.1:65301 -> 127.0.0.1:53 [TIME_WAIT]
  SOCK_STREAM 127.0.0.1:9002 -> 127.0.0.1:61829 [ESTABLISHED]


### **Explanation:**

**Network Interfaces:**
- **`net_if_addrs()`**: Returns all network interfaces and their addresses
- Each interface can have multiple addresses (IPv4, IPv6, MAC)
- Common interfaces: eth0 (Ethernet), wlan0 (WiFi), lo (loopback)

**Network I/O Counters:**
- Cumulative statistics since system boot
- **`bytes_sent/recv`**: Total data transferred
- **`packets_sent/recv`**: Number of packets
- **`errin/errout`**: Transmission errors (should be low)
- **`dropin/dropout`**: Dropped packets (indicates congestion)

**Network Connections:**
- **`net_connections(kind='inet')`**: Lists IPv4/IPv6 connections
- **`kind='tcp'`**: Only TCP connections
- **`kind='udp'`**: Only UDP connections
- **Status values**: ESTABLISHED, LISTEN, TIME_WAIT, etc.
- May require admin/root privileges

**DevOps Use Case:** Detecting unusual network activity, monitoring bandwidth usage, finding port conflicts, security auditing.

---

## **Section 8: Advanced Integration - System Health Monitor**

### **Practical Project: Building a System Health Dashboard**

Let's combine everything we've learned into a production-ready system health monitoring tool.

### **Example 8.1: Complete System Health Monitor**

In [15]:
class SystemHealthMonitor:
    """
    Comprehensive system health monitoring tool for DevOps.
    Monitors CPU, memory, disk, and generates alerts.
    """
    
    def __init__(self, 
                 cpu_threshold: float = 80.0,
                 memory_threshold: float = 85.0,
                 disk_threshold: float = 90.0):
        """
        Initialize monitor with alert thresholds.
        
        Args:
            cpu_threshold: CPU usage % that triggers alert
            memory_threshold: Memory usage % that triggers alert
            disk_threshold: Disk usage % that triggers alert
        """
        self.cpu_threshold = cpu_threshold
        self.memory_threshold = memory_threshold
        self.disk_threshold = disk_threshold
        self.alerts = []
    
    def check_cpu(self) -> dict:
        """Check CPU usage and generate alerts if needed."""
        cpu_percent = psutil.cpu_percent(interval=1)
        per_cpu = psutil.cpu_percent(interval=1, percpu=True)
        
        status = "OK"
        if cpu_percent > self.cpu_threshold:
            status = "CRITICAL"
            self.alerts.append(f"CPU usage critical: {cpu_percent}%")
        elif cpu_percent > self.cpu_threshold * 0.8:
            status = "WARNING"
            self.alerts.append(f"CPU usage high: {cpu_percent}%")
        
        return {
            'metric': 'CPU',
            'value': cpu_percent,
            'per_core': per_cpu,
            'threshold': self.cpu_threshold,
            'status': status
        }
    
    def check_memory(self) -> dict:
        """Check memory usage and generate alerts if needed."""
        memory = psutil.virtual_memory()
        
        status = "OK"
        if memory.percent > self.memory_threshold:
            status = "CRITICAL"
            self.alerts.append(f"Memory usage critical: {memory.percent}%")
        elif memory.percent > self.memory_threshold * 0.8:
            status = "WARNING"
            self.alerts.append(f"Memory usage high: {memory.percent}%")
        
        return {
            'metric': 'Memory',
            'value': memory.percent,
            'total_gb': memory.total / (1024**3),
            'available_gb': memory.available / (1024**3),
            'threshold': self.memory_threshold,
            'status': status
        }
    
    def check_disk(self) -> list:
        """Check disk usage for all partitions."""
        disk_results = []
        
        for partition in psutil.disk_partitions():
            try:
                usage = psutil.disk_usage(partition.mountpoint)
                
                status = "OK"
                if usage.percent > self.disk_threshold:
                    status = "CRITICAL"
                    self.alerts.append(
                        f"Disk {partition.mountpoint} critical: {usage.percent}%"
                    )
                elif usage.percent > self.disk_threshold * 0.8:
                    status = "WARNING"
                    self.alerts.append(
                        f"Disk {partition.mountpoint} high: {usage.percent}%"
                    )
                
                disk_results.append({
                    'metric': 'Disk',
                    'mountpoint': partition.mountpoint,
                    'value': usage.percent,
                    'total_gb': usage.total / (1024**3),
                    'free_gb': usage.free / (1024**3),
                    'threshold': self.disk_threshold,
                    'status': status
                })
            except (PermissionError, Exception):
                pass
        
        return disk_results
    
    def get_top_processes(self, n: int = 5) -> list:
        """Get top N processes by CPU and memory usage."""
        processes = []
        for proc in psutil.process_iter(['pid', 'name', 'cpu_percent', 'memory_percent']):
            try:
                processes.append({
                    'pid': proc.info['pid'],
                    'name': proc.info['name'],
                    'cpu_percent': proc.info['cpu_percent'] or 0,
                    'memory_percent': proc.info['memory_percent'] or 0
                })
            except (psutil.NoSuchProcess, psutil.AccessDenied):
                pass
        
        # Sort by CPU then memory
        top_cpu = sorted(processes, key=lambda x: x['cpu_percent'], reverse=True)[:n]
        top_mem = sorted(processes, key=lambda x: x['memory_percent'], reverse=True)[:n]
        
        return {'cpu': top_cpu, 'memory': top_mem}
    
    def run_health_check(self) -> dict:
        """Run complete health check and return results."""
        self.alerts = []  # Reset alerts
        
        results = {
            'timestamp': datetime.now().isoformat(),
            'hostname': platform.node(),
            'cpu': self.check_cpu(),
            'memory': self.check_memory(),
            'disks': self.check_disk(),
            'top_processes': self.get_top_processes(),
            'alerts': self.alerts,
            'overall_status': 'OK' if not any('CRITICAL' in a for a in self.alerts) else 'CRITICAL'
        }
        
        return results
    
    def print_report(self, results: dict):
        """Print a formatted health check report."""
        print("="*60)
        print(f"SYSTEM HEALTH REPORT")
        print(f" {results['timestamp']}")
        print(f"  {results['hostname']}")
        print("="*60)
        
        # CPU
        cpu = results['cpu']
        status_emoji = "‚úÖ" if cpu['status'] == 'OK' else "‚ö†Ô∏è" if cpu['status'] == 'WARNING' else "üö®"
        print(f"\n{status_emoji} CPU: {cpu['value']:.1f}% (threshold: {cpu['threshold']}%)")
        
        # Memory
        mem = results['memory']
        status_emoji = "‚úÖ" if mem['status'] == 'OK' else "‚ö†Ô∏è" if mem['status'] == 'WARNING' else "üö®"
        print(f"{status_emoji} Memory: {mem['value']:.1f}% "
              f"({mem['available_gb']:.1f}GB free of {mem['total_gb']:.1f}GB)")
        
        # Disks
        print(f"\nüíø Disk Usage:")
        for disk in results['disks']:
            status_emoji = "‚úÖ" if disk['status'] == 'OK' else "‚ö†Ô∏è" if disk['status'] == 'WARNING' else "üö®"
            print(f"  {status_emoji} {disk['mountpoint']}: {disk['value']:.1f}% "
                  f"({disk['free_gb']:.1f}GB free)")
        
        # Top processes
        print(f"\nüîù Top Processes by CPU:")
        for proc in results['top_processes']['cpu'][:3]:
            print(f"  {proc['name']} (PID {proc['pid']}): {proc['cpu_percent']:.1f}%")
        
        print(f"\nüîù Top Processes by Memory:")
        for proc in results['top_processes']['memory'][:3]:
            print(f"  {proc['name']} (PID {proc['pid']}): {proc['memory_percent']:.1f}%")
        
        # Alerts
        if results['alerts']:
            print(f"\nüö® ALERTS:")
            for alert in results['alerts']:
                print(f"  - {alert}")
        else:
            print(f"\n‚úÖ No alerts - all systems nominal")
        
        print("="*60)

# Create monitor and run health check
monitor = SystemHealthMonitor(
    cpu_threshold=80.0,
    memory_threshold=85.0,
    disk_threshold=90.0
)

results = monitor.run_health_check()
monitor.print_report(results)

SYSTEM HEALTH REPORT
 2025-11-29T17:42:14.640905
  PF2XV4QP

‚úÖ CPU: 14.3% (threshold: 80.0%)
‚ö†Ô∏è Memory: 68.6% (10.0GB free of 31.8GB)

üíø Disk Usage:
  ‚úÖ C:\: 71.8% (134.0GB free)

üîù Top Processes by CPU:
  System Idle Process (PID 0): 1047.1%
  Code.exe (PID 35780): 28.0%
  System (PID 4): 13.3%

üîù Top Processes by Memory:
  MemCompression (PID 3792): 9.2%
  Code.exe (PID 3424): 2.2%
  chrome.exe (PID 11716): 1.9%

üö® ALERTS:
  - Memory usage high: 68.6%


### **Explanation:**

**System Health Monitor Architecture:**

**1. Configurable Thresholds:**
- Different environments need different alert levels
- Production might use 80% CPU, development 95%
- Thresholds passed via constructor for flexibility

**2. Multi-Metric Monitoring:**
- **CPU**: Overall and per-core usage
- **Memory**: Total and available with percentage
- **Disk**: All partitions with individual thresholds
- **Processes**: Top consumers by CPU and memory

**3. Alert System:**
- **WARNING**: 80% of threshold (early warning)
- **CRITICAL**: Exceeds threshold (immediate action)
- Alerts collected and returned with results
- Can be sent to logging system, Slack, PagerDuty, etc.

**4. Status Reporting:**
- Each metric has individual status
- Overall status based on worst individual status
- Clear visual indicators (‚úÖ‚ö†Ô∏èüö®)

**5. Production Enhancements:**
- Export results as JSON for integration with monitoring tools
- Store historical data for trend analysis
- Send alerts via email/SMS/Slack
- Schedule periodic checks (e.g., every 5 minutes)
- Add network and temperature monitoring
- Implement automatic remediation for common issues

**DevOps Use Case:** 
- Infrastructure monitoring dashboards
- Auto-scaling triggers in cloud environments
- Alerting systems for on-call engineers
- Capacity planning and resource optimization
- SLA compliance monitoring

### **Example 8.2: Export Health Data to JSON**

In [16]:
# Create monitoring data directory
monitoring_dir = Path("monitoring_data")
monitoring_dir.mkdir(exist_ok=True)

# Export results to JSON
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
json_file = monitoring_dir / f"health_check_{timestamp}.json"

with open(json_file, 'w') as f:
    json.dump(results, f, indent=2)

print(f"‚úÖ Health check data exported to: {json_file}")
print(f"üìä File size: {json_file.stat().st_size} bytes")

# Read and display sample
with open(json_file, 'r') as f:
    sample_data = json.load(f)

print(f"\nüìÑ Sample JSON structure:")
print(json.dumps({
    'timestamp': sample_data['timestamp'],
    'hostname': sample_data['hostname'],
    'overall_status': sample_data['overall_status'],
    'alert_count': len(sample_data['alerts']),
    'metrics': {
        'cpu': f"{sample_data['cpu']['value']:.1f}%",
        'memory': f"{sample_data['memory']['value']:.1f}%",
        'disk_count': len(sample_data['disks'])
    }
}, indent=2))

‚úÖ Health check data exported to: monitoring_data\health_check_20251129_174242.json
üìä File size: 2407 bytes

üìÑ Sample JSON structure:
{
  "timestamp": "2025-11-29T17:42:14.640905",
  "hostname": "PF2XV4QP",
  "overall_status": "OK",
  "alert_count": 1,
  "metrics": {
    "cpu": "14.3%",
    "memory": "68.6%",
    "disk_count": 1
  }
}


### **Explanation:**

**JSON Export Benefits:**
- **Integration**: Easy to consume by monitoring tools (Prometheus, Grafana, ELK)
- **Storage**: Historical data for trend analysis and capacity planning
- **API-Ready**: Can be served via REST API for dashboards
- **Portable**: Standard format works across platforms and languages

**Data Persistence Strategy:**
- **Timestamped files**: Each check creates unique file
- **Organized directories**: Group by date or service
- **Retention policy**: Delete files older than N days
- **Compression**: Archive old data to save space

**Production Integration:**
- Push to time-series database (InfluxDB, TimescaleDB)
- Send to log aggregation (ELK Stack, Splunk)
- Trigger webhooks on status changes
- Generate daily/weekly reports from historical data

**DevOps Use Case:** Building custom monitoring solutions, integrating with existing tools, compliance reporting, post-mortem analysis.

---

## **Summary: Key Takeaways**

### **Best Practices:**
‚úÖ Always use `pathlib` for new code (not `os.path`)  
‚úÖ Never use `shell=True` with user input  
‚úÖ Set timeouts on all subprocess calls  
‚úÖ Provide default values for environment variables  
‚úÖ Handle `psutil` exceptions (AccessDenied, NoSuchProcess)  
‚úÖ Monitor trends, not just current values  
‚úÖ Implement multi-level alerts (warning ‚Üí critical)  
‚úÖ Export metrics in standard formats (JSON, Prometheus)

### **Next Steps:**
- Build automated deployment scripts using subprocess
- Create log rotation tools with pathlib and shutil
- Implement auto-scaling triggers based on psutil metrics
- Develop configuration management systems with environment variables
- Build custom monitoring dashboards integrating all techniques

## **üßπ Cleanup**

Run this cell to clean up the test files and directories created during this session.

In [17]:
# Cleanup test directories
cleanup_dirs = [test_dir, monitoring_dir]

print("Cleaning up test directories...")
for directory in cleanup_dirs:
    if directory.exists():
        shutil.rmtree(directory)
        print(f" Removed {directory}")

print("\n Cleanup complete!")

Cleaning up test directories...
 Removed devops_test
 Removed monitoring_data

 Cleanup complete!
