# Tutorial: Service Lifecycle Management with TaskGroups

**Category**: Concurrency
**Difficulty**: Advanced
**Time**: 25-35 minutes

## Problem Statement

Building production services often requires coordinating multiple components with complex dependencies. Consider an HTTP API service that needs:

- A database connection pool that must be ready before anything else
- A cache layer that depends on the database
- An HTTP server that depends on both database and cache
- Background workers processing tasks from a queue
- A health monitoring system that checks all components

The challenge isn't just running these concurrently - it's managing their lifecycle: **initialization order**, **readiness signaling**, **health monitoring**, and **graceful shutdown**. If the database takes 2 seconds to connect, the API shouldn't start serving requests. If a component fails health checks, the service should shut down cleanly.

Traditional approaches like spawning independent tasks lead to race conditions: the API starts before the database is ready, health checks run before components initialize, or shutdown leaves orphaned background workers.

**Why This Matters**:
- **Correctness**: Components accessing uninitialized dependencies cause crashes or data corruption
- **Observability**: Without coordinated health checks, you can't tell if the service is actually ready
- **Reliability**: Uncoordinated shutdown leaves connections open, jobs incomplete, or resources leaked

**What You'll Build**:
A production-ready service manager using lionherd-core's `create_task_group()`, `task_status.started()`, and Event coordination that manages multi-component initialization, dependency ordering, health monitoring, and graceful shutdown.

## Prerequisites

**Prior Knowledge**:
- Python async/await fundamentals (asyncio basics)
- Understanding of context managers (async with)
- Structured concurrency concepts (task groups, cancellation)

**Required Packages**:
```bash
pip install lionherd-core  # >=1.0.0a3
```

**Optional Reading**:
- [API Reference: Task Groups](../../docs/api/libs/concurrency/task.md)
- [Reference Notebook: Task Groups](../references/concurrency_task.ipynb)

In [1]:
# Standard library
import logging
from dataclasses import dataclass, field
from enum import Enum
from typing import Any

# Third-party
import anyio
from anyio.abc import TaskStatus

# lionherd-core
from lionherd_core.libs.concurrency import (
    Event,
    create_task_group,
    get_cancelled_exc_class,
    sleep,
)

# Configure logging for examples
logging.basicConfig(
    level=logging.INFO,
    format='[%(asctime)s.%(msecs)03d] %(name)s: %(message)s',
    datefmt='%H:%M:%S'
)

logger = logging.getLogger(__name__)

## Solution Overview

We'll implement a service manager using structured concurrency that handles the complete lifecycle:

1. **Initialization Protocol**: Services signal readiness via `task_status.started()`
2. **Dependency Ordering**: Parent waits for dependencies before starting dependents
3. **Event Coordination**: Health checks use Events to signal between components
4. **Graceful Shutdown**: Cancel scope triggers coordinated cleanup

**Key lionherd-core Components**:
- `create_task_group()`: Structured concurrency context ensuring all tasks complete
- `TaskGroup.start()`: Wait for service initialization
- `TaskGroup.start_soon()`: Spawn background tasks
- `Event`: Signal coordination between tasks
- `cancel_scope`: Timeout and graceful shutdown

**Flow**:
```
Startup:
  Database ‚Üí started() ‚Üí Cache ‚Üí started() ‚Üí API ‚Üí started()
                                    ‚Üì
                         Health Monitor (Events)
                                    ‚Üì
                              Workers (Queue)

Shutdown:
  cancel_scope.cancel() ‚Üí All tasks receive cancellation
                       ‚Üí Graceful cleanup in each
                       ‚Üí TaskGroup waits for all
                       ‚Üí Exit
```

**Expected Outcome**: Services start in correct order, health monitoring confirms readiness, shutdown is clean and coordinated.

In [2]:
# Quick Start: Service Lifecycle in 30 Seconds


async def service(name: str, events: Event, *, task_status: TaskStatus = anyio.TASK_STATUS_IGNORED):
    """Minimal service with lifecycle."""
    print(f"[{name}] Starting...")
    await sleep(0.1)  # Simulate startup

    task_status.started(f"{name} ready")
    events.set()  # Signal ready
    print(f"[{name}] Running")

    await events.wait()  # Wait for shutdown
    print(f"[{name}] Stopped")

# Try it:
shutdown = Event()

async with create_task_group() as tg:
    # Start service and wait for ready
    status = await tg.start(service, "Database", shutdown)
    print(f"‚úì {status}")

    await sleep(0.2)

    # Trigger shutdown
    shutdown.set()

print("‚úì Lifecycle complete")

# üëá Now read below to understand coordinated multi-service lifecycles

[Database] Starting...
[Database] Running
‚úì Database ready
[Database] Stopped
‚úì Lifecycle complete


### Step 1: Define Service States and Event Coordination

Before implementing services, we need clear state definitions and event signaling mechanisms. Services transition through states (Initializing ‚Üí Running ‚Üí Stopping ‚Üí Stopped), and components coordinate via Events.

**Why Events**: Events provide thread-safe signaling between tasks. A health monitor can wait for a "database_ready" event before checking database health, avoiding race conditions.

In [3]:
class ServiceState(Enum):
    """Service lifecycle states."""
    INITIALIZING = "initializing"
    RUNNING = "running"
    STOPPING = "stopping"
    STOPPED = "stopped"
    FAILED = "failed"

@dataclass
class ServiceStatus:
    """Service status with state and metadata."""
    name: str
    state: ServiceState
    details: dict[str, Any] = field(default_factory=dict)

    def __repr__(self) -> str:
        return f"{self.name}: {self.state.value}"

@dataclass
class ServiceEvents:
    """Coordination events for service lifecycle."""
    ready: Event = field(default_factory=Event)
    shutdown: Event = field(default_factory=Event)
    health_check: Event = field(default_factory=Event)

# Example usage
status = ServiceStatus(name="Database", state=ServiceState.INITIALIZING)
events = ServiceEvents()

print(f"Status: {status}")
print(f"Events ready: {events.ready.is_set()}")
print(f"Events shutdown: {events.shutdown.is_set()}")

Status: Database: initializing
Events ready: False
Events shutdown: False


**Notes**:
- `ServiceState` enum ensures type safety and clear transitions
- `ServiceEvents` groups related events (ready, shutdown, health_check) for easier management
- Events are created once and shared across tasks - don't create new Event instances for coordination

### Step 2: Implement Basic Service with Lifecycle

A service needs initialization, operation, and cleanup phases. The `task_status.started()` protocol signals when initialization completes, allowing dependents to proceed.

**Why task_status.started()**: Without it, parent tasks can't distinguish "still initializing" from "ready". Using `start()` instead of `start_soon()` provides synchronization.

In [4]:
async def database_service(
    *,
    task_status: TaskStatus[ServiceStatus] = anyio.TASK_STATUS_IGNORED,
) -> None:
    """Simulated database service with lifecycle management."""
    name = "Database"
    status = ServiceStatus(name=name, state=ServiceState.INITIALIZING)

    try:
        # Initialize (simulate connection pool setup)
        logger.info(f"[{name}] Initializing connection pool...")
        await sleep(0.2)  # Simulate startup time

        # Signal ready
        status.state = ServiceState.RUNNING
        status.details["connections"] = 10
        task_status.started(status)
        logger.info(f"[{name}] Ready (10 connections)")

        # Run (keep-alive, health checks)
        while True:
            await sleep(1.0)  # Simulate periodic maintenance

    except get_cancelled_exc_class():
        # Graceful shutdown
        status.state = ServiceState.STOPPING
        logger.info(f"[{name}] Shutting down...")
        await sleep(0.1)  # Simulate cleanup (close connections)
        status.state = ServiceState.STOPPED
        logger.info(f"[{name}] Stopped")
        raise
    except Exception as e:
        status.state = ServiceState.FAILED
        logger.error(f"[{name}] Failed: {e}")
        raise

# Test the service lifecycle
async with create_task_group() as tg:
    # Wait for database to initialize
    status = await tg.start(database_service)
    print(f"\n‚úì Service started: {status}")
    print(f"  Details: {status.details}")

    # Let it run briefly
    await sleep(0.3)

    # Trigger shutdown
    tg.cancel_scope.cancel()

print("\n‚úì Service lifecycle complete")

[23:26:07.238] __main__: [Database] Initializing connection pool...
[23:26:07.439] __main__: [Database] Ready (10 connections)



‚úì Service started: Database: running
  Details: {'connections': 10}


[23:26:07.743] __main__: [Database] Shutting down...



‚úì Service lifecycle complete


**Notes**:
- `task_status.started(status)` returns ServiceStatus to caller - useful for passing connection info
- Cancellation triggers graceful shutdown - always catch `get_cancelled_exc_class()` for cleanup
- State transitions (INITIALIZING ‚Üí RUNNING ‚Üí STOPPING ‚Üí STOPPED) provide observability

### Step 3: Add Multi-Service Coordination with Dependencies

Real services have dependencies: cache needs database, API needs both. We use `await tg.start()` sequentially to enforce ordering.

**Why Sequential start()**: Each `await tg.start()` blocks until `task_status.started()` is called, ensuring dependencies are ready before dependents start.

In [5]:
async def cache_service(
    db_status: ServiceStatus,
    *,
    task_status: TaskStatus[ServiceStatus] = anyio.TASK_STATUS_IGNORED,
) -> None:
    """Cache service that depends on database."""
    name = "Cache"
    status = ServiceStatus(name=name, state=ServiceState.INITIALIZING)

    try:
        # Use database connection from db_status
        logger.info(f"[{name}] Connecting to {db_status.name}...")
        await sleep(0.15)

        status.state = ServiceState.RUNNING
        status.details["cache_size"] = "100MB"
        task_status.started(status)
        logger.info(f"[{name}] Ready (cache_size: 100MB)")

        while True:
            await sleep(1.0)

    except get_cancelled_exc_class():
        status.state = ServiceState.STOPPING
        logger.info(f"[{name}] Shutting down...")
        await sleep(0.05)
        status.state = ServiceState.STOPPED
        logger.info(f"[{name}] Stopped")
        raise

async def api_service(
    db_status: ServiceStatus,
    cache_status: ServiceStatus,
    *,
    task_status: TaskStatus[ServiceStatus] = anyio.TASK_STATUS_IGNORED,
) -> None:
    """HTTP API service that depends on database and cache."""
    name = "API"
    status = ServiceStatus(name=name, state=ServiceState.INITIALIZING)

    try:
        logger.info(f"[{name}] Starting HTTP server (port: 8000)...")
        await sleep(0.1)

        status.state = ServiceState.RUNNING
        status.details["port"] = 8000
        status.details["dependencies"] = [db_status.name, cache_status.name]
        task_status.started(status)
        logger.info(f"[{name}] Ready (port: 8000)")

        while True:
            await sleep(1.0)

    except get_cancelled_exc_class():
        status.state = ServiceState.STOPPING
        logger.info(f"[{name}] Shutting down...")
        await sleep(0.05)
        status.state = ServiceState.STOPPED
        logger.info(f"[{name}] Stopped")
        raise

# Test coordinated startup
async with create_task_group() as tg:
    # Start in dependency order
    db_status = await tg.start(database_service)
    print(f"‚úì {db_status}")

    cache_status = await tg.start(cache_service, db_status)
    print(f"‚úì {cache_status}")

    api_status = await tg.start(api_service, db_status, cache_status)
    print(f"‚úì {api_status}")
    print(f"  Dependencies: {api_status.details['dependencies']}")

    # All services running
    print("\n‚úì All services ready\n")
    await sleep(0.3)

    # Coordinated shutdown
    print("Initiating shutdown...\n")
    tg.cancel_scope.cancel()

print("\n‚úì Coordinated lifecycle complete")

[23:26:07.758] __main__: [Database] Initializing connection pool...
[23:26:07.960] __main__: [Database] Ready (10 connections)
[23:26:07.962] __main__: [Cache] Connecting to Database...
[23:26:08.114] __main__: [Cache] Ready (cache_size: 100MB)
[23:26:08.116] __main__: [API] Starting HTTP server (port: 8000)...


‚úì Database: running
‚úì Cache: running


[23:26:08.219] __main__: [API] Ready (port: 8000)


‚úì API: running
  Dependencies: ['Database', 'Cache']

‚úì All services ready



[23:26:08.524] __main__: [Database] Shutting down...
[23:26:08.525] __main__: [API] Shutting down...
[23:26:08.525] __main__: [Cache] Shutting down...


Initiating shutdown...


‚úì Coordinated lifecycle complete


**Notes**:
- Passing `db_status` to `cache_service` provides connection info (not just signaling)
- Services start sequentially but run concurrently after initialization
- Shutdown happens in reverse (cancel propagates to all tasks simultaneously)

### Step 4: Add Health Monitoring with Event Signaling

Health monitors need to coordinate with services: wait for services to be ready, check them periodically, signal failures.

**Why Events**: Health monitor waits for `ready` event before checking. Services set events after initialization. This avoids polling or sleep-based coordination.

In [6]:
async def monitored_service(
    name: str,
    startup_time: float,
    events: ServiceEvents,
    *,
    task_status: TaskStatus[ServiceStatus] = anyio.TASK_STATUS_IGNORED,
) -> None:
    """Service with health monitoring integration."""
    status = ServiceStatus(name=name, state=ServiceState.INITIALIZING)

    try:
        logger.info(f"[{name}] Initializing...")
        await sleep(startup_time)

        status.state = ServiceState.RUNNING
        task_status.started(status)
        events.ready.set()  # Signal health monitor
        logger.info(f"[{name}] Ready (ready event set)")

        # Wait for shutdown signal
        await events.shutdown.wait()

    except get_cancelled_exc_class():
        status.state = ServiceState.STOPPING
        logger.info(f"[{name}] Shutting down...")
        await sleep(0.05)
        status.state = ServiceState.STOPPED
        logger.info(f"[{name}] Stopped")
        raise

async def health_monitor(
    service_name: str,
    events: ServiceEvents,
) -> None:
    """Health monitoring task that waits for service readiness."""
    monitor_name = f"HealthMonitor({service_name})"

    try:
        # Wait for service to be ready
        logger.info(f"[{monitor_name}] Waiting for {service_name} ready...")
        await events.ready.wait()
        logger.info(f"[{monitor_name}] {service_name} is ready, starting checks")

        # Periodic health checks
        check_count = 0
        while True:
            await sleep(0.2)  # Check every 200ms
            check_count += 1
            logger.info(f"[{monitor_name}] Health check #{check_count}: OK")

    except get_cancelled_exc_class():
        logger.info(f"[{monitor_name}] Stopped (performed {check_count} checks)")
        raise

# Test health monitoring
service_events = ServiceEvents()

async with create_task_group() as tg:
    # Start health monitor first (it will wait)
    tg.start_soon(health_monitor, "TestService", service_events)

    # Start service (will signal ready)
    status = await tg.start(monitored_service, "TestService", 0.15, service_events)
    print(f"‚úì {status}\n")

    # Let health checks run
    await sleep(0.5)

    # Shutdown
    print("\nInitiating shutdown...\n")
    service_events.shutdown.set()
    tg.cancel_scope.cancel()

print("\n‚úì Health monitoring complete")

[23:26:08.541] __main__: [HealthMonitor(TestService)] Waiting for TestService ready...
[23:26:08.542] __main__: [TestService] Initializing...
[23:26:08.693] __main__: [TestService] Ready (ready event set)
[23:26:08.695] __main__: [HealthMonitor(TestService)] TestService is ready, starting checks


‚úì TestService: running



[23:26:08.897] __main__: [HealthMonitor(TestService)] Health check #1: OK
[23:26:09.099] __main__: [HealthMonitor(TestService)] Health check #2: OK
[23:26:09.198] __main__: [HealthMonitor(TestService)] Stopped (performed 2 checks)



Initiating shutdown...


‚úì Health monitoring complete


**Notes**:
- Health monitor uses `start_soon()` (fire-and-forget) since it doesn't need initialization protocol
- Service sets `ready` event after initialization - monitor waits for this before checking
- `shutdown` event provides clean termination signal (alternative to cancellation for some scenarios)

from lionherd_core.libs.concurrency import Queue

async def background_worker(
    worker_id: int,
    queue: Queue,
    events: ServiceEvents,
) -> None:
    """Background worker that processes tasks from queue."""
    name = f"Worker-{worker_id}"
    
    try:
        # Wait for service ready
        await events.ready.wait()
        logger.info(f"[{name}] Started")
        
        # Process tasks
        async for task in queue:
            logger.info(f"[{name}] Processing task: {task}")
            await sleep(0.1)  # Simulate work
            logger.info(f"[{name}] Completed task: {task}")
            
    except get_cancelled_exc_class():
        logger.info(f"[{name}] Shutting down")
        raise

async def service_with_workers(
    task_queue: Queue,
    events: ServiceEvents,
    *,
    task_status: TaskStatus[ServiceStatus] = anyio.TASK_STATUS_IGNORED,
) -> None:
    """Service that produces tasks for workers."""
    name = "TaskService"
    status = ServiceStatus(name=name, state=ServiceState.INITIALIZING)
    
    try:
        logger.info(f"[{name}] Initializing...")
        await sleep(0.1)
        
        status.state = ServiceState.RUNNING
        task_status.started(status)
        events.ready.set()
        logger.info(f"[{name}] Ready")
        
        # Produce tasks
        for i in range(5):
            await sleep(0.15)
            await task_queue.put(f"task-{i}")
            logger.info(f"[{name}] Enqueued task-{i}")
        
        # Wait for shutdown
        await events.shutdown.wait()
        
    except get_cancelled_exc_class():
        logger.info(f"[{name}] Shutting down...")
        raise

# Test service with background workers
task_queue = Queue[str](max_size=10)
worker_events = ServiceEvents()

async with create_task_group() as tg:
    # Start workers (they wait for ready event)
    for i in range(2):
        tg.start_soon(background_worker, i, task_queue, worker_events)
    
    # Start service (signals ready, produces tasks)
    status = await tg.start(service_with_workers, task_queue, worker_events)
    print(f"‚úì {status}\n")
    
    # Let workers process
    await sleep(1.0)
    
    # Shutdown
    print("\nInitiating shutdown...\n")
    worker_events.shutdown.set()
    await task_queue.close()
    tg.cancel_scope.cancel()

print("\n‚úì Workers shutdown complete")

**Notes**:
- Workers use `while True: task = await queue.get()` pattern to process tasks
- `await queue.close()` signals queue closure - `queue.get()` raises `anyio.EndOfStream`
- Catch `anyio.EndOfStream` for graceful worker shutdown when queue closes
- Workers wait for `ready` event before processing - ensures service is initialized

## Complete Working Example

Here's the full production-ready implementation combining all patterns: multi-service dependencies, health monitoring, background workers, and coordinated lifecycle management.

**Features**:
- ‚úÖ Multi-component initialization (Database ‚Üí Cache ‚Üí API)
- ‚úÖ Dependency ordering with `task_status.started()`
- ‚úÖ Health monitoring with event coordination
- ‚úÖ Background workers with queue processing
- ‚úÖ Graceful shutdown with cleanup
- ‚úÖ Production-ready error handling

In [7]:
"""
Complete production-ready service lifecycle manager.

Demonstrates multi-component service orchestration with:
- Coordinated initialization (dependency ordering)
- Health monitoring (event signaling)
- Background workers (queue processing)
- Graceful shutdown (cleanup protocols)
"""

# Standard library
import logging
from dataclasses import dataclass, field
from enum import Enum
from typing import Any

# Third-party
import anyio
from anyio.abc import TaskStatus

# lionherd-core
from lionherd_core.libs.concurrency import (
    Event,
    Queue,
    create_task_group,
    sleep,
)

logging.basicConfig(
    level=logging.INFO,
    format='[%(asctime)s.%(msecs)03d] %(name)s: %(message)s',
    datefmt='%H:%M:%S'
)
logger = logging.getLogger(__name__)

class ServiceState(Enum):
    INITIALIZING = "initializing"
    RUNNING = "running"
    STOPPING = "stopping"
    STOPPED = "stopped"

@dataclass
class ServiceStatus:
    name: str
    state: ServiceState
    details: dict[str, Any] = field(default_factory=dict)

@dataclass
class ServiceEvents:
    ready: Event = field(default_factory=Event)
    shutdown: Event = field(default_factory=Event)

# Services
async def database_service(
    events: ServiceEvents,
    *,
    task_status: TaskStatus[ServiceStatus] = anyio.TASK_STATUS_IGNORED,
) -> None:
    name = "Database"
    status = ServiceStatus(name=name, state=ServiceState.INITIALIZING)

    try:
        logger.info(f"[{name}] Initializing connection pool...")
        await sleep(0.2)

        status.state = ServiceState.RUNNING
        status.details["connections"] = 10
        task_status.started(status)
        events.ready.set()
        logger.info(f"[{name}] Ready")

        await events.shutdown.wait()

    except get_cancelled_exc_class():
        logger.info(f"[{name}] Shutting down...")
        await sleep(0.1)
        logger.info(f"[{name}] Stopped")
        raise

async def cache_service(
    db_status: ServiceStatus,
    events: ServiceEvents,
    *,
    task_status: TaskStatus[ServiceStatus] = anyio.TASK_STATUS_IGNORED,
) -> None:
    name = "Cache"
    status = ServiceStatus(name=name, state=ServiceState.INITIALIZING)

    try:
        logger.info(f"[{name}] Connecting to {db_status.name}...")
        await sleep(0.15)

        status.state = ServiceState.RUNNING
        task_status.started(status)
        events.ready.set()
        logger.info(f"[{name}] Ready")

        await events.shutdown.wait()

    except get_cancelled_exc_class():
        logger.info(f"[{name}] Shutting down...")
        await sleep(0.05)
        logger.info(f"[{name}] Stopped")
        raise

async def http_server(
    db_status: ServiceStatus,
    cache_status: ServiceStatus,
    task_queue: Queue,
    events: ServiceEvents,
    *,
    task_status: TaskStatus[ServiceStatus] = anyio.TASK_STATUS_IGNORED,
) -> None:
    name = "HTTP-Server"
    status = ServiceStatus(name=name, state=ServiceState.INITIALIZING)

    try:
        logger.info(f"[{name}] Starting on port 8000...")
        await sleep(0.1)

        status.state = ServiceState.RUNNING
        status.details["port"] = 8000
        task_status.started(status)
        events.ready.set()
        logger.info(f"[{name}] Ready")

        # Simulate handling requests (produce tasks)
        for i in range(3):
            await sleep(0.2)
            await task_queue.put(f"http-request-{i}")
            logger.info(f"[{name}] Enqueued request {i}")

        await events.shutdown.wait()

    except get_cancelled_exc_class():
        logger.info(f"[{name}] Shutting down...")
        await sleep(0.05)
        logger.info(f"[{name}] Stopped")
        raise

# Background worker
async def background_worker(
    worker_id: int,
    queue: Queue,
    events: ServiceEvents,
) -> None:
    name = f"Worker-{worker_id}"

    try:
        await events.ready.wait()
        logger.info(f"[{name}] Started")

        # Process tasks from queue
        while True:
            task = await queue.get()
            logger.info(f"[{name}] Processing: {task}")
            await sleep(0.15)
            logger.info(f"[{name}] Completed: {task}")

    except anyio.EndOfStream:
        # Queue closed - graceful shutdown
        logger.info(f"[{name}] Queue closed, shutting down")
    except get_cancelled_exc_class():
        logger.info(f"[{name}] Cancelled")
        raise

# Health monitor
async def health_monitor(service_name: str, events: ServiceEvents) -> None:
    name = f"HealthMonitor({service_name})"

    try:
        await events.ready.wait()
        logger.info(f"[{name}] Started monitoring")

        check_count = 0
        while True:
            await sleep(0.3)
            check_count += 1
            logger.info(f"[{name}] Check #{check_count}: OK")

    except get_cancelled_exc_class():
        logger.info(f"[{name}] Stopped ({check_count} checks)")
        raise

# Service Manager
class ServiceManager:
    """Production service lifecycle manager."""

    def __init__(self):
        self.db_events = ServiceEvents()
        self.cache_events = ServiceEvents()
        self.api_events = ServiceEvents()
        self.task_queue = Queue.with_maxsize(20)

    async def run(self, duration: float = 2.0) -> None:
        """Run all services for specified duration."""
        async with create_task_group() as tg:
            # Start infrastructure services in order
            db_status = await tg.start(database_service, self.db_events)
            logger.info(f"‚úì {db_status.name} initialized")

            cache_status = await tg.start(
                cache_service, db_status, self.cache_events
            )
            logger.info(f"‚úì {cache_status.name} initialized")

            # Start HTTP server
            api_status = await tg.start(
                http_server,
                db_status,
                cache_status,
                self.task_queue,
                self.api_events,
            )
            logger.info(f"‚úì {api_status.name} initialized on port {api_status.details['port']}")

            # Start background workers
            for i in range(2):
                tg.start_soon(
                    background_worker, i, self.task_queue, self.api_events
                )

            # Start health monitors
            tg.start_soon(health_monitor, "Database", self.db_events)
            tg.start_soon(health_monitor, "Cache", self.cache_events)
            tg.start_soon(health_monitor, "API", self.api_events)

            logger.info("\n" + "="*60)
            logger.info("ALL SERVICES READY")
            logger.info("="*60 + "\n")

            # Run for specified duration
            await sleep(duration)

            # Graceful shutdown
            logger.info("\n" + "="*60)
            logger.info("INITIATING GRACEFUL SHUTDOWN")
            logger.info("="*60 + "\n")

            # Signal all services to shutdown
            self.db_events.shutdown.set()
            self.cache_events.shutdown.set()
            self.api_events.shutdown.set()

            # Close queue (workers will finish and exit)
            await self.task_queue.close()

            # Cancel remaining tasks (monitors)
            await sleep(0.1)
            tg.cancel_scope.cancel()

# Run the complete service manager
manager = ServiceManager()
await manager.run(duration=1.5)

print("\n" + "="*60)
print("‚úì COMPLETE SERVICE LIFECYCLE FINISHED")
print("="*60)

[23:26:09.243] __main__: [Database] Initializing connection pool...
[23:26:09.445] __main__: [Database] Ready
[23:26:09.445] __main__: ‚úì Database initialized
[23:26:09.446] __main__: [Cache] Connecting to Database...
[23:26:09.598] __main__: [Cache] Ready
[23:26:09.599] __main__: ‚úì Cache initialized
[23:26:09.600] __main__: [HTTP-Server] Starting on port 8000...
[23:26:09.702] __main__: [HTTP-Server] Ready
[23:26:09.703] __main__: ‚úì HTTP-Server initialized on port 8000
[23:26:09.704] __main__: 
[23:26:09.704] __main__: ALL SERVICES READY

[23:26:09.706] __main__: [Worker-0] Started
[23:26:09.706] __main__: [Worker-1] Started
[23:26:09.707] __main__: [HealthMonitor(Database)] Started monitoring
[23:26:09.707] __main__: [HealthMonitor(Cache)] Started monitoring
[23:26:09.708] __main__: [HealthMonitor(API)] Started monitoring
[23:26:09.903] __main__: [HTTP-Server] Enqueued request 0
[23:26:09.904] __main__: [Worker-0] Processing: http-request-0
[23:26:10.009] __main__: [HealthMonito


‚úì COMPLETE SERVICE LIFECYCLE FINISHED


## Variation: Parallel Service Initialization

**When to Use**: Services have no dependencies and can initialize concurrently (faster startup)

**Pattern**:
```python
async def parallel_startup():
    """Start independent services in parallel."""
    async with create_task_group() as tg:
        # All start concurrently
        db_task = tg.start(database_service, events_db)
        metrics_task = tg.start(metrics_service, events_metrics)
        logger_task = tg.start(logger_service, events_logger)
        
        # Wait for all
        db_status = await db_task
        metrics_status = await metrics_task
        logger_status = await logger_task
        
        # Now start dependent services
        await tg.start(api_service, db_status, events_api)
```

**Trade-offs**:
- ‚úÖ Faster startup (services initialize concurrently)
- ‚úÖ Better resource utilization during initialization
- ‚ùå More complex (need to track which services are independent)
- ‚ùå Harder to debug (concurrent failures)

For additional variations (Service Registry, Phased Shutdown), see [lionherd-core examples](https://github.com/khive-ai/lionherd-core/examples/service_lifecycle_patterns.py).

## Summary

**What You Accomplished**:
- ‚úÖ Built multi-component service manager with coordinated initialization
- ‚úÖ Implemented dependency ordering using `task_status.started()` protocol
- ‚úÖ Integrated health monitoring with Event-based coordination
- ‚úÖ Added background workers with queue-based task processing
- ‚úÖ Implemented graceful shutdown with cleanup protocols

**Key Takeaways**:
1. **Structured Concurrency**: TaskGroups ensure all tasks complete or cancel before exit - no orphaned tasks
2. **Initialization Protocol**: `await tg.start()` + `task_status.started()` provides type-safe dependency ordering
3. **Event Coordination**: Events signal between tasks without polling or sleep-based synchronization
4. **Graceful Shutdown**: Cancellation propagates to all tasks, each handles cleanup in `except get_cancelled_exc_class()`
5. **Production Readiness**: Error handling, monitoring, and configuration tuning are essential - not optional

**When to Use This Pattern**:
- ‚úÖ Multi-component services with dependencies (HTTP API + database + cache)
- ‚úÖ Long-running services needing health monitoring
- ‚úÖ Background task processing with queues
- ‚úÖ Coordinated startup and shutdown requirements
- ‚ùå Simple single-task operations (use asyncio.create_task instead)
- ‚ùå Fire-and-forget tasks with no lifecycle management (use start_soon only)

## Related Resources

**lionherd-core API Reference**:
- [Task Groups](../../docs/api/libs/concurrency/task.md) - create_task_group, start, start_soon
- [Primitives](../../docs/api/libs/concurrency/primitives.md) - Event, Queue, Lock
- [Cancellation](../../docs/api/libs/concurrency/cancel.md) - Cancel scopes, timeouts

**Reference Notebooks**:
- [Task Groups Patterns](../references/concurrency_task.ipynb) - Overview of task group capabilities
- [Primitives](../references/concurrency_primitives.ipynb) - Event, Queue, Lock usage
- [Cancellation](../references/concurrency_cancel.ipynb) - Timeout and cancellation patterns

**External Resources**:
- [AnyIO Documentation: Task Groups](https://anyio.readthedocs.io/en/stable/tasks.html) - Underlying implementation
- [Structured Concurrency (Nathaniel Smith)](https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/) - Conceptual foundation
- [Production Service Patterns (AWS)](https://aws.amazon.com/builders-library/implementing-health-checks/) - Health monitoring best practices