# Cryptocurrency Data Pipeline - Task Orchestration & Automation

This notebook implements the task orchestration layer for the cryptocurrency data pipeline, automating the flow from data ingestion through harmonization to analytics.

## Setup Environment

In [None]:
%%sql
USE ROLE CRYPTO_ROLE;
USE WAREHOUSE CRYPTO_WH;
USE SCHEMA CRYPTO_DB.HARMONIZED_CRYPTO;

## Create Tasks for Pipeline Automation

### 1. Data Ingestion Task - Runs every 4 hours to fetch new data

In [None]:
%%sql
CREATE OR REPLACE TASK CRYPTO_DB.HARMONIZED_CRYPTO.LOAD_CRYPTO_TASK
    WAREHOUSE = CRYPTO_WH
    SCHEDULE = 'USING CRON 0 */4 * * * UTC'  -- Run every 4 hours
AS
CALL CRYPTO_DB.HARMONIZED_CRYPTO.LOAD_CRYPTO_DATA_SP();

### 2. Create Task for Data Harmonization - Triggered when new data arrives

In [None]:
%%sql
CREATE OR REPLACE TASK CRYPTO_DB.HARMONIZED_CRYPTO.HARMONIZE_CRYPTO_TASK
    WAREHOUSE = CRYPTO_WH
    AFTER CRYPTO_DB.HARMONIZED_CRYPTO.LOAD_CRYPTO_TASK
    WHEN SYSTEM$STREAM_HAS_DATA('CRYPTO_DB.HARMONIZED_CRYPTO.RAW_CRYPTO_STREAM')
AS
CALL CRYPTO_DB.HARMONIZED_CRYPTO.HARMONIZE_CRYPTO_DATA_SP();

### 3. Create Task to Update Analytics Tables - Runs after harmonization completes

In [None]:
%%sql
CREATE OR REPLACE TASK CRYPTO_DB.ANALYTICS_CRYPTO.UPDATE_CRYPTO_METRICS_TASK
    WAREHOUSE = CRYPTO_WH
    AFTER CRYPTO_DB.HARMONIZED_CRYPTO.HARMONIZE_CRYPTO_TASK
    WHEN SYSTEM$STREAM_HAS_DATA('CRYPTO_DB.HARMONIZED_CRYPTO.CRYPTO_HARMONIZED_STREAM')
AS
CALL CRYPTO_DB.ANALYTICS_CRYPTO.UPDATE_CRYPTO_ANALYTICS();

## Set Up Change Tracking with Streams

Streams track changes in tables to trigger downstream processes only when new data exists.

In [None]:
%%sql
-- Create a stream on the harmonized data for change tracking
CREATE OR REPLACE STREAM CRYPTO_DB.HARMONIZED_CRYPTO.CRYPTO_HARMONIZED_STREAM
ON TABLE CRYPTO_DB.HARMONIZED_CRYPTO.CRYPTO_HARMONIZED;

In [None]:
%%sql
-- Create a stream on the raw data for change tracking
CREATE OR REPLACE STREAM CRYPTO_DB.HARMONIZED_CRYPTO.RAW_CRYPTO_STREAM
ON TABLE CRYPTO_DB.PUBLIC.BTC_RAW;  -- Assuming BTC_RAW is your raw data table

## Activate the Automation Pipeline

Resume all tasks to start the automation workflow. Tasks are resumed in reverse order of their dependency chain.

In [None]:
%%sql
ALTER TASK CRYPTO_DB.ANALYTICS_CRYPTO.UPDATE_CRYPTO_METRICS_TASK RESUME;
ALTER TASK CRYPTO_DB.HARMONIZED_CRYPTO.HARMONIZE_CRYPTO_TASK RESUME;
ALTER TASK CRYPTO_DB.HARMONIZED_CRYPTO.LOAD_CRYPTO_TASK RESUME;

## Task Monitoring and Observability

### Check Recent Task Execution History

In [None]:
task_history = session.sql("""
SELECT *
FROM TABLE(INFORMATION_SCHEMA.TASK_HISTORY(
    SCHEDULED_TIME_RANGE_START=>DATEADD('DAY',-1,CURRENT_TIMESTAMP()),
    RESULT_LIMIT => 100))
ORDER BY SCHEDULED_TIME DESC
""")

task_history.show()

### View Task Dependency Graph

In [None]:
%%sql
SELECT *
FROM TABLE(INFORMATION_SCHEMA.CURRENT_TASK_GRAPHS())
ORDER BY SCHEDULED_TIME;

## Create Pipeline Health Dashboard

This dashboard view provides insights into both task performance and data freshness

In [None]:
%%sql
CREATE OR REPLACE VIEW CRYPTO_DB.ANALYTICS_CRYPTO.PIPELINE_HEALTH_DASHBOARD AS
WITH task_stats AS (
    SELECT
        NAME as task_name,
        COUNT(*) as total_runs,
        SUM(CASE WHEN STATE = 'SUCCEEDED' THEN 1 ELSE 0 END) as successful_runs,
        SUM(CASE WHEN STATE = 'FAILED' THEN 1 ELSE 0 END) as failed_runs,
        MAX(CASE WHEN STATE = 'SUCCEEDED' THEN COMPLETED_TIME ELSE NULL END) as last_successful_run,
        MAX(CASE WHEN STATE = 'FAILED' THEN COMPLETED_TIME ELSE NULL END) as last_failed_run,
        AVG(CASE WHEN STATE = 'SUCCEEDED' THEN TIMESTAMPDIFF(MILLISECOND, QUERY_START_TIME, COMPLETED_TIME) ELSE NULL END) as avg_duration_ms
    FROM TABLE(INFORMATION_SCHEMA.TASK_HISTORY(
        SCHEDULED_TIME_RANGE_START=>DATEADD('DAY',-7,CURRENT_TIMESTAMP())))
    GROUP BY NAME
),
data_stats AS (
    SELECT
        'BTC' as crypto_symbol,
        COUNT(*) as record_count,
        MIN(timestamp) as earliest_record,
        MAX(timestamp) as latest_record,
        DATEDIFF('hour', MAX(timestamp), CURRENT_TIMESTAMP()) as hours_since_last_update
    FROM CRYPTO_DB.HARMONIZED_CRYPTO.CRYPTO_HARMONIZED
    WHERE crypto_symbol = 'BTC'
    UNION ALL
    SELECT
        'ETH' as crypto_symbol,
        COUNT(*) as record_count,
        MIN(timestamp) as earliest_record,
        MAX(timestamp) as latest_record,
        DATEDIFF('hour', MAX(timestamp), CURRENT_TIMESTAMP()) as hours_since_last_update
    FROM CRYPTO_DB.HARMONIZED_CRYPTO.CRYPTO_HARMONIZED
    WHERE crypto_symbol = 'ETH'
    UNION ALL
    SELECT
        'DOGE' as crypto_symbol,
        COUNT(*) as record_count,
        MIN(timestamp) as earliest_record,
        MAX(timestamp) as latest_record,
        DATEDIFF('hour', MAX(timestamp), CURRENT_TIMESTAMP()) as hours_since_last_update
    FROM CRYPTO_DB.HARMONIZED_CRYPTO.CRYPTO_HARMONIZED
    WHERE crypto_symbol = 'DOGE'
)
SELECT
    'Task Health' as metric_type,
    task_name as metric_name,
    total_runs,
    successful_runs,
    failed_runs,
    ROUND(successful_runs/NULLIF(total_runs,0)*100, 2) as success_rate,
    last_successful_run,
    last_failed_run,
    avg_duration_ms,
    NULL as record_count,
    NULL as earliest_record,
    NULL as latest_record,
    NULL as hours_since_last_update
FROM task_stats
UNION ALL
SELECT
    'Data Health' as metric_type,
    crypto_symbol as metric_name,
    NULL as total_runs,
    NULL as successful_runs,
    NULL as failed_runs,
    NULL as success_rate,
    NULL as last_successful_run,
    NULL as last_failed_run,
    NULL as avg_duration_ms,
    record_count,
    earliest_record,
    latest_record,
    hours_since_last_update
FROM data_stats
ORDER BY metric_type, metric_name;

### Check the Pipeline Health Dashboard

In [None]:
pipeline_health = session.sql("SELECT * FROM CRYPTO_DB.ANALYTICS_CRYPTO.PIPELINE_HEALTH_DASHBOARD")
pipeline_health.show()

## Set Up Alert Notifications

Create email alerts that will notify administrators when tasks fail

In [None]:
%%sql
CREATE OR REPLACE NOTIFICATION INTEGRATION crypto_email_integration
  TYPE = EMAIL
  ENABLED = TRUE;

CREATE OR REPLACE ALERT CRYPTO_DB.ANALYTICS_CRYPTO.TASK_FAILURE_ALERT
  WAREHOUSE = CRYPTO_WH
  SCHEDULE = 'USING CRON 0 */1 * * * UTC'  -- Check every hour
  IF (EXISTS (
    SELECT 1 
    FROM TABLE(INFORMATION_SCHEMA.TASK_HISTORY(
      SCHEDULED_TIME_RANGE_START=>DATEADD('HOUR',-1,CURRENT_TIMESTAMP())))
    WHERE STATE = 'FAILED'
  ))
  THEN CALL SYSTEM$SEND_EMAIL(
    'crypto_email_integration',
    'admin@example.com',
    'Crypto Pipeline Task Failure Alert',
    'A task in the Crypto data pipeline has failed in the last hour. Please check the task history.'
  );

-- Resume the alert to activate it
ALTER ALERT CRYPTO_DB.ANALYTICS_CRYPTO.TASK_FAILURE_ALERT RESUME;

## Pipeline Visualization

The cryptocurrency data pipeline has the following task dependencies:

```
LOAD_CRYPTO_TASK (every 4 hours)
       |
       V
HARMONIZE_CRYPTO_TASK (when RAW_CRYPTO_STREAM has data)
       |
       V
UPDATE_CRYPTO_METRICS_TASK (when CRYPTO_HARMONIZED_STREAM has data)
```

This creates a fully automated workflow that processes data in stages:
1. Ingest raw cryptocurrency data
2. Transform and harmonize the data
3. Calculate analytics and metrics

Each step only runs when there is actual new data to process, optimizing resource usage.