# Business Critical Edition - Business Continuity with Failover Groups

This Snowflake Notebook is a practical checklist for setting up **business continuity** across Snowflake accounts using **failover groups**.

- **Goal**: replicate databases (and optionally account objects) and be able to **promote** a target account to become primary (read-write) during an outage.
- **Requires**: Business Critical edition (or higher).

If you only need **basic redundancy** (read-only replicas, no promotion), use the Enterprise/Standard guide:
- `tools/replication-workbook/database_replication_guide.ipynb`

## What this guide covers
- Create a primary **failover group** in the source account
- Create a secondary failover group in the target account (`AS REPLICA OF ...`)
- Refresh + monitor progress/history/usage
- Promotion runbook (`ALTER FAILOVER GROUP ... PRIMARY`)


---
## Prerequisites (Business Critical)

### Account prerequisites
- Source and target accounts are in the same Snowflake organization
- ORGADMIN has enabled replication for both accounts
- You have a role that can create/manage failover groups (ACCOUNTADMIN by default)

### Minimum privileges (high level)
- `CREATE FAILOVER GROUP` on the account (ACCOUNTADMIN has this by default)
- `MONITOR` on each database you include in the group
- If replicating shares: OWNERSHIP on each share

### Edition feature callouts
- Failover groups and promotion require Business Critical (or higher)
- Replication of account objects beyond databases/shares (users, roles, warehouses, network policies, account parameters, etc.) requires Business Critical (or higher)

### Safety note
Promotion is disruptive. Treat failover like an operational procedure: validate current refresh state, suspend schedules if needed, and document an RPO/RTO runbook.


---
## Step 1 (Source account): Discover replication-enabled accounts

Run this in the source account to find the `<org_name>.<account_name>` identifiers you will use in `ALLOWED_ACCOUNTS`.


In [None]:
from snowflake.snowpark.context import get_active_session

session = get_active_session()

accounts_df = session.sql("SHOW REPLICATION ACCOUNTS").to_pandas()

# Defensive normalization: Snowflake Notebooks sometimes return column names with unexpected casing/whitespace.
accounts_df.columns = [str(c).strip().lower() for c in accounts_df.columns]

print("\n=== Replication-enabled accounts ===")

expected_cols = ["snowflake_region", "account_name", "organization_name"]
existing_cols = [c for c in expected_cols if c in accounts_df.columns]
missing_cols = [c for c in expected_cols if c not in accounts_df.columns]

if missing_cols:
    print("\nWarning: expected columns missing:", missing_cols)
    print("Available columns:", list(accounts_df.columns))

if existing_cols:
    print(accounts_df[existing_cols].to_string(index=False))
else:
    print(accounts_df.to_string(index=False))

print(f"\nTotal accounts available: {len(accounts_df)}")


---
## Step 2 (Source account): Configure and create the primary failover group

Decide:
- Which databases to replicate
- Which target accounts to replicate to
- RPO (replication schedule)
- Whether to replicate account objects (Business Critical only): users, roles, warehouses, etc.

Then create the primary failover group in the source account.


In [None]:
failover_group_name = "MY_FAILOVER_GROUP"

# Use the org/account-name form from SHOW REPLICATION ACCOUNTS (recommended).
source_account_identifier = "MYORG.SOURCE_ACCOUNT"
target_accounts = [
    "MYORG.TARGET_ACCOUNT_1",
    # "MYORG.TARGET_ACCOUNT_2",
]

databases_to_replicate = ["MYDB"]

# Business Critical can replicate more than databases/shares.
# Keep this minimal unless you explicitly need account-object continuity.
object_types = [
    "DATABASES",
    # "USERS",
    # "ROLES",
    # "WAREHOUSES",
    # "RESOURCE MONITORS",
    # "NETWORK POLICIES",
]

replication_schedule = "10 MINUTE"  # or "USING CRON <expr> <time_zone>"

primary_group_identifier = f"{source_account_identifier}.{failover_group_name}"

print("Configuration:")
print(f"  Failover group name (source): {failover_group_name}")
print(f"  Primary group identifier: {primary_group_identifier}")
print(f"  Target accounts: {', '.join(target_accounts)}")
print(f"  Databases: {', '.join(databases_to_replicate)}")
print(f"  Object types: {', '.join(object_types)}")
print(f"  Replication schedule: {replication_schedule}")


In [None]:
object_types_sql = ", ".join(object_types)
target_accounts_sql = ", ".join(target_accounts)
databases_sql = ", ".join(databases_to_replicate)

create_sql = f"""
CREATE FAILOVER GROUP IF NOT EXISTS {failover_group_name}
  OBJECT_TYPES = {object_types_sql}
  ALLOWED_DATABASES = {databases_sql}
  ALLOWED_ACCOUNTS = {target_accounts_sql}
  REPLICATION_SCHEDULE = '{replication_schedule}';
""".strip()

print("Executing in SOURCE account:\n")
print(create_sql)

session.sql(create_sql).collect()

show_df = session.sql(f"SHOW FAILOVER GROUPS LIKE '{failover_group_name}'").to_pandas()
show_df.columns = [c.lower() for c in show_df.columns]
print("\nSHOW FAILOVER GROUPS (filtered):")
print(show_df.to_string(index=False))


---
## Step 3 (Target account): Create the secondary failover group

Sign in to each TARGET account and create a secondary failover group as a replica of the primary failover group.

This step should be executed in the target account that will receive the read-only replicas.


In [None]:
secondary_group_name = failover_group_name

create_secondary_sql = f"""
CREATE FAILOVER GROUP IF NOT EXISTS {secondary_group_name}
  AS REPLICA OF {primary_group_identifier};
""".strip()

print("Executing in TARGET account:\n")
print(create_secondary_sql)

session.sql(create_secondary_sql).collect()

show_df = session.sql(f"SHOW FAILOVER GROUPS LIKE '{secondary_group_name}'").to_pandas()
show_df.columns = [c.lower() for c in show_df.columns]
print("\nSHOW FAILOVER GROUPS (filtered):")
print(show_df.to_string(index=False))


---
## Step 4 (Target account): Refresh and monitor

- A refresh is typically run automatically when the secondary failover group is created.
- You can manually refresh using `ALTER FAILOVER GROUP <name> REFRESH`.

Monitor:
- Refresh progress: `INFORMATION_SCHEMA.REPLICATION_GROUP_REFRESH_PROGRESS('<secondary_group_name>')`
- Refresh history: `INFORMATION_SCHEMA.REPLICATION_GROUP_REFRESH_HISTORY('<secondary_group_name>')`
- Usage: `INFORMATION_SCHEMA.REPLICATION_GROUP_USAGE_HISTORY(...)` (last 14 days)


In [None]:
manual_refresh = False

if manual_refresh:
    refresh_sql = f"ALTER FAILOVER GROUP {secondary_group_name} REFRESH;"
    print("Executing manual refresh:\n")
    print(refresh_sql)
    session.sql(refresh_sql).collect()

progress_sql = f"""
SELECT
  phase_name,
  start_time,
  end_time,
  progress,
  details
FROM TABLE(INFORMATION_SCHEMA.REPLICATION_GROUP_REFRESH_PROGRESS('{secondary_group_name}'))
ORDER BY start_time DESC;
""".strip()

history_sql = f"""
SELECT
  phase_name,
  start_time,
  end_time,
  total_bytes,
  object_count
FROM TABLE(INFORMATION_SCHEMA.REPLICATION_GROUP_REFRESH_HISTORY('{secondary_group_name}'))
ORDER BY start_time DESC
LIMIT 20;
""".strip()

usage_sql = f"""
SELECT
  start_time,
  end_time,
  replication_group_name,
  credits_used,
  bytes_transferred
FROM TABLE(information_schema.replication_group_usage_history(
  date_range_start => dateadd('day', -14, current_timestamp()),
  replication_group_name => '{secondary_group_name}'
))
ORDER BY start_time DESC;
""".strip()

print("=== Refresh progress ===")
print(progress_sql)
progress_df = session.sql(progress_sql).to_pandas()
progress_df.columns = [c.lower() for c in progress_df.columns]
print(progress_df.to_string(index=False) if not progress_df.empty else "No progress rows")

print("\n=== Refresh history ===")
print(history_sql)
history_df = session.sql(history_sql).to_pandas()
history_df.columns = [c.lower() for c in history_df.columns]
print(history_df.to_string(index=False) if not history_df.empty else "No history rows")

print("\n=== Usage (last 14 days) ===")
print(usage_sql)
try:
    usage_df = session.sql(usage_sql).to_pandas()
    usage_df.columns = [c.lower() for c in usage_df.columns]
    if usage_df.empty:
        print("No usage rows")
    else:
        usage_df["credits_used"] = usage_df["credits_used"].astype(float)
        usage_df["gb_transferred"] = usage_df["bytes_transferred"] / 1024 / 1024 / 1024
        print(usage_df.to_string(index=False))
        print(f"\nTotal credits (window): {usage_df['credits_used'].sum():.4f}")
        print(f"Total data (window): {usage_df['gb_transferred'].sum():.4f} GB")
except Exception as e:
    print("Unable to query usage history:")
    print(e)


---
## Step 5 (Target account): Promotion runbook (fail over)

Promotion makes the secondary failover group primary (read-write).

Recommended workflow:
- Confirm no refresh is in progress (use `REPLICATION_GROUP_REFRESH_PROGRESS`)
- Suspend scheduled replication if needed
- Promote:
  - `ALTER FAILOVER GROUP <name> PRIMARY;`

Notes:
- Snowflake prevents promotion if a refresh is currently executing.
- After promotion, validate application connectivity and data pipelines.

Failback strategy depends on your operational model. Document the steps before performing a real failover.


# TARGET account only: promote the secondary failover group to primary.
# This will make the replicated objects read-write in the target account.

promote_now = False

if promote_now:
    promote_sql = f"ALTER FAILOVER GROUP {secondary_group_name} PRIMARY;"
    print("Executing PROMOTION (fail over):\n")
    print(promote_sql)

    session.sql(promote_sql).collect()

    print("\nPromotion submitted.")

    # Quick verification
    show_df = session.sql(f"SHOW FAILOVER GROUPS LIKE '{secondary_group_name}'").to_pandas()
    show_df.columns = [c.lower() for c in show_df.columns]
    cols = [c for c in ["name", "type", "is_primary", "primary", "secondary_state", "next_scheduled_refresh"] if c in show_df.columns]
    print("\nSHOW FAILOVER GROUPS (filtered):")
    print(show_df[cols].to_string(index=False))
else:
    print("Set promote_now = True to execute promotion (ALTER FAILOVER GROUP ... PRIMARY).")


---
## Cleanup (failover groups)

Notes:
- Drop the secondary failover group(s) first.
- A primary failover group cannot be dropped while linked secondary groups exist.
- Dropping a secondary failover group can remove read-only protection on the replicated databases in that target account.

Use these commands carefully.


In [None]:
cleanup_sql = f"""
-- TARGET account: drop the secondary failover group first
-- DROP FAILOVER GROUP IF EXISTS {secondary_group_name};

-- SOURCE account: after all secondary failover groups are dropped
-- DROP FAILOVER GROUP IF EXISTS {failover_group_name};
""".strip()

print(cleanup_sql)


---
## Quick decision guide

- **Need read-only replicas only?** Use `database_replication_guide.ipynb` (replication groups; Standard/Enterprise supported).
- **Need promotion / business continuity?** Use this notebook (failover groups; Business Critical required).

## References
- `https://docs.snowflake.com/en/user-guide/account-replication-intro`
- `https://docs.snowflake.com/en/user-guide/account-replication-config`
- `https://docs.snowflake.com/en/sql-reference/sql/create-failover-group`
- `https://docs.snowflake.com/en/sql-reference/sql/alter-failover-group`
