# Federated Lakehouse Query: Neo4j + Databricks

Demonstrates querying both Databricks Delta lakehouse tables and Neo4j graph data
in unified federated queries — combining time-series sensor analytics with graph-based
maintenance events, flight operations, and component topology.

This notebook implements the same dual-source pattern used in AgentBricks (Lab 6),
but with direct SQL federation instead of AI agent routing.

## Architecture

```
┌─────────────────────────────────────────────────────────────────────┐
│                        Spark SQL Engine                            │
│                     (Federated Query Layer)                        │
├────────────────────────────┬────────────────────────────────────────┤
│    Delta Lakehouse         │         Neo4j Knowledge Graph         │
│    (Unity Catalog)         │         (UC JDBC / Spark Connector)   │
│                           │                                        │
│  ┌──────────────────┐     │     ┌───────────────────────────┐      │
│  │ sensor_readings  │     │     │ MaintenanceEvent nodes    │      │
│  │ (345,600 rows)   │     │     │ Flight nodes              │      │
│  │ sensors (160)    │     │     │ Component topology        │      │
│  │ systems (80)     │     │     │ Airport relationships     │      │
│  │ aircraft (20)    │     │     │ Delay events              │      │
│  └──────────────────┘     │     └───────────────────────────┘      │
│                           │                                        │
│  Best for:                │     Best for:                          │
│  - Time-series analytics  │     - Relationship traversals          │
│  - Statistical aggregates │     - Maintenance correlation          │
│  - Sensor trend analysis  │     - Flight/route topology            │
└────────────────────────────┴────────────────────────────────────────┘
```

**Federation Methods Used:**
1. `remote_query()` — UC JDBC table-valued function for Neo4j aggregate queries (no cluster library needed)
2. Neo4j Spark Connector — Row-level Neo4j data loaded into temp views for rich JOINs with Delta tables

## Prerequisites

1. **Lakehouse tables** already exist in your catalog (created by lab setup):
   `aircraft`, `systems`, `sensors`, `sensor_readings`

2. **Neo4j UC JDBC connection** configured per [neo4j_uc_jdbc_guide.md](../docs/neo4j_uc_jdbc_guide.md)

3. **Cluster configuration:**
   - SafeSpark memory settings applied (see guide)
   - Neo4j Spark Connector installed as cluster library (`org.neo4j:neo4j-connector-apache-spark`)
   - `neo4j-uc-creds` secret scope configured via `setup.sh`

4. **Databricks preview features** enabled:
   - Custom JDBC on UC Compute
   - `remote_query` table-valued function

---

## Configuration

In [None]:
# =============================================================================
# CONFIGURATION
# =============================================================================

SCOPE_NAME = "neo4j-uc-creds"

# Lakehouse configuration — update to match your environment
LAKEHOUSE_CATALOG = "aws-databricks-neo4j-lab"   # Your Unity Catalog name
LAKEHOUSE_SCHEMA = "lakehouse"                    # Schema containing Delta tables

# Neo4j credentials from Databricks Secrets
NEO4J_HOST = dbutils.secrets.get(SCOPE_NAME, "host")
NEO4J_USER = dbutils.secrets.get(SCOPE_NAME, "user")
NEO4J_PASSWORD = dbutils.secrets.get(SCOPE_NAME, "password")
try:
    NEO4J_DATABASE = dbutils.secrets.get(SCOPE_NAME, "database")
except Exception:
    NEO4J_DATABASE = "neo4j"

UC_CONNECTION_NAME = dbutils.secrets.get(SCOPE_NAME, "connection_name")
NEO4J_BOLT_URI = f"neo4j+s://{NEO4J_HOST}"

# Set catalog and schema context for Delta table queries
spark.sql(f"USE CATALOG `{LAKEHOUSE_CATALOG}`")
spark.sql(f"USE SCHEMA `{LAKEHOUSE_SCHEMA}`")

print(f"Lakehouse: {LAKEHOUSE_CATALOG}.{LAKEHOUSE_SCHEMA}")
print(f"Neo4j Host: {NEO4J_HOST}")
print(f"Neo4j Bolt URI: {NEO4J_BOLT_URI}")
print(f"UC Connection: {UC_CONNECTION_NAME}")

---

## Section 1: Verify Data Sources

Confirm both the lakehouse tables and Neo4j UC connection are accessible before running federated queries.

In [None]:
# Verify Delta lakehouse tables
print("=" * 60)
print("DELTA LAKEHOUSE TABLES")
print("=" * 60)

for table in ["aircraft", "systems", "sensors", "sensor_readings"]:
    count = spark.sql(f"SELECT COUNT(*) AS cnt FROM {table}").collect()[0]["cnt"]
    print(f"  {table}: {count:,} rows")

print("\nSample aircraft data:")
spark.sql("""
    SELECT `:ID(Aircraft)` AS aircraft_id, tail_number, model, manufacturer, operator
    FROM aircraft LIMIT 5
""").show(truncate=False)

In [None]:
# Verify Neo4j UC JDBC connection with aggregate queries
print("=" * 60)
print("NEO4J KNOWLEDGE GRAPH (via UC JDBC)")
print("=" * 60)

neo4j_counts = {
    "Aircraft": "SELECT COUNT(*) AS cnt FROM Aircraft",
    "MaintenanceEvent": "SELECT COUNT(*) AS cnt FROM MaintenanceEvent",
    "Flight": "SELECT COUNT(*) AS cnt FROM Flight",
}

for label, query in neo4j_counts.items():
    result = spark.sql(f"""
        SELECT * FROM remote_query('{UC_CONNECTION_NAME}', query => '{query}')
    """).collect()
    print(f"  {label}: {result[0]['cnt']:,} nodes")

# Test graph traversal
traversal = spark.sql(f"""
    SELECT * FROM remote_query('{UC_CONNECTION_NAME}',
        query => 'SELECT COUNT(*) AS cnt FROM Flight f NATURAL JOIN DEPARTS_FROM r NATURAL JOIN Airport a')
""").collect()
print(f"  Flight→Airport relationships: {traversal[0]['cnt']:,}")

print("\nBoth data sources verified.")

---

## Section 2: Federated Query — Fleet Summary

Combines Neo4j graph metrics with Delta sensor analytics using `remote_query()` for a
fleet-wide overview. This approach uses **pure SQL** and requires no cluster libraries
beyond the UC JDBC connection.

**Pattern:** `remote_query()` returns a table that can be CROSS JOINed with Delta
table aggregates in a single SQL statement.

In [None]:
# Federated Fleet Summary: remote_query() for Neo4j + Delta sensor analytics
# No Spark Connector needed — pure SQL federation via UC JDBC

result = spark.sql(f"""
    SELECT
        neo4j.total_maintenance_events,
        neo4j.critical_events,
        neo4j.total_flights,
        neo4j.flight_airport_connections,
        ROUND(sensor.avg_egt, 1) AS avg_egt_celsius,
        ROUND(sensor.avg_vibration, 4) AS avg_vibration_ips,
        ROUND(sensor.avg_fuel_flow, 2) AS avg_fuel_flow_kgs,
        ROUND(sensor.avg_n1_speed, 0) AS avg_n1_speed_rpm,
        sensor.total_readings
    FROM (
        SELECT
            maint.cnt AS total_maintenance_events,
            crit.cnt AS critical_events,
            flights.cnt AS total_flights,
            deps.cnt AS flight_airport_connections
        FROM
            remote_query('{UC_CONNECTION_NAME}',
                query => 'SELECT COUNT(*) AS cnt FROM MaintenanceEvent') AS maint
        CROSS JOIN
            remote_query('{UC_CONNECTION_NAME}',
                query => 'SELECT COUNT(*) AS cnt FROM MaintenanceEvent WHERE severity = ''CRITICAL''') AS crit
        CROSS JOIN
            remote_query('{UC_CONNECTION_NAME}',
                query => 'SELECT COUNT(*) AS cnt FROM Flight') AS flights
        CROSS JOIN
            remote_query('{UC_CONNECTION_NAME}',
                query => 'SELECT COUNT(*) AS cnt FROM Flight f NATURAL JOIN DEPARTS_FROM r NATURAL JOIN Airport a') AS deps
    ) neo4j
    CROSS JOIN (
        SELECT
            AVG(CASE WHEN sen.type = 'EGT' THEN r.value END) AS avg_egt,
            AVG(CASE WHEN sen.type = 'Vibration' THEN r.value END) AS avg_vibration,
            AVG(CASE WHEN sen.type = 'FuelFlow' THEN r.value END) AS avg_fuel_flow,
            AVG(CASE WHEN sen.type = 'N1Speed' THEN r.value END) AS avg_n1_speed,
            COUNT(*) AS total_readings
        FROM sensor_readings r
        JOIN sensors sen ON r.sensor_id = sen.`:ID(Sensor)`
    ) sensor
""")

print("Fleet Summary — Neo4j Graph Metrics + Delta Sensor Analytics")
print("Neo4j: remote_query() via UC JDBC | Delta: sensor_readings + sensors")
print("=" * 80)
result.show(truncate=False)

---

## Section 3: Federated Query — Sensor Health + Maintenance Correlation

Uses the **Neo4j Spark Connector** to load maintenance events as a temp view, then
JOINs with Delta lakehouse sensor data to correlate sensor health with maintenance
activity per aircraft.

**Key insight:** Aircraft with higher sensor readings (EGT, vibration) may correlate
with more frequent maintenance events — this query reveals that relationship across
both data sources.

**Why Spark Connector?** UC JDBC aggregates don't support GROUP BY (Spark wraps
queries in subqueries for schema inference). The Spark Connector gives us row-level
graph data that we can freely aggregate and JOIN in Spark SQL.

In [None]:
# Load maintenance events from Neo4j via Spark Connector
neo4j_maintenance = spark.read.format("org.neo4j.spark.DataSource") \
    .option("url", NEO4J_BOLT_URI) \
    .option("authentication.type", "basic") \
    .option("authentication.basic.username", NEO4J_USER) \
    .option("authentication.basic.password", NEO4J_PASSWORD) \
    .option("labels", "MaintenanceEvent") \
    .load()

neo4j_maintenance.createOrReplaceTempView("neo4j_maintenance")

print(f"Loaded {neo4j_maintenance.count()} maintenance events from Neo4j")
print("\nSample maintenance events:")
neo4j_maintenance.select(
    "aircraft_id", "fault", "severity", "corrective_action"
).show(5, truncate=False)

In [None]:
# Federated Query: Sensor Health + Maintenance Correlation
# Delta: sensor_readings, sensors, systems, aircraft
# Neo4j: MaintenanceEvent nodes (via Spark Connector temp view)

result = spark.sql("""
    WITH aircraft_ref AS (
        SELECT `:ID(Aircraft)` AS aircraft_id, tail_number, model, manufacturer, operator
        FROM aircraft
    ),
    sensor_health AS (
        SELECT
            sys.aircraft_id,
            ROUND(AVG(CASE WHEN sen.type = 'EGT' THEN r.value END), 1) AS avg_egt,
            ROUND(MAX(CASE WHEN sen.type = 'EGT' THEN r.value END), 1) AS max_egt,
            ROUND(AVG(CASE WHEN sen.type = 'Vibration' THEN r.value END), 4) AS avg_vibration,
            ROUND(MAX(CASE WHEN sen.type = 'Vibration' THEN r.value END), 4) AS max_vibration
        FROM sensor_readings r
        JOIN sensors sen ON r.sensor_id = sen.`:ID(Sensor)`
        JOIN systems sys ON sen.system_id = sys.`:ID(System)`
        GROUP BY sys.aircraft_id
    ),
    maintenance_summary AS (
        SELECT
            aircraft_id,
            COUNT(*) AS total_events,
            SUM(CASE WHEN severity = 'CRITICAL' THEN 1 ELSE 0 END) AS critical,
            SUM(CASE WHEN severity = 'MAJOR' THEN 1 ELSE 0 END) AS major,
            SUM(CASE WHEN severity = 'MINOR' THEN 1 ELSE 0 END) AS minor
        FROM neo4j_maintenance
        GROUP BY aircraft_id
    )
    SELECT
        a.tail_number,
        a.model,
        a.operator,
        COALESCE(m.total_events, 0) AS maint_events,
        COALESCE(m.critical, 0) AS critical,
        COALESCE(m.major, 0) AS major,
        COALESCE(m.minor, 0) AS minor,
        s.avg_egt AS avg_egt_c,
        s.max_egt AS max_egt_c,
        s.avg_vibration AS avg_vib_ips,
        s.max_vibration AS max_vib_ips
    FROM aircraft_ref a
    LEFT JOIN maintenance_summary m ON a.aircraft_id = m.aircraft_id
    LEFT JOIN sensor_health s ON a.aircraft_id = s.aircraft_id
    ORDER BY m.total_events DESC NULLS LAST
""")

print("Sensor Health + Maintenance Correlation")
print("Delta: sensor_readings, sensors, systems, aircraft")
print("Neo4j: MaintenanceEvent nodes (Spark Connector)")
print("=" * 100)
result.show(20, truncate=False)

---

## Section 4: Federated Query — Flight Operations + Engine Performance

Loads flight data from Neo4j and correlates with engine sensor performance from the
Delta lakehouse. Shows how aircraft utilization (flight frequency, route coverage)
relates to engine health metrics (EGT, fuel flow, N1 speed).

In [None]:
# Load flight data from Neo4j via Spark Connector
neo4j_flights = spark.read.format("org.neo4j.spark.DataSource") \
    .option("url", NEO4J_BOLT_URI) \
    .option("authentication.type", "basic") \
    .option("authentication.basic.username", NEO4J_USER) \
    .option("authentication.basic.password", NEO4J_PASSWORD) \
    .option("labels", "Flight") \
    .load()

neo4j_flights.createOrReplaceTempView("neo4j_flights")

print(f"Loaded {neo4j_flights.count()} flights from Neo4j")
print("\nSample flight data:")
neo4j_flights.select(
    "aircraft_id", "flight_number", "operator", "origin", "destination"
).show(5, truncate=False)

In [None]:
# Federated Query: Flight Operations + Engine Performance
# Delta: sensor_readings (engine sensors only), sensors, systems, aircraft
# Neo4j: Flight nodes (via Spark Connector temp view)

result = spark.sql("""
    WITH aircraft_ref AS (
        SELECT `:ID(Aircraft)` AS aircraft_id, tail_number, model, operator
        FROM aircraft
    ),
    flight_activity AS (
        SELECT
            aircraft_id,
            COUNT(*) AS total_flights,
            COUNT(DISTINCT origin) AS unique_origins,
            COUNT(DISTINCT destination) AS unique_destinations
        FROM neo4j_flights
        GROUP BY aircraft_id
    ),
    engine_health AS (
        SELECT
            sys.aircraft_id,
            ROUND(AVG(CASE WHEN sen.type = 'EGT' THEN r.value END), 1) AS avg_egt,
            ROUND(AVG(CASE WHEN sen.type = 'FuelFlow' THEN r.value END), 2) AS avg_fuel_flow,
            ROUND(AVG(CASE WHEN sen.type = 'N1Speed' THEN r.value END), 0) AS avg_n1_speed
        FROM sensor_readings r
        JOIN sensors sen ON r.sensor_id = sen.`:ID(Sensor)`
        JOIN systems sys ON sen.system_id = sys.`:ID(System)`
        WHERE sys.type = 'Engine'
        GROUP BY sys.aircraft_id
    )
    SELECT
        a.tail_number,
        a.model,
        a.operator,
        f.total_flights,
        f.unique_origins AS origins,
        f.unique_destinations AS destinations,
        e.avg_egt AS avg_egt_c,
        e.avg_fuel_flow AS fuel_kgs,
        e.avg_n1_speed AS n1_rpm
    FROM aircraft_ref a
    JOIN flight_activity f ON a.aircraft_id = f.aircraft_id
    JOIN engine_health e ON a.aircraft_id = e.aircraft_id
    ORDER BY f.total_flights DESC
""")

print("Flight Operations + Engine Performance")
print("Delta: sensor_readings (Engine sensors), sensors, systems, aircraft")
print("Neo4j: Flight nodes (Spark Connector)")
print("=" * 90)
result.show(20, truncate=False)

---

## Section 5: Federated Query — Fleet Health Dashboard

The most comprehensive federated query, combining **all data sources** into a single
fleet health view:

- **Delta lakehouse**: Sensor readings aggregated per aircraft (EGT, vibration, fuel flow)
- **Neo4j (Spark Connector)**: Maintenance events and flight counts per aircraft
- **Neo4j (remote_query)**: Graph relationship traversal counting flight→airport connections

This demonstrates using **both federation methods** in a single analysis.

In [None]:
# Comprehensive Fleet Health Dashboard
# Combines all data sources: Delta tables + Neo4j Spark Connector + remote_query()

# Graph traversal metric via remote_query (UC JDBC)
departure_count = spark.sql(f"""
    SELECT * FROM remote_query('{UC_CONNECTION_NAME}',
        query => 'SELECT COUNT(*) AS cnt FROM Flight f NATURAL JOIN DEPARTS_FROM r NATURAL JOIN Airport a')
""").collect()[0]["cnt"]
print(f"Graph traversal (Flight)-[:DEPARTS_FROM]->(Airport): {departure_count:,} connections")

# Full fleet health dashboard
result = spark.sql("""
    WITH aircraft_ref AS (
        SELECT `:ID(Aircraft)` AS aircraft_id, tail_number, model, manufacturer, operator
        FROM aircraft
    ),
    sensor_stats AS (
        SELECT
            sys.aircraft_id,
            ROUND(AVG(CASE WHEN sen.type = 'EGT' THEN r.value END), 1) AS avg_egt,
            ROUND(AVG(CASE WHEN sen.type = 'Vibration' THEN r.value END), 4) AS avg_vib,
            ROUND(AVG(CASE WHEN sen.type = 'FuelFlow' THEN r.value END), 2) AS avg_fuel,
            COUNT(*) AS reading_count
        FROM sensor_readings r
        JOIN sensors sen ON r.sensor_id = sen.`:ID(Sensor)`
        JOIN systems sys ON sen.system_id = sys.`:ID(System)`
        GROUP BY sys.aircraft_id
    ),
    maint AS (
        SELECT aircraft_id, COUNT(*) AS events,
               SUM(CASE WHEN severity = 'CRITICAL' THEN 1 ELSE 0 END) AS critical
        FROM neo4j_maintenance
        GROUP BY aircraft_id
    ),
    flights AS (
        SELECT aircraft_id, COUNT(*) AS flight_count
        FROM neo4j_flights
        GROUP BY aircraft_id
    )
    SELECT
        a.tail_number,
        a.model,
        a.operator,
        COALESCE(f.flight_count, 0) AS flights,
        COALESCE(m.events, 0) AS maint_events,
        COALESCE(m.critical, 0) AS critical,
        s.avg_egt AS egt_c,
        s.avg_vib AS vib_ips,
        s.avg_fuel AS fuel_kgs,
        s.reading_count AS readings
    FROM aircraft_ref a
    LEFT JOIN flights f ON a.aircraft_id = f.aircraft_id
    LEFT JOIN maint m ON a.aircraft_id = m.aircraft_id
    LEFT JOIN sensor_stats s ON a.aircraft_id = s.aircraft_id
    ORDER BY COALESCE(m.critical, 0) DESC, COALESCE(m.events, 0) DESC
""")

print("\nFleet Health Dashboard")
print("Delta: sensor_readings, sensors, systems, aircraft")
print("Neo4j: MaintenanceEvent + Flight (Spark Connector), graph traversal (remote_query)")
print("=" * 100)
result.show(20, truncate=False)

---

## Summary

This notebook demonstrated three federated query patterns combining Neo4j graph data
with Databricks Delta lakehouse tables:

| Section | Pattern | Neo4j Method | What It Shows |
|---------|---------|-------------|---------------|
| Fleet Summary | Aggregate federation | `remote_query()` (UC JDBC) | Fleet-wide metrics from both sources in pure SQL |
| Sensor + Maintenance | Row-level federation | Spark Connector → temp view | Per-aircraft correlation across data sources |
| Flight Ops + Engine | Row-level federation | Spark Connector → temp view | Utilization vs engine health |
| Fleet Health Dashboard | Hybrid | Both methods combined | Comprehensive multi-source analytics |

### Integration Methods Comparison

| Method | Pros | Cons |
|--------|------|------|
| `remote_query()` | Pure SQL, no cluster library, UC governed | Aggregate-only (no GROUP BY, ORDER BY) |
| Spark Connector | Full Cypher support, row-level data | Requires cluster library, no UC governance |

### Key Takeaways

- **Delta Lakehouse** is the source of truth for time-series sensor analytics (345K+ readings)
- **Neo4j** is the source of truth for graph relationships (maintenance → components → systems → aircraft)
- **Federated queries** combine both via Spark SQL temp views and `remote_query()`
- The same dual-source pattern from AgentBricks (Lab 6) works directly in SQL without AI agents

### References

- [neo4j_uc_jdbc_guide.md](../docs/neo4j_uc_jdbc_guide.md) — Full UC JDBC integration guide
- [Neo4j JDBC SQL2Cypher](https://neo4j.com/docs/jdbc-manual/current/sql2cypher/) — SQL translation rules
- [Databricks remote_query()](https://docs.databricks.com/sql/language-manual/functions/remote_query) — Table-valued function reference
- [Databricks Lakehouse Federation](https://docs.databricks.com/query-federation/) — Federation overview
- [Neo4j Spark Connector](https://neo4j.com/docs/spark/current/) — Spark Connector docs

---

## Section 6: UC Audit Trail — What Did Unity Catalog Capture?

Unity Catalog logs every federated query that flows through a UC JDBC connection. This
section queries `system.access.audit` to show the audit trail of the `remote_query()`
calls executed in this notebook.

**What's captured:**
- Who ran the query (user identity)
- When it ran (timestamp)
- The full SQL text (including the `remote_query()` call)
- Connection lifecycle events (create, get, update, delete)

**Note:** Audit events may take a few minutes to appear in `system.access.audit` after
query execution. If results are empty, wait 2-3 minutes and re-run this cell.

See [UC_SECURITY.md](../UC_SECURITY.md) for the full governance story including
authorization, lineage, and credential isolation.

In [None]:
# =============================================================================
# UC AUDIT TRAIL: Federated Query Activity
# =============================================================================
# Queries system.access.audit to show what UC captured about the remote_query()
# calls executed in this notebook.

try:
    # -------------------------------------------------------------------------
    # 1. Federated queries — remote_query() calls from the last 30 minutes
    # -------------------------------------------------------------------------
    print("=" * 80)
    print("FEDERATED QUERY AUDIT TRAIL (last 30 minutes)")
    print("=" * 80)
    print("Source: system.access.audit | Filter: commandSubmit + remote_query\n")

    federated_queries = spark.sql("""
        SELECT
            event_time,
            user_identity.email AS user,
            SUBSTRING(request_params['commandText'], 1, 120) AS query_text,
            response.status_code AS status
        FROM system.access.audit
        WHERE action_name = 'commandSubmit'
          AND request_params['commandText'] LIKE '%remote_query%'
          AND event_time >= CURRENT_TIMESTAMP - INTERVAL 30 MINUTES
        ORDER BY event_time DESC
    """)

    federated_count = federated_queries.count()
    print(f"Found {federated_count} federated query events\n")
    if federated_count > 0:
        federated_queries.show(20, truncate=False)
    else:
        print("No events found yet. Audit events can take 2-3 minutes to appear.")
        print("Re-run this cell after a short wait.\n")

    # -------------------------------------------------------------------------
    # 2. Connection lifecycle events (create, get, update, delete)
    # -------------------------------------------------------------------------
    print("=" * 80)
    print("CONNECTION LIFECYCLE EVENTS (last 30 minutes)")
    print("=" * 80)
    print("Source: system.access.audit | Filter: unityCatalog connection actions\n")

    connection_events = spark.sql("""
        SELECT
            event_time,
            user_identity.email AS user,
            action_name,
            request_params['name'] AS connection_name,
            response.status_code AS status
        FROM system.access.audit
        WHERE service_name = 'unityCatalog'
          AND action_name IN (
              'createConnection', 'updateConnection',
              'deleteConnection', 'getConnection'
          )
          AND event_time >= CURRENT_TIMESTAMP - INTERVAL 30 MINUTES
        ORDER BY event_time DESC
    """)

    conn_count = connection_events.count()
    print(f"Found {conn_count} connection events\n")
    if conn_count > 0:
        connection_events.show(20, truncate=False)

    # -------------------------------------------------------------------------
    # 3. Connection permissions — who has access?
    # -------------------------------------------------------------------------
    print("=" * 80)
    print(f"CONNECTION PERMISSIONS: {UC_CONNECTION_NAME}")
    print("=" * 80)

    grants = spark.sql(f"SHOW GRANTS ON CONNECTION {UC_CONNECTION_NAME}")
    grants.show(truncate=False)

except Exception as e:
    error_msg = str(e)
    if "TABLE_OR_VIEW_NOT_FOUND" in error_msg or "system.access.audit" in error_msg:
        print("[INFO] system.access.audit is not available in this workspace.")
        print("System tables must be enabled by a workspace admin.")
        print("See: https://docs.databricks.com/admin/system-tables/")
    else:
        print(f"[ERROR] {error_msg[:200]}")
    print("\nSkipping audit trail queries. The federated queries above still executed successfully.")