# Databricks EDA — Squeeze Analytics (Sydney time)

This notebook assumes your SQLite tables have been loaded into Delta tables (Unity Catalog) as in `copy_into_manual_upload_volume.ipynb`.

It focuses on:
- quick table sanity checks
- timestamp normalization (epoch ms → timestamp)
- converting to **Australia/Sydney** (AEST/AEDT DST-aware)
- basic distributions across alerts and OHLC


In [None]:
%sql
-- ---- CONFIG ----
USE CATALOG `workspace`;
USE SCHEMA `squeeze`;
SHOW TABLES;


## Row counts

In [None]:
%sql
SELECT 'ohlc' AS table, COUNT(*) AS n FROM ohlc
UNION ALL SELECT 'alerts', COUNT(*) FROM alerts
UNION ALL SELECT 'trade_plans', COUNT(*) FROM trade_plans
UNION ALL SELECT 'backtest_trades', COUNT(*) FROM backtest_trades
UNION ALL SELECT 'backtest_results', COUNT(*) FROM backtest_results
UNION ALL SELECT 'snapshot_cache', COUNT(*) FROM snapshot_cache
UNION ALL SELECT 'market_cap_cache', COUNT(*) FROM market_cap_cache
ORDER BY n DESC;


## Timestamp normalization patterns (epoch ms → UTC timestamp → Sydney time)

In Spark/Databricks:
- `to_timestamp(ts/1000)` converts epoch seconds to a timestamp
- `from_utc_timestamp(..., 'Australia/Sydney')` converts a UTC timestamp to Sydney local time

These functions are DST-aware.

In [None]:
%sql
-- Alerts: ts and created_ts in Sydney time
SELECT
  exchange, symbol, signal, source_tf,
  ts,
  to_timestamp(ts/1000) AS ts_utc,
  from_utc_timestamp(to_timestamp(ts/1000), 'Australia/Sydney') AS ts_sydney,
  created_ts,
  from_utc_timestamp(to_timestamp(created_ts/1000), 'Australia/Sydney') AS created_ts_sydney
FROM alerts
ORDER BY ts DESC
LIMIT 50;


## Alert distributions

In [None]:
%sql
SELECT signal, COUNT(*) AS n
FROM alerts
GROUP BY signal
ORDER BY n DESC
LIMIT 50;


In [None]:
%sql
SELECT source_tf, COUNT(*) AS n
FROM alerts
GROUP BY source_tf
ORDER BY n DESC
LIMIT 50;


## Day-of-week and hour-of-day in Sydney time

In [None]:
%sql
WITH x AS (
  SELECT
    from_utc_timestamp(to_timestamp(ts/1000), 'Australia/Sydney') AS ts_syd
  FROM alerts
)
SELECT
  date_format(ts_syd, 'E') AS dow_syd,
  hour(ts_syd) AS hour_syd,
  COUNT(*) AS n
FROM x
GROUP BY 1,2
ORDER BY 1,2;


## OHLC sanity checks

In [None]:
%sql
SELECT
  exchange, symbol, interval,
  open_time,
  from_utc_timestamp(to_timestamp(open_time/1000), 'Australia/Sydney') AS open_time_syd,
  open, high, low, close, volume
FROM ohlc
ORDER BY open_time DESC
LIMIT 50;
