# üåä Nazar√© Marine Conditions Dashboard: Real-Time Wave Analytics with StormGlass, Snowflake & Streamlit

## Overview

This notebook:

1. Loads configuration and secrets from a local `.env` file.
2. Fetches marine data from the **StormGlass API** for a point near Nazar√©, Portugal.
3. Cleans and prepares the data in a Pandas DataFrame.
4. Writes the data into **Snowflake** tables:
   - `STORM_MARINE_RAW`
   - `STORM_MARINE_CLEAN`
5. Lets you preview the ingested data.

The same Snowflake tables will be used later by a **Streamlit dashboard** (`app.py`).


## 1. Imports & Configuration

We will:

- Load environment variables from `.env` using `python-dotenv`.
- Configure StormGlass API and Snowflake connection parameters.


In [8]:
import os
import sys
import requests
import pandas as pd
from datetime import datetime, timedelta, timezone

from dotenv import load_dotenv
from snowflake.snowpark import Session

# Load variables from .env
load_dotenv()

# Read location from env (with defaults as backup)
LAT = float(os.getenv("LAT", "39.60475"))
LON = float(os.getenv("LON", "-9.085443"))

# StormGlass
STORM_API_KEY = os.getenv("STORM_API_KEY")
STORM_PARAMS = "waveHeight,swellHeight,windSpeed,waterTemperature"

# Snowflake connection
SNOW_CONFIG = {
    "account": os.getenv("SNOW_ACCOUNT"),
    "user": os.getenv("SNOW_USER"),
    "password": os.getenv("SNOW_PASSWORD"),
    "role": os.getenv("SNOW_ROLE", "SYSADMIN"),
    "warehouse": os.getenv("SNOW_WAREHOUSE", "COMPUTE_WH"),
    "database": os.getenv("SNOW_DATABASE", "MARINE_DB"),
    "schema": os.getenv("SNOW_SCHEMA", "NAZARE_SCHEMA"),
}

# Simple sanity check
print("LAT, LON =", LAT, LON)
print("Snowflake account:", SNOW_CONFIG["account"])


LAT, LON = 39.60475 -9.085443
Snowflake account: RJJBFMI-QP52154


## 2. Check environment variables

This ensures:

- StormGlass API key exists  
- Snowflake credentials are available  

If anything is missing, we stop execution early to avoid confusing errors later.


In [9]:
def check_env():
    missing = []
    for var in ["STORM_API_KEY", "SNOW_ACCOUNT", "SNOW_USER", "SNOW_PASSWORD"]:
        if not os.getenv(var):
            missing.append(var)
    if missing:
        raise RuntimeError(f"Missing required env vars in .env: {missing}")

check_env()
print("‚úÖ Environment looks good.")


‚úÖ Environment looks good.


## 3. Fetch marine data from StormGlass

We will request a **time window** (e.g. last 48h + next 24h) using:

- Endpoint: `https://api.stormglass.io/v2/weather/point`
- Params: `waveHeight`, `swellHeight`, `windSpeed`, `waterTemperature`
- Auth: API key in the `Authorization` header


In [10]:
def fetch_stormglass(lat: float, lon: float, hours_back: int = 48, hours_forward: int = 24):
    """
    Fetch past + future hours from StormGlass (max window 10 days on free tier).
    Uses UNIX timestamps for start and end.
    """
    now = datetime.now(timezone.utc)
    start = now - timedelta(hours=hours_back)
    end = now + timedelta(hours=hours_forward)

    url = (
        "https://api.stormglass.io/v2/weather/point"
        f"?lat={lat}&lng={lon}"
        f"&params={STORM_PARAMS}"
        f"&start={int(start.timestamp())}"
        f"&end={int(end.timestamp())}"
    )

    headers = {"Authorization": STORM_API_KEY}

    print("Requesting StormGlass URL:")
    print(url)
    r = requests.get(url, headers=headers, timeout=30)
    r.raise_for_status()
    data = r.json()
    return data["hours"]

def stormglass_hours_to_df(hours):
    """Flatten StormGlass 'hours' into a tabular DataFrame."""
    records = []
    for h in hours:
        rec = {
            "timestamp": h["time"],
            "wave_height": h.get("waveHeight", {}).get("sg"),
            "swell_height": h.get("swellHeight", {}).get("sg"),
            "wind_speed": h.get("windSpeed", {}).get("sg"),
            "water_temperature": h.get("waterTemperature", {}).get("sg"),
        }
        records.append(rec)

    df = pd.DataFrame(records)
    df["timestamp"] = pd.to_datetime(df["timestamp"])
    df["lat"] = LAT
    df["lon"] = LON
    df["source"] = "stormglass"
    df["ingested_at"] = pd.Timestamp.utcnow()
    return df

hours = fetch_stormglass(LAT, LON)
df_raw = stormglass_hours_to_df(hours)

print("Rows fetched from StormGlass:", len(df_raw))
df_raw.head()


Requesting StormGlass URL:
https://api.stormglass.io/v2/weather/point?lat=39.60475&lng=-9.085443&params=waveHeight,swellHeight,windSpeed,waterTemperature&start=1764848438&end=1765107638
Rows fetched from StormGlass: 73


Unnamed: 0,timestamp,wave_height,swell_height,wind_speed,water_temperature,lat,lon,source,ingested_at
0,2025-12-04 11:00:00+00:00,3.69,3.46,5.02,15.77,39.60475,-9.085443,stormglass,2025-12-06 11:40:38.539180+00:00
1,2025-12-04 12:00:00+00:00,3.63,3.41,5.39,15.78,39.60475,-9.085443,stormglass,2025-12-06 11:40:38.539180+00:00
2,2025-12-04 13:00:00+00:00,3.61,3.42,5.18,15.79,39.60475,-9.085443,stormglass,2025-12-06 11:40:38.539180+00:00
3,2025-12-04 14:00:00+00:00,3.58,3.43,4.97,15.79,39.60475,-9.085443,stormglass,2025-12-06 11:40:38.539180+00:00
4,2025-12-04 15:00:00+00:00,3.55,3.44,4.76,15.79,39.60475,-9.085443,stormglass,2025-12-06 11:40:38.539180+00:00


## 4. Clean data and create a ‚Äúclean‚Äù DataFrame

- Drop rows where `wave_height` is missing (cannot use as target).
- Sort by timestamp.
- (Optional) Create basic features like:
  - Hour of day
  - Day of week
  - 3-hour rolling average of `wave_height`


In [11]:
df_clean = df_raw.dropna(subset=["wave_height"]).copy()
df_clean = df_clean.sort_values("timestamp").reset_index(drop=True)

# Simple time-based features (optional, useful later for ML)
df_clean["hour"] = df_clean["timestamp"].dt.hour
df_clean["dayofweek"] = df_clean["timestamp"].dt.dayofweek
df_clean["rolling_wave_3h"] = df_clean["wave_height"].rolling(3, min_periods=1).mean()

print("Rows after cleaning:", len(df_clean))
df_clean.head()


Rows after cleaning: 73


Unnamed: 0,timestamp,wave_height,swell_height,wind_speed,water_temperature,lat,lon,source,ingested_at,hour,dayofweek,rolling_wave_3h
0,2025-12-04 11:00:00+00:00,3.69,3.46,5.02,15.77,39.60475,-9.085443,stormglass,2025-12-06 11:40:38.539180+00:00,11,3,3.69
1,2025-12-04 12:00:00+00:00,3.63,3.41,5.39,15.78,39.60475,-9.085443,stormglass,2025-12-06 11:40:38.539180+00:00,12,3,3.66
2,2025-12-04 13:00:00+00:00,3.61,3.42,5.18,15.79,39.60475,-9.085443,stormglass,2025-12-06 11:40:38.539180+00:00,13,3,3.643333
3,2025-12-04 14:00:00+00:00,3.58,3.43,4.97,15.79,39.60475,-9.085443,stormglass,2025-12-06 11:40:38.539180+00:00,14,3,3.606667
4,2025-12-04 15:00:00+00:00,3.55,3.44,4.76,15.79,39.60475,-9.085443,stormglass,2025-12-06 11:40:38.539180+00:00,15,3,3.58


## 5. Connect to Snowflake and ensure tables exist

We will:

- Open a Snowflake Snowpark session using the `.env` values.
- Create two tables if they don't exist:
  - `STORM_MARINE_RAW`  ‚Äì raw snapshots
  - `STORM_MARINE_CLEAN` ‚Äì cleaned, model-ready time series


In [None]:
def get_snowflake_session():
    return Session.builder.configs(SNOW_CONFIG).create()

def ensure_tables(session: Session):
    session.sql("""
        CREATE TABLE IF NOT EXISTS STORM_MARINE_RAW (
          timestamp TIMESTAMP_NTZ,
          wave_height FLOAT,
          swell_height FLOAT,
          wind_speed FLOAT,
          water_temperature FLOAT,
          lat FLOAT,
          lon FLOAT,
          source STRING,
          ingested_at TIMESTAMP_NTZ
        )
    """).collect()

    session.sql("""
        CREATE TABLE IF NOT EXISTS STORM_MARINE_CLEAN (
          timestamp TIMESTAMP_NTZ PRIMARY KEY,
          wave_height FLOAT,
          swell_height FLOAT,
          wind_speed FLOAT,
          water_temperature FLOAT,
          lat FLOAT,
          lon FLOAT,
          source STRING,
          ingested_at TIMESTAMP_NTZ,
          hour INTEGER,
          dayofweek INTEGER,
          rolling_wave_3h FLOAT
        )
    """).collect()

session = get_snowflake_session()
ensure_tables(session)
print("‚úÖ Snowflake tables checked/created.")


‚úÖ Snowflake tables checked/created.


## 6. Write data into Snowflake

- Append new snapshot to `STORM_MARINE_RAW`  
- Overwrite the current window in `STORM_MARINE_CLEAN`  
  (for production you‚Äôd use a MERGE/upsert, but overwrite is fine for the exam).


In [16]:
# Make copies with UPPERCASE column names to match Snowflake behavior
df_raw_sf = df_raw.copy()
df_raw_sf.columns = [c.upper() for c in df_raw_sf.columns]

df_clean_sf = df_clean.copy()
df_clean_sf.columns = [c.upper() for c in df_clean_sf.columns]

print(df_raw_sf.dtypes)  # optional sanity check

# Write raw (append)
session.write_pandas(
    df_raw_sf,
    "STORM_MARINE_RAW",
    auto_create_table=True,
    overwrite=False,
    quote_identifiers=False,   # don't quote "TIMESTAMP"
    use_logical_type=True      # <-- critical: handle datetimes as TIMESTAMP, not NUMBER
)

# Write clean (overwrite)
session.write_pandas(
    df_clean_sf,
    "STORM_MARINE_CLEAN",
    auto_create_table=True,
    overwrite=True,
    quote_identifiers=False,
    use_logical_type=True
)

print("‚úÖ Data written to STORM_MARINE_RAW and STORM_MARINE_CLEAN.")


TIMESTAMP            datetime64[ns, UTC]
WAVE_HEIGHT                      float64
SWELL_HEIGHT                     float64
WIND_SPEED                       float64
WATER_TEMPERATURE                float64
LAT                              float64
LON                              float64
SOURCE                            object
INGESTED_AT          datetime64[us, UTC]
dtype: object
‚úÖ Data written to STORM_MARINE_RAW and STORM_MARINE_CLEAN.


## 7. Preview data directly from Snowflake

Use Snowpark to read back a few rows and confirm that ingestion worked as expected.


In [17]:
df_preview = session.table("STORM_MARINE_CLEAN").sort("timestamp").limit(10).to_pandas()
df_preview


Unnamed: 0,TIMESTAMP,WAVE_HEIGHT,SWELL_HEIGHT,WIND_SPEED,WATER_TEMPERATURE,LAT,LON,SOURCE,INGESTED_AT,HOUR,DAYOFWEEK,ROLLING_WAVE_3H
0,2025-12-04 03:00:00-08:00,3.69,3.46,5.02,15.77,39.60475,-9.085443,stormglass,2025-12-06 03:40:38.539180-08:00,11,3,3.69
1,2025-12-04 04:00:00-08:00,3.63,3.41,5.39,15.78,39.60475,-9.085443,stormglass,2025-12-06 03:40:38.539180-08:00,12,3,3.66
2,2025-12-04 05:00:00-08:00,3.61,3.42,5.18,15.79,39.60475,-9.085443,stormglass,2025-12-06 03:40:38.539180-08:00,13,3,3.643333
3,2025-12-04 06:00:00-08:00,3.58,3.43,4.97,15.79,39.60475,-9.085443,stormglass,2025-12-06 03:40:38.539180-08:00,14,3,3.606667
4,2025-12-04 07:00:00-08:00,3.55,3.44,4.76,15.79,39.60475,-9.085443,stormglass,2025-12-06 03:40:38.539180-08:00,15,3,3.58
5,2025-12-04 08:00:00-08:00,3.54,3.46,4.01,15.79,39.60475,-9.085443,stormglass,2025-12-06 03:40:38.539180-08:00,16,3,3.556667
6,2025-12-04 09:00:00-08:00,3.52,3.48,3.27,15.78,39.60475,-9.085443,stormglass,2025-12-06 03:40:38.539180-08:00,17,3,3.536667
7,2025-12-04 10:00:00-08:00,3.5,3.5,2.52,15.77,39.60475,-9.085443,stormglass,2025-12-06 03:40:38.539180-08:00,18,3,3.52
8,2025-12-04 11:00:00-08:00,3.5,3.51,2.2,15.76,39.60475,-9.085443,stormglass,2025-12-06 03:40:38.539180-08:00,19,3,3.506667
9,2025-12-04 12:00:00-08:00,3.49,3.53,1.88,15.75,39.60475,-9.085443,stormglass,2025-12-06 03:40:38.539180-08:00,20,3,3.496667


## 8. Close Snowflake session

Always good practice to close the session at the end of the notebook run.


In [18]:
session.close()
print("Snowflake session closed.")


Snowflake session closed.
