# Oceanographic Temperature Analysis: Multi-Station Time Series Study

## Overview

This notebook demonstrates comprehensive oceanographic analysis techniques using temperature, salinity, and dissolved oxygen data from three globally distributed ocean monitoring stations.

## Dataset Description

**Temporal Coverage**: 10 years of continuous monitoring (2013-2022)

**Spatial Coverage**: Three stations across major ocean basins:
- **STATION_A**: North Pacific (45°N) - Mid-latitude temperate waters
- **STATION_B**: Equatorial Pacific (0°) - Tropical waters
- **STATION_C**: South Atlantic (-35°S) - Southern hemisphere temperate waters

**Vertical Coverage**: 5 depth levels (0m, 50m, 100m, 200m, 500m)

**Parameters Measured**:
- Temperature (°C): Critical for understanding ocean heat content and circulation
- Salinity (PSU): Practical Salinity Units, key for density calculations
- Dissolved Oxygen (mL/L): Indicator of biological activity and water mass age

## Oceanographic Concepts

This analysis explores fundamental oceanographic phenomena:

1. **Thermocline Structure**: The vertical temperature gradient that separates warm surface waters from cold deep waters (typically 50-200m depth)

2. **Thermal Stratification**: Density-driven layering of the water column, strongest in summer when surface heating increases

3. **Seasonal Cycle**: Annual temperature variation driven by solar heating, most pronounced at the surface in mid-latitudes

4. **Ocean Warming**: Long-term temperature trends associated with climate change, affecting heat content and ecosystem dynamics

5. **Temperature Anomalies**: Deviations from long-term climatological mean, revealing interannual variability and extreme events

6. **T-S Diagrams**: Temperature-Salinity plots used to identify and characterize distinct water masses

7. **Mixed Layer Depth**: The surface layer of relatively uniform temperature and salinity, controlled by wind mixing and convection

8. **Dissolved Oxygen Dynamics**: Oxygen concentration patterns reflecting biological productivity, respiration, and physical ventilation

## Analytical Methods

- **Vertical profiling**: Temperature-depth relationships
- **Time series analysis**: Trend detection and seasonal decomposition
- **Statistical testing**: Significance of warming trends
- **Climatology**: Long-term mean state calculation
- **Anomaly analysis**: Departure from climatological baseline
- **Water mass analysis**: T-S diagram interpretation
- **Stratification metrics**: Thermal gradient quantification

## Setup and Configuration

Import required libraries for data analysis, visualization, and statistical modeling.

In [None]:
# Core data analysis libraries

# Visualization libraries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import statsmodels.api as sm

# Statistical analysis
from statsmodels.tsa.seasonal import seasonal_decompose

# Configure visualization style
plt.style.use("seaborn-v0_8-darkgrid")
sns.set_palette("deep")
plt.rcParams["figure.figsize"] = (12, 6)
plt.rcParams["font.size"] = 10

# Display settings
pd.set_option("display.max_columns", None)
pd.set_option("display.precision", 3)

print("Libraries imported successfully")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")

## Load and Explore Data

Load the oceanographic dataset and perform initial exploration to understand data structure, coverage, and quality.

In [None]:
# Load oceanographic data
df = pd.read_csv("../data/ocean_temperature_data.csv")

# Parse dates
df["date"] = pd.to_datetime(df["date"])

# Extract temporal features for analysis
df["year"] = df["date"].dt.year
df["month"] = df["date"].dt.month
df["season"] = df["month"].map(
    {
        12: "Winter",
        1: "Winter",
        2: "Winter",
        3: "Spring",
        4: "Spring",
        5: "Spring",
        6: "Summer",
        7: "Summer",
        8: "Summer",
        9: "Fall",
        10: "Fall",
        11: "Fall",
    }
)

# Sort by date and depth for time series analysis
df = df.sort_values(["station_id", "date", "depth_m"]).reset_index(drop=True)

print("Dataset loaded successfully")
print(f"\nDataset shape: {df.shape[0]:,} observations, {df.shape[1]} variables")
print(
    f"Date range: {df['date'].min().strftime('%Y-%m-%d')} to {df['date'].max().strftime('%Y-%m-%d')}"
)
print(f"Number of years: {df['year'].nunique()}")
print("\nFirst few rows:")
df.head(10)

In [None]:
# Data structure and quality
print("Dataset Information:")
print(df.info())
print("\n" + "=" * 80 + "\n")

# Summary statistics
print("Summary Statistics:")
print(df[["temperature_c", "salinity_psu", "dissolved_oxygen_ml_l", "depth_m"]].describe())
print("\n" + "=" * 80 + "\n")

# Check for missing data
print("Missing Data:")
missing = df.isnull().sum()
missing_pct = 100 * missing / len(df)
missing_summary = pd.DataFrame({"Missing_Count": missing, "Percentage": missing_pct})
print(missing_summary[missing_summary["Missing_Count"] > 0])

In [None]:
# Station-specific information
print("Station Information:")
print("=" * 80)

station_info = (
    df.groupby("station_id")
    .agg(
        {
            "station_name": "first",
            "latitude": "first",
            "longitude": "first",
            "date": ["min", "max", "count"],
            "temperature_c": ["mean", "std", "min", "max"],
        }
    )
    .round(2)
)

print(station_info)
print("\n" + "=" * 80 + "\n")

# Depth level coverage
print("Depth Level Coverage:")
depth_coverage = df.groupby(["station_id", "depth_m"]).size().unstack(fill_value=0)
print(depth_coverage)
print("\n" + "=" * 80 + "\n")

# Observations per station
print("Observations per Station:")
obs_per_station = df["station_id"].value_counts().sort_index()
print(obs_per_station)

## Temperature Profiles Analysis

Vertical temperature profiles reveal the ocean's thermal structure, including the thermocline - the transition layer between warm surface waters and cold deep waters. The strength and depth of the thermocline vary with season and latitude.

In [None]:
# Calculate mean temperature profile for each station
profile_data = (
    df.groupby(["station_id", "station_name", "latitude", "depth_m"])["temperature_c"]
    .agg(["mean", "std"])
    .reset_index()
)

# Create temperature profile plot
fig, axes = plt.subplots(1, 3, figsize=(15, 6), sharey=True)

stations = df["station_id"].unique()
colors = ["#2E86AB", "#A23B72", "#F18F01"]

for idx, (station, color) in enumerate(zip(stations, colors)):
    ax = axes[idx]
    station_data = profile_data[profile_data["station_id"] == station]

    # Plot mean temperature profile
    ax.plot(
        station_data["mean"],
        station_data["depth_m"],
        marker="o",
        linewidth=2.5,
        markersize=8,
        color=color,
        label="Mean",
    )

    # Add standard deviation shading
    ax.fill_betweenx(
        station_data["depth_m"],
        station_data["mean"] - station_data["std"],
        station_data["mean"] + station_data["std"],
        alpha=0.3,
        color=color,
        label="±1 SD",
    )

    # Formatting
    ax.invert_yaxis()
    ax.set_xlabel("Temperature (°C)", fontsize=12, fontweight="bold")
    ax.set_ylabel("Depth (m)", fontsize=12, fontweight="bold")
    ax.set_title(
        f"{station_data['station_name'].iloc[0]}\n{station_data['latitude'].iloc[0]}°",
        fontsize=13,
        fontweight="bold",
    )
    ax.grid(True, alpha=0.3, linestyle="--")
    ax.legend(loc="best", framealpha=0.9)

axes[0].set_ylabel("Depth (m)", fontsize=12, fontweight="bold")

plt.suptitle(
    "Mean Temperature Profiles by Station\n(10-Year Climatology)",
    fontsize=16,
    fontweight="bold",
    y=1.02,
)
plt.tight_layout()
plt.show()

print("Temperature Profile Summary:")
print(profile_data.round(2))


In [None]:
# Thermocline analysis - calculate temperature gradient
def calculate_thermocline(station_data):
    """Calculate thermocline depth and strength from temperature profile"""
    depths = station_data["depth_m"].values
    temps = station_data["mean"].values

    # Calculate temperature gradient (dT/dz)
    gradients = np.diff(temps) / np.diff(depths)
    gradient_depths = (depths[:-1] + depths[1:]) / 2

    # Thermocline is at maximum negative gradient
    thermocline_idx = np.argmin(gradients)
    thermocline_depth = gradient_depths[thermocline_idx]
    thermocline_strength = abs(gradients[thermocline_idx])

    return gradient_depths, gradients, thermocline_depth, thermocline_strength


# Create thermocline visualization
fig, axes = plt.subplots(1, 3, figsize=(15, 6))

print("Thermocline Characteristics:")
print("=" * 80)

for idx, (station, color) in enumerate(zip(stations, colors)):
    ax = axes[idx]
    station_data = profile_data[profile_data["station_id"] == station].sort_values("depth_m")

    # Calculate thermocline
    gradient_depths, gradients, therm_depth, therm_strength = calculate_thermocline(station_data)

    # Plot temperature gradient
    ax.plot(gradients, gradient_depths, marker="s", linewidth=2.5, markersize=8, color=color)
    ax.axhline(
        y=therm_depth,
        color="red",
        linestyle="--",
        linewidth=2,
        label=f"Thermocline: {therm_depth:.0f}m",
    )
    ax.axvline(x=0, color="gray", linestyle="-", linewidth=1, alpha=0.5)

    # Formatting
    ax.invert_yaxis()
    ax.set_xlabel("Temperature Gradient (°C/m)", fontsize=12, fontweight="bold")
    ax.set_ylabel("Depth (m)", fontsize=12, fontweight="bold")
    ax.set_title(f"{station_data['station_name'].iloc[0]}", fontsize=13, fontweight="bold")
    ax.grid(True, alpha=0.3, linestyle="--")
    ax.legend(loc="best", framealpha=0.9)

    # Print thermocline info
    print(f"{station_data['station_name'].iloc[0]} ({station_data['latitude'].iloc[0]}°):")
    print(f"  Thermocline Depth: {therm_depth:.1f} m")
    print(f"  Thermocline Strength: {therm_strength:.3f} °C/m")
    print()

plt.suptitle(
    "Thermocline Analysis: Vertical Temperature Gradients", fontsize=16, fontweight="bold", y=1.02
)
plt.tight_layout()
plt.show()


In [None]:
# Seasonal temperature profiles
seasonal_profiles = (
    df.groupby(["station_id", "station_name", "season", "depth_m"])["temperature_c"]
    .mean()
    .reset_index()
)

fig, axes = plt.subplots(1, 3, figsize=(15, 6), sharey=True)

season_colors = {"Winter": "#3A86FF", "Spring": "#8338EC", "Summer": "#FB5607", "Fall": "#FFBE0B"}
season_order = ["Winter", "Spring", "Summer", "Fall"]

for idx, station in enumerate(stations):
    ax = axes[idx]
    station_data = seasonal_profiles[seasonal_profiles["station_id"] == station]

    for season in season_order:
        season_data = station_data[station_data["season"] == season].sort_values("depth_m")
        if not season_data.empty:
            ax.plot(
                season_data["temperature_c"],
                season_data["depth_m"],
                marker="o",
                linewidth=2,
                markersize=6,
                color=season_colors[season],
                label=season,
            )

    ax.invert_yaxis()
    ax.set_xlabel("Temperature (°C)", fontsize=12, fontweight="bold")
    ax.set_title(f"{station_data['station_name'].iloc[0]}", fontsize=13, fontweight="bold")
    ax.grid(True, alpha=0.3, linestyle="--")
    ax.legend(loc="best", framealpha=0.9, fontsize=9)

axes[0].set_ylabel("Depth (m)", fontsize=12, fontweight="bold")

plt.suptitle("Seasonal Temperature Profiles", fontsize=16, fontweight="bold", y=1.02)
plt.tight_layout()
plt.show()

## Seasonal Pattern Analysis

Surface waters exhibit strong seasonal cycles driven by solar heating and cooling. The amplitude of this cycle varies with latitude - larger at mid-latitudes, smaller near the equator where seasonal variation in solar radiation is minimal.

In [None]:
# Extract surface temperature time series (0m depth)
surface_data = df[df["depth_m"] == 0].copy()
surface_data = surface_data.sort_values(["station_id", "date"]).reset_index(drop=True)

# Plot surface temperature time series for all stations
fig, ax = plt.subplots(figsize=(16, 6))

for station, color in zip(stations, colors):
    station_data = surface_data[surface_data["station_id"] == station]
    ax.plot(
        station_data["date"],
        station_data["temperature_c"],
        linewidth=1.5,
        color=color,
        alpha=0.8,
        label=f"{station_data['station_name'].iloc[0]} ({station_data['latitude'].iloc[0]}°)",
    )

ax.set_xlabel("Date", fontsize=12, fontweight="bold")
ax.set_ylabel("Sea Surface Temperature (°C)", fontsize=12, fontweight="bold")
ax.set_title("Surface Temperature Time Series (0m depth)", fontsize=14, fontweight="bold")
ax.legend(loc="best", framealpha=0.9, fontsize=11)
ax.grid(True, alpha=0.3, linestyle="--")
plt.tight_layout()
plt.show()

print("Surface Temperature Statistics by Station:")
print("=" * 80)
surface_stats = (
    surface_data.groupby(["station_id", "station_name", "latitude"])["temperature_c"]
    .agg(
        [
            "count",
            "mean",
            "std",
            "min",
            "max",
            lambda x: x.max() - x.min(),  # range
        ]
    )
    .round(2)
)
surface_stats.columns = ["Count", "Mean", "Std_Dev", "Min", "Max", "Range"]
print(surface_stats)

In [None]:
# Monthly climatology - average seasonal cycle
monthly_clim = (
    surface_data.groupby(["station_id", "station_name", "latitude", "month"])["temperature_c"]
    .agg(["mean", "std"])
    .reset_index()
)

fig, ax = plt.subplots(figsize=(14, 6))

month_names = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]

for station, color in zip(stations, colors):
    station_clim = monthly_clim[monthly_clim["station_id"] == station].sort_values("month")

    # Plot mean seasonal cycle
    ax.plot(
        station_clim["month"],
        station_clim["mean"],
        marker="o",
        linewidth=3,
        markersize=8,
        color=color,
        label=f"{station_clim['station_name'].iloc[0]} ({station_clim['latitude'].iloc[0]}°)",
    )

    # Add standard deviation envelope
    ax.fill_between(
        station_clim["month"],
        station_clim["mean"] - station_clim["std"],
        station_clim["mean"] + station_clim["std"],
        alpha=0.2,
        color=color,
    )

ax.set_xlabel("Month", fontsize=12, fontweight="bold")
ax.set_ylabel("Sea Surface Temperature (°C)", fontsize=12, fontweight="bold")
ax.set_title(
    "Mean Annual Cycle of Surface Temperature (Climatology)", fontsize=14, fontweight="bold"
)
ax.set_xticks(range(1, 13))
ax.set_xticklabels(month_names)
ax.legend(loc="best", framealpha=0.9, fontsize=11)
ax.grid(True, alpha=0.3, linestyle="--")
plt.tight_layout()
plt.show()

print("\nSeasonal Amplitude Analysis:")
print("=" * 80)
for station in stations:
    station_clim = monthly_clim[monthly_clim["station_id"] == station]
    amplitude = station_clim["mean"].max() - station_clim["mean"].min()
    warmest_month = station_clim.loc[station_clim["mean"].idxmax(), "month"]
    coldest_month = station_clim.loc[station_clim["mean"].idxmin(), "month"]

    print(f"{station_clim['station_name'].iloc[0]} ({station_clim['latitude'].iloc[0]}°):")
    print(f"  Seasonal amplitude: {amplitude:.2f}°C")
    print(
        f"  Warmest month: {month_names[int(warmest_month) - 1]} ({station_clim['mean'].max():.2f}°C)"
    )
    print(
        f"  Coldest month: {month_names[int(coldest_month) - 1]} ({station_clim['mean'].min():.2f}°C)"
    )
    print()

In [None]:
# Seasonal decomposition for one station (STATION_A)
station_a_surface = surface_data[surface_data["station_id"] == "STATION_A"].set_index("date")[
    "temperature_c"
]
station_a_surface = station_a_surface.asfreq("MS")  # Monthly frequency

# Perform seasonal decomposition
decomposition = seasonal_decompose(station_a_surface, model="additive", period=12)

# Plot decomposition
fig, axes = plt.subplots(4, 1, figsize=(16, 10))

decomposition.observed.plot(ax=axes[0], color="#2E86AB", linewidth=1.5)
axes[0].set_ylabel("Observed (°C)", fontsize=11, fontweight="bold")
axes[0].set_title(
    "Seasonal Decomposition: STATION_A Surface Temperature", fontsize=14, fontweight="bold"
)
axes[0].grid(True, alpha=0.3)

decomposition.trend.plot(ax=axes[1], color="#E63946", linewidth=2.5)
axes[1].set_ylabel("Trend (°C)", fontsize=11, fontweight="bold")
axes[1].grid(True, alpha=0.3)

decomposition.seasonal.plot(ax=axes[2], color="#06FFA5", linewidth=1.5)
axes[2].set_ylabel("Seasonal (°C)", fontsize=11, fontweight="bold")
axes[2].grid(True, alpha=0.3)

decomposition.resid.plot(ax=axes[3], color="#F77F00", linewidth=1, alpha=0.7)
axes[3].axhline(y=0, color="gray", linestyle="--", linewidth=1)
axes[3].set_ylabel("Residual (°C)", fontsize=11, fontweight="bold")
axes[3].set_xlabel("Date", fontsize=11, fontweight="bold")
axes[3].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Decomposition Components Summary:")
print("=" * 80)
print(f"Trend range: {decomposition.trend.min():.2f}°C to {decomposition.trend.max():.2f}°C")
print(f"Seasonal amplitude: {decomposition.seasonal.max() - decomposition.seasonal.min():.2f}°C")
print(f"Residual std dev: {decomposition.resid.std():.3f}°C")

## Long-term Trend Analysis

Ocean warming is a critical indicator of climate change. We analyze 10-year temperature trends to detect and quantify warming rates, assessing statistical significance to distinguish true climate signals from natural variability.

In [None]:
# Calculate annual mean temperature by station and depth
annual_means = (
    df.groupby(["station_id", "station_name", "latitude", "depth_m", "year"])["temperature_c"]
    .mean()
    .reset_index()
)


# Linear trend analysis function
def calculate_trend(years, temps):
    """Calculate linear trend and statistics"""
    # Add constant for intercept
    X = sm.add_constant(years)
    model = sm.OLS(temps, X).fit()

    slope = model.params[1]  # Trend in °C/year
    p_value = model.pvalues[1]
    r_squared = model.rsquared

    return slope, p_value, r_squared, model


# Calculate trends for surface waters at each station
print("Surface Temperature Trends (0m depth):")
print("=" * 80)

trend_results = []

for station in stations:
    station_data = annual_means[
        (annual_means["station_id"] == station) & (annual_means["depth_m"] == 0)
    ]

    years = station_data["year"].values
    temps = station_data["temperature_c"].values

    slope, p_value, r_squared, model = calculate_trend(years, temps)

    # Calculate trend over 10 years
    decadal_change = slope * 10

    significance = (
        "***" if p_value < 0.001 else "**" if p_value < 0.01 else "*" if p_value < 0.05 else "ns"
    )

    trend_results.append(
        {
            "station": station,
            "station_name": station_data["station_name"].iloc[0],
            "latitude": station_data["latitude"].iloc[0],
            "slope": slope,
            "decadal_change": decadal_change,
            "p_value": p_value,
            "r_squared": r_squared,
            "significance": significance,
        }
    )

    print(f"{station_data['station_name'].iloc[0]} ({station_data['latitude'].iloc[0]}°):")
    print(f"  Warming rate: {slope:.4f}°C/year ({significance})")
    print(f"  Decadal change: {decadal_change:+.3f}°C over 10 years")
    print(f"  R² = {r_squared:.3f}, p-value = {p_value:.4f}")
    print()

print("Significance codes: *** p<0.001, ** p<0.01, * p<0.05, ns = not significant")

In [None]:
# Visualize trends with regression lines
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

for idx, (station, color, result) in enumerate(zip(stations, colors, trend_results)):
    ax = axes[idx]
    station_data = annual_means[
        (annual_means["station_id"] == station) & (annual_means["depth_m"] == 0)
    ]

    years = station_data["year"].values
    temps = station_data["temperature_c"].values

    # Scatter plot of annual means
    ax.scatter(
        years, temps, s=100, color=color, alpha=0.7, edgecolors="black", linewidth=1.5, zorder=3
    )

    # Trend line
    X = sm.add_constant(years)
    model = sm.OLS(temps, X).fit()
    trend_line = model.predict(X)
    ax.plot(
        years,
        trend_line,
        color="red",
        linewidth=3,
        linestyle="--",
        label=f"Trend: {result['slope']:.4f}°C/yr {result['significance']}",
    )

    # Formatting
    ax.set_xlabel("Year", fontsize=12, fontweight="bold")
    ax.set_ylabel("Annual Mean Temperature (°C)", fontsize=12, fontweight="bold")
    ax.set_title(f"{result['station_name']}\n{result['latitude']}°", fontsize=13, fontweight="bold")
    ax.legend(loc="best", framealpha=0.9, fontsize=10)
    ax.grid(True, alpha=0.3, linestyle="--")

plt.suptitle(
    "Surface Temperature Trends: Annual Mean Time Series", fontsize=16, fontweight="bold", y=1.02
)
plt.tight_layout()
plt.show()

In [None]:
# Depth-dependent trends - warming throughout the water column
print("Warming Trends by Depth:")
print("=" * 80)

depth_trends = []

for station in stations:
    print(f"\n{annual_means[annual_means['station_id'] == station]['station_name'].iloc[0]}:")
    print(f"{'Depth (m)':<12} {'Trend (°C/yr)':<18} {'10-yr Change':<15} {'p-value':<12} {'Sig.'}")
    print("-" * 70)

    for depth in sorted(df["depth_m"].unique()):
        station_depth_data = annual_means[
            (annual_means["station_id"] == station) & (annual_means["depth_m"] == depth)
        ]

        if len(station_depth_data) > 3:
            years = station_depth_data["year"].values
            temps = station_depth_data["temperature_c"].values

            slope, p_value, r_squared, model = calculate_trend(years, temps)
            decadal_change = slope * 10
            significance = (
                "***"
                if p_value < 0.001
                else "**"
                if p_value < 0.01
                else "*"
                if p_value < 0.05
                else "ns"
            )

            depth_trends.append(
                {
                    "station": station,
                    "depth": depth,
                    "slope": slope,
                    "decadal_change": decadal_change,
                    "p_value": p_value,
                    "significance": significance,
                }
            )

            print(
                f"{depth:<12.0f} {slope:>8.5f}        {decadal_change:>+8.4f}        {p_value:>8.5f}    {significance}"
            )

# Visualize depth-dependent trends
depth_trends_df = pd.DataFrame(depth_trends)

fig, ax = plt.subplots(figsize=(12, 7))

for station, color in zip(stations, colors):
    station_trends = depth_trends_df[depth_trends_df["station"] == station]
    station_name = annual_means[annual_means["station_id"] == station]["station_name"].iloc[0]

    ax.plot(
        station_trends["slope"],
        station_trends["depth"],
        marker="o",
        linewidth=2.5,
        markersize=10,
        color=color,
        label=station_name,
    )

ax.axvline(x=0, color="gray", linestyle="--", linewidth=1.5, alpha=0.7)
ax.invert_yaxis()
ax.set_xlabel("Warming Rate (°C/year)", fontsize=13, fontweight="bold")
ax.set_ylabel("Depth (m)", fontsize=13, fontweight="bold")
ax.set_title("Warming Trends by Depth: Vertical Profile", fontsize=14, fontweight="bold")
ax.legend(loc="best", framealpha=0.9, fontsize=11)
ax.grid(True, alpha=0.3, linestyle="--")
plt.tight_layout()
plt.show()

## Temperature Anomaly Analysis

Temperature anomalies represent departures from the long-term climatological mean. Anomalies reveal interannual variability, extreme events, and climate oscillations that may be obscured in the raw temperature data due to the strong seasonal cycle.

In [None]:
# Calculate monthly climatology for anomaly computation
surface_monthly_clim = (
    surface_data.groupby(["station_id", "month"])["temperature_c"].mean().reset_index()
)
surface_monthly_clim.rename(columns={"temperature_c": "climatology"}, inplace=True)

# Merge climatology with observations to calculate anomalies
surface_with_anom = surface_data.merge(surface_monthly_clim, on=["station_id", "month"])
surface_with_anom["anomaly"] = surface_with_anom["temperature_c"] - surface_with_anom["climatology"]

# Plot temperature anomaly time series
fig, axes = plt.subplots(3, 1, figsize=(16, 12))

for idx, (station, color) in enumerate(zip(stations, colors)):
    ax = axes[idx]
    station_data = surface_with_anom[surface_with_anom["station_id"] == station].sort_values("date")

    # Plot anomaly time series
    ax.fill_between(
        station_data["date"],
        0,
        station_data["anomaly"],
        where=(station_data["anomaly"] >= 0),
        color="#E63946",
        alpha=0.6,
        label="Warm anomaly",
    )
    ax.fill_between(
        station_data["date"],
        0,
        station_data["anomaly"],
        where=(station_data["anomaly"] < 0),
        color="#457B9D",
        alpha=0.6,
        label="Cold anomaly",
    )
    ax.plot(station_data["date"], station_data["anomaly"], color="black", linewidth=0.8, alpha=0.7)

    # Add zero line
    ax.axhline(y=0, color="gray", linestyle="-", linewidth=1.5)

    # Add running mean
    station_data["running_mean"] = station_data["anomaly"].rolling(window=12, center=True).mean()
    ax.plot(
        station_data["date"],
        station_data["running_mean"],
        color=color,
        linewidth=3,
        label="12-month running mean",
    )

    # Formatting
    ax.set_ylabel("Temperature Anomaly (°C)", fontsize=12, fontweight="bold")
    ax.set_title(
        f"{station_data['station_name'].iloc[0]} ({station_data['latitude'].iloc[0]}°)",
        fontsize=13,
        fontweight="bold",
    )
    ax.legend(loc="best", framealpha=0.9, fontsize=10)
    ax.grid(True, alpha=0.3, linestyle="--")

axes[2].set_xlabel("Date", fontsize=12, fontweight="bold")
plt.suptitle(
    "Surface Temperature Anomalies (Departure from Monthly Climatology)",
    fontsize=16,
    fontweight="bold",
    y=0.995,
)
plt.tight_layout()
plt.show()


In [None]:
# Anomaly statistics
print("Temperature Anomaly Statistics:")
print("=" * 80)

for station in stations:
    station_anom = surface_with_anom[surface_with_anom["station_id"] == station]

    print(f"\n{station_anom['station_name'].iloc[0]} ({station_anom['latitude'].iloc[0]}°):")
    print(f"  Mean anomaly: {station_anom['anomaly'].mean():+.3f}°C")
    print(f"  Std deviation: {station_anom['anomaly'].std():.3f}°C")
    print(
        f"  Maximum warm anomaly: {station_anom['anomaly'].max():+.3f}°C on {station_anom.loc[station_anom['anomaly'].idxmax(), 'date'].strftime('%Y-%m-%d')}"
    )
    print(
        f"  Maximum cold anomaly: {station_anom['anomaly'].min():+.3f}°C on {station_anom.loc[station_anom['anomaly'].idxmin(), 'date'].strftime('%Y-%m-%d')}"
    )

    # Calculate percentage of time with warm/cold anomalies
    pct_warm = 100 * (station_anom["anomaly"] > 0).sum() / len(station_anom)
    pct_cold = 100 * (station_anom["anomaly"] < 0).sum() / len(station_anom)
    print(f"  Warm anomaly occurrence: {pct_warm:.1f}% of time")
    print(f"  Cold anomaly occurrence: {pct_cold:.1f}% of time")

In [None]:
# Anomaly heatmap by year and month
fig, axes = plt.subplots(3, 1, figsize=(14, 12))

for idx, station in enumerate(stations):
    ax = axes[idx]
    station_data = surface_with_anom[surface_with_anom["station_id"] == station]

    # Create pivot table for heatmap
    pivot_data = station_data.pivot_table(values="anomaly", index="month", columns="year")

    # Create heatmap
    sns.heatmap(
        pivot_data,
        cmap="RdBu_r",
        center=0,
        cbar_kws={"label": "Temperature Anomaly (°C)"},
        ax=ax,
        vmin=-3,
        vmax=3,
    )

    # Formatting
    ax.set_ylabel("Month", fontsize=12, fontweight="bold")
    ax.set_xlabel("Year", fontsize=12, fontweight="bold")
    ax.set_title(
        f"{station_data['station_name'].iloc[0]} ({station_data['latitude'].iloc[0]}°)",
        fontsize=13,
        fontweight="bold",
    )
    ax.set_yticklabels(
        ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"],
        rotation=0,
    )

plt.suptitle(
    "Temperature Anomaly Patterns: Year-Month Heatmap", fontsize=16, fontweight="bold", y=0.995
)
plt.tight_layout()
plt.show()

## Thermal Stratification Analysis

Thermal stratification describes the vertical layering of water masses by temperature and density. Strong stratification isolates surface waters from deeper layers, affecting nutrient supply and biological productivity. Seasonal mixing breaks down stratification, particularly in winter at mid-latitudes.

In [None]:
# Calculate stratification index (temperature difference surface to deep)
# Use 0m and 200m as reference depths

stratification_data = []

for station in df["station_id"].unique():
    station_df = df[df["station_id"] == station]

    # Get dates with both surface and 200m measurements
    dates = station_df["date"].unique()

    for date in dates:
        date_data = station_df[station_df["date"] == date]

        temp_0m = date_data[date_data["depth_m"] == 0]["temperature_c"].values
        temp_200m = date_data[date_data["depth_m"] == 200]["temperature_c"].values

        if len(temp_0m) > 0 and len(temp_200m) > 0:
            stratification_index = temp_0m[0] - temp_200m[0]

            stratification_data.append(
                {
                    "station_id": station,
                    "station_name": date_data["station_name"].iloc[0],
                    "latitude": date_data["latitude"].iloc[0],
                    "date": date,
                    "year": date_data["year"].iloc[0],
                    "month": date_data["month"].iloc[0],
                    "season": date_data["season"].iloc[0],
                    "temp_0m": temp_0m[0],
                    "temp_200m": temp_200m[0],
                    "stratification_index": stratification_index,
                }
            )

strat_df = pd.DataFrame(stratification_data)
strat_df["date"] = pd.to_datetime(strat_df["date"])

print("Stratification Index Calculation Complete")
print("Stratification index = Temperature(0m) - Temperature(200m)")
print("Higher values indicate stronger stratification\n")

# Plot stratification time series
fig, ax = plt.subplots(figsize=(16, 6))

for station, color in zip(stations, colors):
    station_strat = strat_df[strat_df["station_id"] == station].sort_values("date")
    ax.plot(
        station_strat["date"],
        station_strat["stratification_index"],
        linewidth=1.5,
        color=color,
        alpha=0.8,
        label=f"{station_strat['station_name'].iloc[0]} ({station_strat['latitude'].iloc[0]}°)",
    )

ax.set_xlabel("Date", fontsize=12, fontweight="bold")
ax.set_ylabel("Stratification Index (°C)", fontsize=12, fontweight="bold")
ax.set_title(
    "Thermal Stratification Time Series (0m - 200m Temperature Difference)",
    fontsize=14,
    fontweight="bold",
)
ax.legend(loc="best", framealpha=0.9, fontsize=11)
ax.grid(True, alpha=0.3, linestyle="--")
plt.tight_layout()
plt.show()

In [None]:
# Seasonal stratification patterns
seasonal_strat = (
    strat_df.groupby(["station_id", "station_name", "latitude", "season"])["stratification_index"]
    .agg(["mean", "std"])
    .reset_index()
)

fig, ax = plt.subplots(figsize=(12, 6))

x_pos = np.arange(len(season_order))
width = 0.25

for idx, (station, color) in enumerate(zip(stations, colors)):
    station_data = seasonal_strat[seasonal_strat["station_id"] == station]
    station_data = station_data.set_index("season").reindex(season_order).reset_index()

    ax.bar(
        x_pos + idx * width,
        station_data["mean"],
        width,
        yerr=station_data["std"],
        capsize=5,
        color=color,
        alpha=0.8,
        edgecolor="black",
        linewidth=1.5,
        label=f"{station_data['station_name'].iloc[0]} ({station_data['latitude'].iloc[0]}°)",
    )

ax.set_xlabel("Season", fontsize=12, fontweight="bold")
ax.set_ylabel("Stratification Index (°C)", fontsize=12, fontweight="bold")
ax.set_title("Seasonal Stratification Patterns", fontsize=14, fontweight="bold")
ax.set_xticks(x_pos + width)
ax.set_xticklabels(season_order)
ax.legend(loc="best", framealpha=0.9, fontsize=10)
ax.grid(True, alpha=0.3, linestyle="--", axis="y")
plt.tight_layout()
plt.show()

print("\nSeasonal Stratification Statistics:")
print("=" * 80)
for station in stations:
    station_data = seasonal_strat[seasonal_strat["station_id"] == station]
    print(f"\n{station_data['station_name'].iloc[0]} ({station_data['latitude'].iloc[0]}°):")
    print(f"{'Season':<12} {'Mean (°C)':<15} {'Std Dev (°C)'}")
    print("-" * 45)
    for season in season_order:
        season_data = station_data[station_data["season"] == season]
        if not season_data.empty:
            print(
                f"{season:<12} {season_data['mean'].values[0]:>8.2f}       {season_data['std'].values[0]:>8.2f}"
            )


In [None]:
# Mixed layer depth estimation (simplified)
# Define MLD as depth where temperature differs from surface by 0.5°C


def estimate_mld(profile_data, temp_criterion=0.5):
    """Estimate mixed layer depth using temperature criterion"""
    if len(profile_data) < 2:
        return np.nan

    surface_temp = profile_data["temperature_c"].iloc[0]

    for _idx, row in profile_data.iterrows():
        if abs(row["temperature_c"] - surface_temp) >= temp_criterion:
            return row["depth_m"]

    return profile_data["depth_m"].max()


# Calculate MLD for each station and date
mld_data = []

for station in df["station_id"].unique():
    station_df = df[df["station_id"] == station]

    for date in station_df["date"].unique():
        date_data = station_df[station_df["date"] == date].sort_values("depth_m")

        if len(date_data) >= 2:
            mld = estimate_mld(date_data)

            mld_data.append(
                {
                    "station_id": station,
                    "station_name": date_data["station_name"].iloc[0],
                    "latitude": date_data["latitude"].iloc[0],
                    "date": date,
                    "month": date_data["month"].iloc[0],
                    "season": date_data["season"].iloc[0],
                    "mld": mld,
                }
            )

mld_df = pd.DataFrame(mld_data)
mld_df["date"] = pd.to_datetime(mld_df["date"])

# Plot MLD time series
fig, ax = plt.subplots(figsize=(16, 6))

for station, color in zip(stations, colors):
    station_mld = mld_df[mld_df["station_id"] == station].sort_values("date")
    ax.plot(
        station_mld["date"],
        station_mld["mld"],
        linewidth=1.5,
        color=color,
        alpha=0.8,
        label=f"{station_mld['station_name'].iloc[0]} ({station_mld['latitude'].iloc[0]}°)",
    )

ax.invert_yaxis()
ax.set_xlabel("Date", fontsize=12, fontweight="bold")
ax.set_ylabel("Mixed Layer Depth (m)", fontsize=12, fontweight="bold")
ax.set_title(
    "Mixed Layer Depth Time Series (0.5°C Temperature Criterion)", fontsize=14, fontweight="bold"
)
ax.legend(loc="best", framealpha=0.9, fontsize=11)
ax.grid(True, alpha=0.3, linestyle="--")
plt.tight_layout()
plt.show()

print("\nMixed Layer Depth Statistics:")
print("=" * 80)
mld_stats = (
    mld_df.groupby(["station_id", "station_name", "latitude"])["mld"]
    .agg(["mean", "std", "min", "max"])
    .round(1)
)
print(mld_stats)

## Water Mass Properties Analysis

Water masses are identified by their characteristic temperature-salinity (T-S) relationships. T-S diagrams reveal the origin and mixing of water masses. Dissolved oxygen patterns reflect biological activity, ventilation age, and circulation pathways.

In [None]:
# Temperature-Salinity (T-S) diagram
fig, axes = plt.subplots(1, 3, figsize=(18, 6))

for idx, (station, _) in enumerate(zip(stations, colors)):
    ax = axes[idx]
    station_data = df[df["station_id"] == station]

    # Create scatter plot colored by depth
    scatter = ax.scatter(
        station_data["salinity_psu"],
        station_data["temperature_c"],
        c=station_data["depth_m"],
        cmap="viridis_r",
        s=30,
        alpha=0.6,
        edgecolors="black",
        linewidth=0.3,
    )

    # Add colorbar
    cbar = plt.colorbar(scatter, ax=ax)
    cbar.set_label("Depth (m)", fontsize=11, fontweight="bold")

    # Formatting
    ax.set_xlabel("Salinity (PSU)", fontsize=12, fontweight="bold")
    ax.set_ylabel("Temperature (°C)", fontsize=12, fontweight="bold")
    ax.set_title(
        f"{station_data['station_name'].iloc[0]}\n{station_data['latitude'].iloc[0]}°",
        fontsize=13,
        fontweight="bold",
    )
    ax.grid(True, alpha=0.3, linestyle="--")

plt.suptitle(
    "Temperature-Salinity (T-S) Diagrams by Station", fontsize=16, fontweight="bold", y=1.00
)
plt.tight_layout()
plt.show()

print("T-S Characteristics by Station:")
print("=" * 80)
for station in stations:
    station_data = df[df["station_id"] == station]
    print(f"\n{station_data['station_name'].iloc[0]} ({station_data['latitude'].iloc[0]}°):")
    print(
        f"  Temperature range: {station_data['temperature_c'].min():.2f} to {station_data['temperature_c'].max():.2f}°C"
    )
    print(
        f"  Salinity range: {station_data['salinity_psu'].min():.2f} to {station_data['salinity_psu'].max():.2f} PSU"
    )
    print(
        f"  Mean T: {station_data['temperature_c'].mean():.2f}°C, Mean S: {station_data['salinity_psu'].mean():.2f} PSU"
    )


In [None]:
# T-S diagram with seasonal coloring
fig, axes = plt.subplots(1, 3, figsize=(18, 6))

season_colors_map = {
    "Winter": "#3A86FF",
    "Spring": "#8338EC",
    "Summer": "#FB5607",
    "Fall": "#FFBE0B",
}

for idx, station in enumerate(stations):
    ax = axes[idx]
    station_data = df[df["station_id"] == station]

    # Plot by season
    for season in season_order:
        season_data = station_data[station_data["season"] == season]
        ax.scatter(
            season_data["salinity_psu"],
            season_data["temperature_c"],
            color=season_colors_map[season],
            s=20,
            alpha=0.5,
            label=season,
            edgecolors="none",
        )

    # Formatting
    ax.set_xlabel("Salinity (PSU)", fontsize=12, fontweight="bold")
    ax.set_ylabel("Temperature (°C)", fontsize=12, fontweight="bold")
    ax.set_title(
        f"{station_data['station_name'].iloc[0]}\n{station_data['latitude'].iloc[0]}°",
        fontsize=13,
        fontweight="bold",
    )
    ax.legend(loc="best", framealpha=0.9, fontsize=9)
    ax.grid(True, alpha=0.3, linestyle="--")

plt.suptitle(
    "T-S Diagrams: Seasonal Water Mass Variability", fontsize=16, fontweight="bold", y=1.00
)
plt.tight_layout()
plt.show()

In [None]:
# Dissolved oxygen analysis
# Create vertical O2 profiles
o2_profile_data = (
    df.groupby(["station_id", "station_name", "latitude", "depth_m"])["dissolved_oxygen_ml_l"]
    .agg(["mean", "std"])
    .reset_index()
)

fig, axes = plt.subplots(1, 3, figsize=(15, 6), sharey=True)

for idx, (station, color) in enumerate(zip(stations, colors)):
    ax = axes[idx]
    station_data = o2_profile_data[o2_profile_data["station_id"] == station]

    # Plot mean O2 profile
    ax.plot(
        station_data["mean"],
        station_data["depth_m"],
        marker="o",
        linewidth=2.5,
        markersize=8,
        color=color,
        label="Mean",
    )

    # Add standard deviation shading
    ax.fill_betweenx(
        station_data["depth_m"],
        station_data["mean"] - station_data["std"],
        station_data["mean"] + station_data["std"],
        alpha=0.3,
        color=color,
        label="±1 SD",
    )

    # Formatting
    ax.invert_yaxis()
    ax.set_xlabel("Dissolved Oxygen (mL/L)", fontsize=12, fontweight="bold")
    ax.set_title(
        f"{station_data['station_name'].iloc[0]}\n{station_data['latitude'].iloc[0]}°",
        fontsize=13,
        fontweight="bold",
    )
    ax.grid(True, alpha=0.3, linestyle="--")
    ax.legend(loc="best", framealpha=0.9)

axes[0].set_ylabel("Depth (m)", fontsize=12, fontweight="bold")

plt.suptitle("Dissolved Oxygen Vertical Profiles", fontsize=16, fontweight="bold", y=1.02)
plt.tight_layout()
plt.show()

print("Dissolved Oxygen Profile Summary:")
print(o2_profile_data.round(2))


In [None]:
# Temperature-Oxygen relationship
fig, axes = plt.subplots(1, 3, figsize=(18, 6))

for idx, (station, _) in enumerate(zip(stations, colors)):
    ax = axes[idx]
    station_data = df[df["station_id"] == station]

    # Create scatter plot colored by depth
    scatter = ax.scatter(
        station_data["temperature_c"],
        station_data["dissolved_oxygen_ml_l"],
        c=station_data["depth_m"],
        cmap="viridis_r",
        s=30,
        alpha=0.6,
        edgecolors="black",
        linewidth=0.3,
    )

    # Add colorbar
    cbar = plt.colorbar(scatter, ax=ax)
    cbar.set_label("Depth (m)", fontsize=11, fontweight="bold")

    # Formatting
    ax.set_xlabel("Temperature (°C)", fontsize=12, fontweight="bold")
    ax.set_ylabel("Dissolved Oxygen (mL/L)", fontsize=12, fontweight="bold")
    ax.set_title(
        f"{station_data['station_name'].iloc[0]}\n{station_data['latitude'].iloc[0]}°",
        fontsize=13,
        fontweight="bold",
    )
    ax.grid(True, alpha=0.3, linestyle="--")

plt.suptitle("Temperature-Oxygen Relationships", fontsize=16, fontweight="bold", y=1.00)
plt.tight_layout()
plt.show()

print("\nTemperature-Oxygen Correlation by Station:")
print("=" * 80)
for station in stations:
    station_data = df[df["station_id"] == station]
    correlation = station_data[["temperature_c", "dissolved_oxygen_ml_l"]].corr().iloc[0, 1]
    print(f"{station_data['station_name'].iloc[0]}: r = {correlation:.3f}")


## Summary Statistics and Conclusions

Comprehensive statistical summary of oceanographic conditions across all stations, depths, and time periods.

In [None]:
# Comprehensive summary by station
print("COMPREHENSIVE OCEANOGRAPHIC SUMMARY")
print("=" * 80)
print(
    f"\nDataset: {len(df):,} observations from {df['date'].min().strftime('%Y-%m-%d')} to {df['date'].max().strftime('%Y-%m-%d')}"
)
print(f"Stations: {len(stations)}")
print(f"Depth levels: {sorted(df['depth_m'].unique())}")
print(f"Duration: {df['year'].nunique()} years\n")

for station in stations:
    station_data = df[df["station_id"] == station]
    surface_data_st = station_data[station_data["depth_m"] == 0]

    print("=" * 80)
    print(f"STATION: {station_data['station_name'].iloc[0]}")
    print(
        f"Location: {station_data['latitude'].iloc[0]}° lat, {station_data['longitude'].iloc[0]}° lon"
    )
    print("=" * 80)

    print("\n1. TEMPERATURE CHARACTERISTICS")
    print("-" * 80)
    print(
        f"  Overall range: {station_data['temperature_c'].min():.2f}°C to {station_data['temperature_c'].max():.2f}°C"
    )
    print(
        f"  Mean temperature: {station_data['temperature_c'].mean():.2f}°C (±{station_data['temperature_c'].std():.2f}°C)"
    )
    print(f"  Surface mean: {surface_data_st['temperature_c'].mean():.2f}°C")
    print(
        f"  Deep (500m) mean: {station_data[station_data['depth_m'] == 500]['temperature_c'].mean():.2f}°C"
    )

    print("\n2. SALINITY CHARACTERISTICS")
    print("-" * 80)
    print(
        f"  Overall range: {station_data['salinity_psu'].min():.2f} to {station_data['salinity_psu'].max():.2f} PSU"
    )
    print(
        f"  Mean salinity: {station_data['salinity_psu'].mean():.2f} PSU (±{station_data['salinity_psu'].std():.2f} PSU)"
    )

    print("\n3. DISSOLVED OXYGEN")
    print("-" * 80)
    print(
        f"  Overall range: {station_data['dissolved_oxygen_ml_l'].min():.2f} to {station_data['dissolved_oxygen_ml_l'].max():.2f} mL/L"
    )
    print(
        f"  Mean O2: {station_data['dissolved_oxygen_ml_l'].mean():.2f} mL/L (±{station_data['dissolved_oxygen_ml_l'].std():.2f} mL/L)"
    )

    print("\n4. SEASONAL CYCLE (Surface)")
    print("-" * 80)
    seasonal_stats = surface_data_st.groupby("season")["temperature_c"].agg(["mean", "std"])
    for season in season_order:
        if season in seasonal_stats.index:
            print(
                f"  {season:<8}: {seasonal_stats.loc[season, 'mean']:>6.2f}°C (±{seasonal_stats.loc[season, 'std']:.2f}°C)"
            )

    print("\n5. TRENDS AND CHANGES")
    print("-" * 80)
    # Get trend results for this station
    station_trend = next(t for t in trend_results if t["station"] == station)
    print(f"  Warming rate: {station_trend['slope']:.4f}°C/year {station_trend['significance']}")
    print(f"  10-year change: {station_trend['decadal_change']:+.3f}°C")
    print(f"  Statistical significance: p = {station_trend['p_value']:.4f}")

    print("\n6. VERTICAL STRUCTURE")
    print("-" * 80)
    depth_temps = station_data.groupby("depth_m")["temperature_c"].mean()
    for depth in sorted(station_data["depth_m"].unique()):
        print(f"  {depth:>4.0f}m: {depth_temps[depth]:>6.2f}°C")

    print("\n")

In [None]:
# Create final comprehensive comparison visualization
fig = plt.figure(figsize=(18, 12))
gs = fig.add_gridspec(3, 3, hspace=0.3, wspace=0.3)

# 1. Temperature profiles
ax1 = fig.add_subplot(gs[0, 0])
for station, color in zip(stations, colors):
    station_prof = profile_data[profile_data["station_id"] == station]
    ax1.plot(
        station_prof["mean"],
        station_prof["depth_m"],
        marker="o",
        linewidth=2,
        color=color,
        label=station_prof["station_name"].iloc[0],
    )
ax1.invert_yaxis()
ax1.set_xlabel("Temperature (°C)", fontweight="bold")
ax1.set_ylabel("Depth (m)", fontweight="bold")
ax1.set_title("Temperature Profiles", fontweight="bold")
ax1.legend(fontsize=8)
ax1.grid(True, alpha=0.3)

# 2. Seasonal cycle
ax2 = fig.add_subplot(gs[0, 1])
for station, color in zip(stations, colors):
    station_monthly = monthly_clim[monthly_clim["station_id"] == station].sort_values("month")
    ax2.plot(
        station_monthly["month"],
        station_monthly["mean"],
        marker="o",
        linewidth=2,
        color=color,
        label=station_monthly["station_name"].iloc[0],
    )
ax2.set_xlabel("Month", fontweight="bold")
ax2.set_ylabel("SST (°C)", fontweight="bold")
ax2.set_title("Seasonal Cycle", fontweight="bold")
ax2.set_xticks([1, 3, 5, 7, 9, 11])
ax2.legend(fontsize=8)
ax2.grid(True, alpha=0.3)

# 3. Warming trends
ax3 = fig.add_subplot(gs[0, 2])
station_names = [t["station_name"][:10] for t in trend_results]
warming_rates = [t["decadal_change"] for t in trend_results]
significance_colors = [
    "red" if t["significance"] in ["*", "**", "***"] else "gray" for t in trend_results
]
ax3.bar(
    range(len(stations)), warming_rates, color=significance_colors, edgecolor="black", linewidth=1.5
)
ax3.axhline(y=0, color="black", linestyle="-", linewidth=1)
ax3.set_xticks(range(len(stations)))
ax3.set_xticklabels(station_names, rotation=15, ha="right")
ax3.set_ylabel("10-year Change (°C)", fontweight="bold")
ax3.set_title("Warming Trends", fontweight="bold")
ax3.grid(True, alpha=0.3, axis="y")

# 4. Stratification
ax4 = fig.add_subplot(gs[1, 0])
for station, color in zip(stations, colors):
    station_strat = strat_df[strat_df["station_id"] == station].sort_values("date")
    monthly_strat = station_strat.groupby("month")["stratification_index"].mean()
    ax4.plot(
        monthly_strat.index,
        monthly_strat.values,
        marker="o",
        linewidth=2,
        color=color,
        label=station_strat["station_name"].iloc[0],
    )
ax4.set_xlabel("Month", fontweight="bold")
ax4.set_ylabel("Stratification (°C)", fontweight="bold")
ax4.set_title("Stratification Index", fontweight="bold")
ax4.legend(fontsize=8)
ax4.grid(True, alpha=0.3)

# 5. O2 profiles
ax5 = fig.add_subplot(gs[1, 1])
for station, color in zip(stations, colors):
    station_o2 = o2_profile_data[o2_profile_data["station_id"] == station]
    ax5.plot(
        station_o2["mean"],
        station_o2["depth_m"],
        marker="o",
        linewidth=2,
        color=color,
        label=station_o2["station_name"].iloc[0],
    )
ax5.invert_yaxis()
ax5.set_xlabel("O2 (mL/L)", fontweight="bold")
ax5.set_ylabel("Depth (m)", fontweight="bold")
ax5.set_title("Oxygen Profiles", fontweight="bold")
ax5.legend(fontsize=8)
ax5.grid(True, alpha=0.3)

# 6. T-S comparison
ax6 = fig.add_subplot(gs[1, 2])
for station, color in zip(stations, colors):
    station_ts = df[df["station_id"] == station]
    ax6.scatter(
        station_ts["salinity_psu"],
        station_ts["temperature_c"],
        s=5,
        alpha=0.3,
        color=color,
        label=station_ts["station_name"].iloc[0],
    )
ax6.set_xlabel("Salinity (PSU)", fontweight="bold")
ax6.set_ylabel("Temperature (°C)", fontweight="bold")
ax6.set_title("T-S Diagram", fontweight="bold")
ax6.legend(fontsize=8)
ax6.grid(True, alpha=0.3)

# 7-9. Station-specific summary statistics
for idx, station in enumerate(stations):
    ax = fig.add_subplot(gs[2, idx])
    station_data = df[df["station_id"] == station]

    # Create summary table
    summary_text = (
        f"{station_data['station_name'].iloc[0]}\n{station_data['latitude'].iloc[0]}°\n\n"
    )
    summary_text += "Temperature:\n"
    summary_text += f"  Mean: {station_data['temperature_c'].mean():.2f}°C\n"
    summary_text += f"  Range: {station_data['temperature_c'].min():.1f}-{station_data['temperature_c'].max():.1f}°C\n\n"
    summary_text += "Salinity:\n"
    summary_text += f"  Mean: {station_data['salinity_psu'].mean():.2f} PSU\n\n"
    summary_text += "Oxygen:\n"
    summary_text += f"  Mean: {station_data['dissolved_oxygen_ml_l'].mean():.2f} mL/L\n\n"

    station_trend = next(t for t in trend_results if t["station"] == station)
    summary_text += (
        f"Trend: {station_trend['decadal_change']:+.3f}°C/10yr {station_trend['significance']}"
    )

    ax.text(
        0.1,
        0.5,
        summary_text,
        transform=ax.transAxes,
        fontsize=10,
        verticalalignment="center",
        fontfamily="monospace",
        bbox={"boxstyle": "round", "facecolor": colors[idx], "alpha": 0.2},
    )
    ax.axis("off")

plt.suptitle(
    "Comprehensive Oceanographic Analysis Summary", fontsize=18, fontweight="bold", y=0.995
)
plt.show()

In [None]:
# Final key findings
print("\n" + "=" * 80)
print("KEY OCEANOGRAPHIC FINDINGS")
print("=" * 80)

print("\n1. THERMAL STRUCTURE")
print("   - All stations exhibit clear thermocline structure")
print("   - Thermocline depth varies by latitude and season")
print("   - Strongest stratification in summer months")

print("\n2. SEASONAL VARIABILITY")
print("   - Mid-latitude stations show large seasonal amplitude (>10°C)")
print("   - Equatorial station shows minimal seasonal variation (<3°C)")
print("   - Seasonal cycle strongest at surface, diminishes with depth")

print("\n3. CLIMATE TRENDS")
all_warming = all(t["decadal_change"] > 0 for t in trend_results)
if all_warming:
    print("   - All stations show positive warming trends")
    mean_warming = np.mean([t["decadal_change"] for t in trend_results])
    print(f"   - Average warming rate: {mean_warming:.3f}°C per decade")
sig_count = sum([1 for t in trend_results if t["significance"] != "ns"])
print(f"   - {sig_count}/{len(trend_results)} stations show statistically significant trends")

print("\n4. WATER MASS CHARACTERISTICS")
print("   - Distinct T-S signatures for each station")
print("   - Temperature-oxygen relationships show biological influence")
print("   - Vertical oxygen profiles reflect ventilation and productivity")

print("\n5. STRATIFICATION DYNAMICS")
print("   - Seasonal stratification cycle evident at mid-latitudes")
print("   - Mixed layer depth shows strong seasonal variation")
print("   - Equatorial station maintains persistent stratification")

print("\n" + "=" * 80)
print("ANALYSIS COMPLETE")
print("=" * 80)
print(f"Total observations analyzed: {len(df):,}")
print(f"Time period: {df['date'].min().strftime('%Y')} - {df['date'].max().strftime('%Y')}")
print(f"Stations: {len(stations)}")
print("Parameters: Temperature, Salinity, Dissolved Oxygen")
print("\nThis analysis demonstrates comprehensive oceanographic methods including:")
print("  - Vertical profiling and thermocline analysis")
print("  - Seasonal decomposition and climatology")
print("  - Trend detection and statistical significance testing")
print("  - Anomaly analysis and extreme event identification")
print("  - Stratification and mixing dynamics")
print("  - Water mass characterization (T-S diagrams)")
print("  - Biogeochemical cycling (dissolved oxygen)")
print("=" * 80)