# 10 Minutes to kanoa

Understanding Water Quality Trends with AI-Powered Analytics

Everyone understands water quality - it's relatable, visual, and has real-world stakes. This example uses publicly available data from California's Harmful Algal Bloom Monitoring program ([CalHABMAP](https://calhabmap.org/)), which monitors water quality at piers along the California coast.

## Setup

First, let's import the necessary libraries and configure our environment.

In [None]:
from pathlib import Path

import matplotlib.pyplot as plt
import pandas as pd
from dotenv import load_dotenv

# Import kanoa
from kanoa import AnalyticsInterpreter

# Load API keys from user config
config_dir = Path.home() / ".config" / "kanoa"
if (config_dir / ".env").exists():
    load_dotenv(config_dir / ".env")

print("✓ Setup complete!")

## Load Water Quality Data

We'll use data from Scripps Pier in La Jolla, CA. This pier has been monitored weekly since 2008, providing temperature, chlorophyll, nutrients, and harmful algae counts.

**Data Source**: [SCCOOS ERDDAP](https://erddap.sccoos.org) - California HAB Monitoring and Alert Program

In [None]:
"""
Analyzing coastal water quality at Scripps Pier, La Jolla, CA
Data: California HAB Monitoring and Alert Program (CalHABMAP)
Source: SCCOOS ERDDAP - https://erddap.sccoos.org
"""

# Load water quality data from Scripps Pier (weekly monitoring since 2008)
# Data includes: temperature, chlorophyll, nutrients, harmful algae counts
url = (
    "https://erddap.sccoos.org/erddap/tabledap/HABs-ScrippsPier.csv"
    "?time,Temp,Avg_Chloro,no3,nh4,po4,Pseudo_nitzschia_seriata_group"
    "&time>=2024-01-01&time<=2024-10-31"
)

# Read the data, skipping the units row
df = pd.read_csv(url, skiprows=[1])
df["time"] = pd.to_datetime(df["time"])

print(f"Loaded {len(df)} water quality observations from 2024")
print("\nFirst few rows:")
df.head()

## Visualize the Data

Create a multi-panel visualization showing:
- **Panel 1**: Water temperature (seasonal patterns)
- **Panel 2**: Chlorophyll-a (algal biomass indicator)
- **Panel 3**: Harmful algae (Pseudo-nitzschia - produces domoic acid neurotoxin)

In [None]:
# Create a multi-panel visualization
fig, axes = plt.subplots(3, 1, figsize=(12, 10), sharex=True)

# Panel 1: Water temperature
axes[0].plot(df["time"], df["Temp"], "b-", linewidth=1.5)
axes[0].set_ylabel("Temperature (°C)", fontsize=11)
axes[0].set_title("Scripps Pier Water Quality - 2024", fontsize=14, fontweight="bold")
axes[0].grid(alpha=0.3)

# Panel 2: Chlorophyll (algal biomass indicator)
axes[1].fill_between(df["time"], 0, df["Avg_Chloro"], alpha=0.4, color="green")
axes[1].plot(df["time"], df["Avg_Chloro"], "g-", linewidth=1.5)
axes[1].set_ylabel("Chlorophyll-a (mg/m³)", fontsize=11)
axes[1].grid(alpha=0.3)

# Panel 3: Harmful algae (Pseudo-nitzschia - produces domoic acid neurotoxin)
axes[2].bar(
    df["time"], df["Pseudo_nitzschia_seriata_group"], width=5, color="red", alpha=0.7
)
axes[2].set_ylabel("Pseudo-nitzschia\n(cells/L)", fontsize=11)
axes[2].set_xlabel("Date", fontsize=11)
axes[2].grid(alpha=0.3)

plt.tight_layout()
plt.show()

## Interpret with kanoa

This is where kanoa shines. Instead of manually analyzing the patterns, let's get an AI-powered interpretation.

**Note**: Make sure you have set up your API key. See the [Authentication Guide](https://github.com/lhzn-io/kanoa/blob/main/docs/source/user_guide/authentication.md) for details.

In [None]:
# Initialize the interpreter (defaults to Gemini)
# Ensure GOOGLE_API_KEY is set in your environment
interpreter = AnalyticsInterpreter(backend="gemini-3")

# Get AI-powered interpretation of the water quality trends
result = interpreter.interpret(
    fig=fig,
    context="Coastal water quality monitoring at Scripps Pier, La Jolla, CA",
    focus="Identify any concerning patterns, potential HAB events, and seasonal trends",
)

# The result is automatically displayed in Jupyter notebooks
# You can also access the text directly:
# print(result.text)

## What You Get

kanoa analyzes the visualization and returns a structured interpretation including:

- **Visual Summary**: Description of the data patterns and ranges
- **Key Observations**: Identification of important trends and events
- **Technical Interpretation**: Scientific explanation of the observed patterns
- **Recommendations**: Actionable next steps for further analysis

The interpretation is grounded in the context you provide and focuses on the aspects you specify.

## Why This Example Works

1. **Universal relevance**: Everyone drinks water, goes to the beach, cares about ocean health
2. **Real public data**: Actual monitoring data freely available via SCCOOS ERDDAP
3. **Visual + quantitative**: Multiple data types in one figure (temp, chlorophyll, cell counts)
4. **Stakes are clear**: Harmful algal blooms cause real problems (2014 Toledo water crisis, marine mammal deaths)
5. **Actionable output**: The interpretation suggests concrete next steps

## Additional Data Sources

### Other California Piers

All CalHABMAP stations use the same ERDDAP pattern:

In [None]:
# Example: Load data from multiple piers
base_url = "https://erddap.sccoos.org/erddap/tabledap"
datasets = [
    "HABs-ScrippsPier",  # San Diego
    "HABs-NewportBeachPier",  # Orange County
    "HABs-SantaMonicaPier",  # LA
    "HABs-StearnsWharf",  # Santa Barbara
    "HABs-CalPolyPier",  # San Luis Obispo
    "HABs-MontereyWharf",  # Monterey
    "HABs-SantaCruzWharf",  # Santa Cruz
]

# Example query for any station:
# url = f"{base_url}/{dataset}.csv?time,Temp,Avg_Chloro&time>=2024-01-01"
# df = pd.read_csv(url, skiprows=[1])

### Alternative Water Quality Datasets

The same pattern works with other water quality monitoring programs:

- **Lake Erie HAB Monitoring**: [NOAA GLERL](https://www.ncei.noaa.gov/access/metadata/landing-page/bin/iso?id=gov.noaa.nodc:GLERL-CIGLR-HAB-LakeErie-water-qual)
- **EPA National Water Quality**: [National Aquatic Resource Surveys](https://www.epa.gov/national-aquatic-resource-surveys)
- **USGS Water Quality**: [Water Quality Portal](https://www.waterqualitydata.us/)

## Next Steps

Now that you've seen kanoa in action, explore more features:

- Try different backends (Claude, OpenAI)
- Add a knowledge base with domain-specific documents
- Interpret DataFrames directly
- Track costs and token usage

Check out the [User Guide](https://github.com/lhzn-io/kanoa/blob/main/docs/source/user_guide/index.md) for more advanced examples.