# **Wetland Monitoring and Alert System**

### **Goal**
Develop a machine learning workflow to monitor and protect the wetland ecosystem by:
1. Predicting a **Water Quality Index (WQI)** based on numerical environmental variables.
2. Triggering **Early Warning Alerts** for abnormal or dangerous water/air conditions.

The system focuses on numerical sensor data (water and air quality) to generate actionable insights for ecosystem preservation.

#### **Data Inputs**
The workflow uses the following numerical variables:

| Feature | Source / Description |
|---------|--------------------|
| Water Temperature (°C) | Thermometer / Submersible probe |
| pH Level | External pH sensor |
| Dissolved Oxygen (mg/L) | Water quality dataset/API |
| Turbidity (NTU) | Water quality sensor |
| Nitrate & Phosphate (mg/L) | Open environmental datasets |
| Water Surface Temperature Gradient | Derived from IR sensor |
| Air Temperature (°C) | Onboard thermometers |
| Humidity (%) | Air sensor |
| Air Quality Index (AQI) | AIR sensor / OpenAQ API |
| Temporal Change Rate | Derived from sequential readings |
| Pollution Correlation Index | Derived (AQI + Nitrate + Phosphate + Turbidity) |

### **Pipeline & Code Structure**

#### 1. Data Preprocessing
- Load sensor and derived numerical data
- Normalize features (MinMaxScaler / StandardScaler)
- Handle missing values and outliers
- Compute temporal change rates
- Compute derived features (Pollution Correlation Index, WQI target for regression)

### 2. Water Quality Index Model
- **Input:** Water variables + temporal change rates
- **Model:** Linear Regression / Dense Neural Network / XGBoost
- **Output:** WQI score (0–100)
- **Evaluation:** RMSE, MAE, trend correlation

#### 3. Early Warning Alerts Model
- **Input:** Predicted WQI + all water & air variables + derived indices
- **Model:** Logistic Regression / Autoencoder / XGBoost Classifier
- **Output:** Binary alert (Normal / Alert) + severity score
- **Evaluation:** Confusion Matrix, Precision, Recall, F1-score

#### 4. Visualization & Reporting
- Plot WQI trends over time
- Visualize anomalies and triggered alerts
- Export evaluation metrics

# **Data Collection and Processing**

In [4]:
import requests
import pandas as pd
from datetime import datetime, timedelta

# -------------------------------
# Define time windows
# -------------------------------
time_windows = {
    "morning": (6, 12),   # 6:00 - 11:59
    "evening": (12, 18),  # 12:00 - 17:59
    "night":   (18, 24)   # 18:00 - 23:59
}

# -------------------------------
# 1. Fetch Weather Data (Open-Meteo)
# -------------------------------
def fetch_weather_simple(lat=25.4052, lon=55.5136, days=3):
    start = datetime.utcnow().date() - timedelta(days=days-1)
    end = datetime.utcnow().date()
    
    url = (
        f"https://api.open-meteo.com/v1/forecast?"
        f"latitude={lat}&longitude={lon}&hourly=temperature_2m,relative_humidity_2m,cloudcover,windspeed_10m"
        f"&start={start}&end={end}&timezone=GMT"
    )
    response = requests.get(url).json()
    
    df = pd.DataFrame(response.get("hourly", {}))
    if df.empty or "time" not in df.columns:
        return pd.DataFrame()
    
    df["timestamp"] = pd.to_datetime(df["time"])
    df = df.drop(columns=["time"])
    
    # Assign coarse time windows
    df["hour"] = df["timestamp"].dt.hour
    df["time_window"] = df["hour"].apply(lambda h: next((tw for tw, (start, end) in time_windows.items() if start <= h < end), "night"))
    
    # Aggregate by day + time_window
    df["date"] = df["timestamp"].dt.date
    agg_df = df.groupby(["date", "time_window"]).mean().reset_index()
    return agg_df

# -------------------------------
# 2. Fetch AQI Data (OpenAQ) and average by day
# -------------------------------
def fetch_aqi_simple(city="Ajman", days=3, limit=100):
    url = f"https://api.openaq.org/v2/latest?city={city}&limit={limit}"
    response = requests.get(url).json()
    
    data = []
    for result in response.get("results", []):
        measurements = {m["parameter"]: m["value"] for m in result.get("measurements", [])}
        timestamp = result.get("measurements")[0]["lastUpdated"] if result.get("measurements") else None
        if timestamp:
            data.append({
                "location": result.get("location"),
                "AQI_PM2_5": measurements.get("pm25"),
                "AQI_PM10": measurements.get("pm10"),
                "AQI_NO2": measurements.get("no2"),
                "AQI_O3": measurements.get("o3"),
                "timestamp": pd.to_datetime(timestamp)
            })
    df = pd.DataFrame(data)
    if df.empty:
        return pd.DataFrame()
    
    df["date"] = df["timestamp"].dt.date
    df["hour"] = df["timestamp"].dt.hour
    df["time_window"] = df["hour"].apply(lambda h: next((tw for tw, (start, end) in time_windows.items() if start <= h < end), "night"))
    
    agg_df = df.groupby(["date", "time_window"]).mean().reset_index()
    return agg_df

# -------------------------------
# 3. Combine Weather + AQI
# -------------------------------
weather_df = fetch_weather_simple()
aqi_df = fetch_aqi_simple()

if not weather_df.empty and not aqi_df.empty:
    combined_df = pd.merge(weather_df, aqi_df, on=["date", "time_window"], how="outer")
else:
    combined_df = weather_df if not weather_df.empty else aqi_df

print(combined_df.head())


  start = datetime.utcnow().date() - timedelta(days=days-1)
  end = datetime.utcnow().date()


         date time_window  temperature_2m  relative_humidity_2m  cloudcover  \
0  2025-11-08     evening       28.166667             52.333333    0.000000   
1  2025-11-08     morning       28.066667             45.666667    0.000000   
2  2025-11-08       night       26.558333             55.583333    0.000000   
3  2025-11-09     evening       28.066667             54.333333   17.166667   
4  2025-11-09     morning       27.616667             56.666667    3.000000   

   windspeed_10m           timestamp  hour  
0      15.050000 2025-11-08 14:30:00  14.5  
1      10.116667 2025-11-08 08:30:00   8.5  
2       9.441667 2025-11-08 11:30:00  11.5  
3      16.050000 2025-11-09 14:30:00  14.5  
4      10.766667 2025-11-09 08:30:00   8.5  


## **Water Quality Index**

**Goal** : Predict a single index representing water quality using environmental inputs

- Inputs (features) you currently have :
- Air Temperature (°C)
- Humidity (%)
- AQI values (PM2.5, PM10, NO2, O3)
- Wind speed, cloud cover, light intensity (if available / simulated)

**Target** : Water Quality Index (WQI) — initially you may simulate WQI using known formulas from pH, DO, turbidity, nitrate, phosphate if those variables are missing. Later, replace with real sensor measurements.

**Model type** :

- Linear Regression or Random Forest Regression for WQI prediction.
- Could later upgrade to Gradient Boosting / XGBoost for better performance on small datasets.

**Expected result** :

- A numerical WQI score for each day/time-window.
- You can evaluate with RMSE or MAE to see how accurate your predictions are compared to ground truth or simulated WQI.