# **Wetland Monitoring and Alert System**

### **Goal**
Develop a machine learning workflow to monitor and protect the wetland ecosystem by:
1. Predicting a **Water Quality Index (WQI)** based on numerical environmental variables.
2. Triggering **Early Warning Alerts** for abnormal or dangerous water/air conditions.

The system focuses on numerical sensor data (water and air quality) to generate actionable insights for ecosystem preservation.

#### **Data Inputs**
The workflow uses the following numerical variables:

| Feature | Source / Description |
|---------|--------------------|
| Water Temperature (°C) | Thermometer / Submersible probe |
| pH Level | External pH sensor |
| Dissolved Oxygen (mg/L) | Water quality dataset/API |
| Turbidity (NTU) | Water quality sensor |
| Nitrate & Phosphate (mg/L) | Open environmental datasets |
| Water Surface Temperature Gradient | Derived from IR sensor |
| Air Temperature (°C) | Onboard thermometers |
| Humidity (%) | Air sensor |
| Air Quality Index (AQI) | AIR sensor / OpenAQ API |
| Temporal Change Rate | Derived from sequential readings |
| Pollution Correlation Index | Derived (AQI + Nitrate + Phosphate + Turbidity) |

### **Pipeline & Code Structure**

#### 1. Data Preprocessing
- Load sensor and derived numerical data
- Normalize features (MinMaxScaler / StandardScaler)
- Handle missing values and outliers
- Compute temporal change rates
- Compute derived features (Pollution Correlation Index, WQI target for regression)

### 2. Water Quality Index Model
- **Input:** Water variables + temporal change rates
- **Model:** Linear Regression / Dense Neural Network / XGBoost
- **Output:** WQI score (0–100)
- **Evaluation:** RMSE, MAE, trend correlation

#### 3. Early Warning Alerts Model
- **Input:** Predicted WQI + all water & air variables + derived indices
- **Model:** Logistic Regression / Autoencoder / XGBoost Classifier
- **Output:** Binary alert (Normal / Alert) + severity score
- **Evaluation:** Confusion Matrix, Precision, Recall, F1-score

#### 4. Visualization & Reporting
- Plot WQI trends over time
- Visualize anomalies and triggered alerts
- Export evaluation metrics

# **Data Collection and Processing**

In [5]:
import requests
import pandas as pd
from datetime import datetime, timedelta

# -------------------------------
# Define time windows
# -------------------------------
time_windows = {
    "morning": (6, 12),   # 6:00 - 11:59
    "evening": (12, 18),  # 12:00 - 17:59
    "night":   (18, 24)   # 18:00 - 23:59
}

# -------------------------------
# 1. Fetch Weather Data (Open-Meteo)
# -------------------------------
def fetch_weather_simple(lat=25.4052, lon=55.5136, days=3):
    start = datetime.utcnow().date() - timedelta(days=days-1)
    end = datetime.utcnow().date()
    
    url = (
        f"https://api.open-meteo.com/v1/forecast?"
        f"latitude={lat}&longitude={lon}&hourly=temperature_2m,relative_humidity_2m,cloudcover,windspeed_10m"
        f"&start={start}&end={end}&timezone=GMT"
    )
    response = requests.get(url).json()
    
    df = pd.DataFrame(response.get("hourly", {}))
    if df.empty or "time" not in df.columns:
        return pd.DataFrame()
    
    df["timestamp"] = pd.to_datetime(df["time"])
    df = df.drop(columns=["time"])
    
    # Assign coarse time windows
    df["hour"] = df["timestamp"].dt.hour
    df["time_window"] = df["hour"].apply(lambda h: next((tw for tw, (start, end) in time_windows.items() if start <= h < end), "night"))
    
    # Aggregate by day + time_window
    df["date"] = df["timestamp"].dt.date
    agg_df = df.groupby(["date", "time_window"]).mean().reset_index()
    return agg_df

# -------------------------------
# 2. Fetch AQI Data (OpenAQ) and average by day
# -------------------------------
def fetch_aqi_simple(city="Ajman", days=3, limit=100):
    url = f"https://api.openaq.org/v2/latest?city={city}&limit={limit}"
    response = requests.get(url).json()
    
    data = []
    for result in response.get("results", []):
        measurements = {m["parameter"]: m["value"] for m in result.get("measurements", [])}
        timestamp = result.get("measurements")[0]["lastUpdated"] if result.get("measurements") else None
        if timestamp:
            data.append({
                "location": result.get("location"),
                "AQI_PM2_5": measurements.get("pm25"),
                "AQI_PM10": measurements.get("pm10"),
                "AQI_NO2": measurements.get("no2"),
                "AQI_O3": measurements.get("o3"),
                "timestamp": pd.to_datetime(timestamp)
            })
    df = pd.DataFrame(data)
    if df.empty:
        return pd.DataFrame()
    
    df["date"] = df["timestamp"].dt.date
    df["hour"] = df["timestamp"].dt.hour
    df["time_window"] = df["hour"].apply(lambda h: next((tw for tw, (start, end) in time_windows.items() if start <= h < end), "night"))
    
    agg_df = df.groupby(["date", "time_window"]).mean().reset_index()
    return agg_df

# -------------------------------
# 3. Combine Weather + AQI
# -------------------------------
weather_df = fetch_weather_simple()
aqi_df = fetch_aqi_simple()

if not weather_df.empty and not aqi_df.empty:
    combined_df = pd.merge(weather_df, aqi_df, on=["date", "time_window"], how="outer")
else:
    combined_df = weather_df if not weather_df.empty else aqi_df

print(combined_df.head())


  start = datetime.utcnow().date() - timedelta(days=days-1)
  end = datetime.utcnow().date()


         date time_window  temperature_2m  relative_humidity_2m  cloudcover  \
0  2025-11-08     evening       28.600000             49.166667    0.000000   
1  2025-11-08     morning       28.066667             45.666667    0.000000   
2  2025-11-08       night       26.891667             52.500000    0.000000   
3  2025-11-09     evening       28.433333             53.833333   53.666667   
4  2025-11-09     morning       28.133333             53.666667    3.500000   

   windspeed_10m           timestamp  hour  
0      14.850000 2025-11-08 14:30:00  14.5  
1      10.116667 2025-11-08 08:30:00   8.5  
2       9.350000 2025-11-08 11:30:00  11.5  
3      17.683333 2025-11-09 14:30:00  14.5  
4      10.966667 2025-11-09 08:30:00   8.5  


## **Water Quality Index (WQI) Prediction**

In [7]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error

# -------------------------------
# 1. Simulate WQI (handles missing columns)
# -------------------------------
def simulate_wqi(df):
    # Normalize pollutants if available, else default 0
    pm25_norm = df['AQI_PM2_5'] / 100 if 'AQI_PM2_5' in df else 0
    pm10_norm = df['AQI_PM10'] / 200 if 'AQI_PM10' in df else 0
    no2_norm  = df['AQI_NO2'] / 50 if 'AQI_NO2' in df else 0
    
    # Temperature penalty (ideal 20-25°C)
    temp_penalty = np.abs(df['temperature_2m'] - 22.5)/10 if 'temperature_2m' in df else 0
    
    # Combine factors (weights can be tuned)
    pollution_score = 0.4*pm25_norm + 0.3*pm10_norm + 0.3*no2_norm
    total_penalty = pollution_score + temp_penalty
    wqi = 100 * (1 - np.clip(total_penalty, 0, 1))
    return wqi

# -------------------------------
# 2. Preprocess Dataset
# -------------------------------
if not combined_df.empty:
    
    # Ensure all columns exist
    for col in ['AQI_PM2_5', 'AQI_PM10', 'AQI_NO2', 'temperature_2m', 
                'relative_humidity_2m', 'cloudcover', 'windspeed_10m']:
        if col not in combined_df.columns:
            combined_df[col] = 0
    
    # Interpolate missing values
    combined_df.interpolate(method='linear', limit_direction='forward', inplace=True)
    combined_df.fillna(method='bfill', inplace=True)
    
    # Simulate WQI
    combined_df['WQI'] = simulate_wqi(combined_df)
    
    # Features and target
    features = ['temperature_2m', 'relative_humidity_2m', 'cloudcover', 
                'windspeed_10m', 'AQI_PM2_5', 'AQI_PM10', 'AQI_NO2']
    target = 'WQI'
    
    model_df = combined_df[features + [target]].dropna()
    X = model_df[features]
    y = model_df[target]
    
    # Train-test split
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42
    )
    
    # Feature scaling
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    # -------------------------------
    # 3. Train and Evaluate Models
    # -------------------------------
    models = {
        'Linear Regression': LinearRegression(),
        'Random Forest': RandomForestRegressor(random_state=42)
    }
    
    for name, model in models.items():
        model.fit(X_train_scaled, y_train)
        y_pred = model.predict(X_test_scaled)
        rmse = np.sqrt(mean_squared_error(y_test, y_pred))
        mae = mean_absolute_error(y_test, y_pred)
        print(f'--- {name} ---')
        print(f'RMSE: {rmse:.4f}, MAE: {mae:.4f}')
    
    # -------------------------------
    # 4. Display Sample Predictions
    # -------------------------------
    predictions_df = pd.DataFrame({
        'Actual WQI': y_test,
        'Predicted WQI': models['Random Forest'].predict(X_test_scaled)
    }).reset_index(drop=True)
    print("--- Sample Predictions ---")
    print(predictions_df.head())
    
else:
    print("Combined dataframe is empty. Cannot proceed with modeling.")


--- Linear Regression ---
RMSE: 0.0000, MAE: 0.0000
--- Random Forest ---
RMSE: 1.0783, MAE: 0.8453
--- Sample Predictions ---
   Actual WQI  Predicted WQI
0   39.000000      39.805000
1   51.333333      51.804167
2   44.333333      43.575000
3   44.333333      44.438333
4   47.833333      49.920833


  combined_df.interpolate(method='linear', limit_direction='forward', inplace=True)
  combined_df.fillna(method='bfill', inplace=True)
