# **Water Monitoring and Statistical Model**

* Provides **ecosystem health score** (0‚Äì100).
* Predicts **potential risks**, e.g., low oxygen or algal bloom.
* Gives **recommendations**, e.g., ‚Äúadd oxygenation‚Äù or ‚Äúmonitor nutrients closely.‚Äù

## **Model Construction**

1. **Ecosystem Health Scorer** ‚Äì Inputs: DO, pH, Temp, Turbidity, Nutrients; Model: Random Forest Regression; Output: Health Score (0‚Äì100)
2. **Algal Bloom Predictor** ‚Äì Inputs: Chlorophyll-a, Nitrate, Phosphate, Temp, Sunlight; Model: Gradient Boosting Classifier; Output: Bloom Risk (Low/Medium/High)
3. **Fish Population Estimator** ‚Äì Inputs: DO, Temp, Turbidity, Conductivity, Depth, Camera/Sonar Counts; Model: LSTM Time-Series Regression; Output: Predicted Fish Count per Zone
4. **Anomaly Detector** ‚Äì Inputs: All sensor data over time; Model: Autoencoder Neural Network; Output: Alerts for unusual ecosystem changes
5. **Water Quality Trend Forecaster** ‚Äì Inputs: Historical DO, pH, Turbidity, Nutrients; Model: Temporal CNN; Output: Next 24‚Äì48h projections for key water metrics

In [2]:
import requests
import datetime
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score

## **Data Collection & Pre-processing**

In [15]:
def fetch_environmental_data(lat=25.43, lon=55.48, days=30):
    """
    Fetches historical environmental and water-quality data for given coordinates 
    using Open-Meteo Archive API (for weather) and Marine API (for sea data when available).
    Synthesizes missing parameters to form a full feature set.
    Returns a combined DataFrame with 5 resampled daily entries.
    """
    import requests, pandas as pd, numpy as np
    from datetime import datetime, timedelta, timezone

    # --- Date setup (UTC and timezone-aware) ---
    end_date = datetime.now(timezone.utc).strftime('%Y-%m-%d')
    start_date = (datetime.now(timezone.utc) - timedelta(days=days)).strftime('%Y-%m-%d')

    print(f"Fetching data from {start_date} ‚Üí {end_date}")

    # --- WEATHER (Archive API) ---
    weather_url = (
        f"https://archive-api.open-meteo.com/v1/archive?"
        f"latitude={lat}&longitude={lon}"
        f"&hourly=temperature_2m,relative_humidity_2m,uv_index"
        f"&start_date={start_date}&end_date={end_date}"
    )

    # --- MARINE (Open-Meteo Marine API, if available) ---
    marine_url = (
        f"https://marine-api.open-meteo.com/v1/marine?"
        f"latitude={lat}&longitude={lon}"
        f"&hourly=sea_surface_temperature,wave_height"
        f"&start_date={start_date}&end_date={end_date}"
    )

    print("Fetching weather data...")
    weather = requests.get(weather_url).json()
    print("Fetching marine data...")
    marine = requests.get(marine_url).json()

    if "hourly" not in weather:
        print("‚ö†Ô∏è Weather data unavailable ‚Äî check Open-Meteo Archive API.")
        return pd.DataFrame()

    df_weather = pd.DataFrame(weather["hourly"])

    # --- Handle marine data gracefully ---
    if "hourly" in marine:
        df_marine = pd.DataFrame(marine["hourly"])
        df = pd.merge(df_weather, df_marine, on="time", how="outer")
        print("üåä Marine data merged successfully.")
    else:
        df = df_weather.copy()
        print("‚ö†Ô∏è Marine data unavailable ‚Äî using synthetic sea variables.")

    # --- Core variable transformations ---
    df["timestamp"] = pd.to_datetime(df["time"], errors="coerce")
    df["air_temp_c"] = df.get("temperature_2m", np.nan)
    df["sea_surface_temp_c"] = df.get("sea_surface_temperature", np.nan).fillna(
        df["air_temp_c"] - np.random.uniform(0.5, 2.0)
    )
    df["wave_height_m"] = df.get("wave_height", np.nan).fillna(np.random.uniform(0.3, 1.0))

    # --- Synthetic & Derived Water Parameters ---
    df["pH"] = 7.0 + 0.1 * np.sin(df["air_temp_c"] / 10)
    df["dissolved_oxygen_mgL"] = 14 - 0.3 * df["air_temp_c"]
    df["turbidity_NTU"] = np.abs(np.random.normal(5, 1.5, len(df)))
    df["conductivity_uScm"] = 500 + 10 * df["wave_height_m"]
    df["ammonia_mgL"] = np.abs(np.random.normal(0.05, 0.015, len(df)))
    df["nitrate_mgL"] = np.abs(np.random.normal(0.2, 0.04, len(df)))
    df["phosphate_mgL"] = np.abs(np.random.normal(0.1, 0.025, len(df)))
    df["chlorophyll_a_ugL"] = np.abs(np.random.normal(3, 0.8, len(df)))
    df["water_level_m"] = 1.0 + 0.1 * np.sin(df["wave_height_m"])
    df["pCO2_ppm"] = np.abs(np.random.normal(400, 12, len(df)))
    df["dissolved_CO2_mgL"] = df["pCO2_ppm"] * 0.03

    # --- Filter numeric and resample to consistent 5 daily samples ---
    numeric_df = df.select_dtypes(include=[np.number])
    numeric_df["timestamp"] = df["timestamp"]

    df_resampled = (
        numeric_df
        .set_index("timestamp")
        .resample("18H")
        .mean(numeric_only=True)
        .head(5)
        .reset_index()
    )

    # --- Final column ordering ---
    columns = [
        "timestamp", "air_temp_c", "sea_surface_temp_c", "pH",
        "dissolved_oxygen_mgL", "turbidity_NTU", "conductivity_uScm",
        "ammonia_mgL", "nitrate_mgL", "phosphate_mgL", "chlorophyll_a_ugL",
        "water_level_m", "pCO2_ppm", "dissolved_CO2_mgL"
    ]
    df_resampled = df_resampled.reindex(columns=columns)

    print("‚úÖ Environmental data fetched and processed successfully.")
    return df_resampled


# Run once to verify
df = fetch_environmental_data()
print("\nData collection complete.\n")
print(df.head())


Fetching data from 2025-10-14 ‚Üí 2025-11-13
Fetching weather data...
Fetching marine data...
üåä Marine data merged successfully.
‚úÖ Environmental data fetched and processed successfully.

Data collection complete.

            timestamp  air_temp_c  sea_surface_temp_c        pH  \
0 2025-10-14 00:00:00   31.394444           31.794444  6.999928   
1 2025-10-14 18:00:00   30.116667           31.577778  7.012525   
2 2025-10-15 12:00:00   28.561111           31.544444  7.027563   
3 2025-10-16 06:00:00   31.138889           31.466667  7.002722   
4 2025-10-17 00:00:00   30.883333           31.283333  7.005108   

   dissolved_oxygen_mgL  turbidity_NTU  conductivity_uScm  ammonia_mgL  \
0              4.581667       4.800417         502.466667     0.050214   
1              4.965000       6.341233         505.600000     0.052344   
2              5.431667       5.312245         503.511111     0.047994   
3              4.658333       5.254385         503.255556     0.051561   
4       

  .resample("18H")


# **Exploratory Data Analysis (EDA)**

In [21]:
print("-------------------------------------------------")
print("             Dataset Points Overview             ")
print("-------------------------------------------------")
df.info()

print("---------------------------------------------------------")
print ("              Dataset Mathematical Summary              ")
print("---------------------------------------------------------")
df.describe()

-------------------------------------------------
             Dataset Points Overview             
-------------------------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 14 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   timestamp             5 non-null      datetime64[ns]
 1   air_temp_c            5 non-null      float64       
 2   sea_surface_temp_c    5 non-null      float64       
 3   pH                    5 non-null      float64       
 4   dissolved_oxygen_mgL  5 non-null      float64       
 5   turbidity_NTU         5 non-null      float64       
 6   conductivity_uScm     5 non-null      float64       
 7   ammonia_mgL           5 non-null      float64       
 8   nitrate_mgL           5 non-null      float64       
 9   phosphate_mgL         5 non-null      float64       
 10  chlorophyll_a_ugL     5 non-null      float64   

Unnamed: 0,timestamp,air_temp_c,sea_surface_temp_c,pH,dissolved_oxygen_mgL,turbidity_NTU,conductivity_uScm,ammonia_mgL,nitrate_mgL,phosphate_mgL,chlorophyll_a_ugL,water_level_m,pCO2_ppm,dissolved_CO2_mgL
count,5,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0
mean,2025-10-15 12:00:00,30.418889,31.533333,7.009569,4.874333,5.51541,503.444444,0.051085,0.195452,0.099233,2.980603,1.033424,402.529368,12.075881
min,2025-10-14 00:00:00,28.561111,31.283333,6.999928,4.581667,4.800417,502.388889,0.047994,0.187864,0.09303,2.699603,1.023652,395.663227,11.869897
25%,2025-10-14 18:00:00,30.116667,31.466667,7.002722,4.658333,5.254385,502.466667,0.050214,0.190687,0.099811,2.906383,1.024108,400.761808,12.022854
50%,2025-10-15 12:00:00,30.883333,31.544444,7.005108,4.735,5.312245,503.255556,0.051561,0.194457,0.10024,3.021185,1.031953,405.207989,12.15624
75%,2025-10-16 06:00:00,31.138889,31.577778,7.012525,4.965,5.868771,503.511111,0.052344,0.201759,0.100958,3.128888,1.03431,405.490446,12.164713
max,2025-10-17 00:00:00,31.394444,31.794444,7.027563,5.431667,6.341233,505.6,0.053311,0.202493,0.102125,3.146953,1.053095,405.52337,12.165701
std,,1.143295,0.185218,0.011093,0.342988,0.597384,1.299691,0.002066,0.006531,0.003577,0.18424,0.01196,4.335182,0.130055
