<a href="https://colab.research.google.com/github/lscblack/water_quality_model/blob/main/Jolly_UMULISA_water_quality_model_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Jolly's Custom SetUp**

| Parameter          | Value                             |
| ------------------ | --------------------------------- |
| **Optimizer**      | RMSprop                           |
| **Learning Rate**  | 0.002                             |
| **Regularizer**    | L1 (0.005)                        |
| **Dropout Rate**   | 0.4                               |
| **Early Stopping** | Monitor = val\_loss, patience = 6 |


In [None]:
from google.colab import files
uploaded = files.upload()

Saving water_potability.csv to water_potability.csv


 Load the Data

In [None]:
import pandas as pd

# Load dataset
df = pd.read_csv('water_potability.csv')  # Adjust path if needed
df.head()

Unnamed: 0,ph,Hardness,Solids,Chloramines,Sulfate,Conductivity,Organic_carbon,Trihalomethanes,Turbidity,Potability
0,,204.890455,20791.318981,7.300212,368.516441,564.308654,10.379783,86.99097,2.963135,0
1,3.71608,129.422921,18630.057858,6.635246,,592.885359,15.180013,56.329076,4.500656,0
2,8.099124,224.236259,19909.541732,9.275884,,418.606213,16.868637,66.420093,3.055934,0
3,8.316766,214.373394,22018.417441,8.059332,356.886136,363.266516,18.436524,100.341674,4.628771,0
4,9.092223,181.101509,17978.986339,6.5466,310.135738,398.410813,11.558279,31.997993,4.075075,0


**Data Preprocessing** ( Handle the missing values.)

In [None]:
# Fill missing values with column mean
df.fillna(df.mean(), inplace=True)

Split into Features (X) and Target (y)

In [None]:
X = df.drop("Potability", axis=1)
y = df["Potability"]

Train the model

In [None]:
import tensorflow as tf
from tensorflow.keras import layers, regularizers, callbacks

# Jolly's hyperparameters
dropout_rate = 0.4
learning_rate = 0.002
regularizer = regularizers.l1(0.005)
early_stop = callbacks.EarlyStopping(monitor='val_loss', patience=6, restore_best_weights=True)

# Build model
model = tf.keras.Sequential([
    layers.Dense(64, activation='relu', kernel_regularizer=regularizer, input_shape=(X_train.shape[1],)),
    layers.Dropout(dropout_rate),
    layers.Dense(32, activation='relu'),
    layers.Dropout(dropout_rate),
    layers.Dense(1, activation='sigmoid')
])

# Compile model with RMSprop
optimizer = tf.keras.optimizers.RMSprop(learning_rate=learning_rate)
model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])

# Train model
history = model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=100,
    batch_size=32,
    callbacks=[early_stop],
    verbose=1
)

Epoch 1/100


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m72/72[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 10ms/step - accuracy: 0.5500 - loss: 1.0458 - val_accuracy: 0.6354 - val_loss: 0.9086
Epoch 2/100
[1m72/72[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.6064 - loss: 0.8837 - val_accuracy: 0.6456 - val_loss: 0.7970
Epoch 3/100
[1m72/72[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.6123 - loss: 0.7838 - val_accuracy: 0.6375 - val_loss: 0.7149
Epoch 4/100
[1m72/72[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.6151 - loss: 0.7207 - val_accuracy: 0.6456 - val_loss: 0.6759
Epoch 5/100
[1m72/72[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.6270 - loss: 0.6882 - val_accuracy: 0.6517 - val_loss: 0.6584
Epoch 6/100
[1m72/72[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.6396 - loss: 0.6788 - val_accuracy: 0.6680 - val_loss: 0.6459
Epoch 7/100
[1m72/72[0m [32m━━━━━━━━━━━━━━

Evaluation

In [None]:
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score

# Predict
y_pred_prob = model.predict(X_test)
y_pred = (y_pred_prob > 0.5).astype("int32")

# Metrics
acc = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)

# Print results
print("Jolly's Model Performance:")
print("Accuracy:", round(acc, 4))
print("F1 Score:", round(f1, 4))
print("Precision:", round(precision, 4))
print("Recall:", round(recall, 4))

[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step
Jolly's Model Performance:
Accuracy: 0.6789
F1 Score: 0.373
Precision: 0.7015
Recall: 0.2541


**TRAINING SUMMARY ENTRY**

| Train Instance | Engineer Name | Regularizer | Optimizer | Early Stopping        | Dropout Rate | Accuracy    | F1 Score   | Recall         | Precision         |
| -------------- | ------------- | ----------- | --------- | --------------------- | ------------ | ----------- | ---------- | -------------- | ----------------- |
| 1              | Jolly UMULISA | L1 (0.005)  | RMSprop   | val\_loss, patience=6 | 0.4          | 0.6789 | 0.373 | 0.2541 | 0.7015 |


**OVERALL OF MY TRAINING ENTRY**



*   Learning rate 0.002 was chosen to balance training speed with convergence stability.
*   Dropout rate 0.4 prevents overfitting given the small number of input features.
*   L1 regularization encourages sparsity and improves generalization.
*   RMSprop works well when features are on different scales and updates adaptively.
*   EarlyStopping prevents overtraining and helps the model generalize.

My Model had slightly lower recall (0.2541) but higher precision (0.7015), indicating it was more conservative in labeling water as safe. This might be preferred in a public health context, where false positives (unsafe water labeled safe) are riskier.




