Water Quality Prediction Using Machine Learning

This repository contains the source code and data for the research paper "Water Quality Prediction Using Machine Learning: A Case Study on Random Forest Algorithm". The primary objective of this study is to develop a machine learning model for predicting water quality parameters based on the Random Forest algorithm.

Introduction

Water quality is a critical aspect of environmental protection, public health, and sustainable development. Machine learning techniques have the potential to improve the accuracy, efficiency, and scalability of water quality monitoring by identifying patterns and relationships in large and complex datasets, predicting water quality parameters, and providing insights to inform decision-making.

This project aims to explore the potential of machine learning in water quality assessment, focusing on the Random Forest algorithm, an ensemble learning method that combines multiple decision trees to make more accurate and robust predictions.

Data

The dataset used in this study is derived from the Kaggle Water Potability dataset. The dataset contains information on various water quality parameters, such as pH, hardness, solids, chloramines, sulfate, conductivity, organic carbon, trihalomethanes, and turbidity. The target variable is water potability, a binary variable indicating whether the water is safe for human consumption or not.

Installation

To set up the project on your local machine, follow these steps:

Clone the repository:

git clone https://github.com/your-username/water-quality-prediction.git

Change to the project directory:
```
cd water-quality-prediction
```
Install the required Python packages:
```
pip install -r requirements.txt
```

Usage

Run the Jupyter Notebook water_quality_prediction.ipynb to train and evaluate the Random Forest model on the dataset.
Modify the notebook as needed to experiment with different machine learning algorithms, hyperparameters, or preprocessing techniques.
Analyze the results and compare the performance of different models using various evaluation metrics.

Results

Our study demonstrated the effectiveness of the Random Forest algorithm in predicting water quality parameters with high accuracy. The model's performance was evaluated using various metrics, such as accuracy, precision, recall, and F1 score.

For more details on the results and their implications, refer to the research paper.

Contributors

Ossama Essfadi 🖋

Acknowledgements

We would like to express our deepest gratitude to our supervisor, Doctor Asmae Mourhir, for her invaluable guidance, support, and encouragement throughout the course of this research.

We are also grateful to the Kaggle Water Potability dataset contributors for providing the data used in this study.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.idea		.idea
README.md		README.md
RF_model_performance_fig.py		RF_model_performance_fig.py
baseline.py		baseline.py
correlation_matrix_fig.py		correlation_matrix_fig.py
data_distribution_fig.py		data_distribution_fig.py
main.py		main.py
requirements.txt		requirements.txt
stats_dataset.py		stats_dataset.py
summary.xlsx		summary.xlsx
water_potability.csv		water_potability.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Water Quality Prediction Using Machine Learning

Table of Contents

Introduction

Data

Installation

Usage

Results

Contributors

Acknowledgements

License

About

Releases

Packages

Languages

essfadi/applied-research

Folders and files

Latest commit

History

Repository files navigation

Water Quality Prediction Using Machine Learning

Table of Contents

Introduction

Data

Installation

Usage

Results

Contributors

Acknowledgements

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages