- The dataset includes information about various substances present in water, typically measured in units of concentration per liter.
All attributes are numeric variables and they are listed below :
- A description of the data attributes.
aluminium
- dangerous if greater than 2.8ammonia
- dangerous if greater than 32.5arsenic
- dangerous if greater than 0.01barium
- dangerous if greater than 2cadmium
- dangerous if greater than 0.005chloramine
- dangerous if greater than 4chromium
- dangerous if greater than 0.1copper
- dangerous if greater than 1.3flouride
- dangerous if greater than 1.5bacteria
- dangerous if greater than 0viruses
- dangerous if greater than 0lead
- dangerous if greater than 0.015nitrates
- dangerous if greater than 10nitrites
- dangerous if greater than 1mercury
- dangerous if greater than 0.002perchlorate
- dangerous if greater than 56radium
- dangerous if greater than 5selenium
- dangerous if greater than 0.5silver
- dangerous if greater than 0.1uranium
- dangerous if greater than 0.3is_safe
- class attribute {0 - not safe, 1 - safe}
Dataset Source Link : https://www.kaggle.com/datasets/mssmartypants/water-quality
The objective is to categorize the provided instances into one of two distinct categories and predict the percentage indicating the quality of water being good.
- Data Preprocessing:
- In this initial stage, we identify the null values. The #NUM! values are replaced with NaN, and then the NaN values are dropped, as there are very few of them.
- Data Transformation:
- In this stage, standard scaling is performed on the complete dataset except for the target variable.
- Model Training:
- In this phase, the model was trained using a Random Forest Classifier, which achieved an accuracy of 95.19%.
- In a binary classification problem, the function
predict_proba
was utilized to compute the probabilities for the givenx_test
data.
- Flask App Creation:
- The Flask library is used to develop a web application that serves as a user interface for predicting water quality as a percentage.