# Predicting Tanzania Water Pumps Functionality

**Author : Edwin Maina**



## Business Understanding

Access to clean  water is a fundamental human need and a key for public health, economic development, and social well-being. 

In Tanzania, communities heavily rely on water wells for their daily water supply. However, many of these water points become non-functional over time due to poor maintenance, environmental conditions, or inadequate infrastructure.

This project aims to build a predictive models to assess the functionality status of water wells in Tanzania. The models will help stakeholders prioritize maintenance efforts, allocate resources more effectively, and ensure that water wells remain operational for the communities that depend on them.


### Business Problem

Tanzania's water supply system is characterized by frequent water pump breakdowns resulting from lack of proper maintenance and inefficient management. This leads to disruptions in water supply, exacerbating the acute shortage of clean water and subsequent socio-economic losses.

The government of Tanzania, in collaboration with NGOs and partner organizations, aims to enhance access to clean water by improving the maintenance and functionality of water wells across the country. To achieve this, water point sustainability risk levels across the supply system need to be projected by learning from current point profiles to guide stakeholders' decisions through highlighting:

  - Most dilapidated wells that should be prioritized for maintenance, repairs, or rehabilitation.
  - Sites to be earmarked for future wells.
  - Data-driven recommendations that are responsive to stakeholders' needs and actionable guide improve management practices and water accessibility.


### Importing Required Libraries

In [1]:
# importing pandas for data wrangling and manipulation
import pandas as pd
import numpy as np

# importing matplotlib and seaborn for data visualization
import matplotlib.pyplot as plt
%matplotlib inline
%config inlineBackend.figure_format = 'retina'
import seaborn as sns
sns.set_context('notebook')

# Sklearn
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OrdinalEncoder, OneHotEncoder, FunctionTransformer, StandardScaler, LabelEncoder
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import FeatureUnion, Pipeline
from sklearn.feature_selection import RFECV
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.metrics import  accuracy_score, classification_report, confusion_matrix, ConfusionMatrixDisplay


from imblearn.over_sampling import SMOTE
import statsmodels.api as sm
import random

### loading the Datasets

In [6]:
training_values = pd.read_csv('Trainig_set_values.csv')
training_labels = pd.read_csv('Training_set_labels.csv')
test_values = pd.read_csv('Test_set_values.csv')