Skip to content

In this project, I analyze, plot and clean Tanzania's Water Pump Dataset, which is provided by DrivenData.org for a competition.

Notifications You must be signed in to change notification settings

kochlisGit/Predictive-Maintainance-Tanzania-Water-Pumps

Repository files navigation

Tanzania - Water Pump Status Challenge

Tanzania is the largest country in East-Africa, with a population of approximately 60 million people. But of those 60 million people, only 47% have access to basic water, while the rest of the population have no choice but to drink dirty water from unsafe sources. As a result, 4000 children each year die from preventable diseases due to unsafe water. Safe water is scarce, and often women and children have to spend two to seven hours to collect clean water (WaterAid, 2016). This is quite the predicament. Water is a basic need and right for all human beings. The purpose of this work is to answer the following questions:

  1. Can machine learning become a valuable addition to the Tanzanian government in battling water scarcity?
  2. What is the best way to predict the functional state of Tanzanian water pumps?
  3. Which data preparation algorithms improve the predictive capabilities of a machine learning algorithm on this dataset?

Tanzania-Poster

This dataset includes 5 notebooks:

  1. tanzania_dataset_analysis.ipynb: Extended analysis of the dataset
  2. tanzania_train_dataset_preprocessing.ipynb: Preprocessing of the training set
  3. tanzania_train_dataset_preprocessing.ipynb: Preprocessing of the test set based on the training set preprocessing
  4. tanzania_classifier_training.ipynb: Training & Hyper-parameter tuning of several machine learning classifiers
  5. tanzania_advanced_training.ipynb: Advanced ML training algorithms for dealing with imbalanced datasets

Dataset Download

https://www.drivendata.org/competitions/7/pump-it-up-data-mining-the-water-table/