# Final Report Notebook

## Tanzania Well Classification
#### Objective 
>Using data from Taarifa and the Tanzanian Ministry of Water, can you predict which pumps are functional, which need some repairs, and which don't work at all? This is an intermediate-level practice competition. Predict one of these three classes based on a number of variables about what kind of pump is operating, when it was installed, and how it is managed. A smart understanding of which waterpoints will fail can improve maintenance operations and ensure that clean, potable water is available to communities across Tanzania.  
<p style="text-align:right;"><i>-DRIVENDATA Project Description</i></p>

### Contents
- Notebook Summary
- Library and Data Importing
- EDA
- Data Processing
- Baseline Model
- Multiple Model Iterations
- Draw Conclusions from Final Model
- Business-facing Recommendations

### Notebook Summary

This notebook provides a high level summary of the process we undertook in building the classification model. Our analysis was tailored to provide the Tanzanian Government the ability to predict the status of waterpoint infrastructure. Our reasoning behind this task is to enable the government to prepare for the future water demands in Tanzania. We utilized the [Tanzania Water Well Data](https://www.drivendata.org/competitions/7/pump-it-up-data-mining-the-water-table/page/23/) from Driven Data for this analysis, Driven Data provided us with a train_dataset, target_values, and a test_dataset. After generating a [Pandas Profiling Report](references/well_class_report.html), some insights we aimed to explore were:
- the correlation of the water point regions with the status of the pumps
- management of the water point playing a role in the status of the pumps
- installation by specific engineers or groups playing a role in the status of the pumps

This notebook will walk through the process of data gathering and exploration, how the data was analyzed, our processing methods, and the results of our analysis.

### Data Importing

The cell below allows us to import the [functions](https://github.com/sydroth/tanzaniawellclassification/blob/master/src/functions.py) created during our analysis.

In [1]:
%load_ext autoreload
%autoreload 2
import os
import sys
module_path = os.path.abspath(os.path.join(os.pardir, os.pardir))
if module_path not in sys.path:
    sys.path.append(module_path)

In [2]:
import tanzaniawellclassification.src.functions as f

The following cell is importing our training [processed data](https://github.com/sydroth/tanzaniawellclassification/blob/master/src/functions.py).

In [4]:
train = f.load_processed_train_df()
train.head()

Unnamed: 0,id,amount_tsh,date_recorded,funder,gps_height,installer,longitude,latitude,num_private,basin,...,payment,quality_group,quantity,source,source_class,waterpoint_type,status_group,status,region_bins,lga_coded
0,69572,6000.0,2011-03-14,other,1390,other,34.938093,-9.856322,0,Lake Nyasa,...,pay annually,good,enough,spring,groundwater,communal standpipe,functional,2,0,other
1,8776,0.0,2013-03-06,other,1399,other,34.698766,-2.147466,0,Lake Victoria,...,never pay,good,insufficient,rainwater harvesting,surface,communal standpipe,functional,2,5,other
2,34310,25.0,2013-02-25,other,686,other,37.460664,-3.821329,0,Pangani,...,pay per bucket,good,enough,dam,surface,communal standpipe multiple,functional,2,1,other
3,67743,0.0,2013-01-28,other,263,other,38.486161,-11.155298,0,Ruvuma / Southern Coast,...,never pay,good,dry,machine dbh,groundwater,communal standpipe multiple,non functional,0,6,other
4,19728,0.0,2011-07-13,other,0,other,31.130847,-1.825359,0,Lake Victoria,...,never pay,good,seasonal,rainwater harvesting,surface,communal standpipe,functional,2,3,other
