### Contents

1. [Importing the libraries](#1.-Importing-the-libraries)
1. [Reading the dataset having no ambiguous columns](#2.-Reading-the-dataset-having-no-ambiguous-columns)
1. [Columns having missing values](#3.-Columns-having-missing-values)
1. [Imputing scheme management](#4.-Imputing-scheme-management)

## 1. Importing the libraries

In [1]:
from glob import glob
import pandas as pd

## 2. Reading the dataset having no ambiguous columns

In [2]:
files = glob('../data/v1/*.csv')

files

['../data/v1/train_cleaned.csv', '../data/v1/test_cleaned.csv']

In [3]:
# Reading train and test data
df_train = pd.read_csv(files[0])
df_test = pd.read_csv(files[-1])

## 3. Columns having missing values

In [4]:
df_train.columns[df_train.isnull().sum() > 0].tolist()

['funder',
 'installer',
 'subvillage',
 'public_meeting',
 'scheme_management',
 'scheme_name',
 'permit']

In [5]:
df_train.scheme_management.unique()

array(['VWC', 'Other', nan, 'Private operator', 'WUG', 'Water Board',
       'WUA', 'Water authority', 'Company', 'Parastatal', 'Trust', 'SWC',
       'None'], dtype=object)

## 4. Imputing scheme management

#### After doing research for the values in `scheme_management` the following things have been observed:
* WUG and WUA are all same
* Water Authority and Water Board
* Trust and SWC are non-profit based, VWC
* Parastatal, Private operator and Company

#### Replacing will be done in the following way:
* WUA -> WUG
* VWC, SWC -> Trust
* Water authority -> Water Board
* Private Operator, Parastatal -> Company

In [6]:
temp_df = df_train.copy()

In [7]:
temp_df.scheme_management.value_counts(dropna=False)

VWC                 36793
WUG                  5206
NaN                  3877
Water authority      3153
WUA                  2883
Water Board          2748
Parastatal           1680
Private operator     1063
Company              1061
Other                 766
SWC                    97
Trust                  72
None                    1
Name: scheme_management, dtype: int64

In [8]:
# temp_df['scheme_management'] = temp_df.scheme_management.str.replace('WUA', 'WUG')
# temp_df['scheme_management'] = temp_df.scheme_management.str.replace('VWC', 'Trust')
# temp_df['scheme_management'] = temp_df.scheme_management.str.replace('SWC', 'Trust')
# temp_df['scheme_management'] = temp_df.scheme_management.str.replace('Water authority', 'Water Board')
# temp_df['scheme_management'] = temp_df.scheme_management.str.replace('Parastatal', 'Water Board')
# temp_df['scheme_management'] = temp_df.scheme_management.str.replace('Private operator', 'Company')

In [9]:
y = temp_df.loc[(temp_df.scheme_management.isna()), 
                ['scheme_management', 'scheme_name']]['scheme_name'].value_counts().reset_index()

y

Unnamed: 0,index,scheme_name
0,Lake Victoria pipe scheme,19
1,Ng'au,18
2,B,16
3,Segese pipe scheme,15
4,Migoli,14
...,...,...
119,ENDAMASAK,1
120,Mshiri pipeline,1
121,Nyamatoke piped scheme,1
122,Bukirilo gravity water supply,1


In [10]:
y['scheme_name'].unique()

array([19, 18, 16, 15, 14, 13,  9,  8,  6,  5,  4,  3,  2,  1])

In [11]:
for i in y.loc[y['scheme_name'] == 1, 'index']:
    print(i)

Lang'ata dapash water proj
Malinyi
ENDANANG'WENI SPRING
Lituli
Njomlole water gravity scheme
Msaginya
Utaruni pipeline
Ikela Wa
Shaba water supply
Saseni
QWICKWIN
Nzul
Huru mawela water project
Mshewa Water Supply
Ziwani water supply
Kamwanga Erikaswa water pr
Ihowanja
Sikonge water supply
Sukuro pipe scheme
MVUM
World Bank Water Supply
DANIDA
Kilimahewa water supply
Manyoni water supply
Gwata water supply
Mavimba
Nyachenda water scheme
Machumba estate pipe line
Mangola pipe scheme
Hgodin
Mnyuzi water supplly
Namaukula water supply
Kimokouwa water project
Mradi wa maji Kisumwa
Nyangoto water suplly
Maswa water supply program
Jumuhiya ya watumia maji
Nduruma pipe line
Vyama vya watumia maji
Mwaya Mn
Shongololo gravity water supply
Vumamti
Kirua kahe gravity water supply trust
Dasp
Makwale water supplied sche
Ilala water supply
Nguruma gravity water supply
Ilindi
Mwadui piped scheme
Mitema
Namanga water project
Masanwa Piped water Scheme
Kazilankanda Water Supply
MWS
Tangini
Rural water 

|   scheme_name  |   scheme_management | missing_value_count |
|---------------:|--------------------:|--------------------:|
| Migoli         |         VWC         |         14           |
| Kazilankanda Water Supply | VWC | 1 |
| Huru mawela water project | Water Board | 1 |
| World Bank Water Supply | Trust| 1 |
| Kimokouwa water project| VWC | 1 |
| Makwale water supplied sche | VWC| 1 |
| Njomlole water gravity scheme | VWC| 1 |
| Nyangoto water suplly | VWC| 1 |
| Utaruni pipeline | VWC| 1 |
| Namaukula water supply | VWC| 1 |
| Madimba water supply | Other| 1 |
| Elang'atadapash water proj | VWC | 1 |
| Mavimba | - | 1 |
| ENDANANG'WENI SPRING | - | 1|
| Nyamatoke piped scheme | WUA | 1 |
| Masanwa Piped water Scheme | VWC | 1 |
| Ruhatwe water supply | WUG | 1 |
| Kilimahewa water supply | VWC | 1 |
| Mwaya Mn | WUA | 1 |
| Mshewa Water Supply | VWC | 1 |
| Shaba water supply | VWC | 1 |
| Hgodin | Other | 1 |
| MVUM | VWC | 1 |
| Nasula gravity water supply | - | 1 |
| Kirua kahe gravity water supply trust | Water Board | 1 |
| Nguruma gravity water supply | VWC | 1 |
| Kinyinya gravity water supply | VWC | 1 |
| Mangola pipe scheme | - | 1 |
| Maswa water supply program | Water authority | 1 |
| Namanga water project | VWC/Parastatal | 1 |
| Ziwani water supply | VWC | 1 |
| Saseni | Water authority | 1 |
| Lang'ata dapash water proj | VWC | 1 |
| NCHULOWAIBALE WATER SUPPLY SCHEME | VWC | 1 |
| Ihowanja | VWC | 1 |
| Muhalala water supply | VWC/Water Authority | 1 |
| Malinyi | VWC | 1 |
| Bukirilo gravity water supply | VWC | 1 |
| Mwigumbi piped scheme | WUA | 1 |
| Mtama water supply | VWC | 1 |
| Kyamakata Pumping water supply | VWC | 1 |
| Churazo gravity water supply | VWC | 1 |
| Machumba estate pipe line | VWC | 1 |
| Rural water supply | VWC | 1 |
| Mshiri pipeline | Other | 1 |
| Mwadui piped scheme | WUA/Parastatal | 1 |
| Ilindi | VWC | 1 |
| Nduruma pipe line | VWC/Parastatal | 1 |
| Sukuro pipe scheme | Other | 1 |
| Nyabibuye | VWC | 1 |
| Mbwinji | VWC | 1 |
| Mnyuzi water supplly | VWC | 1 |
| DANIDA | VWC | 1 | 
| Mwigimbi piped scheme | WUA | 1 |
| Nyachenda water scheme | Water authority | 1 | 
| Kaisho/Isingiro w | VWC/Other | 1 | 
| Kabare gravity water supply mission | VWC | 1 |
| Tank refu Mtakuja | Water authority | 1 |
| Hesawa | - | 1 |
| Kamwanga Erikaswa water pr | VWC | 1 |
| Mradi wa maji Kisumwa | VWC | 1 |
| Shongololo gravity water supply | VWC/WUA | 1 |
| QWICKWIN | WUG/Other | 1 |
| Mradi wa maji wa maji sikonge | Other | 1 |
| Dasp | - | 1 |
| Nabaiye pipe line | VWC | 1 |
| Adra | VWC | 1 |
| MWS | VWC | 1 | 
| Mitema
| Msaginya
| Ikela Wa
| Sikonge water supply
| Lituli
| Nzul
| Nyangao Water Supply
| Gwata water supply
| Vumamti
| ENDAMASAK
| Vyama vya watumia maji
| Ilala water supply
| Msitu wa tembo pipe scheme
| Dihimba water supply
| Manyoni water supply
| Tangini
| Jumuhiya ya watumia maji