### Handle Missing data

#### Import Libraries
* Pandas        : high-performance and easy to use data structure and data analysis tools
* NumPy         : Use to work with array
* SimpleImputer : is a class from `sklearn.impute` module that provides basic strategies of missing values.

In [None]:
import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer

#### Load the Dataset
Dataset is loaded to Pandas dataframe using the `read_csv` function to read CSV file into dataframe

In [None]:
df = pd.read_csv("pima-indians-diabetes.csv")

#### Identify Missing Data
To identify missing data of DataFrame we can use isnull() method, return boolean value. Then count the missing data based on its value (boolean)

In [None]:
missing_data = df.isnull().sum()
print("Missing data: \n", missing_data)

#### Replace Missing Data with Mean Value

`SimpleImputer` is class from `sklearn.impute` module. This class is provide basic strategies for imputing (filling in) missing value, using either constant or the mean, median, or medium, etc. `imputer` is object that initialized from class `Simple Imputer` by using `SimpleImputer(missing_values=np.nan, strategy="mean")`. When we initialized, we pass argument `missing_value=np.nan` and `strategy="mean"` to the __init__ method from class `Simple Imputer`. `missing_value=np.nan` indicate for filling missing value that represented as `np.nan` (missing data in pandas DataFrame) and `strategy="mean` to use mean as strategy to fill missing value. Both is argument specifies that passed to the `SimpleImputer` class when instance is created. 

With using `imputer`, we use this object to fill missing value on our data with using mean strategy.

In [None]:
imputer = SimpleImputer(missing_values=np.nan, strategy="mean")

After instance of `SimpleImputer` is stored in `imputer` as object. To fill the missing data we use `fit` and `transform` methods. `fit` method will compute the imputation values based on the provide data, and the `transform` method will fill the missing values.

In [None]:
imputer.fit(df)
dataset_imputed = imputer.transform(df)
print(df)