In [1]:
from sklearn.preprocessing import Imputer

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

**Read the CSV File to Pandas Dataframe object**

In [19]:
df = pd.read_csv("Data.csv")
X = df.iloc[:, :-1].values
Y = df.iloc[:, 3].values 

In [6]:
df.head()

Unnamed: 0,Country,Age,Salary,Purchased
0,France,44.0,72000.0,No
1,Spain,27.0,48000.0,Yes
2,Germany,30.0,54000.0,No
3,Spain,38.0,61000.0,No
4,Germany,40.0,,Yes


In [20]:
X, Y

(array([['France', 44.0, 72000.0],
        ['Spain', 27.0, 48000.0],
        ['Germany', 30.0, 54000.0],
        ['Spain', 38.0, 61000.0],
        ['Germany', 40.0, nan],
        ['France', 35.0, 58000.0],
        ['Spain', nan, 52000.0],
        ['France', 48.0, 79000.0],
        ['Germany', 50.0, 83000.0],
        ['France', 37.0, 67000.0]], dtype=object),
 array(['No', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes'], dtype=object))

**NaN** values are present in the dataset. NaN a.k.a Not a Number need to be removed to be able to perform further computations and analysis on the data.

We use Imputer class from ```sklearn.pre_processing``` module. Here is short description on Imputer class from it's help page:

```
# help(Imputer) output

Init signature: Imputer(missing_values='NaN', strategy='mean', axis=0, verbose=0, copy=True)
Docstring:     
Imputation transformer for completing missing values.
```

In [10]:
imputer = Imputer(missing_values='NaN', strategy='mean', axis=0, verbose=3)
# Since NaN values are the missing values in our dataset

Now fit the imputer object to only those columns/features having missing values (`NaN`) in the dataset (`df`)

Use `imputer.fit()` function for this.
```

Signature: imputer.fit(X, y=None)
Docstring:
Fit the imputer on X.

Parameters
----------
X : {array-like, sparse matrix}, shape (n_samples, n_features)
    Input data, where ``n_samples`` is the number of samples and
    ``n_features`` is the number of features.

Returns
-------
self : object
    Returns self.
```

In [23]:
imputer = imputer.fit(X[:, 1:3])

Now transform the data using imputer object created (after fitting). Use `imputer.transform()` function for this task.

```
Signature: imputer.transform(X)
Docstring:
Impute all missing values in X.

Parameters
----------
X : {array-like, sparse matrix}, shape = [n_samples, n_features]
    The input data to complete.
```

In [35]:
X[:, 1:3] = imputer.transform(X[:, 1:3])

In [33]:
X

array([['France', 44.0, 72000.0],
       ['Spain', 27.0, 48000.0],
       ['Germany', 30.0, 54000.0],
       ['Spain', 38.0, 61000.0],
       ['Germany', 40.0, 63777.77777777778],
       ['France', 35.0, 58000.0],
       ['Spain', 38.77777777777778, 52000.0],
       ['France', 48.0, 79000.0],
       ['Germany', 50.0, 83000.0],
       ['France', 37.0, 67000.0]], dtype=object)

**Further Experiments**

Use different strategies to fill missing values (like `median` and `most_frequent`)