#### Imputation Methods and Resources

One of the most common methods for working with missing values is by imputing the missing values. imputation means than you input a value for values that were originally missing

It is very common to impute in the following ways:

    1. Impute the **mean** of a column.
    2. If you are working with categorical data or a variable with outliers, then use the mode of the column.
    3. Impute 0, a very small number, or a very large number to diffrentiate missing values from other values.
    4. Use knn to impute values based on features that are most similar

In general, you should try to be more careful with missing data in understanding the real world implications and reasons for why the missing values exist. At the same time, these solutions are very quick, and they enable3 you to get models off the ground. You can then iterate on your feature engineering to be more careful as time permits.

Let's take a look at how some of them work. Chris' content is again very helpful for many of these items - and you can access it [here](https://chrisalbon.com/).  He uses the [sklearn.preprocessing library](http://scikit-learn.org/stable/modules/preprocessing.html).  There are also a ton of ways to fill in missing values directly using pandas, which can be found [here](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html)

In [1]:
import pandas as pd
import numpy as np

df = pd.DataFrame({'A':[np.nan, 2, np.nan, 0, 7, 10, 15],
                   'B':[3, 4, 5, 1, 2, 3, 5],
                   'C':[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
                   'D':[np.nan, True, np.nan, False, True, False, np.nan],
                   'E':['Yes', 'No', 'Maybe', np.nan, np.nan, 'Yes', np.nan]})

df

Unnamed: 0,A,B,C,D,E
0,,3,,,Yes
1,2.0,4,,True,No
2,,5,,,Maybe
3,0.0,1,,False,
4,7.0,2,,True,
5,10.0,3,,False,Yes
6,15.0,5,,,


In [2]:
new_df = df.dropna(axis = 1, how = 'all')

In [3]:
fill_mean = lambda col: col.fillna(col.mean())

In [4]:
try:
    print(new_df.apply(fill_mean, axis = 0))
except:
    print('That broke...')

That broke...


In [5]:
fill_mode = lambda col: col.fillna(col.mode()[0])

In [6]:
try:
    print(new_df.apply(fill_mode, axis = 0))
except:
    print('That broke...')

      A  B      D      E
0   0.0  3  False    Yes
1   2.0  4   True     No
2   0.0  5  False  Maybe
3   0.0  1  False    Yes
4   7.0  2   True    Yes
5  10.0  3  False    Yes
6  15.0  5  False    Yes
