This notebook is created in order to learn basic approaches to handle with missing values. 
The source is: https://www.kaggle.com/residentmario/simple-techniques-for-missing-data-imputation/data 

In [36]:
import pandas as pd
import numpy as np
pd.set_option('max_columns', None)
df = pd.read_csv("../Missing_Data_Imputation/data/recipeData.csv", encoding='latin-1').set_index("BeerID")
df.head(5)

Unnamed: 0_level_0,Name,URL,Style,StyleID,Size(L),OG,FG,ABV,IBU,Color,BoilSize,BoilTime,BoilGravity,Efficiency,MashThickness,SugarScale,BrewMethod,PitchRate,PrimaryTemp,PrimingMethod,PrimingAmount,UserId
BeerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
1,Vanilla Cream Ale,/homebrew/recipe/view/1633/vanilla-cream-ale,Cream Ale,45,21.77,1.055,1.013,5.48,17.65,4.83,28.39,75,1.038,70.0,,Specific Gravity,All Grain,,17.78,corn sugar,4.5 oz,116.0
2,Southern Tier Pumking clone,/homebrew/recipe/view/16367/southern-tier-pumk...,Holiday/Winter Special Spiced Beer,85,20.82,1.083,1.021,8.16,60.65,15.64,24.61,60,1.07,70.0,,Specific Gravity,All Grain,,,,,955.0
3,Zombie Dust Clone - EXTRACT,/homebrew/recipe/view/5920/zombie-dust-clone-e...,American IPA,7,18.93,1.063,1.018,5.91,59.25,8.98,22.71,60,,70.0,,Specific Gravity,extract,,,,,
4,Zombie Dust Clone - ALL GRAIN,/homebrew/recipe/view/5916/zombie-dust-clone-a...,American IPA,7,22.71,1.061,1.017,5.8,54.48,8.5,26.5,60,,70.0,,Specific Gravity,All Grain,,,,,
5,Bakke Brygg Belgisk Blonde 50 l,/homebrew/recipe/view/89534/bakke-brygg-belgis...,Belgian Blond Ale,20,50.0,1.06,1.01,6.48,17.84,4.57,60.0,90,1.05,72.0,,Specific Gravity,All Grain,,19.0,Sukkerlake,6-7 g sukker/l,18325.0


In [15]:
df.shape

(73861, 22)

In [16]:
df.isna().sum()

Name                 1
URL                  0
Style              596
StyleID              0
Size(L)              0
OG                   0
FG                   0
ABV                  0
IBU                  0
Color                0
BoilSize             0
BoilTime             0
BoilGravity       2990
Efficiency           0
MashThickness    29864
SugarScale           0
BrewMethod           0
PitchRate        39252
PrimaryTemp      22662
PrimingMethod    67095
PrimingAmount    69087
UserId           50490
dtype: int64

Quite a lot of missing data. Some features have more than 50 and 80 percent of all values missed.

### Data missing at random and not at random

Most machine learning algorithms (kNN is a notable exception) cannot deal with this problem intrinsically, as they are designed for complete data. Something needs to be done with the missing data values.

There are two broad classes of missing data: data missing at random, and data missing not at random. When considering what to do with our data we must keep this in mind. The typology of the missing data strongly informs how best to approach dealing with it; or rather it's safer to say that if the data is missing not completely at random, you are going to need domain expertise to understand what to do with it.

### Simple approaches

A number of simple approaches exist. For basic use cases, these are often enough.

#### Dropping rows with null values

The easiest and quickest approach to a missing data problem is dropping the offending entries. This is an acceptable solution if we are confident that the missing data in the dataset is missing at random, and if the number of data points we have access to is sufficiently high that dropping some of them will not cause us to lose generalizability in the models we build (to determine whether or not this is case, use a learning curve).

**Dropping data missing not at random is dangerous. It will result in significant bias in your model in cases where data being absent corresponds with some real-world phenomenon**. Because this requires domain knowledge, usually the only way to determine if this is a problem is through manual inspection. Dropping too much data is also dangerous. It can create significant bias by depriving your algorithms of space. This is especially true of classifiers sensitive to the curse of dimensionality. For example, for this beer dataset we might not want to simply blindly drop everything, as this would result in very few samples:

In [17]:
# Only 1% of the data remains

len(df), len(df.dropna())

(73861, 757)

Certain types of datasets will suffer from "almost complete" columns—e.g. columns which are missing values in a relatively small number of cases. In these cases dropping the offending records is usually fine, with the level of how OK it is depending on how close to complete the column is. This is convenient because it removes that column from the list of things you need to deal with before you can start learning.

#### Dropping features with high nullity

A feature that has a high number of empty values is unlikely to be very useful for prediction. It can often be safely dropped. For example in the beer dataset I would drop `PrimingMethod` and `PrimingAmount`; and consider dropping a couple of others as well.

Dropping rare features simplifies your model, but obviously gives you fewer features to work with. Before dropping features outright, consider subsetting the part of the dataset that this value is available for and checking its feature importance when it is used to train a model in this subset. If in doing so you disover that the variable is important in the subset it is defined, consider making an effort to retain it.

In [18]:
df.shape[1], df.drop(['PrimingMethod', 'PrimingAmount'], axis='columns').shape[1]

(22, 20)

#### Mean or median or other summary statistic substitution

The remainder of the techniques available are imputation methods, as opposed to data-dropping methods. The simplest imputation method is replacing missing values with the **mean** or **median** values of the dataset at large, or some similar summary statistic. This has the advantage of being the simplest possible approach, and one that doesn't introduce any undue bias into the dataset.

But: 

[However] with missing values that are not strictly random, especially in the presence of a great inequality in the number of missing values for the different variables, **the mean substitution method may lead to inconsistent bias**. Furthermore, this approach adds no new information (more variability of the data) but only increases the sample size and leads to an underestimate of the errors. Thus, mean substitution is not generally accepted.

From: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3668100/

So `mean` and `median` do not introduce any undue bias (which is actually quite debatable following to what is written just above) into the dataset but in the same time do not increase a variance of our data set.

In [21]:
# We saw that 'MashThickness' feature has around 40% of values missing

df['MashThickness'].isnull().sum() , df['MashThickness'].fillna(df['MashThickness'].mean()).isnull().sum()

(29864, 0)

In [23]:
# Mean is slightly changes after imputation

df['MashThickness'].mean(), df['MashThickness'].fillna(df['MashThickness'].mean()).mean()

(2.127235233993227, 2.1272352339932263)

### Model imputation

Here's a fun trick. To prepare a dataset for machine learning we need to fix missing values, and we can fix missing values by applying machine learning to that dataset! If we consider a column with missing data as our target variable, and existing columns with complete data as our predictor variables, then we may construct a machine learning model using complete records as our train and test datasets and the records with incomplete data as our generalization target. This is a fully scoped-out machine learning problem.

In [25]:
df.head(5)

Unnamed: 0_level_0,Name,URL,Style,StyleID,Size(L),OG,FG,ABV,IBU,Color,BoilSize,BoilTime,BoilGravity,Efficiency,MashThickness,SugarScale,BrewMethod,PitchRate,PrimaryTemp,PrimingMethod,PrimingAmount,UserId
BeerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
1,Vanilla Cream Ale,/homebrew/recipe/view/1633/vanilla-cream-ale,Cream Ale,45,21.77,1.055,1.013,5.48,17.65,4.83,28.39,75,1.038,70.0,,Specific Gravity,All Grain,,17.78,corn sugar,4.5 oz,116.0
2,Southern Tier Pumking clone,/homebrew/recipe/view/16367/southern-tier-pumk...,Holiday/Winter Special Spiced Beer,85,20.82,1.083,1.021,8.16,60.65,15.64,24.61,60,1.07,70.0,,Specific Gravity,All Grain,,,,,955.0
3,Zombie Dust Clone - EXTRACT,/homebrew/recipe/view/5920/zombie-dust-clone-e...,American IPA,7,18.93,1.063,1.018,5.91,59.25,8.98,22.71,60,,70.0,,Specific Gravity,extract,,,,,
4,Zombie Dust Clone - ALL GRAIN,/homebrew/recipe/view/5916/zombie-dust-clone-a...,American IPA,7,22.71,1.061,1.017,5.8,54.48,8.5,26.5,60,,70.0,,Specific Gravity,All Grain,,,,,
5,Bakke Brygg Belgisk Blonde 50 l,/homebrew/recipe/view/89534/bakke-brygg-belgis...,Belgian Blond Ale,20,50.0,1.06,1.01,6.48,17.84,4.57,60.0,90,1.05,72.0,,Specific Gravity,All Grain,,19.0,Sukkerlake,6-7 g sukker/l,18325.0


In [32]:
# Format the data for applying ML to it.
# Define those Beer styles which amount of non-NaN samples is bigger than (len(df) / 100)

popular_beer_styles = (pd.get_dummies(df['Style']).sum(axis='rows') > (len(df) / 100)).where(lambda v: v).dropna().index.values
len(popular_beer_styles)

23

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. Instead of writing

f(g(h(df), arg1=a), arg2=b, arg3=c)

- You can write

(df.pipe(h)

...    .pipe(g, arg1=a)

...    .pipe(f, arg2=b, arg3=c)

... )

In [34]:
dfc = (df
       .drop(['PrimingMethod', 'PrimingAmount', 'UserId', 'PitchRate', 'PrimaryTemp', 'StyleID', 'Name', 'URL'], axis='columns')
       .dropna(subset=['BoilGravity'])
       .pipe(lambda df: df.join(pd.get_dummies(df['BrewMethod'], prefix='BrewMethod')))
       .pipe(lambda df: df.join(pd.get_dummies(df['SugarScale'], prefix='SugarScale')))       
       .pipe(lambda df: df.assign(Style=df['Style'].map(lambda s: s if s in popular_beer_styles else 'Other')))
       .pipe(lambda df: df.join(pd.get_dummies(df['Style'], prefix='Style')))       
       .drop(['BrewMethod', 'SugarScale', 'Style'], axis='columns')
      )

dfc.head(5)

Unnamed: 0_level_0,Size(L),OG,FG,ABV,IBU,Color,BoilSize,BoilTime,BoilGravity,Efficiency,MashThickness,BrewMethod_All Grain,BrewMethod_BIAB,BrewMethod_Partial Mash,BrewMethod_extract,SugarScale_Plato,SugarScale_Specific Gravity,Style_American Amber Ale,Style_American Brown Ale,Style_American IPA,Style_American Light Lager,Style_American Pale Ale,Style_American Porter,Style_American Stout,Style_Blonde Ale,Style_California Common Beer,Style_Cream Ale,Style_Double IPA,Style_English IPA,Style_Imperial IPA,Style_Irish Red Ale,Style_Kölsch,Style_Oatmeal Stout,Style_Other,Style_Robust Porter,Style_Russian Imperial Stout,Style_Saison,Style_Sweet Stout,Style_Weissbier,Style_Weizen/Weissbier,Style_Witbier
BeerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1
1,21.77,1.055,1.013,5.48,17.65,4.83,28.39,75,1.038,70.0,,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,20.82,1.083,1.021,8.16,60.65,15.64,24.61,60,1.07,70.0,,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
5,50.0,1.06,1.01,6.48,17.84,4.57,60.0,90,1.05,72.0,,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
6,24.61,1.055,1.013,5.58,40.12,8.0,29.34,70,1.047,79.0,,1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
8,20.82,1.054,1.014,5.36,19.97,5.94,28.39,75,1.04,70.0,1.4,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [35]:
c = [c for c in dfc.columns if c != 'MashThickness']
X = dfc[dfc['MashThickness'].notnull()].loc[:, c].values
y = dfc[dfc['MashThickness'].notnull()]['MashThickness'].values
yy = dfc[dfc['MashThickness'].isnull()]['MashThickness'].values

In [43]:
# Apply a regression approach to imputing the mash thickness.

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import KFold
from sklearn.metrics import r2_score


np.random.seed(42)
kf = KFold(n_splits=5)
scores = []
for train_index, test_index in kf.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    
    clf = LinearRegression()
    clf.fit(X_train, y_train)
    y_test_pred = clf.predict(X_test)
    
    scores.append(r2_score(y_test, y_test_pred))

print(scores)
print('Average value of cross-validation results:', np.mean(scores))

[0.01557952331349366, 0.011168471985599338, 0.013375476821894994, 0.0045922744079380795, -0.0004321395799600225]
('Average value of cross-validation results:', 0.00885672138979321)


The  *R2*  score measures how much better than baseline linear regression performs, where baseline is flat regression against the mean. In this case that baseline performance (an  R2  of 0) is the performance of replacing the missing values with the mean of the observed values. In this specific case the extremely low cross validation scores, all indistinguishable from 0, basically tells us that we've picked an impossible task: `MashThickness` cannot be determined with any accuracy from another of the other variables in the dataset (at least, if it can, then the relationship is non-linear—doubtful in this scenario). This cuts both ways, of course—if none of the variables in the dataset predict MashThickness, then MashThickness is useless for predicting anything any of them either!

Nevertheless, for more usefully correlated columns this template of using a model of some kind to impute the column values is highly useful and makes a lot of sense from a practitioner's perspecive.

This paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3668100/) has the following to say about this technique (which it refers to as "regression imputation"; but, strictly speaking, it doesn't have to be regression):


*This approach has a number of advantages, because the imputation retains a great deal of data over the listwise or pairwise deletion and avoids significantly altering the standard deviation or the shape of the distribution. However, as in a mean substitution, while a regression imputation substitutes a value that is predicted from other variables, no novel information is added, while the sample size has been increased and the standard error is reduced.*


**In other words, this technique will still tend to increase the bias of the dataset, just less so (in success cases) than naively using the mean or median value would.**

If you are looking for some other models to try, the fancyimpute package contains a number of (mostly matrix-based, e.g. linear algebraic) models specifically tuned for imputation tasks.

### Semi-supervised learning

You can use a set of techniques known as "semi-supervised learning" to attack missing data imputation. 

For some kinds of data you will run into the problem of having many samples, but not having labels for all of those samples, only for a subset of them. This situation occurs particularly often in research contexts, where it's often easy to get a small number of labelled data points, via hand-labelling, but significantly harder to gather the full dataset, if the full dataset is sufficiently large.

This is known as the semi-supervised learning problem. It is semi-supervised because it lies in between unsupervised learning, which does not use labels, and supervised learning, which requires them. In a semi-supervised learning problem you don't have all the labels or none of them, only some of them.

Semi-supervised learning is a restatement of the missing data imputation problem which is specific to the small-sample, missing-label case. This problem gets its own name likely because it is so commonly encountered in research and dataset generation contexts. It's a useful tool to know about more generally for missing data imputation from a limited sample size, but the algorithms have poor performance characteristics for larger samples. In those cases, perhaps try applying machine learning to the problem directly.

To learn more about semi-supervised learning, check out the notebook "Notes on semi-supervised learning". The TLDR is that these techniques are an approach that works well when the number of labeled is extremely small, but do not scale to larger data because they involve building a similarity matrix, an  O(n2)  operation.

### Maximum likelihood imputation

Simple approaches are easy to implement, but can lead to high bias. The model imputation approach is a bit more challenging, but it's still off-the-shelf, and it does still have a problem with introducing bias into the dataset. In fact, this paper (http://www.statisticalhorizons.com/wp-content/uploads/MissingDataByML.pdf) on the subject goes so far as to say that really you ought to be using either of two specialized techniques: maximum likelihood, or multiple imputation.

In statistics the **maximum likelihood estimator** is any statistical estimator for a distribution of interest which has the property that it maximizes the "likelihood function" of that data.

Recall that a statistical estimator takes a random sample of data and attempts to explain something about the overall distribution by generalizing from that sample. For example,  $ \frac{∑(y)}{len(y)}$  is an estimator for the average of a set of data  y . It is MLE because it doesn't have any bias: it converges on the true mean of the distribution (given a large enough number of samples). For most problems an MLE estimator is the simplest estimator to build. But sometimes an MLE estimator is not possible, and in other cases some amount of bias in the estimator is useful (if you know something the model doesn't; see e.g. regularization).

Maximum likelihood imputation is maximum likelihood estimation applied to missing data. First, **build a maximum likelihood estimator** with the complete records in the dataset as your predictor variables and the variable containing missing values your target. Then, for each record containing missing data, **draw a value from the distribution you generated**, one parameterized with the known dependent values of the data.

This purely statistical approach to this problem has the drawback of statistical models more generally in that it is dependent on the probability distribution you use in your estimator. If you expect the data is normally distributed, you may fit a normal distribution to the data. If it's Bernoulli you can fit a Bernoulli distribution. If it's a combination of different distributions, then you have to build a multimodal distribution!

For this reason there is no "standard" maximum likelihood estimator imputation technique. Instead, qouting from this excellent CrossValidated answer:

Handling missing data with Maximum Likelihood on all available data (so-called FIML) is a very useful technique. However, there are a number of complications that make it challenging to implement in a general way.

Consider a simple linear regression model, predicting some continuous outcome from say age, sex, and occupation type. In OLS, you do not worry about the distribution of age, sex, and occupation, only the outcome. Typically for categorical predictors, they are dummy coded (0/1). To use ML, distributional assumptions are required for all variables with missingness. By far the easiest approach is multivariate normal (MVN). This is what for example Mplus will do by default if you do not go out for your way to declare the type of variable (e.g., categorical).

In the simple example I gave, you would probably want to assume, normal for age, Bernoulli for sex, and multinomal for job type. The latter is tricky because what you actually have are several binary variables, but you do not want to treat them as Bernoulli. This means you do not want to work with the dummy coded variables, you need to work with the actual categorical variable so the ML estimators can properly use a multinomial, but this in turn means that the dummy coding process needs to be built into the model, not the data. Again complicating life.

Further, the joint distribution of continuous and categorical variables is nontrivial to compute (when I run into problems like this in Mplus, it pretty quickly starts to break down and struggle). Finally, you really ideally specify the missing data mechanism. In SEM style, FIML, all variables are essentially conditioned on all others, but this is not necessarily correct.

For example, perhaps age is missing as a function not of gender and occupation type, but their interaction. The interaction may not be important for the focal outcome, but if it is important for missingness on age, then it must also be in the model, not necessarily the substantive model of interest but the missing data model.

I don't know of any off-the-shelf maximum likelihood imputation algorithms in Python, for precisely this reason.

The most flexible possible solution for modeling the distribution of data is kernel density estimation. sklearn includes *raw kernel density estimator algorithms* available. I might suggest starting there. Otherwise, if you want to go the statistical estimator route, the statsmodel package includes facilities for working with all of the most common types of statistical distributions.

### Multiple imputation


All of the techniques discussed so far are what one might call "single imputation": each value in the dataset is filled in exactly once. In general, the limitation with single imputation is that because these techniques find maximally likely values, they do not generate entries which accurately reflect the distribution of the underlying data.

Take the extreme case of replacing missing values in the data with the mean value, for example. *If we had been able to observe the data we were missing, we would naturally expect to see some variability in it: extreme values, outliers, and records which do not completely fit the "pattern" of the data. This noise is intrinsic to the dataset, yet mean value replacement makes no attempt to represent it in its result. This leads to bias in any downstream models, which are exposed to a trend (the presence of the mean value in the datset) which does not exist in the underlying data. This in turn decreases accuracy during both the train and test phases.*

In the statistical literature, arguably the most advanced methodology for performing missing data imputation is **multiple imputation**. In multiple imputation we generate missing values from the dataset many times. The individual datasets are then pooled together into the final imputed dataset, with the values chosen to replace the missing data being drawn from the combined results in some way. In other words, multiple imputation breaks imputation out into three steps: imputation (multiple times), analysis (staging how the results should be combined), and pooling (integrating the results into the final imputed matrix).

Any technique that follows this general framework is a multiple imputation technique. As such, there are a variety of multiple imputation algorithms and implementations available. The most popular algorithm is called MICE, and a Python implementation thereof is available as part of the fancyimpute package: https://github.com/iskandr/fancyimpute.

1) A simple imputation, such as imputing the mean, is performed for every missing value in the dataset. These mean imputations can be thought of as “place holders.”

2) The “place holder” mean imputations for one variable (“var”) are set back to missing.

3) The observed values from the variable “var” in Step 2 are regressed on the other variables in the imputation model, which may or may not consist of all of the variables in the dataset. In other words, “var” is the dependent variable in a regression model and all the other variables are independent variables in the regression model.

4) The missing values for “var” are then replaced with predictions (imputations) from the regression model. When “var” is subsequently used as an independent variable in the regression models for other variables, both the observed and these imputed values will be used.

5) Steps 2–4 are then repeated for each variable that has missing data. The cycling through each of the variables constitutes one iteration or “cycle.” At the end of one cycle all of the missing values have been replaced with predictions from regressions that reflect the relationships observed in the data.

6) Steps 2 through 4 are repeated for a number of cycles, with the imputations being updated at each cycle. At the end of these cycles the final imputations are retained, resulting in one imputed dataset. Generally, ten cycles are performed; however, research is needed to identify the optimal number of cycles when imputing data under different conditions. The idea is that by the end of the cycles the distribution of the parameters governing the imputations (e.g., the coefficients in the regression models) should have converged in the sense of becoming stable.