Skip to content

Latest commit

 

History

History
46 lines (42 loc) · 3.82 KB

gen-handlingMissingValues.md

File metadata and controls

46 lines (42 loc) · 3.82 KB

Strategies for Handling Missing Values

A number of courses in the Johns Hopkins Data Science Specialization on Coursera force students to deal with messy data, including missing values in the data. This leads them to ask questions about different ways for managing missing values, given that this topic is not covered in any level of detail in the specialization.

To provide more background on this topic, I've compiled the following list of resources on missing values. Since articles posted on the internet sometimes disappear over time for various reasons, I've referenced local copies of the publicly available articles.

ResourceDescription
A Comparison of Six Methods for Missing Data Imputation Author(s): Peter Schmitt, Jonas Mandel, and Mickael Guedj

Article compares six different imputation methods against four real data sets of varying sizes. Results are based on four evaluation criteria, including root mean squared error, unsupervised classification error, supervised classification error, and execution time.
A Review of Missing Data Handling Methods in Education Research Author: Jehanzeb R. Cheema

Article discusses the problem of missing data in educational research, including a review of previously published studies.
A Framework for Missing Value Imputation Author(s): Ms. R. Malarvizhi, Dr. Antony Selvadoss Thanamani

Article discusses the imputation of data by comparing the two most popular techniques, mean substitution and k-means clustering with a proposed k nearest neighbor approach.
Chapter 25: Missing Data Imputation Author(s): Andrew Gelman, Jennifer Hill

Professor Gelman posted chapter 25 of his book Data Analysis Using Regression and Multilevel / Hierarchical Models on his website at Columbia University. The book is considered an important reference for social scientists using linear and hierarchical models. The missing values chapter describes a variety of ways to handle missing data, and includes examples coded in R.
Review of Methods for Missing Data Author(s): Therese Pigott

Pigott's article compares model based methods of missing value imputation with ad hoc methods, such as pairwise or listwise deletion. The approaches are compared using an analysis of students' ability to control asthma symptoms.
Missing Data in Educational Research: A Review of Reporting Practices and Suggestions for Improvement Author(s): James L. Peugh, Craig K. Enders

Article provides an overview of missing-data theory, maximum likelihood estimation and multiple imputation. It also includes a methodological review of missing data reporting practices across 23 applied research journals, and demonstrates forms of imputation on data from the Longitudinal Study of American Youth.