Skip to content

nishamuktewar/missing_data_imputation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 

Repository files navigation

Imputation of missing data

Traditionally the most common way to deal with missing data has been:

  • replace the missing with mean or median value. If categorical, the most common class or a separate missing value.
  • include a missing indicator variable
  • exclude observations when one or more variable values are missing
  • interpolation when it comes to time series

Thoughts

It has a very broad application area - structured or timeseries data, images, audio. In terms of verticals, I see quite a few recent papers in the lifesciences/healthcare industry or using a medical dataset to demonstrate the algorithm. Other than the traditional approaches - EM, matrix completion, some form of RNNs, GANs and using auto-encoders have been part of the recent research. One other interesting perspective of missing data is when the data is multi-modal, meaning for some samples we have brain signal data and for some we have let's say brain scans.

Recent papers

arxiv archive

ICML archive

ICLR archive

NIPS

Criteria

  1. Is it exciting to the team? Would not classify it as exciting but definitely something that needs our attention
  2. Can it be framed as a strong capability (rather than an algorithm)? probably no
  3. Is it a subject that is more possible now than in two years, and likely to be more possible/transformative still in a couple of years. That usually means some or all of the following things are true: i. There is excitement (ideally including concrete breakthroughs) in the research community - maybe ii. Economic constraints (e.g. hardware) have lessened/disappeared iii. There has there been a commoditization of tooling - no iv. Data is available (especially to FFL!) - using some kind of patient healthcare data might make an interesting usecase
  4. Is it useful to our existing clients? yes
  5. It is appealing to potential clients? yes
  6. It it possible to build a prototype?

Older or other papers:

Miscellaneous, a bit probably unrelated at times:

About

Dealing with missing data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published