Exploration and visualisation of missing data #15
Comments
Snap! I have medical data with missing entries too. I'm interested in being able to visual it and explore clusters of missingness as well as other types of data inconsistencies (e.g. end time before start time). |
The Also have a look at e.g. http://www.r-bloggers.com/imputing-missing-data-with-r-mice-package/ |
Thank Jonno, Thanks for that, VIM certainly does have some useful plots, what do you think about incorporating them into ggmissing? |
Keep the package simple. Primary purpose is to make ggplot2 graphics that include the missings in the plot. |
I'm not very familiar with ggmissing, but I'd like to know more about it! BTW, here is a nice example of a scatterplot with margins for missing values http://kbroman.org/d3panels/assets/test/scatterplot/ |
7 votes from the AuUnconf... :) Might be worth continuing discussions around this.. |
Nick created a channel on the AuUnconf slack account. Anyone interested can join discussions there also. |
In my PhD research I work with medical data and there are often large amounts of it missing. In my attempts to explore missing data problems and make my life easier I have done some work on two packages:
ggmissing
with Di Cook, andmex
with Damjan Vukcevic. But, as my PhD research continues, I have been finding it hard to dedicate some serious time to continue work on these packages.I'd like to propose a project on one, or perhaps both of these packages.
A bit more about them:
ggmissing
extends ggplot to allow for missing data to be visualised. This would basically involve creating a couple of ggplotgeom_missing_*
functions that could be added as a layer to a plot. For example,geom_missing_point()
would add in and colour the missing points. You can see more about it on the github repo, and at these slides.mex
is a missingness exploration package. This extends off of some research that I have done into using decision trees to explore missing data. The original idea of the package was to create a framework or even a recommended path for handling missing data. One idea was to break it into exploring, modelling, and confirming.Exploring would include:
visdat
Modelling would include:
Confirming might be something like:
I'm very much open to suggestions about how to implement these ideas.
The text was updated successfully, but these errors were encountered: