Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in NiPN data quality toolkit #1

Closed
ernestguevarra opened this issue Jun 25, 2019 · 0 comments
Closed

Error in NiPN data quality toolkit #1

ernestguevarra opened this issue Jun 25, 2019 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@ernestguevarra
Copy link
Member

I found a bug in the outliersUV() function in the NiPN data quality toolkit that can prevent calls such as:

svy[outliersUV(svy$muac), ]

from returning the correct set of records when there are NA values in the variable being tested.

I found this when using the toolkit with a dataset from MSF that had NA for all anthropometry when oedema was present. This may be a common case.

Here is an example of the problem using the example dataset in the NiPN toolkit:

svy <- read.table("rl.ex01.csv", header = TRUE, sep = ",")
head(svy)
## Test function
svy[outliersUV(svy$muac), ]

This gives:

Univariate outliers : Lower fence = 98, Upper fence = 178

  age sex weight height muac oedema
33 24 1 9.8 74.5 180.0 2
93 12 2 6.7 67.0 96.0 1
126 16 2 9.0 74.6 999.0 2
135 18 2 8.5 74.5 999.0 2
194 24 M 7.0 75.0 95.0 2
227 8 M 6.2 66.0 11.1 2
253 35 2 7.6 75.6 97.0 2
381 24 1 10.8 82.8 12.4 2
501 36 2 15.5 93.4 185.0 2
594 21 2 9.8 76.5 13.2 2
714 59 2 18.9 98.5 180.0 2
752 48 2 15.6 102.2 999.0 2
756 59 1 19.4 101.1 180.0 2
873 59 1 20.6 109.4 179.0 2

Note that the values in the muac column are all outside of the fences. This is correct.

Adding a missing value:

## Add a missing value
svy$muac[1] <- NA
head(svy)
## Test function
svy[outliersUV(svy$muac), ]

gives us:

Univariate outliers : Lower fence = 98, Upper fence = 178

  age sex weight height muac oedema
32 24 2 8.2 68.7 139 2
92 14 F 7.2 70.6 109 2
125 12 1 8.8 70.6 124 2
134 18 M 7.9 73.1 114 2
193 24 1 8.3 77.1 116 2
226 12 2 9.6 79.2 133 2
252 15 1 11.5 79.2 154 2
380 32 1 11.8 82.6 142 2
500 36 2 11.8 90.0 136 2
593 59 2 13.3 97.5 146 2
713 59 2 13.1 99.3 126 2
751 59 1 16.5 101.1 154 2
755 59 2 14.9 104.4 143 2
872 59 1 15.9 109.5 139 2

The listed records do not contain outliers. This is wrong.

@ernestguevarra ernestguevarra added the bug Something isn't working label Jun 25, 2019
@ernestguevarra ernestguevarra self-assigned this Jun 25, 2019
ernestguevarra added a commit that referenced this issue Jun 25, 2019
ernestguevarra added a commit that referenced this issue Jun 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant