The maskr package provides a masked array type, which is either a
vector, array, matrix, or data frame which has a mask. The mask is a
list of indices, one for each axis of the array, which defines the parts
of the array to show. All other parts then become invisible.
You can install the development version of maskr from GitHub with:
# install.packages("remotes")
remotes::install_github("multimeric/maskr")You can create a MaskedArray using the constructor function, which
takes firstly the array to mask, and secondly a list of mask vectors,
one for each dimension of the array. For a masked vector, the mask is a
list of one vector.
Here we define an inner array with 10 elements, but our mask, which is a
vector of indices to show, hides "a" and "j":
x = maskr::MaskedArray(letters[1:10], masks=list(2:9))
x
#> An object of class "MaskedArray"
#> Slot "value":
#> [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
#>
#> Slot "masks":
#> [[1]]
#> [1] 2 3 4 5 6 7 8 9For most purposes, the array acts the same as
c("b", "c", "d", "e", "f", "g", "h", "i"):
length(x)
#> [1] 8x[[1]]
#> [1] "b"However, we can unmask the array the return to the original form:
maskr::unmask(x)
#> [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"One use case for maskr is hiding some kind of bad data, like outliers.
For example, let’s start with some data that includes two outliers
set.seed(1)
x = c(
rnorm(10),
rnorm(2, mean=2)
)
x
#> [1] -0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078 -0.8204684
#> [7] 0.4874291 0.7383247 0.5757814 -0.3053884 3.5117812 2.3898432We could identify these outliers using some kind of test. In this case
we test that all the values come from from a
outliers = pnorm(x, lower.tail = F) < 0.05
outliers
#> [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUEThen we can use that outlier list as a mask. Note that logical vectors are also legal masks!
masked = maskr::MaskedArray(x, mask = list(!outliers))
masked
#> An object of class "MaskedArray"
#> Slot "value":
#> [1] -0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078 -0.8204684
#> [7] 0.4874291 0.7383247 0.5757814 -0.3053884 3.5117812 2.3898432
#>
#> Slot "masks":
#> [[1]]
#> [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSEWe can then find the mean of the non-outliers directly on the masked array:
mean(masked)
#> [1] 0.1322028Another useful feature of maskr is that, when you subset an array, the
mask shrinks to hide the excluded elements. Let’s define x as in the
first section:
x = maskr::MaskedArray(letters[1:10], masks=list(2:9))
x[5:6]
#> An object of class "MaskedArray"
#> Slot "value":
#> [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
#>
#> Slot "masks":
#> [[1]]
#> [1] 6 7Notice how, the resulting object is still a length-10 vector, but now all but 2 items are hidden. What this means, is that you can take a sequence of slices from the array:
y = x[-1][-1][-1]We can compute the final subset:
maskr::apply_mask(y)
#> [1] "e" "f" "g" "h" "i"But also revert the mask:
maskr::unmask(y)
#> [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"We can mask a matrix. If we do, the mask needs to be a list of two vectors:
x = maskr::MaskedArray(matrix(rnorm(n=16), nrow=4), list(1:2, 2:3))
x
#> An object of class "MaskedArray"
#> Slot "value":
#> [,1] [,2] [,3] [,4]
#> [1,] -0.62124058 -0.01619026 0.91897737 0.61982575
#> [2,] -2.21469989 0.94383621 0.78213630 -0.05612874
#> [3,] 1.12493092 0.82122120 0.07456498 -0.15579551
#> [4,] -0.04493361 0.59390132 -1.98935170 -1.47075238
#>
#> Slot "masks":
#> [[1]]
#> [1] 1 2
#>
#> [[2]]
#> [1] 2 3maskr::apply_mask(x)
#> [,1] [,2]
#> [1,] -0.01619026 0.9189774
#> [2,] 0.94383621 0.7821363We can also mask a data.frame in the same way. Note that we can use a
character vector to select certain columns, as you might expect:
x = maskr::MaskedArray(iris, list(10:15, c("Petal.Length", "Sepal.Length")))
maskr::apply_mask(x)
#> Petal.Length Sepal.Length
#> 10 1.5 4.9
#> 11 1.5 5.4
#> 12 1.6 4.8
#> 13 1.4 4.8
#> 14 1.1 4.3
#> 15 1.2 5.8