# Masking With Boolean Arrays

Masking with boolean arrays provides efficient means of accessing, manipulating and performing calculations on NumPy arrays based on some criteria. 

Let's look at a simple use case. Generate an array of uniformly-distributed random integers and find out how may of them are even.

In [1]:
import numpy as np

np.random.seed(50)
x = np.random.randint(0, 100, size=100)
print(x[:10])

[48 96 11 33 94  4 70 70 22  5]


In [2]:
mask = (x % 2 == 0)
print(mask[:10])

[ True  True False False  True  True  True  True  True False]


In [3]:
print(mask.sum())

55


So, 55 of the random integers are even. This makes intuitive sense as half of the samples randomly drawn from a uniform distribution of integers would be expected to be even. 

Next, save even and odd integers into separate variables.

In [4]:
even, odd = x[mask], x[~mask]

In [5]:
print(even[:10])

[48 96 94  4 70 70 22  2 68 78]


In [6]:
print(odd[:10])

[11 33  5 95 71 35 91 43 31 49]


Comparison operators like *==*, *!=*, *<*, *>*, *<=* and *>=* also generate boolean masks, which can be used to perform fast selections and manipulations of NumPy arrays.

Needless to say, these operators work on multi-dimensional arrays, too.

In [7]:
y = np.random.randint(99, size = (4, 5))

In [8]:
print(y)

[[27 50 58 28 31]
 [15  5 52 55 56]
 [47 68  4 18 42]
 [17 90 72 46 11]]


In [9]:
y < 10

array([[False, False, False, False, False],
       [False,  True, False, False, False],
       [False, False,  True, False, False],
       [False, False, False, False, False]], dtype=bool)

In [10]:
# count the number of entries less than 50
print(np.sum(y < 50))

12


In [11]:
# how many values are less than 50 in each column?
print(np.sum(y < 50, axis=0))


[4 1 1 3 3]


In [12]:
# how many values are less than 50 in each row?
print(np.sum(y < 50, axis=1))

[3 2 4 3]


In [13]:
# are all values less than 100?
print(np.all(y < 100))

True


In [14]:
# is there any value less than 20
print(np.any(y < 20))

True


In [15]:
# are all values greater than 20 in each row?
print(np.all(y > 20, axis=1))

[ True False False False]


In [16]:
# what about in each column?
print(np.all(y > 20, axis=0))

[False False False False False]


## Bitwise Logic Operators

When compound logical opearations between NumPy arrays are involved, bitwise logic operators *&*, *|*, *^* and *~* should be used rather than the keywords *and*, *or*, *logical_xor* or *not*. 

The bitwise logic operators operate on each element - either bits of, say, integers, or elements of arrays - of the operands whereas the logic keywords operate on entire operands.

In [17]:
# count of elements in y that are even and greater than 50
print(np.sum((y>50) & (y%2==0)))

6


What happens above if the keyword *and* is used instead of the bitwise operator *&*?

In [18]:
print(np.sum((y>50) and (y%2==0)))

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Since the keyword operator *and* compares whole operands and each operand in this case is an array, NumPy cannot determine the truth value of the logical AND operation. Hence, it raises an error.