# Ch02/02_04 Learn Boolean Indexing

In [4]:
import numpy as np

Sometimes you'd like to select parts of an array not by indices, but by some logic, say all the values that are bigger than some threshold.

For this, we're going to use Boolean indexing. Let's see how it works. So we **import Numpy and create an array with three elements**. Let's run it.

In [14]:
arr = np.arange(3)
arr

array([0, 1, 2])

You can index an array with another array in the same shape **containing Boolean values true or false**. So here's the array and I'm adding it in index with true, false, true and I'm going to run it. I'm going to get only the elements where the index was true. For now this does not seem that helpful, but let's see one more thing.

In [15]:
import numpy as np

arr = np.arange(3)
arr

array([0, 1, 2])

In [16]:
arr[[True, False, True]]

array([0, 2])

And this will make it super useful if I'm doing array bigger or equal to one. I'm going to get a Boolean array with false through and through and now I can combine these two and say array at the location where the array is bigger or equal to. 

In [17]:
arr >= 1

array([False,  True,  True])

And I'm getting one and two, but not the zero. 

In [18]:
arr[arr >= 1]

array([1, 2])

You can combine these conditions or masks using Boolean operators. However these are not the normal Boolean operators that we have in Python. You're going to use **ampersand (&)** for **AND** the **vertical bar (|)** for **OR** and the **tilde(~)** for NOT. 

So I have a bigger array now with 10 and I'm saying all the elements that are bigger than two and smaller than seven. Note that the parenthesis here is mandatory. 

In [19]:
arr = np.arange(10)

arr[(arr>2) & (arr<7)]

array([3, 4, 5, 6])

In [20]:
arr[(arr>7) | (arr<2)]

array([0, 1, 8, 9])

So everything that is not bigger than 7 usually going to write it as everything that is smaller or equal to 8. And here is an example of negation.

In [21]:
arr[~(arr>7)]

array([0, 1, 2, 3, 4, 5, 6, 7])

# Standard score
https://en.wikipedia.org/wiki/Standard_score       
  Let's do a more realistic example. We're going to find outliers using the standard score.

 First, we're going to create our data. So I'm going to create. And array of 1000 elements with values between 0 and 10 with the normal distribution and then I'm going to add 2 outliers at location 33 and location 832. 

In [5]:
values = np.random.normal(0, 10, 1_000)
values[33] = 1038
values[832] = -3423

Now I'm going to calculate my mask so the mask is where the absolute value of the value itself minus the mean, meaning the distance from the mean is bigger than two times the standard deviation inside the array, and then I'm going to calculate the values in the mask. I'm going to see exactly my outliers. 

In [6]:
mask = np.abs(values - values.mean()) > (2 * values.std())
values[mask]

array([ 1038., -3423.])

Over there I can even use this mask to change the values, let's say to the mean of the current array. This is an example for the power of Boolean indexing. You can do a lot of things in a very few line apart from the cool effect. This is also very fast. In numerical Python you try hard not to do any follow-ups. This method of computation is called vectorization and once you use it, everything runs at the C. For photon level of Numpy.

In [7]:
values[mask] = values.mean()

In [8]:
values

array([-1.24151744e+01, -5.42715440e+00, -2.10605999e+00,  9.30479524e-01,
       -9.31545035e+00, -1.16376728e+01,  6.34152402e+00,  4.79516972e+00,
        1.52506419e+01, -2.20290294e+00,  1.52115650e+00,  1.41811488e+01,
       -4.53863647e+00,  1.10360531e+01,  3.19759658e+00,  1.64336588e+01,
       -1.20022750e+00, -9.31617544e-01,  6.49295023e+00,  2.50491436e+00,
       -1.33440302e+01, -5.04832123e+00,  5.37535185e+00, -2.90431152e+00,
       -1.42853300e+01, -7.95079632e+00, -1.06373360e+01,  2.20321926e+00,
        2.05997711e+01,  1.01998094e+01,  1.06987001e+01,  1.25616240e+01,
        1.06430223e+00, -2.37295772e+00,  1.22791791e+00, -8.08582967e+00,
       -1.59959886e+01, -1.35194268e+01,  1.49027277e+01,  3.65121249e+00,
       -1.03972089e+01, -3.97403849e+00, -5.28042897e+00,  8.10545529e+00,
        2.13794271e+00,  2.68200285e+00,  6.29726393e+00, -1.27596273e+01,
       -4.94719453e+00, -3.81283790e+00,  9.00123118e-01,  5.06563625e-01,
        7.07070854e+00,  

<a href="https://github.com/browntruck246/Data-Science/blob/main/Data-Science-Foundations-Python/context.md">Context</a>