# Numpy Boolean Masks and argsort()

In [None]:
import numpy as np

### Why are we covering these two topics?

#### Historically, students have had difficulty understanding these two concepts when asked to use them in homework and exam exercises. So we introduce them here, to help with that understanding. We will not be doing complex exercises here, but simply introducing the concepts.

# Numpy boolean masks

### What is a boolean mask?

**In pandas, as we saw earlier, a mask is used filter and return only the rows that meet a certain condition.**

**In numpy however, a mask creates a "truth array" of the same shape as the source array being compared.**

**Each element in the "truth array" will have a value of either `True` or `False`, depending on the result of the comparison on that element in the source array.**

**You can then use the truth array to filter/select only the source array elements that you need in your exercise.**

In [None]:
# create two 2x2 numpy arrays
a = np.random.randint(0,10,(3,3))
display(a)

In [None]:
a < 5

In [None]:
# using parentheses for readability
truth_less_than_five = (a < 5)
display(truth_less_than_five)
display(truth_less_than_five.dtype)

In [None]:
# using parentheses for readability
truth_greater_equal_five = (a >= 5)
display(truth_greater_equal_five)
display(truth_greater_equal_five.dtype)

Now to select these values from the array, we can simply index on this Boolean array; this is known as a masking operation:

In [None]:
a[a < 5]

In [None]:
a[truth_less_than_five]

What is returned is a one-dimensional array filled with all the values that meet this condition; in other words, all the values in positions at which the mask array is `True`.

**We can then use these values as required in the exercise. This is the key takeaway here. When we want to filter/return only the values that meet some criteria, we want to use a Boolean Mask to do so.**

**You will see applications of this in the homework notebooks and sample/practice midterm and final exam notebooks.**

The numpy documentation has an excellent reference on the logic and functions you can use when applying Boolean Masks:  https://numpy.org/doc/stable/reference/routines.logic.html

As noted previously, Vanderplas has a good introduction to Boolean Masks in his book:  https://jakevdp.github.io/PythonDataScienceHandbook/02.06-boolean-arrays-and-masks.html

### What are your questions on Boolean Masks?

# argsort() function

Documentation link:  https://numpy.org/doc/stable/reference/generated/numpy.argsort.html

### The np.argsort() function is used to return the indices that would sort an array.

#### So the function returns an array of indexes of the same shape as `a` that index data along the given axis in sorted order (from the documentation).

OK, so what does this mean, in practice?

The function will return an integer array, with the same shape as the source array, with the values being the index locations of the sorted source array values. The returned array does not sort the values themselves, but it gives us what the order of the sorted values would be.

Let's look at a simple example for understanding.

In [None]:
a = np.array([5, 3, 2, 0, 1, 4])
np.argsort(a)

OK, so what is this array telling us?

1. The element at index = 3 is the first element in the sorted order (0 is the lowest value).
2. The element at index = 4 is the second element in the sorted order (1 is the next lowest value).
3. The element at index = 2 is the third element in the sorted order (2 is the next lowest value).
.......
4. The element at index = 0 is the largest element in the sorted order (5 is the highest value).

Does it sort float values in the same manner?

In [None]:
b = np.array([5.0, 3.0, 2.0, 0.0, 1.0, 4.0])
np.argsort(b)

What about strings?

In [None]:
c = np.array(['p','m','x','h','a','t'])
np.argsort(c)

Now let's look at a simple example. While this may seem fairly straightforward, conceptually, this is the types of exercise that you will see in the homework notebooks and on the exams.

**Requirement:**

What are the three largest values in an array?

Return a numpy array with these three values.

In [None]:
# intialize an array
a = np.array([68,43,2,100,54,5,12,76,23,37])
a

Using visual inspection, what are the three largest values?

1. Value = 100, at index 3.
2. Value = 76, at index 7.
3. Value = 68, at index 0.

In [None]:
# using argsort, get indices of the values, arranged in ascending order
np.argsort(a)

#### Recall slicing of arrays, for the cells below.

We use square brackets to access subarrays with the slice notation, marked by the colon (:) character. The NumPy slicing syntax follows that of the standard Python list; to access a slice of an array x, use this:

**x[start:stop:step]**

If any of these are unspecified, they default to the values `start=0`, `stop=size of dimension`, `step=1`.


Good reference, go down about 1/3rd of the page:  https://jakevdp.github.io/PythonDataScienceHandbook/02.02-the-basics-of-numpy-arrays.html

In [None]:
# return three highest value index of array
# In the slice notation, we are telling it to return the last three values of the sort array,
# which are the indexes of the three largest values in the original array.
np.argsort(a)[-3::]

Note that in the above, we have included both sets of `colons` in the code `[::]`.

Because we are only defining the first part of the slice notation (give us the last three elements), we actually don't **NEED** the second colon.

So the two lines below are equivalent.

We are using the code with **BOTH COLONS** `[::]` for ease of understanding.

In [None]:
# the two lines of code below are equivalent
display(np.argsort(a)[-3::])
display(np.argsort(a)[-3:])

In [None]:
# Now let's arrange the sort array in ascending order of index, for the top three
# We are taking the array from the previous cell and using slice notation
# to sort the indexes in reverse order (step = -1)
# Note that we are still returning the indexes from the original array.
np.argsort(a)[-3::][::-1]

In [None]:
# Finally, let's return the 3 highest values from the original array
# Remember from the last step the we are returning, in sorted order, the
# indexes of the top three values.
# So all we are doing now is returning the values at those indexes.
a[np.argsort(a)[-3::][::-1]]

In [None]:
# This is the same as:
a[[3,7,0]]

### What are your questions on argsort()?