# NumPy Part 2

> Python Data Science Handbook, *Jake VanderPlas 2020*

---

# Comparisons, Masks, and Boolean Logic

## Comparison Operators as ufuncs

Comparison Operators as ufuncs
In the previous lesson we introduced
ufuncs, and focused in particular on arithmetic operators. We saw that using +, -, *, /,
and others on arrays leads to element-wise operations. NumPy also implements com‐
parison operators such as < (less than) and > (greater than) as element-wise ufuncs.
The result of these comparison operators is always an array with a Boolean data type.
All six of the standard comparison operations are available:

```python
In[4]: x = np.array([1, 2, 3, 4, 5])
In[5]: x < 3 # less than
Out[5]: array([ True, True, False, False, False], dtype=bool)
In[6]: x > 3 # greater than
Out[6]: array([False, False, False, True, True], dtype=bool)
In[7]: x <= 3 # less than or equal
Out[7]: array([ True, True, True, False, False], dtype=bool)
In[8]: x >= 3 # greater than or equal
Out[8]: array([False, False, True, True, True], dtype=bool)
In[9]: x != 3 # not equal
Out[9]: array([ True, True, False, True, True], dtype=bool)
In[10]: x == 3 # equal
Out[10]: array([False, False, True, False, False], dtype=bool)
```

It is also possible to do an element-by-element comparison of two arrays, and to
include compound expressions:

```python
In[11]: (2 * x) == (x ** 2)
Out[11]: array([False, True, False, False, False], dtype=bool)
```
As in the case of arithmetic operators, the comparison operators are implemented as
ufuncs in NumPy; for example, when you write x < 3, internally NumPy uses
np.less(x, 3).

Just as in the case of arithmetic ufuncs, these will work on arrays of any size and
shape. Here is a two-dimensional example:
```python
In[12]: rng = np.random.RandomState(0)
        x = rng.randint(10, size=(3, 4))
        x
Out[12]: array([[5, 0, 3, 3],
                [7, 9, 3, 5],
                [2, 4, 7, 6]])
In[13]:     x < 6
Out[13]:    array([[ True, True, True, True],
                   [False, False, True, True],
                   [ True, True, False, False]], dtype=bool)
```
In each case, the result is a Boolean array, and NumPy provides a number of straight‐
forward patterns for working with these Boolean results.

### **Try it yourself**: Given and 2D Matrix at `x`, find the values greater than 4.

In [10]:
import numpy as np

np.random.seed(0)  # seed for reproducibility

size = [3,3]#edit the size after the lecture
x = np.random.randint(0,10,size=size)  

#your code here


---

## Working with Boolean Arrays

Given a Boolean array, there are a host of useful operations you can do. We’ll work
with x, the two-dimensional array we created earlier:
```python
In[14]: print(x)
        [[5 0 3 3]
         [7 9 3 5]
         [2 4 7 6]]
```
### Counting entries
To count the number of True entries in a Boolean array, np.count_nonzero is useful:

```python
In[15]: # how many values less than 6?
 np.count_nonzero(x < 6)
Out[15]: 8
```

We see that there are eight array entries that are less than 6. Another way to get at this
information is to use np.sum; in this case, False is interpreted as 0, and True is inter‐
preted as 1:

```python
In[16]: np.sum(x < 6)
Out[16]: 8
```
The benefit of `sum()` is that like with other NumPy aggregation functions, this sum‐
mation can be done along rows or columns as well:

```python
In[17]: # how many values less than 6 in each row?
 np.sum(x < 6, axis=1)
Out[17]: array([4, 2, 2])

```

If we’re interested in quickly checking whether any or all the values are true, we can
use (you guessed it) np.any() or np.all():
```python
In[18]: # are there any values greater than 8?
 np.any(x > 8)
Out[18]: True
In[19]: # are there any values less than zero?
 np.any(x < 0)
Out[19]: False
In[20]: # are all values less than 10?
 np.all(x < 10)
Out[20]: True
In[21]: # are all values equal to 6?
 np.all(x == 6)
Out[21]: False
```
`np.all()` and `np.any()` can be used along particular axes as well. For example:
```python

In[22]: # are all values in each row less than 8?
 np.all(x < 8, axis=1)
Out[22]: array([ True, False, True], dtype=bool)
```
Here all the elements in the first and third rows are less than 8, while this is not the
case for the second row.
Finally, a quick warning: as mentioned in **“Aggregations: Min, Max, and Everything
in Between”**, Python has built-in `sum()`, `any()`, and `all()` functions.
These have a different syntax than the NumPy versions, and in particular will fail or
produce unintended results when used on multidimensional arrays. Be sure that you
are using `np.sum()`, `np.any()`, and `np.all()` for these examples!



### **Try it yourself**: Given the previous array `x` 
1. Count the number of values greater than `3` and print the answer 
2. Print the sum on `axis=0` for numbers greater than `4`
3. Find if there are values equal to `2` on `axis=1`

In [None]:
#your code here

# Boolean Arrays as Masks

In the preceding section, we looked at aggregates computed directly on Boolean
arrays. A more powerful pattern is to use Boolean arrays as masks, to select particular
subsets of the data themselves. Returning to our x array from before, suppose we
want an array of all values in the array that are less than, say, 5:
```python
In[26]: x
Out[26]: array([[5, 0, 3, 3],
                [7, 9, 3, 5],
                [2, 4, 7, 6]])
 ```
We can obtain a Boolean array for this condition easily, as we’ve already seen:
```python
In[27]: x < 5
Out[27]: array( [[False, True, True, True],
                [False, False, True, False],
                [ True, True, False, False]], dtype=bool)
 ```
Now to select these values from the array, we can simply index on this Boolean array;
this is known as a masking operation:
```python
In[28]: x[x < 5]
Out[28]: array([0, 3, 3, 3, 2, 4])
```
What is returned is a one-dimensional array filled with all the values that meet this
condition; in other words, all the values in positions at which the mask array is True.
We are then free to operate on these values as we wish. For example, we can compute
some relevant statistics on our Seattle rain data:
```python
In[29]:
# construct a mask of all rainy days
rainy = (inches > 0)
# construct a mask of all summer days (June 21st is the 172nd day)
summer = (np.arange(365) - 172 < 90) & (np.arange(365) - 172 > 0)
print("Median precip on rainy days in 2014 (inches): ",
 np.median(inches[rainy]))
print("Median precip on summer days in 2014 (inches): ",
 np.median(inches[summer]))
print("Maximum precip on summer days in 2014 (inches): ",
 np.max(inches[summer]))
print("Median precip on non-summer rainy days (inches):",
 np.median(inches[rainy & ~summer]))
    Median precip on rainy days in 2014 (inches): 0.194881889764
    Median precip on summer days in 2014 (inches): 0.0
    Maximum precip on summer days in 2014 (inches): 0.850393700787
    Median precip on non-summer rainy days (inches): 0.200787401575
```
By combining Boolean operations, masking operations, and aggregates, we can very
quickly answer these sorts of questions for our dataset.


### **Try it yourself**: Given the array, mask the values with inches greather 0.378 inches/day.(<a href="https://www.usgs.gov/special-topics/water-science-school/science/precipitation-and-water-cycle">USGS Precipitation and Water Cycle</a>) then:
1. Find the average, median, and max

In [16]:
import pandas as pd
# use Pandas to extract rainfall inches as a NumPy array
rainfall = pd.read_csv(r"https://github.com/jakevdp/PythonDataScienceHandbook/raw/master/notebooks_v1/data/Seattle2014.csv")['PRCP'].values
inches = rainfall / 254 # 1/10mm -> inches

#your code here



---
# Fancy Indexing
In the previous sections, we saw how to access and modify portions of arrays using
simple indices (e.g., `arr[0]`), slices (e.g., `arr[:5]`), and Boolean masks (e.g., `arr[arr> 0]`). In this section, we’ll look at another style of array indexing, known as fancy
indexing. Fancy indexing is like the simple indexing we’ve already seen, but we pass
arrays of indices in place of single scalars. This allows us to very quickly access and
modify complicated subsets of an array’s values.

## Exploring Fancy Indexing
Fancy indexing is conceptually simple: it means passing an array of indices to access
multiple array elements at once. For example, consider the following array:
```python
In[1]: import numpy as np
 rand = np.random.RandomState(42)
 x = rand.randint(100, size=10)
 print(x)
[51 92 14 71 60 20 82 86 74 74]
```
Suppose we want to access three different elements. We could do it like this:
```python
In[2]: [x[3], x[7], x[2]]
Out[2]: [71, 86, 14]
```
Alternatively, we can pass a single list or array of indices to obtain the same result:
```python
In[3]: ind = [3, 7, 4]
 x[ind]
Out[3]: array([71, 86, 60])
```
With fancy indexing, the shape of the result reflects the shape of the index arrays
rather than the shape of the array being indexed:
```python
In[4]: ind = np.array([[3, 7],
 [4, 5]])
 x[ind]
Out[4]: array([[71, 86],
 [60, 20]])
 ```
Fancy indexing also works in multiple dimensions. Consider the following array:
```python
In[5]: X = np.arange(12).reshape((3, 4))
 X
Out[5]: array([[ 0, 1, 2, 3],
 [ 4, 5, 6, 7],
 [ 8, 9, 10, 11]])
 ```
Like with standard indexing, the first index refers to the row, and the second to the
column:
```python
In[6]: row = np.array([0, 1, 2])
 col = np.array([2, 1, 3])
 X[row, col]
Out[6]: array([ 2, 5, 11])
```
Notice that the first value in the result is X[0, 2], the second is X[1, 1], and the
third is X[2, 3]. The pairing of indices in fancy indexing follows all the broadcasting
rules that were mentioned in “Computation on Arrays: Broadcasting” on page 63. So,
for example, if we combine a column vector and a row vector within the indices, we
get a two-dimensional result:
```python
In[7]: X[row[:, np.newaxis], col]
Out[7]: array([[ 2, 1, 3],
 [ 6, 5, 7],
 [10, 9, 11]])
 ```
Here, each row value is matched with each column vector, exactly as we saw in broad‐
casting of arithmetic operations. For example:
```python
In[8]: row[:, np.newaxis] * col
Out[8]: array([[0, 0, 0],
 [2, 1, 3],
 [4, 2, 6]])
 ```
It is always important to remember with fancy indexing that the return value reflects
the broadcasted shape of the indices, rather than the shape of the array being indexed.

### Try it yourself: Given a 2D Matrix, cells on every 3rd row and column _using_

In [32]:
x = np.random.randint(0,10,size=(10,10))

#your code here


(array([[5, 0, 3, 3, 7, 9, 3, 5, 2, 4],
        [7, 6, 8, 8, 1, 6, 7, 7, 8, 1],
        [5, 9, 8, 9, 4, 3, 0, 3, 5, 0],
        [2, 3, 8, 1, 3, 3, 3, 7, 0, 1],
        [9, 9, 0, 4, 7, 3, 2, 7, 2, 0],
        [0, 4, 5, 5, 6, 8, 4, 1, 4, 9],
        [8, 1, 1, 7, 9, 9, 3, 6, 7, 2],
        [0, 3, 5, 9, 4, 4, 6, 4, 4, 3],
        [4, 4, 8, 4, 3, 7, 5, 5, 0, 1],
        [5, 9, 3, 0, 5, 0, 1, 2, 4, 2]]),
 array([5, 1, 3, 2]))

## Combined Indexing

For even more powerful operations, fancy indexing can be combined with the other
indexing schemes we’ve seen:

```python
In[9]: print(X)
[[ 0 1 2 3]
 [ 4 5 6 7]
 [ 8 9 10 11]]
```
We can combine fancy and simple indices:
```python
In[10]: X[2, [2, 0, 1]]
Out[10]: array([10, 8, 9])
```
We can also combine fancy indexing with slicing:
```python
In[11]: X[1:, [2, 0, 1]]
Out[11]: array([[ 6, 4, 5],
                [10, 8, 9]])
 ```
And we can combine fancy indexing with masking:
```python
In[12]: mask = np.array([1, 0, 1, 0], dtype=bool)
        X[row[:, np.newaxis], mask]
Out[12]: array([[ 0, 2],
                [ 4, 6],
                [ 8, 10]])
```
All of these indexing options combined lead to a very flexible set of operations for
accessing and modifying array values.

### Try it yourself: Extract every 3rd column with **ALL** the columns and then extract the values at row index `[1,5,2,6,3]`

In [37]:
x = np.random.randint(0,10,size=(10,10))

#your code here

---
# Sorting Arrays

## Fast Sorting in NumPy: `np.sort` and `np.argsort`

Although Python has built-in sort and sorted functions to work with lists, we won’t
discuss them here because NumPy’s np.sort function turns out to be much more
efficient and useful for our purposes. By default np.sort uses an ᇭ N log N , quick‐
sort algorithm, though mergesort and heapsort are also available. For most applica‐
tions, the default quicksort is more than sufficient.
To return a sorted version of the array without modifying the input, you can use
np.sort:
```python
In[5]: x = np.array([2, 1, 4, 3, 5])
 np.sort(x)
Out[5]: array([1, 2, 3, 4, 5])
```
If you prefer to sort the array in-place, you can instead use the sort method of arrays:
```python
In[6]: x.sort()
 print(x)
[1 2 3 4 5]
```
A related function is argsort, which instead returns the indices of the sorted
elements:
```python
In[7]: x = np.array([2, 1, 4, 3, 5])
 i = np.argsort(x)
 print(i)
[1 0 3 2 4]
```
The first element of this result gives the index of the smallest element, the second
value gives the index of the second smallest, and so on. These indices can then be
used (via fancy indexing) to construct the sorted array if desired:
```python
In[8]: x[i]
Out[8]: array([1, 2, 3, 4, 5])
```

### Sorting along rows or columns

A useful feature of NumPy’s sorting algorithms is the ability to sort along specific
rows or columns of a multidimensional array using the axis argument. For example:
```python
In[9]: rand = np.random.RandomState(42)
 X = rand.randint(0, 10, (4, 6))
 print(X)
[[6 3 7 4 6 9]
 [2 6 7 4 3 7]
 [7 2 5 4 1 7]
 [5 1 4 0 9 5]]
 
In[10]: # sort each column of X
 np.sort(X, axis=0)
Out[10]: array([[2, 1, 4, 0, 1, 5],
 [5, 2, 5, 4, 3, 7],
 [6, 3, 7, 4, 6, 7],
 [7, 6, 7, 4, 9, 9]])
In[11]: # sort each row of X
 np.sort(X, axis=1)
Out[11]: array([[3, 4, 6, 6, 7, 9],
 [2, 3, 4, 6, 7, 7],
 [1, 2, 4, 5, 7, 7],
 [0, 1, 4, 5, 5, 9]])
 ```
Keep in mind that this treats each row or column as an independent array, and any
relationships between the row or column values will be lost!


### **Try it yourself**: 
1. Make a 2D Array then sort the values on the vertical axis but not on the horizontal axis
2. show the indexes using `argsort`

In [None]:
#your code here


## Partial Sorts: Partitioning

Sometimes we’re not interested in sorting the entire array, but simply want to find the
K smallest values in the array. NumPy provides this in the np.partition function.
np.partition takes an array and a number K; the result is a new array with the small‐
est K values to the left of the partition, and the remaining values to the right, in arbi‐
trary order:
```python
In[12]: x = np.array([7, 2, 3, 1, 6, 5, 4])
            np.partition(x, 3)
Out[12]: array([2, 1, 3, 4, 6, 5, 7])
```
Note that the first three values in the resulting array are the three smallest in the
array, and the remaining array positions contain the remaining values. Within the
two partitions, the elements have arbitrary order.
Similarly to sorting, we can partition along an arbitrary axis of a multidimensional
array:
```python
In[13]: np.partition(X, 2, axis=1)
Out[13]: array([[3, 4, 6, 7, 6, 9],
                [2, 3, 4, 7, 6, 7],
                [1, 2, 4, 5, 7, 7],
                [0, 1, 4, 5, 9, 5]])
 ```
The result is an array where the first two slots in each row contain the smallest values
from that row, with the remaining values filling the remaining slots.
Finally, just as there is a np.argsort that computes indices of the sort, there is a
np.argpartition that computes indices of the partition. We’ll see this in action in the
following section.

### Try it yourself: Create a 2D Array and get a partial sort where the first 2 rows are your smallest numbers then the rest an arbitary sort using the `partition` method

In [38]:
# your code here