# Conditional Selection on Arrays

In [1]:
import numpy as np
import pandas as pd

Often when dealing with arrays, we sometimes want to select elements from them based on a condition. For instance, suppose we only wanted to select elements that were larger than a number or some criterion. We can do this utilizing comparison operators or in other words Booleans. We can start with an example:

Suppose that we had daily temperature data for Vancouver and wanted to look at the **maximum** temperatures for each day. We first load the data in using `Pandas` (another library that we'll cover in the next section).

In [2]:
# 2019 daily max temperatures
url = "https://raw.githubusercontent.com/kailu3/105-book/master/data/vancouver-2019-max-temp.csv"
daily_max = pd.read_csv(url)['degC'].values

Our first step is to look at some basic summary statistics using what we learned from the last section.

In [3]:
print("Minimum temp:", np.min(daily_max))
print("Maximum temp:", np.max(daily_max))
print("Mean temp:", np.mean(daily_max))
print("Median temp:", np.mean(daily_max))
print("Variance:", np.var(daily_max))
print("Standard deviation:", np.std(daily_max))

Minimum temp: -0.5355916000000001
Maximum temp: 28.981457
Mean temp: 16.194906061111112
Median temp: 16.194906061111112
Variance: 39.22653236797662
Standard deviation: 6.263108842098836


```{note}
The summary statistics we compute are for the **maximum** daily temperatures in Vancouver in 2019

```

Now suppose we had a question, say: **How many times in 2019 was the maximum temperature below 20 degrees Celsius?**

We can first apply a boolean operator (`< 20`) to our entire array. For each element in the array, it will compute whether it is less than 20 (`True` or `False`). We can use `==`, `!=`, `<`, `>`. `<=`. `>=` (equals, not equals, less than, greater than, less than or equal to, greater than or equal to) depending on the type of question we have.

In [4]:
daily_max < 20

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True, False, False,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True, False,  True,  True,
        True,  True, False,  True,  True,  True,  True,  True,  True,
        True,  True,

Since `True` is equivalent to 1 and `False` is equal to 0, we can sum up `daily_max < 20` to get the total number of days the maximum temperature was less than 20 degress Celsius.

In [5]:
np.sum(daily_max < 20)

246

Alternatively, say we were interested in the temperatures at which the maximum temperature was less than 20 degrees Celsius. We can use the syntax `array[boolean array]` to select from our `daily_max` array.

In [6]:
daily_max[daily_max < 20]

array([ 7.5730515,  9.90463  , 10.972761 ,  8.451971 ,  7.8538175,
        8.744944 ,  7.8355064, 10.054168 ,  7.6951237,  6.9657426,
        6.617837 ,  8.6717005,  7.4296165,  6.431677 ,  6.868085 ,
        6.0471497,  6.4469357,  6.663614 ,  6.3126564,  6.2272058,
        6.1417556,  5.9220257,  5.723658 ,  5.244525 , -0.5355916,
        2.3300524,  4.835583 ,  4.8783083,  5.16823  ,  5.3727007,
        9.001296 ,  8.595406 ,  8.931104 ,  9.09285  ,  9.135575 ,
        9.7947645, 10.191499 , 10.185395 ,  9.300372 , 10.685891 ,
       11.287097 , 11.27489  , 13.008314 , 12.96864  , 11.772333 ,
       11.079574 ,  9.236284 ,  9.816127 ,  9.074539 , 10.8812065,
       10.917829 , 10.856792 , 10.444798 , 10.823222 , 11.506826 ,
       12.419316 , 13.539328 , 14.686807 , 14.970624 , 14.128326 ,
       13.624779 , 14.430455 , 14.076446 , 13.4081   , 13.591208 ,
       13.496603 , 13.4782915, 13.496603 , 12.614632 , 10.98802  ,
       11.406117 , 13.029676 , 12.507818 , 13.032728 , 13.7437

We can also combine boolean statements to use multiple conditions! Say we wanted the temperatures between 20 *and* 30 degrees Celsius.

In [7]:
daily_max[(daily_max >= 20) & (daily_max <= 30)]

array([20.070189, 20.66224 , 20.613409, 20.094603, 22.6276  , 24.174864,
       23.790337, 20.338747, 21.150526, 20.765999, 22.874796, 24.275574,
       22.54215 , 21.083387, 21.840235, 23.262375, 22.972454, 21.709007,
       22.990765, 22.719154, 23.414965, 23.45464 , 22.819864, 22.50858 ,
       21.08644 , 21.00404 , 23.024334, 25.459675, 24.501408, 24.501408,
       24.065   , 24.37018 , 21.62966 , 23.027386, 23.86358 , 24.394594,
       23.765923, 23.784233, 21.840235, 21.840235, 22.935833, 22.938885,
       24.09857 , 22.941936, 21.547262, 23.74456 , 23.689629, 23.775078,
       25.026318, 24.007015, 24.333559, 24.6601  , 24.211487, 24.919504,
       25.413897, 25.77401 , 25.97543 , 27.327377, 28.981457, 28.371096,
       25.46883 , 23.976498, 23.74456 , 24.092466, 23.125044, 20.067137,
       23.158613, 22.587927, 20.558477, 23.128096, 26.811623, 26.634619,
       25.123976, 24.721138, 27.547108, 27.602041, 27.653921, 25.951015,
       25.783165, 26.851297, 27.513538, 27.013042, 

```{note}
`|` represents "or"

`&` represents "and"
```

Alternatively, say we wanted the temperatures greater than 20 degrees Celsius *or* less than 10 degrees Celsius.

In [8]:
daily_max[(daily_max > 20) | (daily_max < 10)]

array([ 7.5730515,  9.90463  ,  8.451971 ,  7.8538175,  8.744944 ,
        7.8355064,  7.6951237,  6.9657426,  6.617837 ,  8.6717005,
        7.4296165,  6.431677 ,  6.868085 ,  6.0471497,  6.4469357,
        6.663614 ,  6.3126564,  6.2272058,  6.1417556,  5.9220257,
        5.723658 ,  5.244525 , -0.5355916,  2.3300524,  4.835583 ,
        4.8783083,  5.16823  ,  5.3727007,  9.001296 ,  8.595406 ,
        8.931104 ,  9.09285  ,  9.135575 ,  9.7947645,  9.300372 ,
        9.236284 ,  9.816127 ,  9.074539 , 20.070189 , 20.66224  ,
       20.613409 , 20.094603 , 22.6276   , 24.174864 , 23.790337 ,
       20.338747 , 21.150526 , 20.765999 , 22.874796 , 24.275574 ,
       22.54215  , 21.083387 , 21.840235 , 23.262375 , 22.972454 ,
       21.709007 , 22.990765 , 22.719154 , 23.414965 , 23.45464  ,
       22.819864 , 22.50858  , 21.08644  , 21.00404  , 23.024334 ,
       25.459675 , 24.501408 , 24.501408 , 24.065    , 24.37018  ,
       21.62966  , 23.027386 , 23.86358  , 24.394594 , 23.7659