In [1]:
import numpy as np

ModuleNotFoundError: No module named 'numpy'

# Filtering Data With Logical Indexing

Sometimes you want to remove certain values from your dataset.  In Numpy, this can be done with **Logical Indexing**, and in normal Python this is done with an **If Statement**

### Step 1: Create a Logical Numpy Array

We can convert all of the values in an array at once with a single logical expression.  This is broadcasting, the same as is done with the math operations we saw earlier:

```python
>>> data = np.array([1, 2, 3, 4, 5])
>>> data < 3
[True, True, False, False, False]
```

**Exercises**: Make arrays of True/False values that answer the following questions about the dataset below for each element.

In [3]:
import numpy as np

list_of_values = [3, 7, 10, 2, 1, 7, np.nan, 20, -5]
data = np.array(list_of_values)

1. Which values are greater than zero?

2. Which values are equal to 7?

3. Which values are greater or equal to 7?

4. Which values are not equal to 7?

## Step 2: Filter with Logical Indexing

If an array of True/False values is used to *index* another array, and both arrays are the same size, it will return all of the values that correspond to the True values of the indexing array:

```python
>>> data = np.array([1, 2, 3, 4, 5])
>>> is_big = data > 3
>>> is_big
[False, False, False, True, True]

>>> data[is_big]
[4, 5]
```


**Exercises**:  Using the data below, extract only the values that corresspond to each question

In [None]:
data = np.array([3, 1, -6, 8, 20, 2, np.nan, 7, 1, np.nan, 9, 7, 7, -7])
data

array([ 3.,  1., -6.,  8., 20.,  2., nan,  7.,  1., nan,  9.,  7.,  7.,
       -7.])

1. The values that are less than 0

2. The values that are greater than 3

4. The values not equal to 7

  5. The values equal to 20

The values that are not missing

### Step 2.5: Combine Step 1 and Step 2 into a single line

Both steps can be done in a single expression.  Sometimes this can make things clearer!


```python
>>> data = np.array([1, 2, 3, 4, 5])
>>> data[data > 3]
[4, 5]
```



**Exercises**: Do the same as in the previous section, this time in a single line.

In [None]:
data = np.array([3, 1, -6, np.nan, 8, 20, 2, 7, np.nan, 1, 9, 7, 7, -7])
data

array([ 3.,  1., -6., nan,  8., 20.,  2.,  7., nan,  1.,  9.,  7.,  7.,
       -7.])

The values that are less than 0

The values that are greater than 3

The values equal to 7  (will be an array of sevens)

The values not equal to 7

The values equal to 20

The values not missing

### Statistics on Filtered Data

Using the following dataset, have Python to calculate the answers to the questions below:

In [None]:
data = np.array([3, 1, -6, 8, 20, 2, 7, 1, 9, 7, 7, -7])
data

array([ 3,  1, -6,  8, 20,  2,  7,  1,  9,  7,  7, -7])

How many values are greater than 4 in this dataset?  
Useful function: `len([2, 3, 4])`

How many values are equal to 7 in this dataset?

What is the mean value of the positive numbers in this dataset?

What is the mean value of the negative numbers in this dataset?

What is the median value of the values in this dataset that are greater than 5?

How many missing values are in this dataset?

What proportion of the values in this dataset are positive?

What proportion of the values in this dataset are less than or equal to 8?

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=d703678d-f12c-4453-a422-685f3ee4b709' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>