## Filtering data
It's also possible to select elements (filter) based on a condition. 

In [1]:
import numpy as np
data = np.loadtxt(fname='../data/inflammation-01.csv', delimiter=',')

Sometimes we may want to select array elements based on their values. For this case boolean mask is very useful. The mask is an array of the same length as the indexed array containg only False or True values:

In [2]:
a = np.arange(4)
print(a)

[0 1 2 3]


In [3]:
mask = np.array([False, True, True, False])

In [4]:
a[mask]

array([1, 2])

In most cases the mask is constructed from the values of the array itself. For example, to select only odd numbers we could use the following mask:

In [5]:
odd = (a % 2) == 1
odd

array([False,  True, False,  True], dtype=bool)

In [6]:
np.array([False,  True, False,  True], dtype=bool)
a[odd]

array([1, 3])

This could be also done in a single step:

In [7]:
a[(a % 2) == 1]

array([1, 3])

## $\color{green}{\text{Excercise}}$ Filtering data
In the `data` what do you have to do to select all measurments above 10 in the first patient (index 0)?

In [8]:
patient0_data = data[0, :]
patient0_data[patient0_data>10]

array([ 12.,  18.,  13.,  11.,  11.])

We can also substitute the measurement with a new value:

In [9]:
patient1_data = data[1, :]
patient1_data[patient1_data>10] = 10
print(patient1_data)

[  0.   1.   2.   1.   2.   1.   3.   2.   2.   6.  10.  10.   5.   9.   4.
   4.   7.  10.   8.   6.  10.   4.  10.   5.  10.   7.  10.   5.  10.   3.
   3.   5.   4.   4.   5.   5.   1.   1.   0.   1.]


## Copy and view

When the contents are physically stored in another location, it is called __Copy__. If on the other hand, a different view of the same memory content is provided, we call it as __View__.

__view__ : it is another way of viewing the data of the array
 
__copy__ : it does not share allocated memory with the original array

## $\color{green}{\text{Excercise}}$ view or a copy
Is `patient_data` view or a copy of `data`?

In [11]:
print(data[2, :])
patient2_data = data[2, :]
patient2_data[patient2_data>10] = 10
print(patient2_data)
print(data[2,:])

[  0.   1.   1.   3.   3.   2.   6.   2.   5.   9.   5.   7.   4.   5.   4.
  10.   5.  10.   9.  10.  10.  10.  10.  10.   7.  10.  10.   7.   4.   2.
  10.   5.   4.   2.   2.   3.   2.   2.   1.   1.]
[  0.   1.   1.   3.   3.   2.   6.   2.   5.   9.   5.   7.   4.   5.   4.
  10.   5.  10.   9.  10.  10.  10.  10.  10.   7.  10.  10.   7.   4.   2.
  10.   5.   4.   2.   2.   3.   2.   2.   1.   1.]
[  0.   1.   1.   3.   3.   2.   6.   2.   5.   9.   5.   7.   4.   5.   4.
  10.   5.  10.   9.  10.  10.  10.  10.  10.   7.  10.  10.   7.   4.   2.
  10.   5.   4.   2.   2.   3.   2.   2.   1.   1.]


## $\color{green}{\text{Quiz}}$  view or copy
What are the final values of a and b at the end of the following program? Explain why.

`a = np.arange(5)
b = a[a < 3]
b[::2] = 0`


a)  a = [0, 1, 2, 3, 4], b = [0, 1, 2] <br>
b)  a = [0, 1, 0, 3, 4], b = [0, 1, 0] <br>
c)  a = [0, 0, 2, 3, 4], b = [0, 0, 2] <br>
d)  a = [0, 1, 2, 3, 4], b = [0, 1, 0] <br>
e)  a = [0, 1, 2, 3, 4], b = [0, 1, 0, 3, 0] <br>m

[Previous: Working with a dataset](dataset_intro.ipynb)<br>[Next: Slices](slices.ipynb)