<a href="https://colab.research.google.com/github/axel-sirota/operations-arrays-numpy/blob/main/module5/OperationsNumpy_Mod5Demo1_Sort%26Search.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Arithmetic on Numpy

## Prep

Let's recall how to access data in lists. For that we will leverage the dtaset for this course, whch is the results of an actual beheivoural experiment conducted by Universidad de la Matanza (UNLaM)

In [18]:
%%writefile get_data.sh
if [ ! -f dataset.csv ]; then
  wget -O dataset.csv https://www.dropbox.com/s/9t5lc04vxwvjvo6/dataset.csv?dl=0
fi

Overwriting get_data.sh


In [19]:
!bash get_data.sh


In [20]:
import numpy as np

In [21]:
numpy_arr = np.genfromtxt('dataset.csv', delimiter=',')

## Basic Sorting in NumPy

There is one basic method for sorting a NumPy array, and a couple of auxiliary methods. Let's review them.

The basic method is `np.sort(arr)`

In [22]:
a = numpy_arr[3, :11]
a

array([3.  , 0.  , 2.  , 0.  , 2.  , 1.33, 5.  , 4.  , 5.  , 4.67, 3.33])

In [23]:
np.sort(a)

array([0.  , 0.  , 1.33, 2.  , 2.  , 3.  , 3.33, 4.  , 4.67, 5.  , 5.  ])

Notice the end result is sorted (in numerical order). If we provide a hiher dimensional array let's see what happens

In [24]:
b = numpy_arr[1:3, :11]
b

array([[ 1.  ,  0.  ,  4.  ,  0.  ,  4.  ,  2.67,  3.  ,  2.  ,  2.  ,
         2.33, -0.33],
       [ 2.  ,  0.  ,  3.  ,  0.  ,  2.  ,  1.67,  4.  ,  4.  ,  4.  ,
         4.  ,  2.33]])

In [25]:
np.sort(b)

array([[-0.33,  0.  ,  0.  ,  1.  ,  2.  ,  2.  ,  2.33,  2.67,  3.  ,
         4.  ,  4.  ],
       [ 0.  ,  0.  ,  1.67,  2.  ,  2.  ,  2.33,  3.  ,  4.  ,  4.  ,
         4.  ,  4.  ]])

So, running `np.sort` on a numpy array applies the sort function per row, that means it applies over columns (which is the last axis) 

In [26]:
np.sort(b, axis=0)

array([[ 1.  ,  0.  ,  3.  ,  0.  ,  2.  ,  1.67,  3.  ,  2.  ,  2.  ,
         2.33, -0.33],
       [ 2.  ,  0.  ,  4.  ,  0.  ,  4.  ,  2.67,  4.  ,  4.  ,  4.  ,
         4.  ,  2.33]])

As expected then, if we change the axis to apply oer column over all rows, we get a different array.

**Lesson: np.sort applies over the last dimension by default**

Let's see what happens if we change the resulting array or not 

In [27]:
a = numpy_arr[2, :11]
a

array([2.  , 0.  , 3.  , 0.  , 2.  , 1.67, 4.  , 4.  , 4.  , 4.  , 2.33])

In [28]:
b = np.sort(a)
b

array([0.  , 0.  , 1.67, 2.  , 2.  , 2.33, 3.  , 4.  , 4.  , 4.  , 4.  ])

In [29]:
b[2] = -100
b

array([   0.  ,    0.  , -100.  ,    2.  ,    2.  ,    2.33,    3.  ,
          4.  ,    4.  ,    4.  ,    4.  ])

In [30]:
a

array([2.  , 0.  , 3.  , 0.  , 2.  , 1.67, 4.  , 4.  , 4.  , 4.  , 2.33])

Notice that a did not change!

**Lesson: `np.sort()` returns a copy**

In [31]:
a.sort()
a

array([0.  , 0.  , 1.67, 2.  , 2.  , 2.33, 3.  , 4.  , 4.  , 4.  , 4.  ])

In [32]:
numpy_arr[2,:]

array([0.  , 0.  , 1.67, 2.  , 2.  , 2.33, 3.  , 4.  , 4.  , 4.  , 4.  ])

However `arr.sort()` does modify the array inplace. BEWARE of this.

**Lesson: arr.sort() method modifies the numpy array inplace, along with all it's views.**

Finally, one may want the indices to sort, instead of actually sorting. In that case you can use `np.argsort`

In [34]:
a = numpy_arr[1, :]
a

array([ 1.  ,  0.  ,  4.  ,  0.  ,  4.  ,  2.67,  3.  ,  2.  ,  2.  ,
        2.33, -0.33])

In [35]:
np.argsort(a)

array([10,  1,  3,  0,  7,  8,  9,  5,  6,  2,  4])

Of course, getting those values in that order returns in the sorted array

In [36]:
a[np.argsort(a)]

array([-0.33,  0.  ,  0.  ,  1.  ,  2.  ,  2.  ,  2.33,  2.67,  3.  ,
        4.  ,  4.  ])

## Searching NumPy Arrays

Actualy we have seen this already, but let's take it as a refresher. To search for an element in an array we normally use `indexers`. A way to create indexers are via the `where` statement

In [37]:
a

array([ 1.  ,  0.  ,  4.  ,  0.  ,  4.  ,  2.67,  3.  ,  2.  ,  2.  ,
        2.33, -0.33])

In [38]:
index = np.where(a == 4)
index

(array([2, 4]),)

This means that on index 2 and 4 (third and fifth position) we will have a 4, and indeed it is. To retrieve the values...

In [39]:
a[index]

array([4., 4.])

In more dimensions:

In [41]:
a = numpy_arr[2:5, :]
a

array([[0.  , 0.  , 1.67, 2.  , 2.  , 2.33, 3.  , 4.  , 4.  , 4.  , 4.  ],
       [3.  , 0.  , 2.  , 0.  , 2.  , 1.33, 5.  , 4.  , 5.  , 4.67, 3.33],
       [4.  , 0.  , 3.  , 1.  , 3.  , 2.33, 4.  , 4.  , 5.  , 4.33, 2.  ]])

In [42]:
np.where(a < 4)

(array([0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]),
 array([ 0,  1,  2,  3,  4,  5,  6,  0,  1,  2,  3,  4,  5, 10,  1,  2,  3,
         4,  5, 10]))

Here it may look weird so let's explain the full structure of the indexer. On the first element of the tuple refers to the index on the first axis. Having 7 0s mean that 7 elements of the first row comply with the condition. The question is which ones?

For that we go to the second tuple, that gives the indices. As an example the we can find the following element:

```
(1,10)
```

in our indexer tuple (align the arrays as columns and get the row). Indeed if we check:

```
a[1,10] = 3.33 < 4
```

In [43]:
a[np.where(a < 4)]

array([0.  , 0.  , 1.67, 2.  , 2.  , 2.33, 3.  , 3.  , 0.  , 2.  , 0.  ,
       2.  , 1.33, 3.33, 0.  , 3.  , 1.  , 3.  , 2.33, 2.  ])

Evaluating the full indexer returns a 1D array of all elements that comply. But what if we want to keep the structure? Then we need to pass an element too fill

In [46]:
np.where(a < 4, a, -100)

array([[   0.  ,    0.  ,    1.67,    2.  ,    2.  ,    2.33,    3.  ,
        -100.  , -100.  , -100.  , -100.  ],
       [   3.  ,    0.  ,    2.  ,    0.  ,    2.  ,    1.33, -100.  ,
        -100.  , -100.  , -100.  ,    3.33],
       [-100.  ,    0.  ,    3.  ,    1.  ,    3.  ,    2.33, -100.  ,
        -100.  , -100.  , -100.  ,    2.  ]])

In [47]:
a

array([[0.  , 0.  , 1.67, 2.  , 2.  , 2.33, 3.  , 4.  , 4.  , 4.  , 4.  ],
       [3.  , 0.  , 2.  , 0.  , 2.  , 1.33, 5.  , 4.  , 5.  , 4.67, 3.33],
       [4.  , 0.  , 3.  , 1.  , 3.  , 2.33, 4.  , 4.  , 5.  , 4.33, 2.  ]])

Notice that passing the broadcaster we got *a copy* swaping the elements that don't match a condition with a sentinel value 

Note that, one can pass to the where statement any expression that evaluates to `True` or `False`

In [49]:
a = numpy_arr[4, :]
a

array([4.  , 0.  , 3.  , 1.  , 3.  , 2.33, 4.  , 4.  , 5.  , 4.33, 2.  ])

In [50]:
np.where(a < 4)

(array([ 1,  2,  3,  4,  5, 10]),)

In [51]:
def less4(x):
  print(f'I got called to evaluate {x}')
  return x < 4

In [52]:
np.where(less4(a))

I got called to evaluate [4.   0.   3.   1.   3.   2.33 4.   4.   5.   4.33 2.  ]


(array([ 1,  2,  3,  4,  5, 10]),)

To end search, if we want to find the maximum of an array we can do so without sorting it (which is memory inefficient)

In [54]:
a

array([4.  , 0.  , 3.  , 1.  , 3.  , 2.33, 4.  , 4.  , 5.  , 4.33, 2.  ])

In [55]:
np.argmax(a)

8

In [56]:
a[np.argmax(a)]

5.0

Indeed, 5 is the maximum.