<a href="https://colab.research.google.com/github/gauravml/NUMPY/blob/main/Numpy3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Indexing on Numpy

## Prep

Let's recall how to access data in lists. For that we will leverage the dtaset for this course, whch is the results of an actual beheivoural experiment conducted by Universidad de la Matanza (UNLaM)

In [1]:
%%writefile get_data.sh
if [ ! -f dataset.csv ]; then
  wget -O dataset.csv https://www.dropbox.com/s/9t5lc04vxwvjvo6/dataset.csv?dl=0
fi

Writing get_data.sh


In [2]:
!bash get_data.sh


--2025-05-06 20:47:30--  https://www.dropbox.com/s/9t5lc04vxwvjvo6/dataset.csv?dl=0
Resolving www.dropbox.com (www.dropbox.com)... 162.125.5.18, 2620:100:601d:18::a27d:512
Connecting to www.dropbox.com (www.dropbox.com)|162.125.5.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://www.dropbox.com/scl/fi/vdpumrhhu5yhwhgmzwday/dataset.csv?rlkey=ezm9dpl2wbkd1tzxqrn2rzrrn&dl=0 [following]
--2025-05-06 20:47:31--  https://www.dropbox.com/scl/fi/vdpumrhhu5yhwhgmzwday/dataset.csv?rlkey=ezm9dpl2wbkd1tzxqrn2rzrrn&dl=0
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uccd7e3184fe9441d600dfe2bf87.dl.dropboxusercontent.com/cd/0/inline/CpJhOE2SIJfyHDczeBJnF1Rv9xy7avpe5fNGxtk86Fn5fbw08bwHJVdydqUnnw0TjxDo1ZDTS7xA0GrfrS6Jv7MAv5BHZuazrroYKDLx55VdCzkjvLOAjiq12fUie5Gx8c26baVrc3MPe7sS83u2jtkR/file# [following]
--2025-05-06 20:47:31--  https://uccd7e3184fe9441d600dfe2bf87.dl.dropboxusercontent.com/c

In [3]:
import numpy as np

In [4]:
numpy_arr = np.genfromtxt('dataset.csv', delimiter=',')

## Indexing with tuples

Although not commonly known, one can index with tuples and arrays on another numpy array. Let's see it.

In [5]:
numpy_arr

array([[  0.  ,   0.  ,   2.  , ...,   3.  ,   2.67,   1.67],
       [  1.  ,   0.  ,   4.  , ...,   2.  ,   2.33,  -0.33],
       [  2.  ,   0.  ,   3.  , ...,   4.  ,   4.  ,   2.33],
       ...,
       [196.  ,   2.  ,    nan, ...,    nan,    nan,    nan],
       [197.  ,   2.  ,    nan, ...,    nan,    nan,    nan],
       [198.  ,   2.  ,    nan, ...,    nan,    nan,    nan]])

In [6]:
indexer_tuple = (1,2)
print(type(indexer_tuple))

<class 'tuple'>


In [7]:
numpy_arr[indexer_tuple]

np.float64(4.0)

In [8]:
numpy_arr[1,2]

np.float64(4.0)

## Indexing with arrays

But we can also input numpy arrays!

In [9]:
indexer_array = np.array([3,4])

In [10]:
type(indexer_array)

numpy.ndarray

In [11]:
numpy_arr[indexer_array]

array([[3.  , 0.  , 2.  , 0.  , 2.  , 1.33, 5.  , 4.  , 5.  , 4.67, 3.33],
       [4.  , 0.  , 3.  , 1.  , 3.  , 2.33, 4.  , 4.  , 5.  , 4.33, 2.  ]])

In [12]:
numpy_arr[3,4]

np.float64(2.0)

### What happpened here??? Think about it!

Using complex objects as indexers trigger **Advanced indexing** where `arr[[1,2,3]] =! arr[1,2,3]` but `arr[[1,2,3],]` which resolves to:



```
# All being numpy arrays, not lists
arr[[1,2,3]] = array([arr[1], arr[2], arr[3]])
```

Let's test it out!


In [13]:
numpy_arr[np.array([2,3])] == np.array([numpy_arr[2], numpy_arr[3]])

array([[ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True]])

Indeed, it is what we expected. Watch out for this! It may look weird but it allows powerful queries on our numpy arrays.

## Boolean indexing

In [14]:
small_arr = numpy_arr[:3, :3]
small_arr


array([[0., 0., 2.],
       [1., 0., 4.],
       [2., 0., 3.]])

In [21]:
index = np.where(small_arr > 2)
print(index)

(array([1, 2]), array([2, 2]))


Here we are capturing on every axis if the condition is met, so we know we should get rows 2 and 3 (index 1 and 2), as well as columns 3. The result is:

In [16]:
small_arr[index]

array([4., 3.])

As expected (I hope!) the slice of the third column and final two rows. But we can also use this to delete values

In [22]:
np.delete(small_arr, index)

array([0., 1., 0., 4., 2., 0., 3.])

**Can you guess what operation happened here?**

In [23]:
# We can also use Boolean arrays instead of using indices

index = small_arr > 2
print(index)
new_arr = small_arr[index]

[[False False False]
 [False False  True]
 [False False  True]]


In [24]:
new_arr

array([4., 3.])

We can also use Boolean indexing for treatment of NaN values (not a number)



In [25]:
# Notice we have some NaN
np.isnan(numpy_arr)

array([[False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False,  True, ...,  True,  True,  True],
       [False, False,  True, ...,  True,  True,  True],
       [False, False,  True, ...,  True,  True,  True]])

Convert all NaN to 0:

In [26]:
numpy_arr[np.isnan(numpy_arr)] = 0

In [27]:
numpy_arr

array([[  0.  ,   0.  ,   2.  , ...,   3.  ,   2.67,   1.67],
       [  1.  ,   0.  ,   4.  , ...,   2.  ,   2.33,  -0.33],
       [  2.  ,   0.  ,   3.  , ...,   4.  ,   4.  ,   2.33],
       ...,
       [196.  ,   2.  ,   0.  , ...,   0.  ,   0.  ,   0.  ],
       [197.  ,   2.  ,   0.  , ...,   0.  ,   0.  ,   0.  ],
       [198.  ,   2.  ,   0.  , ...,   0.  ,   0.  ,   0.  ]])

In [28]:
np.isnan(numpy_arr).any()  # -> Returning False means there are no NaNs

np.False_

Of course if we use multiple boolean masks the Boolean logic is preserved.

## Combining it all

Finally, we can combine Boolean indexing, slicing and strides to create powerful indexers. Let's look at a simple example:

In [29]:
arr = np.arange(50).reshape(5,10)
print(f'The shape of arr is {arr.shape}')
arr

The shape of arr is (5, 10)


array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])

In [30]:
bool_mask = ( 15 < arr ) & ( arr < 35 )
bool_mask

array([[False, False, False, False, False, False, False, False, False,
        False],
       [False, False, False, False, False, False,  True,  True,  True,
         True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True],
       [ True,  True,  True,  True,  True, False, False, False, False,
        False],
       [False, False, False, False, False, False, False, False, False,
        False]])

In [31]:
arr[bool_mask[:, 5],2:8:2]

array([[22, 24, 26]])

Here we:



1.   Created a boolean mask of all the elements more than 15 and less than 35
2.   Then sliced that boolean mask getting all the rows but only the index 5 -> 6th column. The output of this slice is the `ndarray` `[False, False, True, False, False]`
3.   So, on the rows we will only get the third row -> `[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]`
4.   Over that result we wil slice with striding the following way: `2:8:2` so we start on the third element and go every two until the 9th element.
5.   The output then is `[22, 24, 26]` (we don't include `28` since the end of the slice is not inclusive)

