# NumPy Indexing and Selection

The ndarray object follows a zero-based index and its items can be accessed or modified using field access, basic slicing or advanced indexation. The general syntax is:
 x[obj]

In this notebook we will discuss how to select elements or groups of elements from an array.

In [2]:
import numpy as np

In [3]:
#Creating sample array
arr = np.arange(0,11)

In [3]:
#Show
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

## Bracket Indexing and Selection (Basic Slicing)
The simplest way to pick one or some elements of an array looks very similar to python lists:

In [4]:
#Get a value at an index
arr[8]

8

In [5]:
#Get values in a range
arr[1:5]

array([1, 2, 3, 4])

In [6]:
#Get values in a range
arr[0:5]

array([0, 1, 2, 3, 4])

In [7]:
#Step indices
arr[1:6:2]

array([1, 3, 5])

In [4]:
arr[-1:-5:-1]

array([10,  9,  8,  7])

## Broadcasting

Numpy arrays differ from a normal Python list because of their ability to broadcast:

In [8]:
#Setting a value with index range (Broadcasting)
arr[0:5]=100

#Show
arr

array([100, 100, 100, 100, 100,   5,   6,   7,   8,   9,  10])

In [9]:
# Reset array, we'll see why I had to reset in a moment
arr = np.arange(0,11)

#Show
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [10]:
#Important notes on Slices
slice_of_arr = arr[0:6]

#Show slice
slice_of_arr

array([0, 1, 2, 3, 4, 5])

In [11]:
#Change Slice
slice_of_arr[:]=99

#Show Slice again
slice_of_arr

array([99, 99, 99, 99, 99, 99])

Now note the changes also occur in our original array!

In [12]:
arr

array([99, 99, 99, 99, 99, 99,  6,  7,  8,  9, 10])

Data is not copied, it's a view of the original array! This avoids memory problems!

In [13]:
#To get a copy, need to be explicit
arr_copy = arr.copy()

arr_copy

array([99, 99, 99, 99, 99, 99,  6,  7,  8,  9, 10])

## Indexing a 2D array (matrices)

The general format is **arr_2d[row][col]** or **arr_2d[row,col]**.

In [14]:
arr_2d = np.array(([5,10,15],[20,25,30],[35,40,45]))

#Show
arr_2d

array([[ 5, 10, 15],
       [20, 25, 30],
       [35, 40, 45]])

In [18]:
#Indexing row
arr_2d[1]

array([20, 25, 30])

In [16]:
# Format is arr_2d[row][col] or arr_2d[row,col]

# Getting individual element value
arr_2d[1][0]

20

In [17]:
# Getting individual element value
arr_2d[1,0]

20

In [30]:
arr_2d

array([[ 5, 10, 15],
       [20, 25, 30],
       [35, 40, 45]])

In [29]:
# 2D array slicing

#Shape (2,2) from top right corner
arr_2d[:2,1:]

array([[10, 15],
       [25, 30]])

In [34]:
#Shape bottom row
arr_2d[2]

array([35, 40, 45])

In [20]:
#Shape bottom row
arr_2d[2,:]

array([35, 40, 45])

### Fancy Indexing

Fancy indexing allows you to select entire rows or columns out of order,to show this, let's quickly build out a numpy array:

In [36]:
#Set up matrix
arr2d = np.zeros((10,10))

In [37]:
arr2d

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

In [38]:
#Length of array
arr_length = arr2d.shape[1]

In [23]:
#Set up array

for i in range(arr_length):
    arr2d[i] = i
    
arr2d

array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.],
       [ 4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.],
       [ 5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.],
       [ 6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.],
       [ 7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.],
       [ 8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.],
       [ 9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.]])

Fancy indexing allows the following

In [24]:
arr2d[[2,4,6,8]]

array([[ 2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.],
       [ 4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.],
       [ 6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.],
       [ 8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.]])

In [25]:
#Allows in any order
arr2d[[6,4,2,7]]

array([[ 6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.],
       [ 4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.],
       [ 2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.],
       [ 7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.]])

In [55]:
## Advanced integer indexation
x = np.array([[1,2,np.nan],
              [np.nan,5,6],
              [7,np.nan,9]])
      
print(x)
print(' ')

print('removing NaN: ')
print(x[~np.isnan(x)])
print(' ')

print('Replacing NaN with 10: ' )
x[np.isnan(x)] = 10
print(x)
print(' ')

[[ 1.  2. nan]
 [nan  5.  6.]
 [ 7. nan  9.]]
 
removing NaN: 
[1. 2. 5. 6. 7. 9.]
 
Replacing NaN with 10: 
[[ 1.  2. 10.]
 [10.  5.  6.]
 [ 7. 10.  9.]]
 


## Selection

Let's briefly go over how to use brackets for selection based off of comparison operators.

In [28]:
arr = np.arange(1,11)
arr

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [30]:
arr > 4

array([False, False, False, False,  True,  True,  True,  True,  True,  True], dtype=bool)

In [58]:
arr[arr > 4]

array([ 5,  6,  7,  8,  9, 10])

In [59]:
arr[arr>2]

array([ 3,  4,  5,  6,  7,  8,  9, 10])

In [60]:
x = 2
arr[arr>x]

array([ 3,  4,  5,  6,  7,  8,  9, 10])

## np.where()
It can be used as an if-else statement. Below is thr syntax:

numpy.where(condition[, x, y])

Return elements chosen from x or y depending on condition.

In [86]:
a = np.arange(10)
print("Original array: ",a)
np.where(a < 5, a, 10*a)

Original array:  [0 1 2 3 4 5 6 7 8 9]


array([ 0,  1,  2,  3,  4, 50, 60, 70, 80, 90])

### Using np.where() with a pandas dataframe

In [71]:
import pandas as pd
data = pd.DataFrame({'col1':[1,2,3,4,5], "col2":[2,3,3,4,5]})

### Create a flag when values in col1 and col2 are equal 

#### Method 1: Using if-else statement

In [81]:
flag = []
for i in range(len(data)):
    if data['col1'][i] == data['col2'][i]:
        flag.append(1)
    else:
        flag.append(0)
% time data['Flag'] = flag

Wall time: 4.36 ms


In [82]:
data

Unnamed: 0,col1,col2,Flag
0,1,2,0
1,2,3,0
2,3,3,1
3,4,4,1
4,5,5,1


#### Method 2: Using np.where()

In [79]:
data = pd.DataFrame({'col1':[1,2,3,4,5], "col2":[2,3,3,4,5]})
data

Unnamed: 0,col1,col2
0,1,2
1,2,3
2,3,3
3,4,4
4,5,5


In [83]:
%time data['Flag'] = np.where(data['col1'] == data['col2'], 1, 0)

Wall time: 1.76 ms


In [84]:
data

Unnamed: 0,col1,col2,Flag
0,1,2,0
1,2,3,0
2,3,3,1
3,4,4,1
4,5,5,1
