# Boolean Indexing, Set Operations, and Sorting

How do you handle situations where you don't know the indices of the elemets you want to select in an ndarray?

In [2]:
import numpy as np

Supposing we have a 10000 x 10000 array of random integers ranging from 1 to 15000, and we only want to select integers that are less than 20, boolean indexing can help us, by enabling us select elements using logical arguments instead of explicit indices.

In [3]:
X = np.arange(25).reshape(5,5)
print(X)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]


In [4]:
# Select the elements greater than 10
z = X[X>10]
print(z, z.shape)

[11 12 13 14 15 16 17 18 19 20 21 22 23 24] (14,)


In [5]:
# Select the elements less or equal to 7
z = X[X <= 7]
print(z) 

[0 1 2 3 4 5 6 7]


In [None]:
# Select the elements greater than 10 and less than 17
z = X[(X > 10) & (X < 17)] #brackets are very important here.
print(z)

[11 12 13 14 15 16]


We can use boolean indexing to assign.

In [9]:
# all elements satisfying 10<el<17 should be reassigned to -1
X[(X > 10) & (X < 17)] = -1
print(X)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 -1 -1 -1 -1]
 [-1 -1 17 18 19]
 [20 21 22 23 24]]


## Set Operations

Useful when comparing 2 numpy arrays, for example, to find common elements.

In [14]:
x = np.array([1,2,3,4,5])
y = np.array([6,7,2,8,4])

### Intersection, Difference, Union

In [15]:
# Intersection
print(np.intersect1d(x, y))

[2 4]


In [16]:
# Difference
print(np.setdiff1d(x,y))

[1 3 5]


In [17]:
# Union
print(np.union1d(x,y))

[1 2 3 4 5 6 7 8]


# Sorting Numpy Arrays

Using sort as a function, the original array is not change (sorting out of place). But when used as a method, sort affects the original array (sorting in place).

In [4]:
x = np.random.randint(1,11, size=(10,))
print(x)

[6 3 6 2 1 3 4 6 2 8]


In [8]:
# sort as a function
print(np.sort(x))
print(x) # the original array is not changed, though it's sorted

[1 2 2 3 3 4 6 6 6 8]
[6 3 6 2 1 3 4 6 2 8]


In [9]:
# sorting with uniqueness
print(np.sort(np.unique(x)))

[1 2 3 4 6 8]


In [11]:
# Sorting as a Method
print(x)

[6 3 6 2 1 3 4 6 2 8]


In [None]:
x.sort()
print(x) # the original x is now sorted and printed.

[1 2 2 3 3 4 6 6 6 8]


The function sort returns a new sorted array, the method sort returns None, but sorts the original array

In [13]:
print(x.sort())

None


Sorting rank 2 arrays, we need to tell numpy to sort by rows or columns and doing this, we require axis command

In [14]:
X = np.random.randint(1,11, size=(5,5))
print(X)

[[ 8  3 10  2  9]
 [ 7  8  9  7  9]
 [ 6  3  8  6  4]
 [ 8 10  7  6  2]
 [ 1  9  9  3  6]]


In [16]:
# Sort by rows
print(np.sort(X, axis=0))

[[ 1  3  7  2  2]
 [ 6  3  8  3  4]
 [ 7  8  9  6  6]
 [ 8  9  9  6  9]
 [ 8 10 10  7  9]]


In [15]:
# Sort by columns
print(np.sort(X, axis=1))

[[ 2  3  8  9 10]
 [ 7  7  8  9  9]
 [ 3  4  6  6  8]
 [ 2  6  7  8 10]
 [ 1  3  6  9  9]]
