# NumPy

https://www.w3schools.com/python/numpy/default.asp

## NumPy Tutorial

### Intro

What is NumPy?
- NumPy is a Python library used for working with arrays.
- It also has functions for working in domain of linear algebra, fourier transform, and matrices.
- NumPy was created in 2005 by Travis Oliphant. It is an open source project and you can use it freely.
- NumPy stands for Numerical Python.

Why Use NumPy?
- In Python we have lists that serve the purpose of arrays, but they are slow to process.
- NumPy aims to provide an array object that is up to 50x faster than traditional Python lists.
- The array object in NumPy is called `ndarray`, it provides a lot of supporting functions that make working with `ndarray` very easy.
- Arrays are very frequently used in data science, where speed and resources are very important.

> Data Science: is a branch of computer science where we study how to store, use and analyze data for deriving information from it.

Why is NumPy Faster Than Lists?
- NumPy arrays are stored at one continuous place in memory unlike lists, so processes can access and manipulate them very efficiently. This behavior is called __locality of reference__ in computer science. This is the main reason why NumPy is faster than lists. 
- Also it is optimized to work with latest CPU architectures.

Which Language is NumPy written in?
NumPy is a Python library and is written partially in Python, but most of the parts that require fast computation are written in C or C++.

### Getting Started

Import NumPy

In [11]:
import numpy as np

arr = np.array([1, 2, 3, 4, 5])

print(arr)

[1 2 3 4 5]


In [12]:
# Checking NumPy version
np.__version__

'1.23.2'

### Creating Arrays

NumPy is used to work with arrays. The array object in NumPy is called `ndarray`.

To create an `ndarray`, we can pass a list, tuple or any array-like object into the `array()` method, and it will be converted into an `ndarray`:

Dimensions in Arrays
- A dimension in arrays is one level of array depth (nested arrays).
- __nested array__ are arrays that have arrays as their elements.

0-D Arrays
- 0-D arrays, or Scalars, are the elements in an array. Each value in an array is a 0-D array.

In [13]:
arr = np.array(42)

print(arr)

42


1-D Arrays
- An array that has 0-D arrays as its elements is called uni-dimensional or 1-D array.
- These are the most common and basic arrays.

2-D Arrays
- An array that has 1-D arrays as its elements is called a 2-D array.
- These are often used to represent matrix or 2nd order tensors.
- NumPy has a whole sub module dedicated towards matrix operations called `numpy.mat`

3-D arrays
- An array that has 2-D arrays (matrices) as its elements is called 3-D array.
- These are often used to represent a 3rd order tensor.

Higher Dimensional Arrays
- An array can have any number of dimensions.
- When the array is created, you can define the number of dimensions by using the `ndmin` argument.

In [14]:
arr_1d = np.array([1, 2, 3, 4, 5])
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
arr_3d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])

print(arr.ndim)
print(arr_1d.ndim)
print(arr_2d.ndim)
print(arr_3d.ndim)

arr_5d = np.array([1, 2, 3, 4], ndmin=5)
print(f'number of dimensions: {arr_5d.ndim}')

0
1
2
3
number of dimensions: 5


### Indexing

Access an array element by referring to its index number.

Access 2-D Arrays
- use comma separated integers representing the dimension and the index of the element `arr_2d[0, 1]`.

In [15]:
print(f'2nd element on 1st row of arr_2d: {arr_2d[0, 1]}')
print(f'3rd element of 2nd array of 1st array of arr_3d: {arr_3d[0, 1, 2]}')

2nd element on 1st row of arr_2d: 2
3rd element of 2nd array of 1st array of arr_3d: 6


Negative Indexing
- access an array from the end.

In [16]:
print('Last element from 2nd dim of arr_2d: ', arr_2d[1, -1])

Last element from 2nd dim of arr_2d:  6


### Slicing

Slicing arrays
- taking elements from one given index to another given index with given step .
- `[start:end] or [start:end:step]`
- includes the start index, but excludes the end index
- default values: `start=0, end=len(arr), step=1`

Negative Slicing
- Use the minus operator to refer to an index from the end

STEP
- Use the `step` value to determine the step of the slicing

In [17]:
print(f'Slice elements from index 4 to the end of the array: {arr_1d[4:]}')
print(f'Slice elements from the beginning to index 4 (not included): {arr_1d[:4]}')
print(f'Slice from the index 3 from the end to index 1 from the end: {arr_1d[-3:-1]}')
print(f'Return every other element from index 1 to index 5: {arr_1d[1:5:2]}')

# Slicing 2-D Arrays
print(f'From the second element, slice elements from index 1 to index 3 (not included): {arr_2d[1, 1:3]}')

Slice elements from index 4 to the end of the array: [5]
Slice elements from the beginning to index 4 (not included): [1 2 3 4]
Slice from the index 3 from the end to index 1 from the end: [3 4]
Return every other element from index 1 to index 5: [2 4]
From the second element, slice elements from index 1 to index 3 (not included): [5 6]


### NumPy Data Types

Data Types in Python:
- `strings`
- `integer`
- `float`
- `boolean`
- `complex` - used to represent complex numbers. e.g. 1.0 + 2.0j, 1.5 + 2.5j

Data Types in NumPy
- NumPy has some extra data types, 
- refer to data types with one character, like i for integers, u for unsigned integers etc.

Data types in NumPy and the characters used to represent them.
- `i` - integer
- `b` - boolean
- `u` - unsigned integer
- `f` - float
- `c` - complex float
- `m` - timedelta
- `M` - datetime
- `O` - object
- `S` - string
- `U` - unicode string
- `V` - fixed chunk of memory for other type ( void )

Check Data Type
- NumPy array object has a property called `dtype` that returns the data type of the array

In [18]:
print(arr.dtype, arr_1d.dtype, arr_2d.dtype)

int64 int64 int64


Creating Arrays With a Defined Data Type
- use `array()` function to create arrays, this function can take an optional argument: `dtype` that allows us to define the expected data type of the array elements:
- For `i`, `u`, `f`, `S` and `U` we can define size as well.

In [19]:
# Create an array with data type string:
arr_S = np.array([1, 2, 3, 4], dtype='S')

print(arr_S)
print(arr_S.dtype)

# Create an array with data type 4 bytes integer:
arr_i4 = np.array([1, 2, 3, 4], dtype='i4')

print(arr_i4)
print(arr_i4.dtype)

[b'1' b'2' b'3' b'4']
|S1
[1 2 3 4]
int32


If a type is given in which elements can't be casted, NumPy will raise a __ValueError__.

```python
np.array(['a', '2', '3'], dtype='i')
```

<p style="background:black">
<code style="background:black;color:white">
---------------------------------------------------------------------------  
ValueError                                Traceback (most recent call last)
<br>
Input In [10], in <cell line: 1>()
<br>
----> 1 np.array(['a', '2', '3'], dtype='i')
<br>
ValueError: invalid literal for int() with base 10: 'a'
</code>
</p>

Converting Data Type on Existing Arrays
- The best way to change the data type of an existing array, is to make a copy of the array with the `astype()` method.
- The `astype()` function creates a copy of the array, and allows you to specify the data type as a parameter.
- The data type can be specified using a string, like 'f' for float, 'i' for integer etc. or you can use the data type directly like `float` for float and `int` for integer.

String to represent data type
- i = `integer`
- b = `boolean`
- u = `unsigned integer`
- f = `float`
- c = `complex float`
- m = `timedelta`
- M = `datetime`
- O = `object`
- S = `string`

In [20]:
arr_f = np.array([1.1, 2.1, 3.1])

newarr = arr_f.astype(int)
newarr = arr_f.astype('i')

print(newarr)
print(newarr.dtype)

[1 2 3]
int32


### Copy vs View

Difference
- `copy()`, `view()`
- The main difference between a copy and a view of an array is that the copy is a new array, and the view is just a view of the original array.
- The copy owns the data and any changes made to the copy will not affect original array, and any changes made to the original array will not affect the copy.
- The view does not own the data and any changes made to the view will affect the original array, and any changes made to the original array will affect the view.

In [21]:
arr = np.array([1, 2, 3, 4, 5])
x = arr.copy()
arr[0] = 42

print(arr)
print(x)

[42  2  3  4  5]
[1 2 3 4 5]


In [22]:
arr = np.array([1, 2, 3, 4, 5])
x = arr.view()
arr[0] = 42

print(arr)
print(x)

[42  2  3  4  5]
[42  2  3  4  5]


Make Changes in the VIEW
- The original array SHOULD be <code style="background:yellow;color:black">affected</code> by the changes made to the view.

In [23]:
arr = np.array([1, 2, 3, 4, 5])
x = arr.view()
arr[0] = 42

print(arr)
print(x)

[42  2  3  4  5]
[42  2  3  4  5]


Check if Array Owns its Data
- Every NumPy array has the attribute `base` that returns `None` if the array owns the data.
- Otherwise, the base  attribute refers to the original object.

In [24]:
arr = np.array([1, 2, 3, 4, 5])

x = arr.copy()
y = arr.view()

print(x.base)
print(y.base)

None
[1 2 3 4 5]


### Shape

Shape of an Array: the number of elements in each dimension.

Get the Shape of an Array
- NumPy arrays have an attribute called `shape` that returns a tuple with each index having the number of corresponding elements.

In [25]:
# Create an array with 5 dimensions and verify that last dimension has value 4:
arr = np.array([1, 2, 3, 4], ndmin=5)

print(arr)
print('shape of array :', arr.shape)

[[[[[1 2 3 4]]]]]
shape of array : (1, 1, 1, 1, 4)


### Reshape

Returns Copy or View?

In [26]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(2, 3, 2)

print(arr)
print(newarr)
print(newarr.base)

[ 1  2  3  4  5  6  7  8  9 10 11 12]
[[[ 1  2]
  [ 3  4]
  [ 5  6]]

 [[ 7  8]
  [ 9 10]
  [11 12]]]
[ 1  2  3  4  5  6  7  8  9 10 11 12]


Unknown Dimension
- You are allowed to have one "unknown" dimension.
- Pass -1 as the value, and NumPy will calculate this number for you.

<div class="alert alert-block alert-info"><b>Notes:</b> We can not pass -1 to more than one dimension.</div>

In [27]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])

newarr = arr.reshape(2, 2, -1)

print(newarr)

[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]


Flattening the arrays
- reshape(-1)

In [28]:
arr_2d.reshape(-1)

array([1, 2, 3, 4, 5, 6])

### Iterating

If we iterate on a n-D array it will go through n-1th dimension one by one.

In [29]:
arr = np.array([1, 2, 3])

for x in arr:
  print(x)

# Iterating 2-D Arrays
arr = np.array([[1, 2, 3], [4, 5, 6]])

for x in arr:
  print(x)

1
2
3
[1 2 3]
[4 5 6]


Iterating Arrays Using `nditer()`
- Iterating on Each Scalar Element
- Iterating Array With Different Data Types using `op_dtypes` argument and pass it the expected datatype to change the datatype of elements while iterating.

NumPy does not change the data type of the element in-place (where the element is in array), so it needs some other space to perform this action, that extra space is called buffer, and in order to enable it in `nditer()` we pass `flags=['buffered']`.

In [30]:
for x in np.nditer(arr_2d, flags=['buffered'], op_dtypes=['S']):
  print(x)

b'1'
b'2'
b'3'
b'4'
b'5'
b'6'


Iterating With Different Step Size

In [31]:
for x in np.nditer(arr_2d[:, ::2]):
  print(x)

1
3
4
6


Enumerated Iteration Using `ndenumerate()`
- means mentioning sequence number of somethings one by one.
- Useful when we require corresponding index of the element while iterating

In [32]:
for idx, x in np.ndenumerate(arr_2d):
  print(idx, x)

(0, 0) 1
(0, 1) 2
(0, 2) 3
(1, 0) 4
(1, 1) 5
(1, 2) 6


### Join

`concatenate()`
- join a sequence of arrays along an <span style="color:red">existing</span> axis.  

`stack`()
- join a sequence of arrays along a <span style="color:red">new</span> axis.

`concatenate()`
- `axis`, the axis along which the arrays will be joined. If axis is `None`, arrays are flattened before use. Default is 0.

`stack`
- `axis`, the axis in the result array along which the input arrays are stacked.

In [60]:
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6]])

print(np.concatenate((a, b), axis=0), '\n') 
print(np.concatenate((a, b.T), axis=1), '\n')
print(np.concatenate((a, b), axis=None))

[[1 2]
 [3 4]
 [5 6]] 

[[1 2 5]
 [3 4 6]] 

[1 2 3 4 5 6]


In [74]:
# Join two 2-D arrays along columns (axis=1):
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])

arr = np.concatenate((arr1, arr2), axis=1)
print(arr)

[[1 2 5 6]
 [3 4 7 8]]


In [75]:
# stack()
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

np.stack((a, b))

array([[1, 2, 3],
       [4, 5, 6]])

In [73]:
np.stack((a, b), axis=-1)

array([[1, 4],
       [2, 5],
       [3, 6]])

Stacking along 
- rows: `hstack()`
- columns: `vstack`
- depth (height): `hstack` 

In [62]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

print(np.hstack((arr1, arr2)), '\n')
print(np.vstack((arr1, arr2)), '\n')
print(np.dstack((arr1, arr2)))

[1 2 3 4 5 6] 

[[1 2 3]
 [4 5 6]] 

[[[1 4]
  [2 5]
  [3 6]]]


In [76]:
# 2D arrays
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])

print(np.hstack((arr1, arr2)), '\n', np.concatenate((arr1, arr2), axis=1), '\n')
print(np.vstack((arr1, arr2)), '\n', np.concatenate((arr1, arr2), axis=0), '\n')
print(np.dstack((arr1, arr2)))

[[1 2 5 6]
 [3 4 7 8]] 
 [[1 2 5 6]
 [3 4 7 8]] 

[[1 2]
 [3 4]
 [5 6]
 [7 8]] 
 [[1 2]
 [3 4]
 [5 6]
 [7 8]] 

[[[1 5]
  [2 6]]

 [[3 7]
  [4 8]]]


### Split

`split(ary, indices_or_sections, axis=0)[source]`
- Split an array into multiple sub-arrays as views into `ary`.
- `indices_or_sectionsint`: `int` or 1-D array                 
    - If indices_or_sections is an integer, N, the array will be divided into N equal arrays along axis. If such a split is not possible, an error is raised.
    - If indices_or_sections is a 1-D array of sorted integers, the entries indicate where along axis the array is split. If an index exceeds the dimension of the array along axis, an empty sub-array is returned correspondingly.

- `axis`: `int`, _optional_, the axis along which to split, default is 0.
- Returns: sub-arrays, list of ndarrays, a list of sub-arrays as views into ary.

In [78]:
x = np.arange(8.0)
np.split(x, 2)

[array([0., 1., 2., 3.]), array([4., 5., 6., 7.])]

In [79]:
np.split(x, [3, 5, 6, 10])

[array([0., 1., 2.]),
 array([3., 4.]),
 array([5.]),
 array([6., 7.]),
 array([], dtype=float64)]

`array_split`
- only difference: `array_split` allows `indices_or_sections` to be an integer that does <code style="background:yellow;color:black">not</code> equally divide the axis.

In [80]:
x = np.arange(8.0)
np.array_split(x, 3)

[array([0., 1., 2.]), array([3., 4., 5.]), array([6., 7.])]

In [81]:
x = np.arange(9)
np.array_split(x, 4)

[array([0, 1, 2]), array([3, 4]), array([5, 6]), array([7, 8])]

- `vsplit`, split an array into multiple sub-arrays vertically (row-wise).
     - equivalent to `split` with `axis=0` (default), the array is always split along the first axis regardless of the array dimension.
- `hsplit`, split an array into multiple sub-arrays horizontally (column-wise).
    - equivalent to `split` with `axis=1`, the array is always split along the second axis except for 1-D arrays, where it is split at `axis=0`.
- `dsplit`, split array into multiple sub-arrays along the 3rd axis (depth).


In [84]:
x = np.arange(8.0).reshape(2, 2, 2)
np.vsplit(x, 2)

[array([[[0., 1.],
         [2., 3.]]]),
 array([[[4., 5.],
         [6., 7.]]])]

In [85]:
np.hsplit(x, 2)

[array([[[0., 1.]],
 
        [[4., 5.]]]),
 array([[[2., 3.]],
 
        [[6., 7.]]])]

In [86]:
np.dsplit(x, 2)

[array([[[0.],
         [2.]],
 
        [[4.],
         [6.]]]),
 array([[[1.],
         [3.]],
 
        [[5.],
         [7.]]])]

### Search

`argmax(a[, axis, out, keepdims])`
- Returns the indices of the maximum values along an axis.
- `axis`, `int`, optional. By default is `None`, the index is into the flattened array, otherwise along the specified `axis`.
- `out`, array, optional. If provided, the result will be inserted into this array. It should be of the appropriate `shape` and `dtype`.
- `keepdims`, `bool`, optional. If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the array.
- Returns: `index_array`, `ndarray` of `ints`
Array of indices into the array. It has the same shape as `a.shape` with the dimension along `axis` removed. If `keepdims` is set to `True`, then the size of `axis` will be 1 with the resulting array having same shape as `a.shape`.

In [87]:
a = np.arange(6).reshape(2,3) + 10
print(a)
print(np.argmax(a))
print(np.argmax(a, axis=0), np.argmax(a, axis=1))

[[10 11 12]
 [13 14 15]]
5
[1 1 1] [2 2]


In [89]:
# unravel_index(): Converts a flat index or array of flat indices into a tuple of coordinate arrays.
ind = np.unravel_index(np.argmax(a, axis=None), a.shape)
print(ind, a[ind])

(1, 2) 15


In [90]:
b = np.arange(6)
b[1] = 5

print(b)
np.argmax(b)  # Only the first occurrence is returned.

[0 5 2 3 4 5]


1

In [116]:
x = np.array([[4,2,3], [1,0,3]])
index_array = np.argmax(x, axis=-1)

# Same as np.amax(x, axis=-1, keepdims=True)
np.take_along_axis(x, np.expand_dims(index_array, axis=-1), axis=-1)

# Same as np.amax(x, axis=-1)
np.take_along_axis(x, np.expand_dims(index_array, axis=-1), axis=-1).squeeze(axis=-1)

array([4, 3])

In [95]:
x = np.arange(24).reshape((2, 3, 4))
res  = np.argmax(x, axis=1, keepdims=False)
res2 = np.argmax(x, axis=1, keepdims=True)
print(res.shape, res2.shape)

(2, 4) (2, 1, 4)


`numpy.nanargmax(a, axis=None, out=None, *, keepdims=<no value>)`
- Return the indices of the maximum values in the specified axis <span style="color:blue">ignoring NaNs</span>. For all-NaN slices `ValueError` is raised. Warning: the results cannot be trusted if a slice contains only NaNs and -Infs.

In [129]:
a = np.array([[5, np.nan, 4], [1, 2, 3]])

print(a)

# np.argmax() returns the index of np.nan
print(f'np.argmax() returns the index of np.nan: np.argmax(a))')

print(np.nanargmax(a), np.nanargmax(a, axis=0), np.nanargmax(a, axis=1))

[[ 5. nan  4.]
 [ 1.  2.  3.]]
np.argmax() returns the index of np.nan: np.argmax(a))
0 [0 1 0] [0 2]


`numpy.argmin(a, axis=None, out=None, *, keepdims=<no value>)`
- Returns the indices of the minimum values along an axis.

`numpy.nanargmin(a, axis=None, out=None, *, keepdims=<no value>)`
- Return the indices of the minimum values in the specified axis ignoring NaNs. For all-NaN slices `ValueError` is raised. Warning: the results cannot be trusted if a slice contains only NaNs and Infs.

`numpy.argwhere(a)`
- Find the indices of array elements that are non-zero, grouped by element.
- Returns: `index_array`: `(N, a.ndim) ndarray`
    - Indices of elements that are non-zero. Indices are grouped by element. This array will have shape `(N, a.ndim)` where `N` is the number of non-zero items.

In [130]:
x = np.arange(6).reshape(2,3)
np.argwhere(x>1)

array([[0, 2],
       [1, 0],
       [1, 1],
       [1, 2]])

`numpy.nonzero(a)`
- Return the indices of the elements that are non-zero.
- Returns a tuple of arrays, one for each dimension of a, containing the indices of the non-zero elements in that dimension. The values in a are always tested and returned in row-major, C-style order.
- To group the indices by element, rather than dimension, use `argwhere`, which returns a row for each non-zero element.
- Returns: `tuple_of_arrays`: `tuple`. Indices of elements that are non-zero.

<span style="color:blue">Notes</span>  
While the nonzero values can be obtained with `a[nonzero(a)]`, it is recommended to use `x[x.astype(bool)]` or `x[x != 0]` instead, which will correctly handle 0-d arrays.


In [169]:
x = np.array([[3, 0, 0], [0, 4, 0], [5, 6, 0]])
print(
    x, '\n',
    np.nonzero(x), '\n',
    f'x[np.nonzero(x)] = {x[np.nonzero(x)]} \n',
    f'np.transpose(np.nonzero(x)) = {np.transpose(np.nonzero(x))}')

[[3 0 0]
 [0 4 0]
 [5 6 0]] 
 (array([0, 1, 2, 2]), array([0, 1, 0, 1])) 
 x[np.nonzero(x)] = [3 4 5 6] 
 np.transpose(np.nonzero(x)) = [[0 0]
 [1 1]
 [2 0]
 [2 1]]


A common use for `nonzero` is to find the indices of an array, where a condition is True. Given an array a, the condition a > 3 is a boolean array and since False is interpreted as 0, np.nonzero(a > 3) yields the indices of the a where the condition is true.

In [143]:
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(a > 3)
print(np.nonzero(a > 3))

[[False False False]
 [ True  True  True]
 [ True  True  True]]
(array([1, 1, 1, 2, 2, 2]), array([0, 1, 2, 0, 1, 2]))


In [145]:
# Using this result to index a is equivalent to using the mask directly:
a[a > 3]  # prefer this spelling

array([4, 5, 6, 7, 8, 9])

In [146]:
# nonzero can also be called as a method of the array.
(a > 3).nonzero()

(array([1, 1, 1, 2, 2, 2]), array([0, 1, 2, 0, 1, 2]))

`numpy.flatnonzero(a)`
- Return indices that are non-zero in the flattened version of a.
- equivalent to `np.nonzero(np.ravel(a))[0]`.

In [156]:
x = np.arange(-2, 3)

np.flatnonzero(x)

array([0, 1, 3, 4])

In [157]:
# Use the indices of the non-zero elements as an index array to extract these elements:
x.ravel()[np.flatnonzero(x)]

array([-2, -1,  1,  2])

`where(condition, [x, y, ]/)`
- Return elements chosen from x or y depending on _condition_.
- `condition` array_like, `bool`. Where True, yield x, otherwise yield y.
    - x, y array_like: Values from which to choose. x, y and condition need to be broadcastable to some shape.

Returns:
outndarray
An array with elements from x where condition is True, and elements from y elsewhere.

<span style="color:blue">Notes</span>  
When only _condition_ is provided, this function is a shorthand for `np.asarray(condition).nonzero()`. Using `nonzero` directly should be __preferred__, as it behaves correctly for subclasses. The rest of this documentation covers only the case where all three arguments are provided.

<span style="color:blue">Notes</span>  
If all the arrays are 1-D, where is equivalent to:  
```python
[xv if c else yv for c, xv, yv in zip(condition, x, y)]
 ```

In [159]:
a = np.arange(10)

np.where(a < 5, a, 10*a)

array([ 0,  1,  2,  3,  4, 50, 60, 70, 80, 90])

In [158]:
# multidimensional arrays

np.where([[True, False], [True, True]],
         [[1, 2], [3, 4]],
         [[9, 8], [7, 6]])

array([[1, 8],
       [3, 4]])

In [160]:
# The shapes of x, y, and the condition are broadcast together:

x, y = np.ogrid[:3, :4]
np.where(x < y, x, 10 + y)  # both x and 10+y are broadcast

array([[10,  0,  0,  0],
       [10, 11,  1,  1],
       [10, 11, 12,  2]])

In [161]:
a = np.array([[0, 1, 2],
              [0, 2, 4],
              [0, 3, 6]])
np.where(a < 4, a, -1)  # -1 is broadcast

array([[ 0,  1,  2],
       [ 0,  2, -1],
       [ 0,  3, -1]])

In [162]:
np.ogrid[:3, :4]

[array([[0],
        [1],
        [2]]),
 array([[0, 1, 2, 3]])]

`numpy.searchsorted(a, v, side='left', sorter=None)`
- Find indices where elements should be inserted to maintain order.
- Find the indices into a sorted array a such that, if the corresponding elements in v were inserted before the indices, the order of a would be preserved.
- Assuming a is sorted
- `left`: `a[i-1] < v <= a[i]`
- `right`: `a[i-1] <= v < a[i]`

Parameters
- `a`: 1-D array_like. Input array. If `sorter` is `None`, then it must be sorted in ascending order, otherwise `sorter` must be an array of indices that sort it.
- `v`: array_like. Values to insert into a.
- `side{‘left’, ‘right’}`, optional. If ‘left’, the index of the first suitable location found is given. If ‘right’, return the last such index. If there is no suitable index, return either 0 or N (where N is the length of a).
- `sorter`: 1-D array_like, optional. Optional array of integer indices that sort array a into ascending order. They are typically the result of `argsort`.

Returns: `indices`, `int` or `array of ints`
- Array of insertion points with the same shape as `v`, or an integer if `v` is a scalar.




In [166]:
x = [1,2,3,4,5]

print(
    np.searchsorted(x, 3), '\n',
    np.searchsorted(x, 3, side='right'), '\n',
    np.searchsorted(x, [-10, 10, 2, 3]) )

2 
 3 
 [0 5 1 2]


`numpy.extract(condition, arr)`
- Return the elements of an array that satisfy some condition.
- This is equivalent to `np.compress(ravel(condition), ravel(arr))`. 
- If condition is boolean, `np.extract` is equivalent to `arr[condition]`.

Note that `place` does the exact opposite of `extract`.

Parameters:
- `condition`: array_like. An array whose nonzero or `True` entries indicate the elements of `arr` to extract.
- `arr`: array_like. Input array of the same size as condition.

Returns:
- extract: ndarray. Rank 1 array of values from arr where condition is `True`.

In [172]:
arr = np.arange(12).reshape((3, 4))

condition = np.mod(arr, 3)==0
print(
    condition, '\n',
    np.extract(condition, arr))

[[ True False False  True]
 [False False  True False]
 [False  True False False]] 
 [0 3 6 9]


In [171]:
# If condition is boolean:
arr[condition]

array([0, 3, 6, 9])

### Sort

NumPy ndarray object has a function called `sort()`
- sort arrays of numbers, strings, and booleans. 

<div class="alert alert-block alert-info"><b>Notes: </b> This method returns a copy of the array, leaving the original array unchanged.</div>

In [173]:
arr = np.array([3, 2, 0, 1])

np.sort(arr)

array([0, 1, 2, 3])

Sorting a 2-D Array
- both arrays will be sorted

In [174]:
arr = np.array([[3, 2, 4], [5, 0, 1]])

np.sort(arr)

array([[2, 3, 4],
       [0, 1, 5]])

More details: https://numpy.org/doc/stable/reference/routines.sort.html

Python `sorted` vs `sort`

| `sorted` | `sort` |
|------|------|
| `sorted(iterable, key=key, reverse=reverse)` | `list.sort(reverse=True\|False, key=myFunc)` |
| Can work both on Sequences and Collections | Only works on list | 
| Returns a sorted list | Sorts the object in-place, so it returns None |
| Creates a sorted copy of the Python object | Sorts the original sequence, i.e., Inplace sorting | 

### Filter

Filtering arrays

In NumPy, you filter an array using a boolean index list.
- A boolean index list is a list of booleans corresponding to indexes in the array.
- If the value at an index is `True`, that element is contained in the filtered array; if the value at that index is False that element is excluded from the filtered array.

Create a filter

In [176]:
arr = np.array([41, 42, 43, 44])

filter_arr = arr % 2 == 0
# filter_arr = arr > 42

newarr = arr[filter_arr]

print(filter_arr)
print(newarr)

[False  True False  True]
[42 44]
