<a href="https://colab.research.google.com/github/sergijoan22/notes/blob/main/python/NumPy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NumPy

NumPy uses arrays which are stored at one singe place unlike built-in lists, offering better performance.

In [224]:
import numpy as np

## Create an array

Arrays are created using iterable items like lists, tuples or sets.

In [228]:
arr1 = np.array([[1, 4, 3, 5], [4, 5, 6, 8]])
arr2 = np.arange(1, 20, 2) # with a range

print(arr1)
print('------------------')
print(arr2)

[[1 4 3 5]
 [4 5 6 8]]
------------------
[ 1  3  5  7  9 11 13 15 17 19]
------------------
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]


## Array data types

In addition to the default Python types (string, integer, float, boolean and complex), NumPy has its own data types which uses for the arrays:
- i: integer
- b: boolean
- u: unsigned integer
- f: float
- c: complex float
- m: timedelta
- M: datetime
- O: object
- S: string
- U: unicode string
- V: fixed chunk of memory for other type ( void )

So, if elements are integer values, it is the numpy integer, not python's.
All array elements must have the same data type.
i, u, f, S and U types allow to define size as well, like i16 or S4, which is the number of bytes for each element.
ValueError if one element can not be converted to the choosen data type.

## Array attributes

In [212]:
arr = np.array([[1, 4, 3, 5], [4, 5, 6, 8]])

print("number of dimensions: ", arr.ndim)
print("size of each dimension: ", arr.shape)
print("number of elements: ", arr.size)
print("data type of the elements: ", arr.dtype) # One float element converts all other to float as well
print("check if the array owns its data or uses other array's: ", arr.base) # If it is an array view, it returns the original array (None if not)

number of dimensions:  2
size of each dimension:  (2, 4)
number of elements:  8
data type of the elements:  int64
check if the array owns its data or uses other array's:  None


## Access an array

### Array indexing

In [None]:
arr = np.array([[1, 2, 3, 4], [11, 22, 33, 44]])

print(arr[0])
print(arr[0][0])
print(arr[0, 0]) # same that [0, 0]

[1 2 3 4]
1
1


Negative indexing to acces element from the end

In [None]:
arr = np.array([[1, 2, 3, 4], [11, 22, 33, 44]])

print(arr[-1])
print(arr[-1, -1])

[11 22 33 44]
44


### Array slicing

In [None]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])

print("First three elements: ", arr[0:3])
print("Since the 6th element: ", arr[5:])
print("Until the 4th element: ", arr[:4])
print("Last 3 elements: ", arr[-3:])
print("3rd and 2nd last elements: ", arr[-3:-1])
print("From the third take one element every two: ", arr[2::2])

First three elements:  [1 2 3]
Since the 6th element:  [6 7 8 9]
Until the 4th element:  [1 2 3 4]
Last 3 elements:  [7 8 9]
3rd and 2nd last elements:  [7 8]
From the third take one element every two:  [3 5 7 9]


In [None]:
arr = np.array([[1, 2, 3, 4], [11, 22, 33, 44]])

print("First two elements of second subarray: ", arr[1, 0:2])
print("Second element of each subarray: ", arr[:, 1])

First two elements of second subarray:  [11 22]
Second element of each subarray:  [ 2 22]


### Array iteration

1-D array

In [26]:
arr = np.array([1, 2, 3])

for x in arr:
  print(x, end = ' / ')

1 / 2 / 3 / 

Iterate n-D arrays with `nditer` avoids having to nest one for loop per dimension (An alternative `ndenumerate` gives a tuple with the element and its position).

In [43]:
arr = np.array([[1, 2, 3], [11, 22, 33]])

for x in np.nditer(arr):
  print(x, end = ' / ')

print('\n')

# to iterate elements while cast them (does not affect the array)
for x in np.nditer(arr, flags=['buffered'], op_dtypes=['S']):
  print(x, end = ' / ')

1 / 2 / 3 / 11 / 22 / 33 / 

b'1' / b'2' / b'3' / b'11' / b'22' / b'33' / 

## Copy an array

- **Array copy**: A completly new independent array is created
- **Array view**: A view array object is created which uses the data of the original array. Changes in the view do not alter the original array unless it is a change in the underlying data values: Reshaping or changing the dtype do not affect the original data.

In [None]:
arr = np.array([[1, 2, 3, 4], [11, 22, 33, 44]])

arr_c = arr.copy() # this creates a new array
arr_r = arr # this creates in arr_v a reference to arr
arr_v  = arr.view() # this creates a view of arr

arr[0, 0] = 100

print(arr_c[0, 0])
print(arr_r[0, 0])
print(arr_v[0, 0])

[[100   2   3   4]
 [ 11  22  33  44]]


## Modify an array

### Change data type of an array

In [None]:
arr = np.array([[1, 2, 3, 4], [11, 22, 33, 44]])
arr2 = arr.astype('f')
print(arr2)

[[ 1.  2.  3.  4.]
 [11. 22. 33. 44.]]


### Filter an array

Done creating a boolean index lists which says for each element if it must be filtered or not.

In [175]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
arr_to_filt = (arr%2 == 1) # calculates if each element is unpair
print(arr_to_filt)
print('------------------')
arr = arr[arr_to_filt] # filters out pair elements
print(arr)

[ True False  True False  True False  True False  True]
------------------
[1 3 5 7 9]


Another way

In [176]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
arr_to_filt =[]

for x in arr:
    if x%2 == 0:
        arr_to_filt.append(False)
    else:
        arr_to_filt.append(True)
print(arr_to_filt)
print('------------------')
arr = arr[arr_to_filt] # filters out pair elements
print(arr)

[True, False, True, False, True, False, True, False, True]
------------------
[1 3 5 7 9]


### Sort an array

Sort by elements

In [141]:
arr = np.array([6, 3, 9, 8, 1, 4, 2, 7])
arr.sort()

print(arr)

[1 2 3 4 6 7 8 9]


In [142]:
arr = np.array([[6, 3, 9, 8], [ 1, 4, 2, 7]])
arr.sort() # sorts within each dimension

print(arr)

[[3 6 8 9]
 [1 2 4 7]]


Sort by aggrupation of elements

In [155]:
arr = np.array([[4, 8], [7, 3], [9, 2], [7, 3], [5, 2], [1, 3], [4, 5]])

row_sums = arr.sum(axis=1) # array with the sum of each pair
print(row_sums)
print('------------------')
sorted_indices = np.argsort(row_sums) # gives the index of the aggregated array from smallest to biggest
print(sorted_indices)
print('------------------')
arr = arr[sorted_indices] # orders the array using the ordered indexs
print(arr)

[12 10 11 10  7  4  9]
------------------
[5 4 6 1 3 2 0]
------------------
[[1 3]
 [5 2]
 [4 5]
 [7 3]
 [7 3]
 [9 2]
 [4 8]]


### Join arrays

1-D arrays

In [53]:
arr1  = np.array([1, 2, 3, 4])
arr2  = np.array([5, 6, 7, 8])

arr = np.concatenate((arr1, arr2))

print(arr)

[1 2 3 4 5 6 7 8]


n-D arrays. Joins are based on axis.
- `concatenate` joins arrays which must have the same size in all dimension except the axis one which can be vary.
- `stack` joins arrays which may be the exact same size and stacks them in a new axis.
- `hstack`, `vstack` and `dstack` are simplifications of `concatenate` each for a certain axis.

In [77]:
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])

arr_c0 = np.concatenate((arr1, arr2), axis=0)
arr_c1 = np.concatenate((arr1, arr2), axis=1)
arr_s0 = np.stack((arr1, arr2), axis=0)
arr_s1 = np.stack((arr1, arr2), axis=1)

print(arr_c0, '\n', arr_c0.shape)
print('------------------')
print(arr_c1, '\n', arr_c1.shape)
print('------------------')
print(arr_s0, '\n', arr_s0.shape)
print('------------------')
print(arr_s1, '\n', arr_s1.shape)

[[1 2]
 [3 4]
 [5 6]
 [7 8]] 
 (4, 2)
------------------
[[1 2 5 6]
 [3 4 7 8]] 
 (2, 4)
------------------
[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]] 
 (2, 2, 2)
------------------
[[[1 2]
  [5 6]]

 [[3 4]
  [7 8]]] 
 (2, 2, 2)


### Split arrays

1-D arrays

In [83]:
arr = np.array([1, 2, 3, 4, 5, 6])

arr2 = np.array_split(arr, 4) # creates an array of arrays. If division is not exact, adjusted at the end

print(arr2)
print('------------------')
print(arr2[0])

[array([1, 2]), array([3, 4]), array([5]), array([6])]
[1 2]


n-D arrays.

In [94]:
arr = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]])

arr2 = np.array_split(arr, 3) # axis to split can be defined

print(arr2)
print('------------------')
print(arr2[0])

[array([[1, 2],
       [3, 4]]), array([[5, 6],
       [7, 8]]), array([[ 9, 10],
       [11, 12]])]
------------------
[[1 2]
 [3 4]]


### Resize an array

Resize the elements within the dimensions:
- New shape must have the same number of elements
- It creates a view array

In [11]:
arr = np.array([[1, 2, 3, 4], [11, 22, 33, 44]])

arr2 = arr.reshape((4, 2))

print(arr)

[[ 1  2  3  4]
 [11 22 33 44]]


In [17]:
arr = np.array([[1, 2, 3, 4], [11, 22, 33, 44]])

arr2 = arr.reshape((-1, 2)) # one dimension size can be unknown (-1) and calculated by numpy

print(arr2)

[[ 1  2]
 [ 3  4]
 [11 22]
 [33 44]]


In [18]:
arr = np.array([[1, 2, 3, 4], [11, 22, 33, 44]])

arr2 = arr.reshape(-1)

print(arr2)

[ 1  2  3  4 11 22 33 44]


### Add an element

Adds it at the end

In [220]:
arr = np.array([1, 3, 2])

arr = np.append(arr, 7)

print(arr)

[1 3 2 7]


### Delete an element

Adds element at the defined indexes

In [230]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])

arr = np.delete(arr, (0, 1, -1))

print(arr)

[3 4 5 6 7 8]


### Modify elements

Add a value at a certain position.

In [131]:
arr = np.array([1, 3, 5, 6, 8, 10, 12])

arr = np.insert(arr, 2, 0) # several values and position can be passed
print(arr)

[ 1  3  0  5  6  8 10 12]


Modify a value

In [None]:
arr = np.array([[1, 2, 3, 4], [11, 22, 33, 44]])

print(arr[0])
arr[0] = arr[1] + 1
print(arr[0])

[1 2 3 4]
[12 23 34 45]


Modify all values

In [50]:
arr1  = np.array([1, 2, 3, 4])
arr2  = np.array([5, 6, 7, 8])

print(arr1 * arr2)

[ 5 12 21 32]


## Search an array

`where` creates a tuple with the position of elements which fullfill a condition.

In [114]:
arr = np.array([[1, -2, 3, 4], [11, 22, -33, 44]])

x = np.where(arr < 0)

print(x)
print('------------------')
print(x[0])
print('------------------')
print(arr[x])

(array([0, 1]), array([1, 2]))
------------------
[0 1]
------------------
[ -2 -33]


`searchsorted` is intended to work with sorted arrays and it gives the index where a new value should be added to keep the order by finding the first element bigger:
- `side=right` argument to search backwards
- More than one value can be search, returning an array with the indexes.

In [120]:
arr = np.array([1, 3, 5, 6, 8, 10, 12])

x = np.searchsorted(arr, 4)
print(x)

2
[ 1  3  4  5  6  8 10 12]


## Create a random array

Random in NumPy offers many ways to create arrays of random values: From a range, with given possibilities and their probabilities, following certain distributions, etc.


In [181]:
np.random.rand(2, 2)

array([[0.50635428, 0.18837499],
       [0.56083436, 0.4761958 ]])

In [190]:
np.random.choice([1, 2, 3, 4], p=[0.4, 0.3, 0.2, 0.1], size=(4, 4))

array([[2, 2, 3, 1],
       [2, 1, 1, 1],
       [1, 3, 1, 1],
       [2, 3, 4, 2]])

In [197]:
arr = np.array([1, 2, 3, 4, 5])
np.random.shuffle(arr) # modifies the array, `permutation` returns a new array
print(arr)

[4 2 1 3 5]


In [205]:
x = np.random.normal(loc=5, scale=2, size=(10)) # create random values using a normal distribution with mean = 5 and std = 2

print(x)

[2.21576404 7.36210163 5.56827321 8.44037152 6.15530467 6.25293821
 7.58661506 3.08643354 4.53034687 6.95395152]
