📝 **Author:** Amirhossein Heydari - 📧 **Email:** <amirhosseinheydari78@gmail.com> - 📍 **Origin:** [mr-pylin/numpy-workshop](https://github.com/mr-pylin/numpy-workshop)

---


**Table of contents**<a id='toc0_'></a>    
- [Dependencies](#toc1_)    
- [NumPy - Efficient Computing](#toc2_)    
  - [Do Not Be A Purist](#toc2_1_)    
  - [Avoid Loops](#toc2_2_)    
  - [Specify Data Types](#toc2_3_)    
  - [Avoid Unnecessary Copies](#toc2_4_)    
  - [Reuse](#toc2_5_)    
  - [Structured Arrays](#toc2_6_)    
  - [Broadcasting](#toc2_7_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Dependencies](#toc0_)


In [1]:
import sys

import numpy as np

# <a id='toc2_'></a>[NumPy - Efficient Computing](#toc0_)

📝 Tutorial:

- Efficient array computing: [enccs.github.io/hpda-python/stack](https://enccs.github.io/hpda-python/stack/)


## <a id='toc2_1_'></a>[Do Not Be A Purist](#toc0_)

- NumPy is designed to perform array operations efficiently
- Optimized algorithms implemented in C
- There's no need to reinvent the wheel.


In [2]:
arr_2d_1 = np.array([[1, 4, 2], [3, 1, 0], [2, 4, 2]])

In [3]:
# not preferred
sum_1 = 0
for row in range(arr_2d_1.shape[0]):
    for col in range(arr_2d_1.shape[1]):
        sum_1 += arr_2d_1[row, col]

# log
print(f"sum_1: {sum_1}")

sum_1: 19


In [4]:
# preferred
sum_2 = arr_2d_1.sum()

# log
print(f"sum_2: {sum_2}")

sum_2: 19


## <a id='toc2_2_'></a>[Avoid Loops](#toc0_)

- Create a vectorized version of a python function using <np.vectorize>


In [5]:
arr_1d_1 = np.array([0, 1, 2, 2, 4, 1])

In [6]:
def even_odd(i):
    if i % 2:
        return "O"
    else:
        return "E"

In [7]:
arr_1d_2 = np.zeros_like(arr_1d_1, dtype=str)

# not preferred
for i in range(len(arr_1d_1)):
    arr_1d_2[i] = even_odd(arr_1d_1[i])

# log
print(f"arr_1d_2: {arr_1d_2}")

arr_1d_2: ['E' 'O' 'E' 'E' 'E' 'O']


In [8]:
# preferred
even_odd_2 = np.vectorize(even_odd)
arr_1d_3 = even_odd_2(arr_1d_1)

# log
print(f"arr_1d_3: {arr_1d_3}")

arr_1d_3: ['E' 'O' 'E' 'E' 'E' 'O']


## <a id='toc2_3_'></a>[Specify Data Types](#toc0_)

- Explicitly specify the data type (dtype) of your arrays whenever possible
- It significantly reduce memory usage and improve performance


In [9]:
arr_1d_4 = np.array([1, 2, 3, 4, 5], dtype=np.int8)
arr_1d_5 = np.array([1, 2, 3, 4, 5], dtype=np.int16)
arr_1d_6 = np.array([1, 2, 3, 4, 5], dtype=np.int64)

# log
print(f"arr_1d_4 size: {sys.getsizeof(arr_1d_4)} bytes")
print(f"arr_1d_5 size: {sys.getsizeof(arr_1d_5)} bytes")
print(f"arr_1d_6 size: {sys.getsizeof(arr_1d_6)} bytes")

arr_1d_4 size: 117 bytes
arr_1d_5 size: 122 bytes
arr_1d_6 size: 152 bytes


## <a id='toc2_4_'></a>[Avoid Unnecessary Copies](#toc0_)

- It can consume memory and impact performance
- Instead of creating copies, try to utilize views or slices of existing arrays


In [10]:
arr_2d_2 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# copy
flatten_1 = arr_2d_2.flatten()
flatten_1[0] = 0

# log
print(f"flatten_1   : {flatten_1}")
print(f"arr_2d_2[0] : {arr_2d_2[0, 0]}")

flatten_1   : [0 2 3 4 5 6 7 8 9]
arr_2d_2[0] : 1


In [11]:
arr_2d_3 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# in-place
flatten_2 = arr_2d_3.ravel()
flatten_2[0] = 0

# log
print(f"flatten_2   : {flatten_2}")
print(f"arr_2d_3[0] : {arr_2d_3[0, 0]}")

flatten_2   : [0 2 3 4 5 6 7 8 9]
arr_2d_3[0] : 0


## <a id='toc2_5_'></a>[Reuse](#toc0_)

- Use previously unnecessary variables instead of creating new ones
- Skip the initialization step when possible


In [12]:
arr_1d_7 = np.zeros(shape=6)  # not preferred

# do some stuff
for i in range(6):
    arr_1d_7[i] = i

# log
print(f"arr_1d_7: {arr_1d_7}")

arr_1d_7: [0. 1. 2. 3. 4. 5.]


In [13]:
arr_1d_8 = np.empty(shape=6)  # preferred

# do some stuff
for i in range(6):
    arr_1d_8[i] = i

# log
print(f"arr_1d_8: {arr_1d_8}")

arr_1d_8: [0. 1. 2. 3. 4. 5.]


## <a id='toc2_6_'></a>[Structured Arrays](#toc0_)

- Structured arrays provide efficient storage and manipulation for structured data


In [14]:
# not preferred
data_1 = np.array([["John", 30, 5000], ["Alice", 35, 6000]])

# log
print(f"data_1:\n{data_1}")

data_1:
[['John' '30' '5000']
 ['Alice' '35' '6000']]


In [15]:
# not preferred
data_2 = np.array(
    [
        {"name": "John", "age": 30, "salary": 5000},
        {"name": "Alice", "age": 35, "salary": 6000},
    ]
)

# log
print(f"data_2:\n{data_2}")

data_2:
[{'name': 'John', 'age': 30, 'salary': 5000}
 {'name': 'Alice', 'age': 35, 'salary': 6000}]


In [16]:
# not preferred
class Person:
    def __init__(self, name: str, age: int, salary: int) -> None:
        self.name = name
        self.age = age
        self.salary = salary


data_3 = np.array([Person("John", 30, 5000), Person("Alice", 35, 6000)])

# log
print(f"data_3:\n{data_3}")

data_3:
[<__main__.Person object at 0x00000165FF355760>
 <__main__.Person object at 0x00000165F51AFA70>]


In [17]:
# preferred
data_4 = np.array([("John", 30, 5000), ("Alice", 35, 6000)], dtype=[("name", "<U10"), ("age", int), ("salary", int)])

# log
print(f"data_4         : {data_4}")
print(f"data_4['name'] : {data_4['name']}")
print(f"data_4['age']  : {data_4['age']}")
print(f"data_4[0]      : {data_4[0]}")

data_4         : [('John', 30, 5000) ('Alice', 35, 6000)]
data_4['name'] : ['John' 'Alice']
data_4['age']  : [30 35]
data_4[0]      : ('John', 30, 5000)


## <a id='toc2_7_'></a>[Broadcasting](#toc0_)

- It refers to how NumPy treats arrays with different dimension
- Avoid making explicit copies or reshaping arrays


In [18]:
arr_1d_9 = np.array([1, 2, 3, 4, 5])

In [19]:
# not preferred
arr_1d_10 = np.empty_like(arr_1d_9)

# addition by 1
for i in range(arr_1d_9.shape[0]):
    arr_1d_10[i] = arr_1d_9[i] + 1

# log
print(f"arr_1d_10: {arr_1d_10}")

arr_1d_10: [2 3 4 5 6]


In [20]:
# preferred
arr_1d_11 = arr_1d_9 + 1

# log
print(f"arr_1d_11: {arr_1d_11}")

arr_1d_11: [2 3 4 5 6]
