# Numpy

**NumPy** is a popular Python library for numerical and scientific computing, and it provides a powerful data structure called a NumPy array (also known as `ndarray`) for efficiently working with large datasets of homogeneous data types, such as numbers (integers or floats). 

In [None]:
import numpy as np

## 1. Introduction of Numpy array

### 1.1. What is an array?

### 1.2. Why Numpy array?

NumPy arrays and Python lists are both used to store collections of data, but they have significant differences in terms of functionality and performance.

**NumPy Array:**

1. **Homogeneous Data Type:** NumPy arrays are designed to work with homogeneous data types, which means all elements in the array are of the same data type (e.g., integers, floats). This leads to more efficient memory usage and better performance.

2. **Vectorized Operations:** NumPy provides a wide range of mathematical and logical functions that can be applied element-wise to entire arrays, allowing for efficient and concise code for numerical computations. This is known as vectorization.

3. **Performance:** NumPy operations are implemented in C and are highly optimized, making them significantly faster than equivalent operations on Python lists, especially for large datasets.

4. **Memory Efficiency:** NumPy arrays are more memory-efficient than Python lists because they don't need to store type information for each element.

5. **Multidimensional Arrays:** NumPy supports multidimensional arrays (e.g., matrices), making it well-suited for mathematical and scientific applications.

6. **Broadcasting:** NumPy allows you to perform operations on arrays of different shapes through broadcasting, which simplifies many array operations.

**Python List:**

1. **Heterogeneous Data Type:** Python lists can contain elements of different data types, which provides flexibility but can lead to performance issues for numerical computations.

2. **No Built-in Mathematical Operations:** Python lists don't offer built-in mathematical or array operations. You need to loop through lists to perform operations on individual elements, which can be slower and less concise.

3. **Performance:** Python lists are generally slower than NumPy arrays, especially for large datasets and numerical operations.

4. **Memory Overhead:** Python lists have higher memory overhead due to the need to store type information for each element.

5. **Single-Dimensional:** Lists in Python are primarily single-dimensional. To work with multidimensional data, you'd need to use nested lists, which can become complex to manage.

**Why Use NumPy Arrays:**

NumPy arrays are preferred for several reasons:

1. **Performance:** NumPy operations are highly optimized and execute much faster than equivalent Python list operations, making it the go-to choice for numerical computations and data manipulation.

2. **Memory Efficiency:** NumPy arrays are more memory-efficient, especially for large datasets, due to their homogeneous data type and reduced type information overhead.

3. **Ease of Use:** NumPy provides a rich set of functions for array manipulation, statistical analysis, and linear algebra, making it easier to work with data.

4. **Compatibility:** NumPy arrays can be seamlessly integrated with other libraries and tools commonly used in scientific computing and data analysis, such as SciPy, pandas, and scikit-learn.

In [None]:
import numpy as np

In [None]:
# my_arr = np.arange(1_000_000)
# my_list = list(range(1_000_000))

# print("Numpy Array:")
# %timeit my_arr2 = my_arr * 2

# print("List:")
# %timeit my_list2 = [x * 2 for x in my_list]

### 1.3. Array creation

- Basic creation

In [None]:
my_array = np.array((1,2,3,4))

print(my_array)
print(len(my_array))
print(my_array[0], my_array[-1])
print(my_array.dtype)

- Create an array with `dtype` specified

In [None]:
my_array = np.array((1,2,3,4), dtype=np.int8)

print(my_array)
print(my_array.dtype)

In [None]:
my_array = np.array((1,2,3,4)).astype(np.int8)

print(my_array)
print(my_array.dtype)

In [None]:
# What if there are different types of data?
my_array = np.array((1,2,3,"4"))

print(my_array)
print(my_array.dtype)

- Create arrays and fill with values

In [None]:
# Fill with zeros
my_array = np.zeros((2,3))
print("my_array")
print(my_array)
print(my_array.dtype)
print(len(my_array))
print("")

print("my_array2")
my_array2 = np.zeros_like(my_array).astype(np.int8)
print(my_array2)
print(my_array2.dtype)
print(len(my_array2))

In [None]:
# Create a length-10 integer array filled with zeros/ones
my_array = np.zeros((2,3))
print("my_array")
print(my_array)
print(my_array.dtype)
print(len(my_array))
print("")

print("my_array2")
my_array2 = np.zeros_like(my_array).astype(np.int8)
print(my_array2)
print(my_array2.dtype)
print(len(my_array2))

In [None]:
# Fill with ones
my_array = np.ones((2,3), dtype=np.int8)
print("my_array")
print(my_array)
print(my_array.dtype)
print(len(my_array))
print("")

print("my_array2")
my_array2 = np.ones_like(my_array)
print(my_array2)
print(my_array2.dtype)
print(len(my_array2))

In [None]:
# Fill with custom values
my_array = np.full(shape=(3,5), fill_value=3.14, dtype=np.float16)
print("my_array")
print(my_array)
print(my_array.dtype)
print("")

my_array2 = np.full_like(my_array, fill_value=1.43, dtype=np.float32)

print(my_array2)
print(my_array2.dtype)

- Create arrays from a sequence

In [None]:
# Create an array filled with a linear sequence
# starting at 6, ending at 49, stepping by 2
# (this is similar to the built-in range() function)
np.arange(start=6, stop=49, step=2)

In [None]:
# Create an array of five values evenly spaced between 0 and 1
np.linspace(start=0, stop=1, num=5)

- Create arrays and fill with random values

In [None]:
# Create a 5x5 array of uniformly distributed random values between 0 and 1
np.random.random(size=(5,5))

In [None]:
# Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1

np.random.normal(loc=0.0, scale=1.0, size=(3,3))

## 2. Attributes of arrays

### 2.1. Size, shape, memory consumption, and data types of arrays

- The `shape` attribute is a tuple that describes the dimensions of the array. For a 1D array, it's a tuple with a single element representing the number of elements. For a 2D array, it's a tuple with two elements representing the number of rows and columns, and so on.
- The `dtype` attribute specifies the data type of the elements in the array, such as int64, float32, etc.
- The `ndim` attribute indicates the number of dimensions (axes) in the array.
- The `size` attribute represents the total number of elements in the array.
- The `itemsize` attribute specifies the size (in bytes) of each element in the array.
- The `nbytes` attribute returns the total number of bytes used by the array's data.

In [None]:
my_array = np.array(
    [
        [[1,2], [2,3], [3,4]],
        [[4,5], [5,6], [6,7]]
    ],
    dtype=np.int8
)
print(f"Shape of array: {my_array.shape}")
print(f"Type of elements: {my_array.dtype}")
print(f"Number of dimentions: {my_array.ndim}")
print(f"Number of elements: {my_array.size}") 
print(f"Item's size: {my_array.itemsize}")
print(f"Total number of bytes used by my_array: {my_array.nbytes}")

### 2.2. Indexing & Slicing

- 1D array: similar to List indexing and slicing

In [None]:
my_array = np.array([5,6,7,8])
print(my_array[2])
print(my_array[-1])
print(my_array[:3])
print(my_array[1:])
print(my_array[::2])

- 2D array: using a comma-separated tuple of indices

In [None]:
my_array = np.array(
    [
# column 0  1  2  3
        [3, 5, 2, 4], # row 0
        [7, 6, 8, 8], # row 1
        [1, 6, 7, 99] # row 2
    ]
)

print(my_array.shape)
print(my_array[0,0])
print(my_array[-1,-1])

In [None]:
my_array = np.array(
    [
    # cl 0  1  2  3
        [3, 5, 2, 4], # row 0
        [7, 6, 8, 8], # row 1
        [1, 6, 7, 99] # row 2
    ]
)

# Get values from the first 2 rows last column
print(my_array[:2, -1])

# Get values of every even rows and odd column
print(my_array[::2, 1::2])

- Modify values using index notation

In [None]:
my_array = np.array(
    [
        [3, 5, 2, 4],
        [7, 6, 8, 8],
        [1, 6, 7, 99]
    ]
)
my_array[0,0] = 100

print(my_array)

**NOTE**

Numpy array slicing returns **views** rather than **copies** of the array data. In lists, slices will be copies. 

In [None]:
my_list = [1,2,3]
copied_list = my_list[:]
copied_list[0] = 8 # only affect copied_list
print(f"my_list: {my_list}")
print(f"copied_list: {copied_list}")

In [None]:
my_array = np.array([1,2,3])
viewed_array = my_array[:] # slicing
my_array[0] = 8 # update viewed_array as well

print(f"my_array: {my_array}")
print(f"viewed_array: {viewed_array}")

In [None]:
my_array = np.array([1,2,3])
viewed_array = my_array[:] # slicing
viewed_array[0] = 99 # update my_array as well

print(f"my_array: {my_array}")
print(f"viewed_array: {viewed_array}")

To copy a Numpy array, use method `copy()`

In [None]:
my_array = np.array([1,2,3,4,5,6])
copied_array = my_array.copy() # copy
copied_array[0] = 8 # only affect copied_array

print(f"my_array: {my_array}")
print(f"copied_array: {copied_array}")

It's useful when working with large datasets, we only view and copy a small piece of data instead of copying the whole dataset.

**EXERCISE**

1. Create these following matrices using Numpy arrays, be careful with the data type.

a. 
```python
[
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 1, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0]
]
```

b.
```python
[
    [1, 1, 1, 1, 1],
    [0, 0, 0, 0, 0],
    [1, 1, 1, 1, 1],
    [0, 0, 0, 0, 0],
    [1, 1, 1, 1, 1]
]
```

c.
```python
[
    [1, 0, 1, 0, 1],
    [0, 0, 0, 0, 0],
    [1, 0, 1, 0, 1],
    [0, 0, 0, 0, 0],
    [1, 0, 1, 0, 1]
]
```

2. Get all number 9 from these following arrays

a.
```python
[
    [1., 1., 1.],
    [1., 9., 1.],
    [1., 1., 1.]
]
```

b.
```python
[
    [1., 1., 1., 1.],
    [1., 1., 1., 1.],
    [9., 9., 9., 1.],
    [9., 9., 9., 1.]
]
```

c.
```python
[
    [1., 1., 1., 1., 1., 1.],
    [1., 1., 1., 1., 1., 1.],
    [1., 1., 9., 9., 1., 1.],
    [1., 1., 9., 9., 1., 1.],
    [1., 1., 1., 1., 1., 1.],
    [1., 1., 1., 1., 1., 1.]
]
```

## 3. Array manipulations

### 3.1. Reshape an array

In [None]:
my_array = np.arange(1,17)
reshaped = my_array.reshape((4,4))

print(my_array)
print(reshaped)

### 3.2. 1D array to 2D array

![New axis](img/numpy_reshape.png)

In [None]:
# Using reshape()
my_1d = np.array([2,0,1,8])
print(my_1d.shape, my_1d.ndim)

my_2d_row = my_1d.reshape(1, 4) # 1 row, 4 columns
print(my_2d_row.shape, my_2d_row.ndim)

my_2d_col = my_1d.reshape(4, 1) # 4 rows, 1 column
print(my_2d_col.shape, my_2d_col.ndim)

In [None]:
# Using np.newaxis
my_1d = np.array([2,0,1,8])
print(my_1d.shape, my_1d.ndim)

my_2d_row = my_1d[np.newaxis, :]
print(my_2d_row.shape, my_2d_row.ndim)

my_2d_col = my_1d[:, np.newaxis]
print(my_2d_col.shape, my_2d_col.ndim)

### 3.3. Adding, removing, and sorting elements

### 3.4. Reverse an array

### 3.5. Transposing and reshape a matrix

### 3.6. Reshape and flattening multidimentional array

### 3.7. Numpy axes

![Numpy axes](img/numpy-arrays-have-axes_updated_v2.png)

It’s very important to understand what the `axis` parameter actually controls for each function.

- The `axis` in the `np.sum()` controls which axis will be collapsed.

In [164]:
my_array = np.arange(0,6).reshape(2,3)
print(my_array)
print(np.sum(my_array))

[[0 1 2]
 [3 4 5]]
15


![numpy-axes-np-sum-axis-0](img/numpy-axes-np-sum-axis-0.png)
![numpy-axes-np-sum-axis-1](img/numpy-axes-np-sum-axis-1.png)

In [166]:
print("axis=0", np.sum(my_array, axis=0)) # does not sum elements along the rows, but sum elements along the columns to collapse the rows
print("axis=1", np.sum(my_array, axis=1)) # does not sum elements along the columns, but sum elements along the rows to collapse the columns

axis=0 [3 5 7]
axis=1 [ 3 12]


- The `axis` parameter in `np.concatenate()` controls which axis we stack the arrays.

In [179]:
arr1 = np.ones(shape=(2,3), dtype=np.int8)
print("arr1")
print(arr1)

arr2 = np.full(shape=(2,3), fill_value=9, dtype=np.int8)
print("arr2")
print(arr2)

print("axis=0")
print(np.concatenate([arr1, arr2], axis=0))

print("axis=1")
print(np.concatenate([arr1, arr2], axis=1))

arr1
[[1 1 1]
 [1 1 1]]
arr2
[[9 9 9]
 [9 9 9]]
axis=0
[[1 1 1]
 [1 1 1]
 [9 9 9]
 [9 9 9]]
axis=1
[[1 1 1 9 9 9]
 [1 1 1 9 9 9]]


![explanation_numpy-concatenate-axis-0](img/explanation_numpy-concatenate-axis-0.png)
![explanation_numpy-concatenate-axis-1](img/explanation_numpy-concatenate-axis-1.png)

- 1D array only has 1 axis
![1d-array-has-one-axis](img/1d-array-has-one-axis.webp)

In [187]:
arr1 = np.ones(shape=(3,), dtype=np.int8)
print(arr1)

arr2 = np.full_like(arr1, fill_value=9, dtype=np.int8)
print(arr2)

print(np.concatenate([arr1, arr2], axis=0))
#print(np.concatenate([arr1, arr2], axis=1)) # AxisError

[1 1 1]
[9 9 9]
[1 1 1 9 9 9]


In [188]:
# To stack 2 1D arrays vertically
print(np.concatenate([arr1, arr2]).reshape(2,3))

[[1 1 1]
 [9 9 9]]


## 4. Basic array operations

### 4.1. 

### 4.2. Working with mathematical formulas

## 5. Broadcasting

## 6. Save and load Numpy objects

In [None]:
a = np.array([1,2,3, "4"], dtype=np.int32)
a, a.dtype

In [None]:
type(a[0])

### **EXCERCIE**

# REFERENCES

1. [Numpy Tutorials for Beginners](https://www.kaggle.com/code/orhansertkaya/numpy-tutorial-for-beginners)
2. [Numpy Docs](https://numpy.org/doc/stable/user/absolute_beginners.html#)
3. [Numpy Axes Explained](https://www.sharpsightlabs.com/blog/numpy-axes-explained/)