# Module 03 - Scientific Computing with Numpy

---

#### <a href="linkedin.com/in/tasmim-rahman-adib-403074221">Tasmim Rahman Adib</a>
![numpylogo](../img/numpy.jpeg)

# Lecture 3.1 - The Basics of Numpy Arrays
## Agenda
- Introduction
- Getting Started
- Creating Arrays
- Data Types

# 3.1.1. Introduction

## What is NumPy?
- NumPy is a Python library used for working with arrays.

- It also has functions for working in domain of linear algebra, fourier transform, and matrices.

- NumPy was created in 2005 by Travis Oliphant. It is an open source project and you can use it freely.

- NumPy stands for Numerical Python.

## Why is NumPy Faster Than Lists?
- NumPy arrays are stored at one continuous place in memory unlike lists, so processes can access and manipulate them very efficiently.

- This behavior is called locality of reference in computer science.

- This is the main reason why NumPy is faster than lists. Also it is optimized to work with latest CPU architectures.

# 3.1.2. Getting Started

In [3]:
# install numpy
!pip install numpy

Collecting numpy
  Using cached numpy-2.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.3 MB)
Installing collected packages: numpy
Successfully installed numpy-2.1.2


In [1]:
# use following import conventions
import numpy as np

In [5]:
# checking numpy version
np.__version__

'2.1.2'

## Calculation

### Element wise sum is not possible in Python list. But numpy can do that it is an advantage of numpy array

In [6]:
# add 2 lists 
L1 = [1, 2, 3]
L2 = [4, 5, 6]
print(L1+L2)

[1, 2, 3, 4, 5, 6]


In [8]:
# element wise sum using numpy array

A1 = np.array([1, 2, 3]) # Array initialization
A2 = np.array([4, 5, 6])
print(A1+A2) #summation and print

[5 7 9]


## Less Memory Consumption

In [10]:
import sys


# Python list with 1,000,000 elements
py_list = list(range(1000000))

python_memory = sys.getsizeof(py_list) + sum(sys.getsizeof(item) for item in py_list)

# NumPy array with 1,000,000 elements
np_array = np.arange(1000000)

numpy_memory = np_array.nbytes

# Print memory usage
print(f"Memory used by Python list: {python_memory} bytes")
print(f"Memory used by NumPy array: {numpy_memory} bytes")

Memory used by Python list: 36000052 bytes
Memory used by NumPy array: 8000000 bytes


### Explanation:
- A Python list stores references (pointers) to objects, and each integer object in Python contains metadata, leading to more memory overhead.
- NumPy arrays, on the other hand, store raw data in a contiguous block, using a fixed size for each element, which leads to significantly less memory usage.

## Faster Execution 

In [11]:
import time


# Raw Python list
list1 = list(range(10000000))
list2 = list(range(10000000))

# Measure time for Python list
start_time = time.time()
result_list = [x + y for x, y in zip(list1, list2)]
end_time = time.time()
print(f"Python list took: {end_time - start_time:.5f} seconds")

# NumPy array
array1 = np.arange(10000000)
array2 = np.arange(10000000)

# Measure time for NumPy array
start_time = time.time()
result_array = array1 + array2
end_time = time.time()
print(f"NumPy array took: {end_time - start_time:.5f} seconds")


Python list took: 0.85032 seconds
NumPy array took: 0.07978 seconds


### Explanation:
- **Python List:** In Python, lists are general-purpose containers that do not have fixed-size elements. As a result, performing operations like element-wise addition requires looping through each element and performing the addition one by one in Python, which adds significant overhead.

- **NumPy Array:** NumPy uses vectorized operations, meaning that the addition of two arrays is done in highly optimized C code under the hood. This eliminates the need for explicit loops, resulting in much faster execution.

# 3.1.3. Creating Arrays
 
- **Array:** Ordered collection of elements of basic data types of given length.
- **Syntax**
```python 
np.array(object)
```

![ndarray](img/arrays.png)

In [12]:
# Creating 1D array
A = np.array([1, 2, 3])
A 

array([1, 2, 3])

In [13]:
type(A)

numpy.ndarray

## Array with Categorical Entities 
- Numpy can handle different categorical entities. 
- All elements are coerced into same data type 

In [16]:
# create an array with categorical entities. 
X = np.array([12, 13, "Jubayer"])
print(X)

['12' '13' 'Jubayer']


In [17]:
# type 
print(type(X))

<class 'numpy.ndarray'>


In [18]:
# Creating 2D array
A2 = np.array([[3, 4, 5], [7, 8, 9]])
print(A2)

[[3 4 5]
 [7 8 9]]


In [19]:
# Creating 3D array
A3 = np.array([[(1, 2, 3), (4, 5, 6)], [(7, 8, 9), (10, 11, 12)]])
print(A3) 

[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]


## Inspecting array properties

### Size
- Returns number of elements in array
- **Syntax:** `array.size`

In [21]:
A1 = np.array([1, 2, 3, 4, 5])
# size 
A1.size

5

### Shape
- Returns dimensions of array (rows,columns)
- **Syntax:** `array.shape`

In [22]:
A2 = np.array([[4, 5, 6], [7, 8, 9]])
# shape 
A2.shape 

(2, 3)

In [24]:
# get row
A2.shape[0]

2

In [25]:
# get column
A2.shape[1]

3

## Type Conversion 
 - Convert array elements to type dtype
 - **Syntax:** `array.astype(dtype)`
     - dtype - data type 

In [26]:
A3 = np.array([11, 12, 33, 44])
# get data types 
A3.dtype

dtype('int64')

In [29]:
# change it to float
A4 = A3.astype(np.float16)

In [30]:
A4.dtype

dtype('float16')

In [31]:
A4

array([11., 12., 33., 44.], dtype=float16)

## Generate arrays using `zeros()`
- Returns an array of given shape and type filled with zeros 
- **Syntax:** `np.zeros(shape, dtype)`
    - shape - integer or sequence of integers
    - dtype - data type(default: float)

In [32]:
# 1D array of length 3 with all values 0 
Z1 = np.zeros(10)
print(Z1)

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


In [33]:
# 2D array of 3x4 with all values 0 
Z2 = np.zeros((3,4))
print(Z2)

[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]


## Generate arrays using `ones()`
- Returns an array of given shape and type filled with ones 
- **Syntax:** `np.ones(shape, dtype)`
    - shape - integer or sequence of integers 
    - dtype - data type(default: float) 

In [34]:
# 1D array of length 3 with all values 1
A1 = np.ones(10)  
print(A1) 

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]


In [35]:
# 2D array of 3x4 with all values 1
A2 = np.ones((3,4))
A2
print(A2) 

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]


## Generate arrays using `arange()`
- Returns equally spaced numbers with in the given range based on step size. 
- **Syntax:** `np.arange(start, stop, step)`
    - start- starts of interval range 
    - stop - end of interval range '
    - step - step size of interval 

In [36]:
# not specify start and step 
A1 = np.arange(10)
print(A1)

[0 1 2 3 4 5 6 7 8 9]


In [40]:
A = np.arange(1, 10, 2)
print(A)

[1 3 5 7 9]


In [38]:
# specifying start and step 
A2 = np.arange(start=1, stop=10, step=2)
print(A2)

[1 3 5 7 9]


In [39]:
# another way 
A3 = np.arange(10, 25, 2)
print(A3)

[10 12 14 16 18 20 22 24]


## Generate arrays using `linspace()`
- Returns equally spaced numbers within the given range based on the sample number. 
- **Syntax:**  `np.linspace(start, stop, num, dtype, retstep)`
    - start-start of interval range 
    - stop-end of the interval range 
    - num- number of samples to be generated 
    - dtype-type of output array 
    - retstep-return the samples, step values 

In [41]:
# array of evenly spaced values 0 to 2, here sample size = 9
L1 = np.linspace(0,2,50)
print(L1)

[0.         0.04081633 0.08163265 0.12244898 0.16326531 0.20408163
 0.24489796 0.28571429 0.32653061 0.36734694 0.40816327 0.44897959
 0.48979592 0.53061224 0.57142857 0.6122449  0.65306122 0.69387755
 0.73469388 0.7755102  0.81632653 0.85714286 0.89795918 0.93877551
 0.97959184 1.02040816 1.06122449 1.10204082 1.14285714 1.18367347
 1.2244898  1.26530612 1.30612245 1.34693878 1.3877551  1.42857143
 1.46938776 1.51020408 1.55102041 1.59183673 1.63265306 1.67346939
 1.71428571 1.75510204 1.79591837 1.83673469 1.87755102 1.91836735
 1.95918367 2.        ]


In [44]:
# Array of 6 evenly divided values from 0 to 100
L2 = np.linspace(0, 100, 6)
print(L2) 

[  0.  20.  40.  60.  80. 100.]


## Generate constant arrays using `full()` 
- Return a new array of given shape and type, filled with `fill_value`. 
- **Syntax:** `np.full(shape,fill_value, dtype)`
    - shape - Shape of the new array, e.g., ``(2, 3)`` or ``2``.
    - fill_value - Fill value(scaler).
    - dtype - The desired data-type for the array

In [45]:
# generate 2x2 constant array, constant = 7
C = np.full((2, 2), 7)
print(C)

[[7 7]
 [7 7]]


## Creating identity matrix using `eye()`
- An array where all elements are equal to zero, except for the `k`-th
  diagonal, whose values are equal to one
- **Syntax:** `np.eye(N, M, k, dtype)`
    - N : Number of rows(int) in the output
    - M : Number of columns in the output. If None, defaults to `N`.
    - k : Index of the diagonal: 0 (the default) refers to the main diagonal,
      a positive value refers to an upper diagonal, and a negative value
      to a lower diagonal
    - dtype: Data-type of the returned array.

In [46]:
# generate 2x2 identity matrix 
I = np.eye(2)
print(I) 

[[1. 0.]
 [0. 1.]]


In [48]:
# generate 2x2 identity matrix 
I = np.eye(4,4, 1)
print(I) 

[[0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]
 [0. 0. 0. 0.]]


In [49]:
# generate 2x2 identity matrix 
I = np.eye(4,4, -1)
print(I) 

[[0. 0. 0. 0.]
 [1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]]


## Generate arrays using random.rand() 
- Returns an array of given shape filled with random values. 
- **Syntax:** `np.random.rand(shape)`
    - shape - integer or sequence of integer 

In [50]:
# create an array with randomly generated 5 values 
R = np.random.rand(5)
print(R)

[0.57208239 0.79439259 0.98575167 0.87940967 0.88358243]


In [51]:
# generate 4x5 array of random floats between 0-1
R1 = np.random.rand(4,5)
print(R1)

[[0.84778299 0.41796582 0.74845092 0.41478204 0.28825173]
 [0.79371449 0.3463693  0.03059566 0.74976527 0.64581043]
 [0.73652911 0.64979584 0.0791578  0.61918057 0.11869651]
 [0.92421041 0.37433972 0.66484773 0.41187003 0.33612362]]


In [54]:
# generate 6x7 array of random floats between 0-100
R3 = np.random.rand(4,5)*100
print(R3)

[[82.9442899   7.04712881 85.23209739 24.0210895  14.03878098]
 [86.27694994 34.07413562 17.77590825 33.21216759 53.22717884]
 [ 1.20142348 68.85604131 24.21511252 79.42262897 21.37571009]
 [33.55620457 87.53229529 73.51065966 74.37603789 56.53178069]]


In [55]:
# generate 2x3 array of random ints between 0-4
R4 = np.random.randint(5, size=(2,3))
print(R4)

[[2 4 4]
 [2 2 1]]


## Generate empty arrays using `empty()`
- Return a new array of given shape and type, without initializing entries.
- **Syntax:** `np.empty(shape, dtype)`
    - shape - integer or tuple of integer
    - dtype - data-type


In [57]:
# generate an empty array 
E1 = np.empty(2) 
print(E1)

[9.9e-324 1.5e-323]


# 3.1.4 Data Types
## Data Types in Python 
- `strings` - used to represent text data, the text is given under quote marks. eg. "ABCD"
- `integer` - used to represent integer numbers. eg. -1, -2, -3
- `float` - used to represent real numbers. eg. 1.2, 42.42
- `boolean` - used to represent True or False.
- `complex` - used to represent a number in complex plain. eg. 1.0 + 2.0j, 1.5 + 2.5j

## Data Types in NumPy 
- `i` - integer
- `b` - boolean
- `u` - unsigned integer
- `f` - float
- `c` - complex float
- `m` - timedelta
- `M` - datetime
- `O` - object
- `S` - string
- `U` - unicode string
- `V` - fixed chunk of memory for other type ( void )

## Brief Overview of NumPy Data Types 

|Data Type| Description|
|---------|------------|
|bool_ 	|Boolean (True or False) stored as a byte|
|int_ 	|Default integer type (same as C long; normally either int64 or int32)|
|intc 	|Identical to C int (normally int32 or int64)|
|intp 	|Integer used for indexing (same as C ssize_t; normally either int32 or int64)|
|int8 	|Byte (-128 to 127)|
|int16 	|Integer (-32768 to 32767)|
|int32  |Integer (-2147483648 to 2147483647)|
|int64 	|Integer (-9223372036854775808 to 9223372036854775807)|
|uint8 	|Unsigned integer (0 to 255)|
|uint16 |Unsigned integer (0 to 65535)|
|uint32 |Unsigned integer (0 to 4294967295)|
|uint64 |Unsigned integer (0 to 18446744073709551615)|
|float_ |Shorthand for float64|
|float16|Half precision float: sign bit, 5 bits exponent, 10 bits mantissa|
|float32 	|Single precision float: sign bit, 8 bits exponent, 23 bits mantissa|
|float64 	|Double precision float: sign bit, 11 bits exponent, 52 bits mantissa|
|complex_ 	|Shorthand for complex128.|
|complex64 |	Complex number, represented by two 32-bit floats|
|complex128 	|Complex number, represented by two 64-bit floats|

## Creating NumPy Array 

In [2]:
# Create a numpy array 
A = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
print(A)

[1 2 3 4 5 6 7 8 9]


In [3]:
# Create a numpy array with strings 
S = np.array(["Jim", "Tim", "Mim"])
print(S)

['Jim' 'Tim' 'Mim']


## Checking the Data Type of an Array
The NumPy array object has a property called `dtype` that returns the data type of the array:

In [4]:
# Check the data type of `A`
A.dtype

dtype('int64')

In [6]:
# Check the data type of `S`
S.dtype

dtype('<U3')

## Creating Arrays With a Defined Data Type
We use the `array()` function to create arrays, this function can take an optional argument: `dtype` that allows us to define the expected data type of the array elements

In [7]:
# Create an array of integers 
B = np.array([1, 2, 3, 4, 5], dtype=np.int64)
print(B)

[1 2 3 4 5]


In [14]:
# Check data type of `B`
B.dtype

dtype('int64')

In [9]:
# Create an array of floats 
C = np.array([1, 2, 3, 4, 5], dtype='float32')
print(C)

[1. 2. 3. 4. 5.]


In [10]:
# Check the data type of `C`
C.dtype

dtype('float32')

## Find Byte Size of an Array 

In [11]:
# Check byte size of `A`
A.nbytes

72

In [12]:
# Check byte size of `S`
S.nbytes

36

In [13]:
# Check byte size of `B`
B.nbytes

40

## Converting Data Type on Existing Arrays using  `astype()`

## Find Byte Size of an Array 

*Copyright &copy; 2024  [Md. Jubayer Hossain](https://hossainlab.github.io/) &  [Center for Bioinformatics Learning Advancement and Systematic Training (cBLAST)](https://www.cblast.du.ac.bd/). All rights reserved*