# Introduction to NumPy (Numerical Python)

🤖 `Notebook by` [Ihsanul Haque](https://www.linkedin.com/in/ihsanul09/)

✅ `Machine Learning Source Codes` [GitHub](https://https://github.com/ihsanulcode/ML-Batch-2)

📌 `Machine Learning from Scratch` [Course Outline](https://https://docs.google.com/document/d/15mGNTUSlWQsy4TzcLZUdYedpCMO5KiVq1USaDprHaIc/edit?usp=sharing)

## What is NumPy?
NumPy is a Python library used for working with arrays. It is a fundamental library in Python used for numerical computing. It provides support for multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.

## Why Use NumPy?


* **Performance:** In Python we have lists that serve the purpose of arrays, but they are slow to process. NumPy aims to provide an array object that is up to `50x faster` than traditional Python lists. NumPy's array operations are implemented in highly optimized, making computations significantly faster than native Python operations.
* **Data Manipulation:** Its powerful array operations and tools facilitate data manipulation, transformation, and analysis, making it a cornerstone for data scientists, engineers, and researchers.
* **Scientific Computing:** NumPy is extensively used in scientific computing, machine learning, image processing, and various domains where numerical data processing is essential.




## Key Features of NumPy


1. **Multi-dimensional Arrays:** NumPy's primary object is the `ndarray`(n-dimensional array), which allows you to store and manipulate large datasets efficiently.
2. **Efficient Operations:** NumPy provides a wide range of mathematical functions that operate on entire arrays, making numerical computations faster and more concise compared to traditional Python lists.
3. **Linear Algebra and Fourier Transforms:** It includes functions for linear algebra, random number generation, Fourier analysis, and more, which are crucial for scientific computing and data analysis tasks.
4. **Integration with Other Libraries:** NumPy integrates seamlessly with other data science and machine learning libraries like pandas, SciPy, Matplotlib, and scikit-learn, forming the backbone of many data manipulation workflows.


## Why is NumPy Faster Than Lists?
NumPy arrays are stored at one continuous place in memory unlike lists, so processes can access and manipulate them very efficiently. This behavior is called locality of reference in computer science. This is the main reason why NumPy is faster than lists. Also it is optimized to work with latest CPU architectures. NumPy is a Python library and is written partially in Python, but most of the parts that require fast computation are written in C or C++.
`Source: https://www.w3schools.com/`

## Installation of NumPy
Make sure that Python is already installed.

Install it using command line: `pip install numpy`

Install in notebook: `!pip install numpy`

## Check Numpy
In a Jupyter Notebook cell, type and execute: `!conda list | grep numpy  # For Anaconda distribution
`

## Import NumPy
Once NumPy is installed, import it in your applications by adding the `import` keyword: `import numpy`

In [1]:
import numpy

# Creating a numpy array
arr = numpy.array([1,2,3,4,5])
print(arr)

[1 2 3 4 5]


In [2]:
# Numpy is usually imported under the np alias
import numpy as np

# Access Numpy library using np
arr = np.array([1,2,3,4,5])
print(arr)

[1 2 3 4 5]


## Checking NumPy Version

In [3]:
print(np.__version__)

1.21.5


## NumPy Creating Arrays

In [4]:
arr = np.array([1,2,3,4,5])
print(arr)
print(type(arr))

[1 2 3 4 5]
<class 'numpy.ndarray'>


## Dimensions in Arrays

In [5]:
# 0-D Arrays
arr = np.array(42)
print(arr)

# Check array dimention
print(arr.ndim)

42
0


In [7]:
# 1-D Arrays
arr1 = np.array([1,2,3,4,5])
print(arr1)
print(arr1.ndim)

[1 2 3 4 5]
1


In [8]:
# 2-D Arrays
arr2 = np.array([
    [1,2,3],
    [4,5,6]
])
print(arr2)
print(arr2.ndim)

[[1 2 3]
 [4 5 6]]
2


In [9]:
# 3-D Arrays
arr3 = np.array([
    [
        [1,2,3],
        [4,5,6]
    ],
    [
        [1,2,3],
        [4,5,6]
    ]
])
print(arr3)
print(arr3.ndim)

[[[1 2 3]
  [4 5 6]]

 [[1 2 3]
  [4 5 6]]]
3


In [10]:
print(type(arr))
print(type(arr1))
print(type(arr2))
print(type(arr3))

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>


## NumPy Array Shape

In [11]:
print(arr.shape)
print(arr1.shape)
print(arr2.shape)
print(arr3.shape)

()
(5,)
(2, 3)
(2, 2, 3)


## Numpy Reshape

In [12]:
import numpy as np

# Creating a Numpy array
arr = np.array([1,2,3,4,5,6])
print(arr)

# Reshaping the array to 2X3 matrix
reshaped_Arr = arr.reshape(2,3)
print(reshaped_Arr)

[1 2 3 4 5 6]
[[1 2 3]
 [4 5 6]]


## NumPy Array Indexin

In [13]:
arr = np.array([1,1.5,'True', True])
arr

array(['1', '1.5', 'True', 'True'], dtype='<U32')

In [14]:
arr = np.array([1,2,3,4,5])
print(arr[0])
print(arr[1]+arr[3])

1
6


In [16]:
# Access 2D arrays
arr2 = np.array([
    [1,2,3],
    [4,5,6]
])
print(arr2[0,1])

2


In [18]:
# 3-D Arrays
arr3 = np.array([
    [
        [1,2,3],
        [4,5,6]
    ],
    [
        [1,2,3],
        [4,5,6]
    ]
])
print(arr3[0,1,2])
print(arr3[1,0,1])

6
2


In [21]:
# Negative Indexing
arr = np.array([
    [1,2,3,4,5],
    [6,7,8,9,10]
])
print(arr[0,-2])
print(arr[1,-3])

4
8


## NumPy Array Iterating

In [22]:
# For loop
import numpy as np

arr = np.array([
    [1,2,3,4,5],
    [6,7,8,9,10]
])

# Iterating through a 2D array using a nested loop
for row in arr:
    for item in row:
        print(item)

1
2
3
4
5
6
7
8
9
10


In [24]:
# 1D array
arr = np.array([1,2,3])
for i in arr:
    print(i)

1
2
3


In [26]:
# 3D iter
arr3 = np.array([
    [
        [1,2,3],
        [4,5,6]
    ],
    [
        [1,2,3],
        [4,5,6]
    ]
])

for x in arr3:
    for y in x:
        for z in y:
            print(z)

1
2
3
4
5
6
1
2
3
4
5
6


## NumPy Array Slicing
`[:end]`: 0 to end

`[start:end]`: start to end

`[start:]`: start to last index

`[::step]`: start to end increment of step

`[start:end:step]`: start to end follows given steps


In [28]:
# Creating a NumPy array
arr = np.array([1,2,3,4,5,6,7,8,9])

# Slicing from beginning to index 6
s1 = arr[:6]
print(s1)

# Slicing from index 2 to index 5
s2 = arr[2:5]
print(s2)

# Slicing from index 3 to end
s3 = arr[3:]
print(s3)

# Slicing with a step of 2
s4 = arr[::2]
print(s4)

# Slicing with start:end:step
s5 = arr[2::2]
print(s5)

[1 2 3 4 5 6]
[3 4 5]
[4 5 6 7 8 9]
[1 3 5 7 9]
[3 5 7 9]


In [31]:
# Creating a 2D Numpy Array and Apply Slicing
arr_2d = np.array([
    [1,2,3],
    [4,5,6],
    [7,8,9]
])

# Slicing rows
row = arr_2d[1]
print(row)
# Slicing col
col = arr_2d[:,1]
print(col)
# Slicing a sub-matrix
# I want 1st 2 rows and col 2 and 3
matrix = arr_2d[:2, 1:]
print(matrix)

[4 5 6]
[2 5 8]
[[2 3]
 [5 6]]


## Numpy DataTypes

**Numeric Data Types:**
* int8, int16, int32, int64: Signed integers of 8, 16, 32, or 64 bits respectively.
* uint8, uint16, uint32, uint64: Unsigned integers of 8, 16, 32, or 64 bits respectively.
* float16, float32, float64: Floating point numbers of 16, 32, or 64 bits respectively.
* complex64, complex128: Complex numbers using 32 or 64 bits for each part.

In [32]:
a = np.array([1,2,3])
print(a.dtype)
b = np.array([1.5,13.3])
print(b.dtype)

int32
float64


In [35]:
# Specifying data types in Numpy arrays
arr_int32 = np.array([1,2,3], dtype=np.int32)
arr_float32= np.array([1.5,2.7,3.2],dtype=np.float32)
print(arr_int32,arr_int32.dtype)
print(arr_float32, arr_float32.dtype)

[1 2 3] int32
[1.5 2.7 3.2] float32


In [37]:
# Convertion between data types

arr = np.array([1,2,3])
# Convert to float 64
arr_float64 = arr.astype(np.float64)
print(arr_float64.dtype)

float64


## Joining NumPy Arrays

In [38]:
# Concatenation using np.concatenate

arr1 = np.array([1,2,3])
arr2 = np.array([4,5])
result = np.concatenate((arr1,arr2))
result

array([1, 2, 3, 4, 5])

## Splitting NumPy Arrays

In [39]:
# Creating data using np.arange
arr = np.arange(1,10)
arr

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [40]:
# Splitting the array into 3 quals parts
split_arr = np.split(arr, 3)
split_arr

[array([1, 2, 3]), array([4, 5, 6]), array([7, 8, 9])]

In [41]:
# Unqual splitting with np.array_split
unequal_split = np.array_split(arr,4)
unequal_split

[array([1, 2, 3]), array([4, 5]), array([6, 7]), array([8, 9])]

## NumPy Searching Arrays

In [42]:
# np.where
arr = np.arange(1,10)
# Finding indices where the value is greater than 3
index = np.where(arr>3)
index

(array([3, 4, 5, 6, 7, 8], dtype=int64),)

In [44]:
arr = np.array([5,7,13,1,2,3])
# Finding indices where the value is greater than 12
index = np.where(arr>12)
index

(array([2], dtype=int64),)

## Array Masking

In [45]:
# Create a boolean mask
mask = arr>12
filtered_values = arr[mask]
filtered_values

array([13])

## NumPy Sorting Arrays

In [46]:
arr = np.array([3,4,1,2,5])
sorted_arr = np.sort(arr)
sorted_arr

array([1, 2, 3, 4, 5])

In [47]:
ulta_sort = np.sort(arr)[::-1]
ulta_sort

array([5, 4, 3, 2, 1])

In [48]:
arr_2d = np.array([[3,1,6], [2,5,4]])

# Sort along rows (axis=1)
sorted_rows = np.sort(arr_2d, axis=1)
sorted_rows

array([[1, 3, 6],
       [2, 4, 5]])

In [49]:
# Sort along cols (axis=0)
sorted_rows = np.sort(arr_2d, axis=0)
sorted_rows

array([[2, 1, 4],
       [3, 5, 6]])

## Array Operations

In [51]:
arr1 = np.array([1,2,3])
arr2 = np.array([1,2,3])

add = arr1+arr2
add

array([2, 4, 6])

In [52]:
arr1 = np.array([1,2,3])
arr2 = np.array([1,2,3])

mul = arr1*arr2
mul

array([1, 4, 9])

In [53]:
arr1 = np.array([1,2,3])
arr2 = np.array([1,2,3])

sqrt = np.sqrt(arr1)
sqrt

array([1.        , 1.41421356, 1.73205081])

In [55]:
arr = np.array([[1,2,3], [4,5,6]])
scalar = 10
result = arr+scalar
result

array([[11, 12, 13],
       [14, 15, 16]])

In [56]:
row_to_add = np.array([10,20,30])
res = arr+row_to_add
res

array([[11, 22, 33],
       [14, 25, 36]])

## Descriptive Statistics:
Mean, Median, Variance, and Standard Deviation

In [57]:
data = np.array([10,20,30,40,50])
# Mean
mean = np.mean(data)
median = np.median(data)
variance = np.var(data)
std_dev = np.std(data)

print(mean)
print(median)
print(variance)
print(std_dev)

30.0
30.0
200.0
14.142135623730951


## Handling Missing Values

In [59]:
data = np.array([1,2,np.nan,4,5,np.nan,7])

# Checking for NAN values
nan_count = np.sum(np.isnan(data))
nan_count

2

In [60]:
# Removing NaN values
data_cleaned = data[~np.isnan(data)]
data_cleaned

array([1., 2., 4., 5., 7.])

In [61]:
# Handling missing missing values

# Calculate mean values excluding all nan
mean = np.nanmean(data)
# replacing nan values with mean value
data_filled = np.where(np.isnan(data), mean, data)
data_filled

array([1. , 2. , 3.8, 4. , 5. , 3.8, 7. ])

# Suggested Readings
`NumPy official documentation` https://numpy.org/doc/stable/user/absolute_beginners.html

# Thank you
© [Dataque Academy](https://www.facebook.com/dataque.academy)