# Numpy

NumPy, short for Numerical Python, is a fundamental library for scientific computing in Python. It provides a powerful N-dimensional array object, which serves as the core data structure for efficient numerical computations. NumPy is widely used in data science, machine learning, and scientific research due to its ease of use and performance capabilities.

Compared to the native Python, NumPy is faster, more efficient, and easier to use, since NumPy is implemented in C and optimize storage and operations.

## Key Features

1. N Dimensional Array
2. Vectorization
3. Broadcasting

### N Dimensional Array

The ndarray (n-dimensional array) is the fundamental data structure of NumPy. 

It is a multi-dimensional array that can hold any data type. It is created using the `np.array()` function.

#### What is an Array?

An array is a data structure that stores a fixed-size sequence of elements of the same type.

It provides a way to organize and access multiple values under a single variable name. Each element in an array is identified by its `position` or `index`.

#### Types of Arrays

- 0d Array, aka Scalar
- 1d Array, aka Vector
- 2d Array, aka Matrix
- 3d / Multi Dimensional Array, aka Tensor

In [33]:
# 0d array
import numpy as np
zero_d_arr = np.array(np.random.randint(100))
# print(arr)
# print(arr.ndim)
# print(arr.shape)
# print(arr.size)
# print(arr.dtype)

zero_d_arr

array(90)

In [3]:
# 1d array

one_d_arr = np.array([1, 2, 3, 4, 5])

one_d_arr

array([1, 2, 3, 4, 5])

In [27]:
# 2d array

two_d_arr = np.array(np.random.randint(100, size=(2, 3)))

two_d_arr

array([[93,  5, 53],
       [88, 92, 75]])

In [5]:
# 3d array

three_d_arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

three_d_arr

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

## Vectorization

Numpy allows us to perform mathematical and logical operations on entire arrays, known as vectorized operations.

[![alt text](images/vectorization.png)](https://youtu.be/auZhd2pPtv0?t=160)

In [28]:
one_d_arr + 5

array([ 6,  7,  8,  9, 10])

## Broadcasting

Numpy provides broadcasting, a powerful mechanism that enables operations between arrays of different shapes.

## Data Normalization

Data Normalization is a process that reorganizes data to make it easier to use, query, and analyze.

In NumPy, Data Normalization can be achieved using the `.interp()` method.

In [9]:
import numpy as np

# x = np.random.uniform(low=10, high=1000, size=10)
x = np.array([
    763.59505103, 17.83001676, 267.5175031, 330.46019525, 377.30311704, 987.55159079, 412.13789018, 253.56206665, 776.82909686, 311.93425322
])
print(f"x min: {x.min()}\nx max: {x.max()}\n\n")
# print(f"x value: \n{x}\n\n")

x_interp = np.interp(x, (x.min(), x.max()), (1, 20))
print(f"x interpolated values: \n{x_interp}")

x min: 17.83001676
x max: 987.55159079


x interpolated values: 
[15.61196289  1.          5.89219005  7.12544214  8.04324735 20.
  8.72577387  5.61875766 15.87126089  6.76245867]


The `.interp()` function calculates based on the following formula:

$$
x_{inp} = new\_min + \frac{(x - old\_min) \times (new\_max - new\_min)}{old\_max - old\_min}
$$

In the above example:
new_min = 1
new_max = 20
old_min = 17.83001676
old_max = 987.55159079

for x = 763.59505103:
$$
x_{inp} = 1 + \frac{(763.59505103 - 17.83001676) \times (20 - 1)}{987.55159079 - 17.83001676} = 15.61196289
$$