<h3><a href="https://themlpath.com">< The ML Path</a></h3>

<h1>NumPy</h1>

## Introduction to NumPy

NumPy (Numerical Python) is a fundamental package for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.

### Why NumPy is Needed

- **Performance**: NumPy arrays are more efficient than Python lists because they require less memory and allow faster computation.
- **Vectorization**: NumPy allows for vectorized operations, meaning operations that can be applied element-wise without looping. This makes code cleaner and faster.
- **Convenience**: It provides many built-in functions for linear algebra, random number generation, and basic array operations, which makes it easier to work with data at scale.

### Difference Between NumPy Arrays and Python Lists

1. **Size**: NumPy arrays are more memory efficient than Python lists.
2. **Speed**: Operations on NumPy arrays are faster due to optimized, compiled C code running behind the scenes.
3. **Functionality**: NumPy arrays come with a wide variety of mathematical and statistical functions, while Python lists are limited in built-in functionality.

Let’s explore these differences by example.


In [21]:
# Install numpy

!pip install numpy


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Comparing NumPy Arrays and Python Lists

In [22]:
import numpy as np

# Memory Efficiency
import sys

# Memory usage of Python list
python_list = list(range(1000))
print("Python list memory size:", sys.getsizeof(python_list), "bytes")
print()

# Memory usage of NumPy array
numpy_array = np.arange(1000)
print("NumPy array memory size:", numpy_array.nbytes, "bytes")
print()

Python list memory size: 8056 bytes

NumPy array memory size: 8000 bytes



In [23]:
# Speed Efficiency
import time

# Creating a large list and array
size = 1000000
python_list = list(range(size))
numpy_array = np.arange(size)

# Timing sum operation for Python list
start = time.time()
sum_list = sum(python_list)
print("Python list sum time:", time.time() - start, "seconds")
print()

# Timing sum operation for NumPy array
start = time.time()
sum_array = np.sum(numpy_array)
print("NumPy array sum time:", time.time() - start, "seconds")
print()

Python list sum time: 0.008683443069458008 seconds

NumPy array sum time: 0.001028299331665039 seconds



In [24]:
# Creating Arrays
# Creating a 1D array
arr_1d = np.array([1, 2, 3, 4, 5])
print("1D Array:", arr_1d)
print()

# Creating a 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print("2D Array:\n", arr_2d)
print()

1D Array: [1 2 3 4 5]

2D Array:
 [[1 2 3]
 [4 5 6]]



## More Ways of Creating NumPy Arrays

In [25]:
# 1. Creating an array with a range of values using arange()
arr_range = np.arange(0, 10, 2)  # Start at 0, go up to 10, step by 2
print("Array using arange():", arr_range)
print()

# 2. Creating an array with evenly spaced values using linspace()
arr_linspace = np.linspace(0, 1, 5)  # 5 equally spaced values between 0 and 1
print("Array using linspace():", arr_linspace)
print()

Array using arange(): [0 2 4 6 8]

Array using linspace(): [0.   0.25 0.5  0.75 1.  ]



In [26]:
# 3. Creating an array filled with zeros
arr_zeros = np.zeros((3, 3))  # 3x3 array filled with zeros
print("Array of zeros:\n", arr_zeros)
print()

# 4. Creating an array filled with ones
arr_ones = np.ones((2, 4))  # 2x4 array filled with ones
print("Array of ones:\n", arr_ones)
print()

# 5. Creating an identity matrix using eye()
arr_eye = np.eye(4)  # 4x4 identity matrix
print("Identity matrix:\n", arr_eye)
print()

Array of zeros:
 [[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]

Array of ones:
 [[1. 1. 1. 1.]
 [1. 1. 1. 1.]]

Identity matrix:
 [[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]



In [27]:
# 6. Creating an array of random values between 0 and 1
arr_random = np.random.rand(3, 2)  # 3x2 array of random floats between 0 and 1
print("Random array:\n", arr_random)
print()

# 7. Creating an array of random integers
arr_randint = np.random.randint(0, 100, (3, 3))  # 3x3 array of random integers between 0 and 100
print("Random integer array:\n", arr_randint)
print()

# 8. Creating an array of a constant value
arr_full = np.full((2, 2), 7)  # 2x2 array filled with the value 7
print("Array of constant value:\n", arr_full)
print()

Random array:
 [[0.3108685  0.50410609]
 [0.47024816 0.94875341]
 [0.008756   0.52971357]]

Random integer array:
 [[32 35 97]
 [ 2 84  5]
 [22 40  1]]

Array of constant value:
 [[7 7]
 [7 7]]



In [28]:
# 9. Creating an array of equally spaced values (logarithmic scale) using logspace()
arr_logspace = np.logspace(0, 3, 4)  # 4 values between 10^0 and 10^3
print("Array using logspace():", arr_logspace)
print()

# 10. Creating an uninitialized array (for performance)
arr_empty = np.empty((3, 3))  # 3x3 array with uninitialized values (whatever happens to be in memory)
print("Uninitialized array (empty):\n", arr_empty)
print()


Array using logspace(): [   1.   10.  100. 1000.]

Uninitialized array (empty):
 [[4.79e-322 4.89e-322 3.21e-322]
 [2.27e-322 1.63e-322 3.71e-322]
 [2.96e-322 3.01e-322 4.00e-322]]



## Array Indexing and Slicing in NumPy

In [29]:
# Creating a sample array
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("Original Array:\n", arr)
print()

# 1. Basic Indexing
print("Basic Indexing Examples:")
print("Element at row 0, column 1:", arr[0, 1])  # Accessing an element
print("First row:", arr[0])  # Accessing the first row
print("First column:", arr[:, 0])  # Accessing the first column
print()

# 2. Slicing
print("Slicing Examples:")
print("First two rows and columns:\n", arr[:2, :2])  # Accessing first 2 rows and 2 columns
print("Last row:", arr[-1, :])  # Accessing the last row
print("Elements in the middle (2nd row, 2nd column):", arr[1:2, 1:2])  # Slicing middle element
print()



Original Array:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]

Basic Indexing Examples:
Element at row 0, column 1: 2
First row: [1 2 3]
First column: [1 4 7]

Slicing Examples:
First two rows and columns:
 [[1 2]
 [4 5]]
Last row: [7 8 9]
Elements in the middle (2nd row, 2nd column): [[5]]



In [31]:
# 3. Boolean Indexing
print("Boolean Indexing Example:")
print("Elements greater than 5:\n", arr[arr > 5])  # Get all elements greater than 5
print()

# 4. Fancy Indexing (using arrays for indexing)
print("Fancy Indexing Example:")
row_indices = [0, 2]  # Get rows 0 and 2
col_indices = [1, 2]  # Get columns 1 and 2
print("Selected elements (0,1) and (2,2):", arr[row_indices, col_indices])  # Get specific elements
print("Select rows using fancy indexing:\n", arr[[0, 2]])  # Select multiple rows
print()


Boolean Indexing Example:
Elements greater than 5:
 [100  10  20   6   7  30   9]

Fancy Indexing Example:
Selected elements (0,1) and (2,2): [10  9]
Select rows using fancy indexing:
 [[100  10   3]
 [  7  30   9]]



In [32]:
# 5. Modifying Elements with Indexing
print("Modifying Elements:")
arr[0, 0] = 100  # Modify a single element
print("Modified Array:\n", arr)
arr[:, 1] = [10, 20, 30]  # Modify an entire column
print("Modified Array (2nd column):\n", arr)
print()

# 6. Indexing with steps (strides)
print("Slicing with steps:")
print("Every other element from the first row:", arr[0, ::2])  # Access every second element in the first row
print("Every other row:\n", arr[::2, :])  # Access every second row
print()

Modifying Elements:
Modified Array:
 [[100  10   3]
 [  4  20   6]
 [  7  30   9]]
Modified Array (2nd column):
 [[100  10   3]
 [  4  20   6]
 [  7  30   9]]

Slicing with steps:
Every other element from the first row: [100   3]
Every other row:
 [[100  10   3]
 [  7  30   9]]



## NumPy Operations

In [44]:
# Creating sample arrays
arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([5, 6, 7, 8])
arr3 = np.array([[1, 2], [3, 4]])

print("Array 1:", arr1)
print("Array 2:", arr2)
print("Array 3:\n", arr3)
print()

# 1. Arithmetic Operations
print("Arithmetic Operations:")
print("Addition:", arr1 + arr2)
print("Subtraction:", arr1 - arr2)
print("Multiplication:", arr1 * arr2)
print("Division:", arr1 / arr2)
print()


Array 1: [1 2 3 4]
Array 2: [5 6 7 8]
Array 3:
 [[1 2]
 [3 4]]

Arithmetic Operations:
Addition: [ 6  8 10 12]
Subtraction: [-4 -4 -4 -4]
Multiplication: [ 5 12 21 32]
Division: [0.2        0.33333333 0.42857143 0.5       ]



In [43]:
# 2. Scalar Operations
print("Scalar Operations:")
print("Array 1 multiplied by 2:", arr1 * 2)  # Broadcasting a scalar value
print("Array 2 divided by 2:", arr2 / 2)
print()

# 3. Aggregation Functions
print("Aggregation Functions:")
print("Sum of Array 1:", np.sum(arr1))
print("Mean of Array 1:", np.mean(arr1))
print("Max value in Array 2:", np.max(arr2))
print("Min value in Array 2:", np.min(arr2))
print("Standard Deviation of Array 1:", np.std(arr1))
print()

Scalar Operations:
Array 1 multiplied by 2: [2 4 6 8]
Array 2 divided by 2: [2.5 3.  3.5 4. ]

Aggregation Functions:
Sum of Array 1: 10
Mean of Array 1: 2.5
Max value in Array 2: 8
Min value in Array 2: 5
Standard Deviation of Array 1: 1.118033988749895



In [34]:
# 4. Dot Product and Matrix Multiplication
print("Dot Product:")
print("Dot product of Array 1 and Array 2:", np.dot(arr1, arr2))  # 1D arrays dot product
print()

print("Matrix Multiplication:")
arr4 = np.array([[5, 6], [7, 8]])
print("Matrix multiplication of Array 3 and another 2x2 array:\n", np.dot(arr3, arr4))  # Matrix multiplication
print()

# 5. Transpose of a Matrix
print("Transpose of Array 3:")
print(arr3.T)
print()

# 6. Element-wise Functions
print("Element-wise Functions:")
print("Square root of Array 1:", np.sqrt(arr1))
print("Exponential of Array 1:", np.exp(arr1))
print("Logarithm of Array 2:", np.log(arr2))
print()

Dot Product:
Dot product of Array 1 and Array 2: 70

Matrix Multiplication:
Matrix multiplication of Array 3 and another 2x2 array:
 [[19 22]
 [43 50]]

Transpose of Array 3:
[[1 3]
 [2 4]]

Element-wise Functions:
Square root of Array 1: [1.         1.41421356 1.73205081 2.        ]
Exponential of Array 1: [ 2.71828183  7.3890561  20.08553692 54.59815003]
Logarithm of Array 2: [1.60943791 1.79175947 1.94591015 2.07944154]



In [35]:
# 7. Broadcasting
print("Broadcasting Example:")
arr5 = np.array([[1, 2, 3], [4, 5, 6]])
scalar = 2
print("Array 5:\n", arr5)
print("Array 5 multiplied by scalar 2 (broadcasting):\n", arr5 * scalar)
print()

# 8. Comparison Operations
print("Comparison Operations:")
print("Array 1 > 2:", arr1 > 2)  # Element-wise comparison
print("Array 2 == 6:", arr2 == 6)
print()

# 9. Summing along Axes
print("Summing along Axes:")
print("Sum of Array 3 along axis 0 (columns):", np.sum(arr3, axis=0))  # Sum of each column
print("Sum of Array 3 along axis 1 (rows):", np.sum(arr3, axis=1))  # Sum of each row
print()

Broadcasting Example:
Array 5:
 [[1 2 3]
 [4 5 6]]
Array 5 multiplied by scalar 2 (broadcasting):
 [[ 2  4  6]
 [ 8 10 12]]

Comparison Operations:
Array 1 > 2: [False False  True  True]
Array 2 == 6: [False  True False False]

Summing along Axes:
Sum of Array 3 along axis 0 (columns): [4 6]
Sum of Array 3 along axis 1 (rows): [3 7]



## Stacking and Concatenating Arrays

In [None]:
arr_a = np.array([1, 2, 3])
arr_b = np.array([4, 5, 6])

# Vertical stacking (along rows)
arr_vstack = np.vstack((arr_a, arr_b))
print("Vertical Stacking:\n", arr_vstack)

# Horizontal stacking (along columns)
arr_hstack = np.hstack((arr_a, arr_b))
print("Horizontal Stacking:\n", arr_hstack)

# Concatenating along a specific axis
arr_concat = np.concatenate((arr_a, arr_b), axis=0)  # For 1D arrays, axis=0 is the only option
print("Concatenated Array:\n", arr_concat)
print()

Vertical Stacking:
 [[1 2 3]
 [4 5 6]]
Horizontal Stacking:
 [1 2 3 4 5 6]
Concatenated Array:
 [1 2 3 4 5 6]



## Reshaping Arrays

In [None]:

arr = np.arange(1, 13)  # Array with values from 1 to 12
print("Original Array:", arr)

# Reshape to 3x4 matrix
arr_reshaped = arr.reshape(3, 4)
print("Reshaped to 3x4 matrix:\n", arr_reshaped)

# Reshape back to 1D array
arr_flatten = arr_reshaped.flatten()
print("Flattened array:\n", arr_flatten)
print()

Original Array: [ 1  2  3  4  5  6  7  8  9 10 11 12]
Reshaped to 3x4 matrix:
 [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
Flattened array:
 [ 1  2  3  4  5  6  7  8  9 10 11 12]



## Copying Arrays

In [13]:


# Shallow copy (view)
arr_copy = arr_reshaped.view()
arr_copy[0, 0] = 100  # Modifying the view will affect the original array
print("Modified Copy (View):\n", arr_copy)
print("Original Array after modifying the copy:\n", arr_reshaped)
print()

# Deep copy (independent copy)
arr_deep_copy = arr_reshaped.copy()
arr_deep_copy[0, 0] = 200  # Modifying the deep copy will NOT affect the original array
print("Modified Deep Copy:\n", arr_deep_copy)
print("Original Array remains unchanged:\n", arr_reshaped)
print()

Modified Copy (View):
 [[100   2   3   4]
 [  5   6   7   8]
 [  9  10  11  12]]
Original Array after modifying the copy:
 [[100   2   3   4]
 [  5   6   7   8]
 [  9  10  11  12]]

Modified Deep Copy:
 [[200   2   3   4]
 [  5   6   7   8]
 [  9  10  11  12]]
Original Array remains unchanged:
 [[100   2   3   4]
 [  5   6   7   8]
 [  9  10  11  12]]



## Advanced Array Functions

In [17]:

# Sorting arrays
arr_unsorted = np.array([3, 1, 2, 5, 4])
arr_sorted = np.sort(arr_unsorted)
print("Unsorted Array:", arr_unsorted)
print("Sorted Array:", arr_sorted)
print()

# Unique values in an array
arr_with_duplicates = np.array([1, 2, 2, 3, 4, 4, 5])
arr_unique = np.unique(arr_with_duplicates)
print("Array with Duplicates:", arr_with_duplicates)
print("Unique Values:", arr_unique)
print()

Unsorted Array: [3 1 2 5 4]
Sorted Array: [1 2 3 4 5]

Array with Duplicates: [1 2 2 3 4 4 5]
Unique Values: [1 2 3 4 5]



## Linear Algebra Operations

In [16]:

# Matrix inversion
arr_square = np.array([[1, 2], [3, 4]])
arr_inv = np.linalg.inv(arr_square)
print("Inverse of the matrix:\n", arr_inv)
print()

# Determinant of a matrix
det = np.linalg.det(arr_square)
print("Determinant of the matrix:", det)
print()

# Eigenvalues and Eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(arr_square)
print("Eigenvalues:\n", eigenvalues)
print("Eigenvectors:\n", eigenvectors)
print()

Inverse of the matrix:
 [[-2.   1. ]
 [ 1.5 -0.5]]

Determinant of the matrix: -2.0000000000000004

Eigenvalues:
 [-0.37228132  5.37228132]
Eigenvectors:
 [[-0.82456484 -0.41597356]
 [ 0.56576746 -0.90937671]]



## Handling Missing Data

In [19]:

arr_nan = np.array([1, 2, np.nan, 4, 5])

# Check for NaN values
print("Array with NaN:", arr_nan)
print("Is NaN:", np.isnan(arr_nan))
print()

# Replace NaN values with a specific value
arr_nan_replaced = np.nan_to_num(arr_nan, nan=-1)
print("NaN replaced with -1:", arr_nan_replaced)
print()

Array with NaN: [ 1.  2. nan  4.  5.]
Is NaN: [False False  True False False]

NaN replaced with -1: [ 1.  2. -1.  4.  5.]



## Broadcasting in Detail

In [36]:

# Broadcasting a smaller array across a larger array
arr_large = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr_small = np.array([1, 0, 1])

# Broadcasting the smaller array across rows
print("Result of broadcasting smaller array across larger array rows:\n", arr_large + arr_small)
print()

Result of broadcasting smaller array across larger array rows:
 [[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]]



## Masked Arrays

In [37]:

import numpy.ma as ma

# Creating a masked array
arr = np.array([1, 2, -999, 4, 5])
arr_masked = ma.masked_array(arr, mask=[0, 0, 1, 0, 0])  # Masking the value -999
print("Masked Array:", arr_masked)

# Masking conditionally
arr_masked_conditional = ma.masked_where(arr < 0, arr)
print("Conditionally Masked Array:", arr_masked_conditional)
print()

Masked Array: [1 2 -- 4 5]
Conditionally Masked Array: [1 2 -- 4 5]



## Memory Mapping Large Files

In [38]:

# Example: Creating a memory-mapped file
arr = np.memmap('large_array.dat', dtype='float32', mode='w+', shape=(1000, 1000))

# Writing data to the memory-mapped array
arr[:] = np.random.rand(1000, 1000)
print("Memory-mapped array created and partially written.")
print()

Memory-mapped array created and partially written.



## Structured Arrays and Record Arrays

In [39]:

# Creating a structured array
data = np.array([(1, 'Alice', 25.5), (2, 'Bob', 30.3)],
                dtype=[('id', 'i4'), ('name', 'U10'), ('age', 'f4')])

print("Structured Array:\n", data)
print("Accessing by field name (age):", data['age'])
print()

Structured Array:
 [(1, 'Alice', 25.5) (2, 'Bob', 30.3)]
Accessing by field name (age): [25.5 30.3]



## Random Sampling

In [40]:

# Random sampling from normal distribution
samples = np.random.normal(loc=0.0, scale=1.0, size=5)
print("Random samples from normal distribution:", samples)

# Random permutation of an array
arr = np.arange(10)
print("Original Array:", arr)
np.random.shuffle(arr)
print("Shuffled Array:", arr)
print()

Random samples from normal distribution: [ 0.55365366 -0.04078649 -0.85612433 -0.91445941 -2.13225697]
Original Array: [0 1 2 3 4 5 6 7 8 9]
Shuffled Array: [5 0 9 7 8 6 1 4 3 2]



## Universal Functions (ufuncs)

In [41]:

# Applying ufuncs to arrays
arr = np.array([1, 2, 3, 4])
print("Original Array:", arr)
print("Sine of each element:", np.sin(arr))
print("Exponent of each element:", np.exp(arr))
print()

Original Array: [1 2 3 4]
Sine of each element: [ 0.84147098  0.90929743  0.14112001 -0.7568025 ]
Exponent of each element: [ 2.71828183  7.3890561  20.08553692 54.59815003]



## Custom Universal Functions

In [42]:

# Custom function
def my_func(x, y):
    return x + y

# Vectorizing the function
vec_func = np.vectorize(my_func)
print("Vectorized addition:", vec_func([1, 2, 3], [4, 5, 6]))
print()

Vectorized addition: [5 7 9]

