## **Understanding Numpy**

* Numpy (Numerical Python) is the foundation library for scientific computing in Python. It provides a powerful N-dimesional array object and tools for working with these arrays.
* Think of Numpy as the engine that powers most data science libraries - pandas uses Numpy arrays internally, scikit-learn expects Numpy arrays for machine learning, and matplotlib uses Numpy for plotting.
* NumPy operations are implemented in C, making them 10-100x faster than pure Python
* Numpy arrays store data more compactly than Python lists
* Vectorization: Perform operations on entire arrays without writing loops
* Work with arrays of different shapes seamlessly
* Foundation for pandas, scikit-learn, matplotlib, and more

In [1]:
# import all necessary libaries
import numpy as np 
import matplotlib.pyplot as plt 
import time

# check Numpy version
print(f"NumPy version: {np.__version__}")

# Display settings for cleaner output
np.set_printoptions(precision=3, suppress=True)

NumPy version: 2.3.2


We import Numpy with the standard alias `np`. The print options make arrays display more readably by limiting decimal places and avoding scientic notation for numbers like 0.001.

**Numpy Data Structure**
* Numpy arrays are fundamentally different from Python lists:
* Homogenous: All elements must be the same data type
* Fixed size: Size is determined at creation (though you can create new arrays)
* More efficient: Elements are stored in contiguous memory blocks
* Vectorized operations: Mathematical operations work on entire arrays

**Creating Numpy Arrays**

In [5]:
# Creating arrays from Python lists
# 1D array: A simple sequences of numbers
arr1d = np.array([1, 2, 3, 4, 5])

# 2D array: Think of this as a matrix or table with rows and columns
arr2d = np.array([[1, 2, 3],
                  [4, 5, 6]])

# 3D array: Like a stack of 2D arrays - useful for images, time series, etc.
arr3d = np.array([[[1, 2], [3, 4]],
                  [[5, 6], [7, 8]]])

print("1D array:", arr1d)
print("2D array:\n", arr2d)
print("3D array:\n", arr3d)

1D array: [1 2 3 4 5]
2D array:
 [[1 2 3]
 [4 5 6]]
3D array:
 [[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]


When you pass a nested list to np.array(), NumPy automatically determines the dimensions. The 1D array is like a single row, 2D is like a spreadsheet, and 3D is like multiple spreadsheets stacked together.

**Creating Special Arrays in Numpy**

In [4]:
# Creating arrays filled with zeros - useful for initializing arrays
# Shape (3, 4) means 3 rows and 4 columns
zeros = np.zeros((3, 4))

# Creating arrays filled with ones = often used as starting points
ones = np.ones((2, 3, 4))       # 3D array: 2 layers, 3 rows, 4 columns

# Empty array - faster than zeros/ones but contains random values
# Use when you'll immediatly fill the array with real data
empty = np.empty((2, 2))

print("Zeros array (3x4): \n", zeros)
print("Ones array shape: ", ones.shape)
print("Empty array (contains random values):\n", empty)

Zeros array (3x4): 
 [[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
Ones array shape:  (2, 3, 4)
Empty array (contains random values):
 [[0. 0.]
 [0. 0.]]


`zeros()` and `ones()` are memory-efficient ways to create arrays of specific sizes. `empty()` is fastest but contains garbage values, so only use it when you'll immediately overwrite the contents.

In [None]:
# Range arrays - like Python's range() but more powerful
range_arr = np.arange(0, 10, 2)     # start, stop, step: [0, 2, 4, 5, 8]
print("Range array:", range_arr)

# Linearly spaced arrays - divide a range into equal parts
# From 0 to 1 with exactly 5 points (including endpoints)
linspace_arr = np.linspace(0, 1, 5)
print("Linspace array:", linspace_arr)

# Logarithmically spaced arrays - useful for scientific data
# From 10^0 to 10^2 (1 to 100) with 5 points
logspace_arr = np.logspace(0, 2, 5)
print("Logspace array:", logspace_arr)

Range array: [0 2 4 6 8]
Linspace array: [0.   0.25 0.5  0.75 1.  ]
Logspace array: [  1.      3.162  10.     31.623 100.   ]


* `arange()` works like Python's range() but returns a Numpy array and works with floats
* `linspace()` divides a range into equal segments - useful for plotting smooth curves. 
* `logspace()` creates points that are evenly spaced on a logarithmic scale

In [7]:
# Identify matrix - diagonal of ones, zero elsewhere
# Essential for linear algebra operations
identity = np.eye(4)            # 4x4 identity matrix

# Diagonal matrix - put values on the diagonal
diagonal = np.diag([1, 2, 3, 4])

# Array filled with a specific value
full_arr = np.full((3, 3), 7)   # 3x3 array filled with 7

print("Identify matrix:\n", identity)
print("Diagonal matrix:\n", diagonal)
print("Full array (filled with 7):\n", full_arr)

Identify matrix:
 [[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]
Diagonal matrix:
 [[1 0 0 0]
 [0 2 0 0]
 [0 0 3 0]
 [0 0 0 4]]
Full array (filled with 7):
 [[7 7 7]
 [7 7 7]
 [7 7 7]]


Identity matrices are crucial in linear algebra - multiplying any matrix by identity matrix returns the original matrix. Diagonal matrices are useful for scaling operations.

**Numpy Data Types (dtypes)**
* Understanding data types is crucial for memory efficiency and numerical precision.

In [None]:
# Explicit data types - control memory usage and precision
int_arr = np.array([1, 2, 3], dtype=np.int32)       # 32-bit integers
float_arr = np.array([1, 2, 3], dtype=np.float64)   # 64-bit floats (double precision)
bool_arr = np.array([True, False, True], dtype=np.bool_)        # Boolean values

# Type conversion - change dtype of existing array
converted = int_arr.astype(np.float32)      # convert to 32-bit float

print("Integer array dtype:", int_arr.dtype)
print("Float array dtype:", float_arr.dtype)
print("Boolean array dtype:", bool_arr.dtype)
print("Converted array dtype:", converted.dtype)

# Memory usage comparison
print(f"int32 uses {int_arr.itemsize} bytes per element")
print(f"float64 uses {float_arr.itemsize} bytes per element")

Integer array dtype: int32
Float array dtype: float16
Boolean array dtype: bool
Converted array dtype: float32
int32 uses 4 bytes per element
float64 uses 2 bytes per element


* int8: 1byte, range - 128 to 127
* int32: 4bytes, range +- 2billion
* float32: 4 bytes, ~7 decimal digits precision
* float64: 8 bytes, ~15 decimal digits precision
* Choose smaller types to save memory, larger types for precision

**Array Properties & Attributes**
* Understanding array properties helps you work effectively with your data and debug issue

In [15]:
# Create a sample 3D array for demonstration
# Think of this as 3 layers, each with 4 rows and 5 columns
arr = np.random.randn(3, 4, 5)

# Shape: The dimensions of the array (layers, rows, colums)
print("Shape:", arr.shape)          # Output:  (3, 4, 5)

# Size: Total number of elements (3 x 4 x 5 = 60)
print("Size:", arr.size)

# Ndim: Number of dimensions (3D in this case)
print("Ndim:", arr.ndim)

# Dtype: Data type of elements
print("Dtype:", arr.dtype)          # Usually float64 for random numbers

# Itemsize: Memory size of each element in bytes
print("Itemsize:", arr.itemsize)        # 8 bytes for float64

# Total memory usage in bytes
print("Memory usage:", arr.nbytes, "bytes")     # size x itemsize
print("Memory usage:", arr.nbytes / 1024, "KB")     # Convert to KB

Shape: (3, 4, 5)
Size: 60
Ndim: 3
Dtype: float64
Itemsize: 8
Memory usage: 480 bytes
Memory usage: 0.46875 KB


These properties are essential for understanding your data's structure and memory requirements. Large datasets require careful attention to memory usage.

**Array Indexing & Slicing**

**Basic Indexing - Accessing Individual Elements**
* NumPy indexing is similar to Python lists but more powerful for multi-dimensional arrays

In [16]:
# 1D array indexing - similar to Python lists
arr1d = np.array([10, 20, 30, 40, 50])

print("First element:", arr1d[0])       # Index 0: 10
print("Large element:", arr1d[-1])      # Negative indexing: 50
print("Slice [1:4]", arr1d[1:4])        # Elements 1, 2, 3: [20, 30, 40]
print("Every 2nd element:", arr1d[::2]) # step of 2: [10, 30, 50]

First element: 10
Large element: 50
Slice [1:4] [20 30 40]
Every 2nd element: [10 30 50]
