# Week 5, Day 20: Introduction to NumPy
### The Bedrock of Data Science in Python

**Session Date:** July 28th, 2025

**Focus:** Today marks our transition from core Python programming to the specialized libraries that power data science. We begin with **NumPy (Numerical Python)**, the most fundamental package for scientific computing in Python.

## 1. Why NumPy? The Python Data Science Ecosystem

Python's power comes from its vast ecosystem of libraries. Think of Python as a university and its data science libraries as specialized colleges. NumPy is the foundational college of engineering and mathematics that most other scientific colleges are built upon.

```mermaid
graph TD
    subgraph "Python University"
        A[Python Core Language]
    end

    subgraph "Specialized Colleges"
        B(NumPy - Numerical Computing)
        C(Pandas - Data Manipulation)
        D(Matplotlib - Visualization)
        E(Seaborn - Advanced Visualization)
    end
    A --> B & C & D & E
```

### Python Lists vs. NumPy Arrays

While Python lists are flexible, NumPy arrays are designed and optimized for numerical tasks, offering significant advantages in performance and functionality.

| Feature | Python Lists | NumPy Arrays |
| :--- | :--- | :--- |
| **Data Type** | Heterogeneous (mixed types) | **Homogeneous** (single type) |
| **Performance** | Slower | **Faster** (uses C in the background) |
| **Memory** | More memory-intensive | Memory-efficient |
| **Primary Use** | General Storage | **Numerical Calculations** |

## 2. Setting Up and Creating NumPy Arrays

First, we import the NumPy library, following the standard convention of using the alias `np`.

In [2]:
import numpy as np

We can create arrays from various Python collection types.

In [3]:
# Create an array from a Python list
arr_from_list = np.array([1, 2, 3, 4, 5])
print(f"Array from list: {arr_from_list}")

# Create an array from a Python tuple
arr_from_tuple = np.array((10, 20, 30, 40, 50))
print(f"Array from tuple: {arr_from_tuple}")

# Create an array using the range() function
arr_from_range = np.array(range(5))
print(f"Array from range: {arr_from_range}")

Array from list: [1 2 3 4 5]
Array from tuple: [10 20 30 40 50]
Array from range: [0 1 2 3 4]


## 3. The `ndarray` and its Attributes

The core of NumPy is the **n-dimensional array** (`ndarray`). We can inspect its properties using several key attributes. This is the first step in understanding any new dataset you encounter.

In [4]:
arr = np.array([[1, 2, 3], [4, 5, 6]])

print(f"The array is:\n{arr}\n")

# Get the number of dimensions (or axes)
print(f"Dimensions (ndim): {arr.ndim}")

# Get the shape (size of each dimension)
print(f"Shape (shape): {arr.shape}")

# Get the total number of elements
print(f"Size (size): {arr.size}")

# Get the data type of the elements
print(f"Data Type (dtype): {arr.dtype}")

The array is:
[[1 2 3]
 [4 5 6]]

Dimensions (ndim): 2
Shape (shape): (2, 3)
Size (size): 6
Data Type (dtype): int64


## 4. Understanding Dimensions and Shape

This is the most critical concept of the day. The shape of an array tells us its structure.

### 1D Array (Vector)
- A single row of elements.
- **Shape:** `(elements,)`

In [5]:
arr_1d = np.array([10, 20, 30, 40])
print(arr_1d)
print(f"Shape: {arr_1d.shape}")
print(f"Dimensions: {arr_1d.ndim}")

[10 20 30 40]
Shape: (4,)
Dimensions: 1


### 2D Array (Matrix)
- A table of rows and columns.
- **Shape:** `(rows, columns)`

In [None]:
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(arr_2d)
print(f"Shape: {arr_2d.shape}")
print(f"Dimensions: {arr_2d.ndim}")

### 3D Array (Tensor)
- A collection of 2D arrays (layers).
- **Shape:** `(layers, rows, columns)`

In [None]:
arr_3d = np.array([[[1, 1], [2, 2]], [[3, 3], [4, 4]]])
print(arr_3d)
print(f"Shape: {arr_3d.shape}")
print(f"Dimensions: {arr_3d.ndim}")

## 5. Dynamic Creation with `.reshape()`

Manually typing multi-dimensional arrays is tedious and error-prone. A more efficient "dynamic" method is to create a 1D sequence of numbers and then **reshape** it into the desired dimensions.

**The Golden Rule of Reshaping:** The `size` of the old array must equal the product of the new shape's dimensions.
For example, an array of size `24` can be reshaped to `(2, 3, 4)` because `2 * 3 * 4 = 24`.

In [None]:
# Create a 1D array with numbers from 0 to 23
arr_1d_large = np.arange(24) # np.arange is a NumPy-optimized version of range()
print(f"Original 1D array (size={arr_1d_large.size}):\n{arr_1d_large}\n")

# Reshape it into a 3D array of shape (2, 3, 4)
arr_3d_reshaped = arr_1d_large.reshape(2, 3, 4)
print(f"Reshaped 3D array (shape={arr_3d_reshaped.shape}):\n{arr_3d_reshaped}")

In [None]:
# Another example: Reshaping an array of size 12
arr_12 = np.arange(1, 13) # Numbers from 1 to 12

# Reshape to 3 rows, 4 columns
reshaped_3x4 = arr_12.reshape(3, 4)
print(f"Shape (3, 4):\n{reshaped_3x4}\n")

# Reshape to 4 rows, 3 columns
reshaped_4x3 = arr_12.reshape(4, 3)
print(f"Shape (4, 3):\n{reshaped_4x3}")

## 💡 Key Insights & Summary

1.  **Foundation First:** NumPy is the fundamental package for numerical data. Mastering it is non-negotiable for data science.
2.  **Shape is Everything:** The first thing to check for any new data array is its `.shape`. It tells you the structure and dimensionality of your data.
3.  **Efficiency Matters:** Using methods like `np.arange(24).reshape(2, 3, 4)` is the professional and efficient way to generate structured data for analysis and modeling.