# Lesson 2.1: NumPy Arrays vs Python Lists

## Why Does NumPy Exist?

You already know Python lists - they're flexible and hold mixed types. But for ML, we need **speed**.

When doing ML, you crunch **millions of numbers**. Python lists are slow because:
1. Lists store pointers to objects scattered in memory
2. Lists allow mixed types, so Python checks types at every operation
3. No built-in vectorized math - you need loops

### PHP/Laravel Parallel
Think of NumPy arrays as PHP arrays where **all elements MUST be the same type**, and math operations happen on ALL elements at once. Imagine if you could do `$prices * 1.1` in PHP and every price got multiplied - that's NumPy!

In [None]:
import numpy as np

# First, let's see WHY NumPy exists - SPEED
import time

size = 1_000_000  # 1 million numbers

# Python list approach
py_list = list(range(size))
start = time.time()
py_result = [x * 2 for x in py_list]  # Need a loop!
py_time = time.time() - start

# NumPy approach  
np_array = np.arange(size)
start = time.time()
np_result = np_array * 2  # No loop needed!
np_time = time.time() - start

print(f"Python list: {py_time:.4f} seconds")
print(f"NumPy array: {np_time:.4f} seconds")
print(f"NumPy is {py_time/np_time:.0f}x faster! ðŸš€")

## Creating Arrays

The most basic way: convert a Python list to a NumPy array.

In [None]:
# From a Python list
# PHP: $prices = [10.5, 20.3, 15.7];
prices = np.array([10.5, 20.3, 15.7])
print("Prices:", prices)
print("Type:", type(prices))  # numpy.ndarray

In [None]:
# ALL elements must be the same type - NumPy converts automatically
mixed = np.array([1, 2, 3.5])  # int + float â†’ all become float
print(mixed)        # [1.  2.  3.5]
print(mixed.dtype)  # float64

## Array Attributes - Know Your Data

Every array has properties that tell you about its structure.

In [None]:
# 1D array (like a single column of data)
temps = np.array([22.5, 23.1, 21.8, 24.0, 22.9])

print(f"Shape: {temps.shape}")    # (5,) - 5 elements, 1 dimension
print(f"Dimensions: {temps.ndim}") # 1
print(f"Size: {temps.size}")       # 5 total elements
print(f"Data type: {temps.dtype}") # float64

In [None]:
# 2D array (like a table / spreadsheet)
# Think: rows of sensor readings
# Each row = one reading, columns = [temperature, humidity, pressure]
sensor_data = np.array([
    [22.5, 65.0, 1013.2],
    [23.1, 62.3, 1012.8],
    [21.8, 70.1, 1014.0]
])

print(f"Shape: {sensor_data.shape}")    # (3, 3) - 3 rows, 3 columns
print(f"Dimensions: {sensor_data.ndim}") # 2
print(f"Size: {sensor_data.size}")       # 9 total elements
print()
print("The data:")
print(sensor_data)

## Handy Array Creators

You don't always create arrays from lists. NumPy has shortcuts.

In [None]:
# np.zeros - array of all zeros (useful for initializing)
zeros = np.zeros(5)
print("Zeros:", zeros)  # [0. 0. 0. 0. 0.]

# np.ones - array of all ones
ones = np.ones((2, 3))  # 2 rows, 3 columns
print("Ones (2x3):")
print(ones)

In [None]:
# np.arange - like Python's range() but returns an array
# PHP: range(0, 10) gives [0,1,2,...,9]
sequence = np.arange(0, 10, 2)  # start, stop, step
print("Arange:", sequence)  # [0 2 4 6 8]

# np.linspace - evenly spaced numbers between start and end
# Great for plotting! "Give me 5 points between 0 and 1"
even_spread = np.linspace(0, 1, 5)
print("Linspace:", even_spread)  # [0.   0.25 0.5  0.75 1.  ]

In [None]:
# np.random - random numbers (very useful in ML!)
np.random.seed(42)  # For reproducible results

# Random floats between 0 and 1
random_floats = np.random.rand(5)
print("Random floats:", random_floats)

# Random integers
random_ints = np.random.randint(1, 100, size=5)  # 5 random ints from 1-99
print("Random ints:", random_ints)

# Normal distribution (bell curve) - super common in ML
# mean=0, std=1, 5 values
normal = np.random.randn(5)
print("Normal dist:", normal)

## Reshaping Arrays

ML models often need data in specific shapes. Reshaping is your friend.

In [None]:
# Start with a 1D array
flat = np.arange(12)
print("Flat:", flat)
print("Shape:", flat.shape)  # (12,)

# Reshape to 3 rows x 4 columns
table = flat.reshape(3, 4)
print("\nReshaped (3x4):")
print(table)
print("Shape:", table.shape)  # (3, 4)

# Use -1 to let NumPy figure out one dimension
auto = flat.reshape(4, -1)  # 4 rows, NumPy calculates columns
print("\nAuto-shaped (4x?):")
print(auto)
print("Shape:", auto.shape)  # (4, 3)

## Quick Comparison: List vs Array

| Feature | Python List | NumPy Array |
|---------|-------------|-------------|
| Mixed types | Yes | No (all same type) |
| Math on all elements | Need loop | Just `arr * 2` |
| Speed (1M elements) | Slow | 10-100x faster |
| Memory | More | Less |
| Use when | General purpose | Numerical/ML work |

## Exercise: Try It Yourself!

1. Create a NumPy array of your last 7 days' screen time hours
2. Check its shape, dtype, and size
3. Create a 3x3 array of zeros, then a 3x3 array of ones
4. Generate 10 random TDS readings between 150 and 900 (hint: `np.random.randint`)
5. Create an array with `np.linspace` of 10 evenly spaced values from 0 to 100

In [None]:
# YOUR CODE HERE

# 1. Screen time hours
# screen_time = np.array([...])

# 2. Check attributes
# print(f"Shape: {screen_time.shape}")

# 3. Zeros and ones (3x3)
# zeros_3x3 = 
# ones_3x3 = 

# 4. Random TDS readings
# tds_readings = 

# 5. Linspace 0 to 100
# even_values = 