# Lesson 2.3: Indexing & Slicing

## Selecting Data from Arrays

In Laravel, you select data with `->where()`, `->first()`, `->slice()`. In NumPy, you use **indexing and slicing** - more concise but just as powerful.

This is CRITICAL for ML because you'll constantly need to:
- Select specific rows/columns from datasets
- Filter data based on conditions
- Split data into training and test sets

In [None]:
import numpy as np

## 1D Indexing (same as Python lists)

In [None]:
tds = np.array([35, 42, 55, 78, 92, 110, 85, 65])

print("First reading:",  tds[0])     # 35
print("Last reading:",   tds[-1])    # 65
print("Second to last:", tds[-2])    # 85

# Slicing: array[start:stop:step]
print("First 3:",      tds[:3])     # [35, 42, 55]
print("Last 3:",       tds[-3:])    # [110, 85, 65]
print("Middle:",       tds[2:6])    # [55, 78, 92, 110]
print("Every other:",  tds[::2])    # [35, 55, 92, 85]
print("Reversed:",     tds[::-1])   # [65, 85, 110, 92, 78, 55, 42, 35]

## 2D Indexing: `array[row, col]`

This is where NumPy differs from nested lists. You use `[row, col]` notation.

In [None]:
# Sensor data: 4 readings x 3 features
# Columns: [TDS_output, Flow_rate, Pressure]
data = np.array([
    [45,  2.1, 55],   # Reading 0
    [52,  1.9, 52],   # Reading 1
    [78,  1.5, 48],   # Reading 2
    [110, 0.8, 42],   # Reading 3
])

print("Full table:")
print(data)
print()

# Single element: data[row, col]
print("Reading 0, TDS:", data[0, 0])     # 45
print("Reading 2, Flow:", data[2, 1])     # 1.5

# Entire row (one reading)
print("Reading 3 (all):", data[3])        # [110, 0.8, 42]
# Same as: data[3, :]

# Entire column (one feature across all readings)
print("All TDS values:",  data[:, 0])     # [45, 52, 78, 110]
print("All Flow rates:",  data[:, 1])     # [2.1, 1.9, 1.5, 0.8]

In [None]:
# Slicing in 2D
# First 2 readings, all columns
print("First 2 readings:")
print(data[:2, :])

# All readings, first 2 columns (TDS and Flow only)
print("\nTDS and Flow only:")
print(data[:, :2])

# Rows 1-2, columns 0-1 (a sub-table)
print("\nSub-table:")
print(data[1:3, 0:2])

## Boolean Indexing (Filtering)

This is the most **powerful** feature. Like `array_filter()` in PHP but much cleaner.

```php
// PHP: array_filter($readings, fn($r) => $r > 100);
```
```python
# Python: readings[readings > 100]
```

In [None]:
tds = np.array([35, 42, 55, 78, 92, 110, 85, 65, 130, 48])

# Step 1: Create a boolean mask
mask = tds > 80
print("Mask:", mask)  # [False, False, False, False, True, True, True, False, True, False]

# Step 2: Use mask to filter
high_tds = tds[mask]
print("High TDS readings:", high_tds)  # [92, 110, 85, 130]

# Usually done in one line:
print("Alert readings:", tds[tds > 100])  # [110, 130]

In [None]:
# Multiple conditions: use & (and), | (or), ~ (not)
# IMPORTANT: wrap each condition in parentheses!

# Readings between 50 and 100
moderate = tds[(tds >= 50) & (tds <= 100)]
print("Moderate (50-100):", moderate)

# Readings that are either very low OR very high
extreme = tds[(tds < 40) | (tds > 100)]
print("Extreme:", extreme)

# Everything NOT above 80
safe = tds[~(tds > 80)]
print("Safe readings:", safe)

In [None]:
# Boolean indexing on 2D arrays - filter ROWS based on a column
# "Show me all readings where TDS > 70"
print("Original data:")
print(data)
print()

# data[:, 0] is the TDS column
high_tds_readings = data[data[:, 0] > 70]
print("Readings where TDS > 70:")
print(high_tds_readings)

## Fancy Indexing (Select Specific Indices)

Pick specific elements by passing an array of indices.

In [None]:
readings = np.array([35, 42, 55, 78, 92, 110, 85, 65])

# Pick specific indices
selected = readings[[0, 3, 5]]  # Get readings at positions 0, 3, 5
print("Selected:", selected)    # [35, 78, 110]

# Useful for selecting specific rows from a dataset
print("\nSpecific readings from sensor data:")
print(data[[0, 2]])  # Get rows 0 and 2

## Important: Views vs Copies

**Gotcha alert!** Slicing creates a VIEW (not a copy). Changes to the view affect the original!

In [None]:
original = np.array([10, 20, 30, 40, 50])
print("Original:", original)

# Slicing creates a VIEW (like a reference in PHP)
view = original[:3]
view[0] = 999
print("After modifying view:", original)  # [999, 20, 30, 40, 50] - CHANGED!

# To get an independent COPY:
original = np.array([10, 20, 30, 40, 50])
copy = original[:3].copy()  # .copy() makes it independent
copy[0] = 999
print("After modifying copy:", original)  # [10, 20, 30, 40, 50] - UNCHANGED!

## Exercise: Filter Water Filter Data

Given sensor data from multiple water filters, extract specific information.

In [None]:
# Sensor data: 6 filters x 4 features
# Columns: [TDS_output, Flow_rate, Pressure, Filter_age_days]
np.random.seed(42)
sensors = np.array([
    [35,  2.0, 55, 30],
    [52,  1.8, 50, 90],
    [95,  1.2, 45, 200],
    [120, 0.7, 40, 320],
    [45,  1.9, 52, 60],
    [180, 0.5, 38, 350],
])

# 1. Get ALL TDS values (first column)
# tds_values = sensors[:, ???]

# 2. Get the reading for filter index 3 (all features)
# filter_3 = sensors[???]

# 3. Find filters needing maintenance (TDS > 100)
# needs_maintenance = sensors[sensors[:, 0] > ???]

# 4. Find filters with low flow AND high age (flow < 1.0 AND age > 300)
# critical = sensors[(sensors[:, 1] < ???) & (sensors[:, 3] > ???)]

# 5. Get the average TDS of healthy filters (TDS < 80)
# healthy_tds = sensors[sensors[:, 0] < 80, 0]
# avg_healthy = healthy_tds.???()