# NumPy Basics and Indexing

NumPy (Numerical Python) is the fundamental package for scientific computing in Python. It provides powerful tools for working with arrays and matrices.

## What You'll Learn
- How to create NumPy arrays with different data types
- Understanding array dimensions, shape, and data types
- Accessing specific elements, rows, and columns
- Slicing arrays with start, end, and step parameters
- Modifying array values
- Working with multi-dimensional arrays (2D and 3D)

---

## The Problem: Python Lists Are Not Enough

Python lists are great for general-purpose use, but they have limitations for numerical computing:

In [None]:
# Problem 1: Lists don't support element-wise operations
prices = [10, 20, 30, 40]

# This doesn't work - can't multiply list by number element-wise
# discounted_prices = prices * 0.9  # Would repeat the list, not multiply!

# Have to use loops or comprehensions
discounted_prices = [price * 0.9 for price in prices]
print(f"Discounted prices (with list): {discounted_prices}")

# Problem 2: Lists are slower for large datasets
# Problem 3: Lists can contain mixed types (inconsistent data)

---

## Creating NumPy Arrays

### Installation

First, ensure NumPy is installed:
```bash
pip install numpy
```

In [None]:
import numpy as np

# Convention: import numpy as 'np'
print(f"NumPy version: {np.__version__}")

### Example 1: Creating Arrays from Lists

In [None]:
# Create a 1D array from a list
temperatures = np.array([22.5, 24.0, 19.8, 21.3, 23.7])
print("Temperatures array:")
print(temperatures)
print(f"Type: {type(temperatures)}")
print()

# Create a 2D array (matrix) from nested lists
exam_scores = np.array([
    [85, 90, 78],  # Student 1 scores
    [92, 88, 95],  # Student 2 scores
    [78, 85, 82]   # Student 3 scores
])
print("Exam scores (3 students, 3 exams):")
print(exam_scores)

### Example 2: Specifying Data Types

NumPy supports various data types for memory efficiency and precision.

In [None]:
# Integer array (default: int64 or int32 depending on system)
ages = np.array([25, 30, 35, 40], dtype='int32')
print(f"Ages: {ages}")
print(f"Data type: {ages.dtype}")
print()

# Float array
prices = np.array([19.99, 24.50, 15.75], dtype='float64')
print(f"Prices: {prices}")
print(f"Data type: {prices.dtype}")
print()

# Boolean array
availability = np.array([True, False, True, True], dtype='bool')
print(f"Availability: {availability}")
print(f"Data type: {availability.dtype}")
print()

# String array (fixed length)
products = np.array(['Laptop', 'Mouse', 'Keyboard'], dtype='U20')  # U20 = Unicode, 20 chars max
print(f"Products: {products}")
print(f"Data type: {products.dtype}")

**Common Data Types:**
- `int32`, `int64`: Integers (32-bit, 64-bit)
- `float32`, `float64`: Floating point numbers
- `bool`: Boolean (True/False)
- `U<n>`: Unicode string (n = max characters)
- `complex64`, `complex128`: Complex numbers

---

## Understanding Array Properties

### Array Dimensions (.ndim)

In [None]:
# 1D array (vector)
vector = np.array([1, 2, 3, 4, 5])
print(f"1D array: {vector}")
print(f"Number of dimensions: {vector.ndim}")
print()

# 2D array (matrix)
matrix = np.array([
    [1, 2, 3],
    [4, 5, 6]
])
print(f"2D array:\n{matrix}")
print(f"Number of dimensions: {matrix.ndim}")
print()

# 3D array (tensor)
tensor = np.array([
    [[1, 2], [3, 4]],
    [[5, 6], [7, 8]]
])
print(f"3D array:\n{tensor}")
print(f"Number of dimensions: {tensor.ndim}")

**What happens:**
- `.ndim` returns the number of dimensions (axes)
- 1D = single axis (list)
- 2D = two axes (rows and columns)
- 3D = three axes (depth, rows, columns)

### Array Shape (.shape)

In [None]:
# Shape tells you the size along each dimension

# 1D array
arr_1d = np.array([10, 20, 30, 40, 50])
print(f"1D array: {arr_1d}")
print(f"Shape: {arr_1d.shape}")  # (5,) means 5 elements
print()

# 2D array
arr_2d = np.array([
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12]
])
print(f"2D array:\n{arr_2d}")
print(f"Shape: {arr_2d.shape}")  # (3, 4) means 3 rows, 4 columns
print(f"Total elements: {arr_2d.size}")
print()

# 3D array
arr_3d = np.array([
    [[1, 2], [3, 4]],
    [[5, 6], [7, 8]],
    [[9, 10], [11, 12]]
])
print(f"3D array:\n{arr_3d}")
print(f"Shape: {arr_3d.shape}")  # (3, 2, 2) means 3 layers, 2 rows, 2 columns

**Understanding Shape:**
- Shape is a tuple showing size along each dimension
- `(5,)` = 1D array with 5 elements
- `(3, 4)` = 2D array with 3 rows and 4 columns
- `(3, 2, 2)` = 3D array with 3 layers, each layer has 2×2 elements

---

## Array Indexing

### Example 3: Accessing Elements in 1D Arrays

In [None]:
daily_sales = np.array([120, 150, 95, 200, 175, 210, 185])
print(f"Daily sales: {daily_sales}")
print()

# Access by positive index (0-based)
print(f"First day sales: {daily_sales[0]}")
print(f"Third day sales: {daily_sales[2]}")
print()

# Access by negative index (from the end)
print(f"Last day sales: {daily_sales[-1]}")
print(f"Second to last: {daily_sales[-2]}")

### Example 4: Accessing Elements in 2D Arrays

In [None]:
# Product inventory: rows = products, columns = [price, stock, sales]
inventory = np.array([
    [29.99, 50, 120],  # Product A
    [49.99, 30, 85],   # Product B
    [19.99, 100, 200]  # Product C
])
print("Inventory (price, stock, sales):")
print(inventory)
print()

# Access specific element: [row, column]
print(f"Product A price: ${inventory[0, 0]}")
print(f"Product B stock: {inventory[1, 1]} units")
print(f"Product C sales: {inventory[2, 2]} units")
print()

# Access entire row
product_b_data = inventory[1, :]  # : means "all columns"
print(f"Product B all data: {product_b_data}")
print()

# Access entire column
all_prices = inventory[:, 0]  # : means "all rows"
print(f"All product prices: {all_prices}")
print()

# Access specific rows and columns
prices_and_stock = inventory[:, 0:2]  # All rows, columns 0 and 1
print("Prices and stock:")
print(prices_and_stock)

**Indexing Syntax:**
- `arr[i]` - Get element at index i (1D)
- `arr[row, col]` - Get element at specific position (2D)
- `arr[row, :]` - Get entire row
- `arr[:, col]` - Get entire column
- `:` means "all" along that dimension

---

## Array Slicing

Slicing syntax: `[start:end:step]`
- `start`: Starting index (inclusive)
- `end`: Ending index (exclusive)
- `step`: Step size (default is 1)

### Example 5: Slicing 1D Arrays

In [None]:
monthly_revenue = np.array([5000, 5500, 6200, 5800, 7100, 6900, 7500, 8000, 7200, 6800, 7400, 8200])
print(f"Monthly revenue (12 months): {monthly_revenue}")
print()

# Get first quarter (months 0-2, but end is exclusive so use 3)
q1 = monthly_revenue[0:3]
print(f"Q1 revenue: {q1}")
print()

# Get second half of year (months 6-11)
second_half = monthly_revenue[6:]
print(f"Second half: {second_half}")
print()

# Get every other month
every_other = monthly_revenue[::2]
print(f"Every other month: {every_other}")
print()

# Get last 3 months
last_3 = monthly_revenue[-3:]
print(f"Last 3 months: {last_3}")
print()

# Reverse the array
reversed_revenue = monthly_revenue[::-1]
print(f"Reversed: {reversed_revenue}")

**What happens:**
1. `[0:3]` gets indices 0, 1, 2 (end is exclusive)
2. `[6:]` gets from index 6 to the end
3. `[::2]` gets every 2nd element (step=2)
4. `[-3:]` gets last 3 elements
5. `[::-1]` reverses the array (negative step)

### Example 6: Slicing 2D Arrays

In [None]:
# Sales data: rows = weeks, columns = [Mon, Tue, Wed, Thu, Fri]
weekly_sales = np.array([
    [120, 135, 142, 130, 155],  # Week 1
    [125, 140, 138, 145, 160],  # Week 2
    [130, 128, 145, 150, 165],  # Week 3
    [115, 132, 140, 135, 158]   # Week 4
])
print("Weekly sales (4 weeks × 5 days):")
print(weekly_sales)
print()

# Get first 2 weeks, all days
first_two_weeks = weekly_sales[0:2, :]
print("First 2 weeks:")
print(first_two_weeks)
print()

# Get all weeks, only Mon-Wed (columns 0-2)
mon_to_wed = weekly_sales[:, 0:3]
print("Monday to Wednesday:")
print(mon_to_wed)
print()

# Get week 2-3, Thu-Fri (rows 1-2, columns 3-4)
subset = weekly_sales[1:3, 3:5]
print("Weeks 2-3, Thu-Fri:")
print(subset)
print()

# Get every other week, every other day
sparse = weekly_sales[::2, ::2]
print("Every other week and day:")
print(sparse)

---

## Modifying Array Values

### Example 7: Updating Elements

In [None]:
stock_levels = np.array([50, 30, 75, 20, 60])
print(f"Original stock: {stock_levels}")

# Update single element
stock_levels[0] = 45  # Adjust first product stock
print(f"After updating index 0: {stock_levels}")

# Update multiple elements using slicing
stock_levels[2:4] = [80, 25]  # Update indices 2 and 3
print(f"After updating slice: {stock_levels}")

# Update with broadcast (same value to all)
stock_levels[:] = 100  # Set all to 100
print(f"After setting all to 100: {stock_levels}")

### Example 8: Modifying 2D Arrays

In [None]:
grades = np.array([
    [85, 90, 78],
    [92, 88, 95],
    [78, 85, 82]
])
print("Original grades:")
print(grades)
print()

# Update single element
grades[0, 2] = 80  # Change student 1's third exam score
print("After updating [0, 2]:")
print(grades)
print()

# Update entire row
grades[1, :] = [94, 90, 97]  # Update student 2's all scores
print("After updating row 1:")
print(grades)
print()

# Update entire column (add bonus to exam 1)
grades[:, 0] = grades[:, 0] + 5  # Add 5 points to all students' first exam
print("After adding 5 to column 0:")
print(grades)

---

## Working with 3D Arrays

3D arrays are useful for data like images (height × width × color channels) or time-series matrices.

### Example 9: Understanding 3D Structure

In [None]:
# Sales data: 3 stores, 4 products, 5 days
# Shape: (stores, products, days)
sales_3d = np.array([
    # Store 1
    [
        [10, 12, 11, 13, 15],  # Product 1
        [20, 22, 19, 21, 23],  # Product 2
        [15, 16, 14, 17, 18],  # Product 3
        [8, 9, 7, 10, 11]      # Product 4
    ],
    # Store 2
    [
        [12, 14, 13, 15, 17],  # Product 1
        [18, 20, 17, 19, 21],  # Product 2
        [13, 14, 12, 15, 16],  # Product 3
        [9, 10, 8, 11, 12]     # Product 4
    ],
    # Store 3
    [
        [11, 13, 12, 14, 16],  # Product 1
        [19, 21, 18, 20, 22],  # Product 2
        [14, 15, 13, 16, 17],  # Product 3
        [7, 8, 6, 9, 10]       # Product 4
    ]
])

print(f"Shape: {sales_3d.shape}")  # (3, 4, 5)
print(f"Dimensions: {sales_3d.ndim}")
print(f"Total elements: {sales_3d.size}")
print()

# Access all data for Store 1
store1 = sales_3d[0, :, :]
print("Store 1 (all products, all days):")
print(store1)
print()

# Access Product 2 across all stores and days
product2 = sales_3d[:, 1, :]
print("Product 2 (all stores, all days):")
print(product2)
print()

# Access day 3 sales for all stores and products
day3 = sales_3d[:, :, 2]
print("Day 3 (all stores, all products):")
print(day3)
print()

# Access specific element: Store 2, Product 3, Day 4
specific_sale = sales_3d[1, 2, 3]
print(f"Store 2, Product 3, Day 4: {specific_sale} units")

**Understanding 3D Indexing:**
- `[i, :, :]` - Select one "layer" (first dimension)
- `[:, j, :]` - Select across layers, one "row" (second dimension)
- `[:, :, k]` - Select across layers and rows, one "column" (third dimension)
- Think of it as: `[which_layer, which_row, which_column]`

---

## Best Practices

### ✅ Do:
- Use meaningful variable names (`temperatures` not `arr`)
- Specify `dtype` when creating arrays for clarity and memory efficiency
- Check `.shape` and `.ndim` before slicing to avoid errors
- Use slicing instead of loops for better performance
- Comment complex indexing operations to explain dimensions

### ❌ Don't:
- Mix data types in arrays (defeats the purpose of NumPy)
- Use Python lists when NumPy arrays would be more efficient
- Forget that slicing end index is exclusive
- Create unnecessary copies when views will work (covered in next notebook)
- Use loops when vectorized operations are available

### Comparison: Lists vs NumPy Arrays

In [None]:
import time

# Create large dataset
size = 1_000_000
python_list = list(range(size))
numpy_array = np.array(range(size))

# Time list operation
start = time.time()
list_result = [x * 2 for x in python_list]
list_time = time.time() - start

# Time NumPy operation
start = time.time()
array_result = numpy_array * 2
array_time = time.time() - start

print(f"List operation: {list_time:.4f} seconds")
print(f"NumPy operation: {array_time:.4f} seconds")
print(f"NumPy is {list_time/array_time:.1f}x faster!")

---

## Summary

### Key Concepts:
- **NumPy arrays** are faster and more efficient than Python lists for numerical data
- **Data types** (`dtype`) ensure consistency and memory efficiency
- **Dimensions** (`.ndim`) tell you how many axes the array has
- **Shape** (`.shape`) shows the size along each dimension
- **Indexing** accesses specific elements, rows, or columns
- **Slicing** extracts subarrays using `[start:end:step]` syntax

### Syntax Reference:

**Creating Arrays:**
```python
np.array([1, 2, 3])                    # 1D array
np.array([[1, 2], [3, 4]])             # 2D array
np.array([1, 2, 3], dtype='float32')   # Specify data type
```

**Array Properties:**
```python
arr.ndim      # Number of dimensions
arr.shape     # Tuple of dimension sizes
arr.dtype     # Data type of elements
arr.size      # Total number of elements
```

**Indexing:**
```python
arr[i]           # Access element (1D)
arr[i, j]        # Access element (2D)
arr[i, j, k]     # Access element (3D)
arr[i, :]        # Access row
arr[:, j]        # Access column
```

**Slicing:**
```python
arr[start:end]        # Elements from start to end-1
arr[start:end:step]   # With step size
arr[:end]             # From beginning to end-1
arr[start:]           # From start to end
arr[::-1]             # Reverse array
```

### Next Steps:
Next, learn about [Array Creation and Manipulation](02-creation-and-manipulation.ipynb) where you'll discover efficient ways to create arrays and reshape them for different purposes.