# NumPy Array Creation and Manipulation

Learn efficient methods to create, copy, reshape, and combine NumPy arrays for various data processing tasks.

## What You'll Learn
- Creating arrays of zeros, ones, and custom values
- Generating random arrays and identity matrices
- Repeating array elements
- Understanding deep copy vs. reference
- Reshaping arrays to different dimensions
- Combining arrays vertically and horizontally

---

In [None]:
import numpy as np

print(f"NumPy version: {np.__version__}")

---

## Array Initialization Methods

### The Problem: Manual Array Creation Is Tedious

Creating large arrays manually is impractical:

In [None]:
# Bad: Creating 100 zeros manually
# zeros = np.array([0, 0, 0, 0, ...])  # Too tedious!

# Good: Use initialization methods
zeros = np.zeros(100)
print(f"Created 100 zeros: {zeros[:10]}...")  # Show first 10

---

## Creating Arrays of Zeros

### Example 1: Basic Zeros Arrays

In [None]:
# 1D array of zeros
zero_vector = np.zeros(5)
print(f"1D zeros: {zero_vector}")
print(f"Data type: {zero_vector.dtype}")
print()

# 2D array of zeros
zero_matrix = np.zeros((3, 4))  # 3 rows, 4 columns
print("2D zeros (3×4):")
print(zero_matrix)
print()

# Specify data type
zero_ints = np.zeros(5, dtype='int32')
print(f"Integer zeros: {zero_ints}")
print(f"Data type: {zero_ints.dtype}")

**Use Cases:**
- Initializing counters or accumulators
- Creating placeholder arrays for later filling
- Resetting values in algorithms

---

## Creating Arrays of Ones

### Example 2: Basic Ones Arrays

In [None]:
# 1D array of ones
one_vector = np.ones(6)
print(f"1D ones: {one_vector}")
print()

# 2D array of ones
one_matrix = np.ones((2, 5))  # 2 rows, 5 columns
print("2D ones (2×5):")
print(one_matrix)
print()

# 3D array of ones
one_tensor = np.ones((2, 3, 2))  # 2 layers, 3 rows, 2 columns
print("3D ones (2×3×2):")
print(one_tensor)

**Use Cases:**
- Multiplying by ones (identity operation)
- Creating masks (all True/enabled)
- Initializing weights or probabilities

---

## Creating Arrays with Custom Values

### Example 3: Using np.full()

In [None]:
# Fill with specific value
fives = np.full(7, 5)
print(f"Array of fives: {fives}")
print()

# 2D array with custom value
default_prices = np.full((3, 4), 9.99)
print("Default prices (3×4):")
print(default_prices)
print()

# Boolean array (all True)
all_available = np.full(5, True, dtype='bool')
print(f"All available: {all_available}")
print()

# String array with default value
placeholders = np.full(4, "N/A", dtype='U10')
print(f"Placeholders: {placeholders}")

**Use Cases:**
- Setting default values
- Creating test data with specific values
- Initializing with non-zero starting values

---

## Creating Random Arrays

### Example 4: Random Float Arrays

In [None]:
# Random floats between 0 and 1
random_floats = np.random.rand(5)
print(f"Random floats [0, 1): {random_floats}")
print()

# 2D random array
random_matrix = np.random.rand(3, 4)
print("Random matrix (3×4):")
print(random_matrix)
print()

# Random floats in custom range [10, 20)
random_temps = np.random.rand(5) * 10 + 10  # Scale and shift
print(f"Random temperatures [10, 20): {random_temps}")

### Example 5: Random Integer Arrays

In [None]:
# Random integers in range [low, high)
dice_rolls = np.random.randint(1, 7, size=10)  # 1-6 (7 is exclusive)
print(f"Dice rolls (1-6): {dice_rolls}")
print()

# Random 2D integer array
random_ages = np.random.randint(18, 65, size=(4, 3))  # Ages 18-64
print("Random ages (18-64):")
print(random_ages)
print()

# Random binary array (0 or 1)
coin_flips = np.random.randint(0, 2, size=8)
print(f"Coin flips (0=tails, 1=heads): {coin_flips}")

**Use Cases for Random Arrays:**
- Simulations and Monte Carlo methods
- Generating test data
- Initializing neural network weights
- Sampling and statistical analysis

---

## Creating Identity Matrices

### Example 6: Identity Matrix

In [None]:
# 3×3 identity matrix (1s on diagonal, 0s elsewhere)
identity_3x3 = np.identity(3)
print("3×3 identity matrix:")
print(identity_3x3)
print()

# 5×5 identity matrix
identity_5x5 = np.identity(5)
print("5×5 identity matrix:")
print(identity_5x5)
print()

# Integer identity matrix
identity_int = np.identity(4, dtype='int32')
print("4×4 identity (integer):")
print(identity_int)

**What happens:**
- Identity matrix has 1s on the main diagonal (top-left to bottom-right)
- All other elements are 0
- Multiplying by identity matrix leaves original unchanged (like multiplying by 1)

**Use Cases:**
- Linear algebra operations
- Matrix initialization
- Transformation matrices

---

## Repeating Arrays

### Example 7: Using np.repeat()

In [None]:
# Original array
original = np.array([1, 2, 3])
print(f"Original: {original}")
print()

# Repeat each element
repeated = np.repeat(original, 3)  # Each element repeated 3 times
print(f"Repeated (each 3x): {repeated}")
print()

# Different repetitions for each element
custom_repeat = np.repeat(original, [1, 2, 3])  # 1 repeated 1x, 2 repeated 2x, etc.
print(f"Custom repetition: {custom_repeat}")
print()

# 2D array repetition
matrix = np.array([[1, 2], [3, 4]])
print("Original matrix:")
print(matrix)
print()

# Repeat along axis 0 (rows)
repeated_rows = np.repeat(matrix, 2, axis=0)
print("Repeated along rows (axis=0):")
print(repeated_rows)
print()

# Repeat along axis 1 (columns)
repeated_cols = np.repeat(matrix, 3, axis=1)
print("Repeated along columns (axis=1):")
print(repeated_cols)

**Use Cases:**
- Upsampling data
- Creating training datasets with duplication
- Expanding arrays to match dimensions

---

## Array Copying: Deep Copy vs. Reference

### The Problem: Accidental Data Modification

In [None]:
# Original array
original = np.array([10, 20, 30, 40, 50])
print(f"Original: {original}")
print()

# ❌ Assignment creates reference (not a copy!)
reference = original
reference[0] = 999  # Modifies both!

print("After modifying 'reference':")
print(f"Original: {original}")  # Changed!
print(f"Reference: {reference}")
print("⚠️ Both arrays changed - they point to same data!")

### Example 9: Using .copy() for Independent Arrays

In [None]:
# Reset original
original = np.array([10, 20, 30, 40, 50])
print(f"Original: {original}")
print()

# ✅ Create deep copy
independent_copy = original.copy()
independent_copy[0] = 999  # Only modifies the copy

print("After modifying copy:")
print(f"Original: {original}")  # Unchanged!
print(f"Copy: {independent_copy}")
print("✅ Original is safe - copy is independent!")

### Example 9: Slicing Creates Views (Not Copies)

In [None]:
data = np.array([1, 2, 3, 4, 5, 6, 7, 8])
print(f"Original data: {data}")
print()

# Slicing creates a view (not a copy)
subset = data[2:5]  # This is a view
print(f"Subset (view): {subset}")
print()

# Modifying view affects original
subset[0] = 999
print("After modifying subset:")
print(f"Original data: {data}")  # Changed!
print(f"Subset: {subset}")
print()

# To get independent subset, use .copy()
data = np.array([1, 2, 3, 4, 5, 6, 7, 8])
independent_subset = data[2:5].copy()
independent_subset[0] = 777

print("With .copy():")
print(f"Original data: {data}")  # Unchanged!
print(f"Independent subset: {independent_subset}")

**Key Points:**
- ❌ `arr2 = arr1` creates reference (both point to same data)
- ❌ `arr2 = arr1[:]` creates view (still shares memory)
- ✅ `arr2 = arr1.copy()` creates independent copy
- Views are faster (no memory duplication) but risky
- Use `.copy()` when you need independent data

---

## Reshaping Arrays

### Example 10: Basic Reshaping

In [None]:
# 1D array with 12 elements
monthly_data = np.array([100, 120, 110, 130, 125, 140, 135, 150, 145, 160, 155, 170])
print(f"Original 1D array: {monthly_data}")
print(f"Shape: {monthly_data.shape}")
print()

# Reshape to 3 rows × 4 columns
quarterly = monthly_data.reshape((3, 4))
print("Reshaped to 3×4 (3 quarters, 4 months each):")
print(quarterly)
print(f"Shape: {quarterly.shape}")
print()

# Reshape to 4 rows × 3 columns
alternative = monthly_data.reshape((4, 3))
print("Reshaped to 4×3:")
print(alternative)
print(f"Shape: {alternative.shape}")

### Example 11: Reshaping to 3D

In [None]:
# Create 1D array with 24 elements
data = np.arange(1, 25)
print(f"Original 1D: {data}")
print()

# Reshape to 3D: 2 layers, 3 rows, 4 columns
tensor = data.reshape((2, 3, 4))
print("Reshaped to 3D (2×3×4):")
print(tensor)
print(f"Shape: {tensor.shape}")
print()

# Access layers
print("Layer 0:")
print(tensor[0, :, :])
print()
print("Layer 1:")
print(tensor[1, :, :])

**Reshaping Rules:**
- Total elements must remain the same (12 elements → 3×4 or 2×6, not 3×5)
- Use `-1` to auto-calculate one dimension
- Reshaping creates a view (doesn't copy data) unless necessary
- Can reshape between any compatible dimensions

---

## Stacking Arrays

### Example 12: Vertical Stacking (vstack)

In [None]:
# Two separate weeks of data
week1 = np.array([[100, 110, 105],
                  [120, 115, 125]])

week2 = np.array([[130, 135, 128],
                  [140, 145, 142]])

print("Week 1 (2×3):")
print(week1)
print()
print("Week 2 (2×3):")
print(week2)
print()

# Stack vertically (add rows)
combined = np.vstack((week1, week2))
print("Vertically stacked (4×3):")
print(combined)
print(f"Shape: {combined.shape}")

**What happens:**
- `vstack()` stacks arrays vertically (along axis 0)
- Adds rows (increases number of rows)
- Arrays must have same number of columns
- Think: "stacking shelves on top of each other"

### Example 13: Horizontal Stacking (hstack)

In [None]:
# Product data from two stores
store1_data = np.array([[100, 50],
                        [200, 75],
                        [150, 60]])

store2_data = np.array([[120, 55],
                        [210, 80],
                        [160, 65]])

print("Store 1 data (3×2):")
print(store1_data)
print()
print("Store 2 data (3×2):")
print(store2_data)
print()

# Stack horizontally (add columns)
all_stores = np.hstack((store1_data, store2_data))
print("Horizontally stacked (3×4):")
print(all_stores)
print(f"Shape: {all_stores.shape}")

**What happens:**
- `hstack()` stacks arrays horizontally (along axis 1)
- Adds columns (increases number of columns)
- Arrays must have same number of rows
- Think: "placing books side by side on a shelf"

### Example 14: Stacking Multiple Arrays

In [None]:
# Three separate measurements
measurement1 = np.array([10, 20, 30])
measurement2 = np.array([15, 25, 35])
measurement3 = np.array([12, 22, 32])

print(f"Measurement 1: {measurement1}")
print(f"Measurement 2: {measurement2}")
print(f"Measurement 3: {measurement3}")
print()

# Vertical stack (create rows)
measurements_vstack = np.vstack((measurement1, measurement2, measurement3))
print("Vertical stack (3×3):")
print(measurements_vstack)
print()

# Horizontal stack (create columns)
measurements_hstack = np.hstack((measurement1, measurement2, measurement3))
print(f"Horizontal stack (9,): {measurements_hstack}")

---

## Best Practices

### ✅ Do:
- Use initialization methods (`zeros`, `ones`, `full`) instead of manual creation
- Set random seed (`np.random.seed()`) for reproducible results
- Use `.copy()` when you need independent arrays
- Use `-1` in reshape for automatic dimension calculation
- Verify shapes match before stacking arrays
- Choose appropriate data types (`dtype`) for memory efficiency

### ❌ Don't:
- Assume assignment (`arr2 = arr1`) creates a copy - it doesn't!
- Reshape to incompatible dimensions (total elements must match)
- Stack arrays with mismatched dimensions
- Create unnecessarily large arrays (memory waste)
- Use random values in production without seeding (non-reproducible)

### Comparison: Copy vs. View vs. Reference

In [None]:
original = np.array([1, 2, 3, 4, 5])

# Reference (same object)
reference = original
print(f"Same object? {reference is original}")  # True

# View (shares data, different object)
view = original[:]
print(f"View is same object? {view is original}")  # False
print(f"View shares memory? {np.shares_memory(view, original)}")  # True

# Copy (independent)
copy = original.copy()
print(f"Copy is same object? {copy is original}")  # False
print(f"Copy shares memory? {np.shares_memory(copy, original)}")  # False

---

## Summary

### Key Concepts:
- **Initialization**: Create arrays efficiently with `zeros()`, `ones()`, `full()`, `random.rand()`, `random.randint()`
- **Identity matrices**: Create with `identity()` for linear algebra
- **Repetition**: Use `repeat()` to duplicate elements or entire arrays
- **Copying**: Use `.copy()` for independent arrays; assignment and slicing create references/views
- **Reshaping**: Change dimensions with `reshape()`, use `-1` for auto-calculation
- **Stacking**: Combine arrays with `vstack()` (vertical) and `hstack()` (horizontal)

### Syntax Reference:

**Array Creation:**
```python
np.zeros(shape)                    # Array of zeros
np.ones(shape)                     # Array of ones
np.full(shape, value)              # Array with custom value
np.random.rand(shape)              # Random floats [0, 1)
np.random.randint(low, high, size) # Random integers
np.identity(n)                     # n×n identity matrix
```

**Array Manipulation:**
```python
arr.copy()                         # Deep copy
arr.reshape(new_shape)             # Change dimensions
arr.reshape(-1, cols)              # Auto-calculate rows
np.repeat(arr, repeats)            # Repeat elements
```

**Array Combining:**
```python
np.vstack((arr1, arr2))            # Stack vertically (add rows)
np.hstack((arr1, arr2))            # Stack horizontally (add columns)
```

### Next Steps:
Next, learn about [Operations and Statistics](03-operations-and-statistics.ipynb) to perform mathematical computations and statistical analysis on your NumPy arrays.