# NumPy Basics and Arrays

NumPy (Numerical Python) is the fundamental package for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.

## Why NumPy?
- **Performance**: NumPy arrays are much faster than Python lists
- **Memory efficient**: Uses less memory than Python lists
- **Vectorized operations**: Perform operations on entire arrays without writing loops
- **Foundation**: Base for other libraries like Pandas, Scikit-learn, etc.

## Topics Covered:
- Installing and importing NumPy
- Creating NumPy arrays
- Array attributes and properties
- Array data types
- Array indexing and slicing
- Array reshaping

## Installing and Importing NumPy

In [None]:
# Install NumPy (run this if NumPy is not installed)
# !pip install numpy

# Import NumPy
import numpy as np

# Check NumPy version
print(f"NumPy version: {np.__version__}")

## Creating NumPy Arrays

In [None]:
# From Python lists
list1 = [1, 2, 3, 4, 5]
arr1 = np.array(list1)

print("Python list:", list1)
print("NumPy array:", arr1)
print("Type:", type(arr1))
print()

In [None]:
# 2D array from nested lists
list2d = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
arr2d = np.array(list2d)

print("2D array:")
print(arr2d)
print()

In [None]:
# Using NumPy functions to create arrays

# Array of zeros
zeros = np.zeros(5)
print("Zeros:", zeros)

# Array of ones
ones = np.ones((3, 4))  # 3x4 matrix of ones
print("\nOnes (3x4):")
print(ones)

# Array filled with a specific value
full_array = np.full((2, 3), 7)
print("\nFilled with 7:")
print(full_array)

In [None]:
# Range-based arrays

# Using arange (similar to Python's range)
range_arr = np.arange(0, 10, 2)  # start, stop, step
print("Arange (0 to 10, step 2):", range_arr)

# Using linspace (linearly spaced numbers)
linear_arr = np.linspace(0, 1, 5)  # start, stop, number of points
print("Linspace (0 to 1, 5 points):", linear_arr)

# Identity matrix
identity = np.eye(3)  # 3x3 identity matrix
print("\nIdentity matrix (3x3):")
print(identity)

In [None]:
# Random arrays
np.random.seed(42)  # For reproducible results

# Random numbers between 0 and 1
random_arr = np.random.random((2, 3))
print("Random array (0-1):")
print(random_arr)

# Random integers
random_ints = np.random.randint(1, 10, size=(3, 3))
print("\nRandom integers (1-9):")
print(random_ints)

# Random numbers from normal distribution
normal_arr = np.random.normal(0, 1, size=5)  # mean=0, std=1
print("\nNormal distribution:", normal_arr)

## Array Attributes and Properties

In [None]:
# Create a sample array
sample_arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

print("Sample array:")
print(sample_arr)
print()

# Important attributes
print(f"Shape: {sample_arr.shape}")          # Dimensions
print(f"Size: {sample_arr.size}")            # Total number of elements
print(f"Ndim: {sample_arr.ndim}")            # Number of dimensions
print(f"Dtype: {sample_arr.dtype}")          # Data type
print(f"Itemsize: {sample_arr.itemsize}")    # Size of each element in bytes
print(f"Nbytes: {sample_arr.nbytes}")        # Total bytes consumed

## Array Data Types

In [None]:
# Different data types
int_arr = np.array([1, 2, 3], dtype=np.int32)
float_arr = np.array([1, 2, 3], dtype=np.float64)
bool_arr = np.array([True, False, True], dtype=bool)
string_arr = np.array(['a', 'b', 'c'], dtype='U1')  # Unicode string of length 1

print("Integer array:", int_arr, "dtype:", int_arr.dtype)
print("Float array:", float_arr, "dtype:", float_arr.dtype)
print("Boolean array:", bool_arr, "dtype:", bool_arr.dtype)
print("String array:", string_arr, "dtype:", string_arr.dtype)
print()

In [None]:
# Type conversion
original = np.array([1.1, 2.7, 3.9, 4.2])
print("Original:", original, "dtype:", original.dtype)

# Convert to integer
as_int = original.astype(int)
print("As integer:", as_int, "dtype:", as_int.dtype)

# Convert to string
as_string = original.astype(str)
print("As string:", as_string, "dtype:", as_string.dtype)

## Array Indexing and Slicing

In [None]:
# 1D array indexing
arr_1d = np.array([10, 20, 30, 40, 50])

print("1D array:", arr_1d)
print(f"First element: {arr_1d[0]}")
print(f"Last element: {arr_1d[-1]}")
print(f"Elements 1-3: {arr_1d[1:4]}")
print(f"Every other element: {arr_1d[::2]}")
print()

In [None]:
# 2D array indexing
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print("2D array:")
print(arr_2d)
print()

# Indexing
print(f"Element at row 1, col 2: {arr_2d[1, 2]}")
print(f"First row: {arr_2d[0, :]}")
print(f"Second column: {arr_2d[:, 1]}")
print(f"Top-left 2x2: \n{arr_2d[:2, :2]}")

In [None]:
# Boolean indexing
arr = np.array([1, 5, 3, 8, 2, 9, 4])

print("Original array:", arr)

# Create boolean mask
mask = arr > 4
print("Mask (> 4):", mask)
print("Elements > 4:", arr[mask])

# One-liner
print("Elements <= 4:", arr[arr <= 4])
print("Even numbers:", arr[arr % 2 == 0])

In [None]:
# Fancy indexing (using arrays as indices)
arr = np.array([10, 20, 30, 40, 50])
indices = np.array([0, 2, 4])

print("Original array:", arr)
print("Indices:", indices)
print("Selected elements:", arr[indices])

# 2D fancy indexing
arr_2d = np.array([[1, 2], [3, 4], [5, 6]])
row_indices = np.array([0, 2])
col_indices = np.array([1, 0])

print("\n2D array:")
print(arr_2d)
print("Selected elements:", arr_2d[row_indices, col_indices])

## Array Reshaping

In [None]:
# Reshaping arrays
arr_1d = np.arange(12)
print("Original 1D array:", arr_1d)
print("Shape:", arr_1d.shape)
print()

# Reshape to 2D
arr_2d = arr_1d.reshape(3, 4)
print("Reshaped to 3x4:")
print(arr_2d)
print("Shape:", arr_2d.shape)
print()

# Reshape to 3D
arr_3d = arr_1d.reshape(2, 2, 3)
print("Reshaped to 2x2x3:")
print(arr_3d)
print("Shape:", arr_3d.shape)

In [None]:
# Automatic dimension calculation
arr = np.arange(24)
print("Original array shape:", arr.shape)

# Let NumPy calculate one dimension
reshaped = arr.reshape(4, -1)  # -1 means "calculate this dimension"
print("Reshaped (4, -1):", reshaped.shape)
print(reshaped)
print()

# Another example
reshaped2 = arr.reshape(-1, 6)
print("Reshaped (-1, 6):", reshaped2.shape)
print(reshaped2)

In [None]:
# Flattening arrays
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print("2D array:")
print(arr_2d)

# Flatten to 1D
flattened = arr_2d.flatten()
print("Flattened:", flattened)

# Alternative: ravel (returns a view when possible)
raveled = arr_2d.ravel()
print("Raveled:", raveled)

# Reshape to 1D
reshaped_1d = arr_2d.reshape(-1)
print("Reshaped to 1D:", reshaped_1d)

## Array vs List Performance Comparison

In [None]:
import time

# Create large data structures
size = 1000000
python_list = list(range(size))
numpy_array = np.arange(size)

print(f"Comparing performance with {size:,} elements:")
print()

# Test sum operation
# Python list
start_time = time.time()
list_sum = sum(python_list)
list_time = time.time() - start_time

# NumPy array
start_time = time.time()
array_sum = np.sum(numpy_array)
array_time = time.time() - start_time

print(f"List sum time: {list_time:.4f} seconds")
print(f"Array sum time: {array_time:.4f} seconds")
print(f"NumPy is {list_time/array_time:.1f}x faster")

# Memory comparison
import sys
list_memory = sys.getsizeof(python_list) + sum(sys.getsizeof(i) for i in python_list[:100])  # Sample
array_memory = numpy_array.nbytes

print(f"\nMemory usage (approximate):")
print(f"List: {list_memory:,} bytes (sample-based estimate)")
print(f"Array: {array_memory:,} bytes")

## Practical Examples

In [None]:
# Example 1: Creating a multiplication table
def create_multiplication_table(n):
    """Create an n x n multiplication table"""
    # Create arrays for rows and columns
    rows = np.arange(1, n+1).reshape(-1, 1)  # Column vector
    cols = np.arange(1, n+1).reshape(1, -1)  # Row vector
    
    # Broadcasting will create the multiplication table
    table = rows * cols
    return table

# Create a 5x5 multiplication table
mult_table = create_multiplication_table(5)
print("5x5 Multiplication table:")
print(mult_table)

In [None]:
# Example 2: Data analysis with grades
# Simulate student grades
np.random.seed(42)
num_students = 30
num_subjects = 5

# Generate random grades (0-100)
grades = np.random.randint(60, 101, size=(num_students, num_subjects))

print(f"Grades matrix shape: {grades.shape}")
print("Sample grades (first 5 students):")
print(grades[:5])
print()

# Calculate statistics
student_averages = grades.mean(axis=1)  # Average across subjects for each student
subject_averages = grades.mean(axis=0)  # Average across students for each subject

print(f"Student averages (first 10): {student_averages[:10].round(1)}")
print(f"Subject averages: {subject_averages.round(1)}")
print()

# Find top performers
top_student_idx = np.argmax(student_averages)
print(f"Top student (index {top_student_idx}): {student_averages[top_student_idx]:.1f} average")
print(f"Top student's grades: {grades[top_student_idx]}")

# Count students with average > 85
high_performers = np.sum(student_averages > 85)
print(f"Students with average > 85: {high_performers}")

## Key Takeaways

1. **NumPy arrays are faster and more memory-efficient** than Python lists
2. **Array creation**: Use `np.array()`, `np.zeros()`, `np.ones()`, `np.arange()`, `np.linspace()`
3. **Important attributes**: `shape`, `size`, `ndim`, `dtype`
4. **Indexing**: Similar to Python lists but more powerful with boolean and fancy indexing
5. **Reshaping**: Use `reshape()`, `flatten()`, `ravel()` to change array dimensions
6. **Data types matter**: Choose appropriate dtypes for memory efficiency
7. **Broadcasting**: NumPy can perform operations on arrays of different shapes

## Practice Exercises

1. Create a 10x10 array of random integers and find the maximum value in each row
2. Generate a checkerboard pattern using NumPy arrays
3. Create a function that normalizes an array (subtract mean, divide by standard deviation)
4. Find all prime numbers up to 100 using NumPy operations
5. Create a mandelbrot set visualization using NumPy arrays

## Next Steps

In the next notebook, we'll explore:
- Mathematical operations on arrays
- Broadcasting rules
- Linear algebra with NumPy
- Statistical functions