# Practice with NumPy Arrays

In this lab, you'll practice creating and working with NumPy arrays, learning the basics of array creation, manipulation, and simple operations for data science.

## Learning Objectives
By the end of this lab, you will be able to:
- Create NumPy arrays using various methods
- Access and modify array elements using indexing and slicing
- Perform mathematical operations on arrays using vectorization
- Reshape and manipulate array structures
- Filter data using boolean indexing and conditional operations

## Section 1: Getting Started with NumPy Arrays

NumPy arrays form the foundation of data science in Python because they provide efficient storage and computation for numerical data. Unlike Python lists, NumPy arrays store elements of the same data type in contiguous memory locations, making mathematical operations much faster.

Let's begin by importing NumPy and creating your first arrays:

In [None]:
import numpy as np

# Create arrays from lists
simple_array = np.array([1, 2, 3, 4, 5])
print("Simple array:", simple_array)
print("Array type:", type(simple_array))
print("Element data type:", simple_array.dtype)

# Create a 2D array (matrix)
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("\n2D Array:")
print(matrix)
print("Shape:", matrix.shape)
print("Number of dimensions:", matrix.ndim)

**Exercise 1.1:** Run the code above and examine the output. Notice how NumPy automatically determines the data type and provides useful information about the array's structure.

**Think About It:** Why do you think NumPy arrays require all elements to be the same data type, while Python lists allow mixed types?

## Section 2: Array Creation Methods

NumPy provides several convenient methods for creating arrays without manually typing every element. These methods are particularly useful when working with large datasets or when you need arrays with specific patterns.

In [None]:
# Create arrays with specific values
zeros_array = np.zeros(5)  # Array of zeros
ones_array = np.ones((3, 4))  # 3x4 array of ones
full_array = np.full((2, 3), 7)  # 2x3 array filled with 7

print("Zeros array:", zeros_array)
print("\nOnes array:")
print(ones_array)
print("\nFull array:")
print(full_array)

In [None]:
# Create arrays with ranges
range_array = np.arange(0, 10, 2)  # Start at 0, stop before 10, step by 2
linspace_array = np.linspace(0, 1, 5)  # 5 evenly spaced numbers from 0 to 1

print("Range array:", range_array)
print("Linspace array:", linspace_array)

In [None]:
# Create random arrays (useful for testing and simulations)
np.random.seed(42)  # Set seed for reproducible results
random_array = np.random.rand(3, 3)  # 3x3 array of random numbers between 0 and 1
random_integers = np.random.randint(1, 11, size=10)  # 10 random integers from 1 to 10

print("Random array:")
print(random_array)
print("Random integers:", random_integers)

**Exercise 2.1:** Try creating the following arrays:
1. A 4x4 array filled with the number 3.14
2. An array of 15 evenly spaced numbers from -5 to 5
3. A 1D array of 20 random integers between 50 and 100

Use the cell below for your practice:

In [None]:
# Your practice code here


## Section 3: Array Indexing and Slicing

Accessing and modifying array elements is a fundamental skill in data science. NumPy provides powerful indexing capabilities that go far beyond basic list indexing.

In [None]:
# Create a sample array to work with
data = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90, 100])

# Basic indexing (similar to Python lists)
print("First element:", data[0])
print("Last element:", data[-1])
print("Third element:", data[2])

# Slicing - extracting portions of arrays
print("\nFirst three elements:", data[:3])
print("Last three elements:", data[-3:])
print("Every second element:", data[::2])
print("Elements from index 2 to 7:", data[2:8])

In [None]:
# Modifying array elements
data_copy = data.copy()  # Create a copy to preserve original
data_copy[0] = 999
data_copy[5:8] = [600, 700, 800]
print("Original array:", data)
print("Modified array:", data_copy)

In [None]:
# Working with 2D arrays
matrix_2d = np.array([[1, 2, 3, 4],
                      [5, 6, 7, 8], 
                      [9, 10, 11, 12]])

print("2D Array:")
print(matrix_2d)
print("Element at row 1, column 2:", matrix_2d[1, 2])
print("First row:", matrix_2d[0, :])
print("Second column:", matrix_2d[:, 1])
print("Submatrix (rows 0-1, columns 1-3):")
print(matrix_2d[0:2, 1:4])

**Exercise 3.1:** Using the `matrix_2d` array above, try to extract:
1. The last row
2. The last column  
3. The center 2x2 submatrix
4. Every other element in the first row

Use the cell below:

In [None]:
# Your indexing practice here


## Section 4: Mathematical Operations and Broadcasting

One of NumPy's greatest strengths is its ability to perform mathematical operations efficiently across entire arrays. This concept, called vectorization, eliminates the need for explicit loops and dramatically improves performance.

In [None]:
# Create arrays for mathematical operations
array_a = np.array([1, 2, 3, 4, 5])
array_b = np.array([10, 20, 30, 40, 50])

# Element-wise arithmetic operations
print("Array A:", array_a)
print("Array B:", array_b)
print("A + B:", array_a + array_b)
print("A * B:", array_a * array_b)
print("B / A:", array_b / array_a)
print("A squared:", array_a ** 2)

In [None]:
# Operations with scalars (broadcasting)
print("Broadcasting with scalars:")
print("A + 10:", array_a + 10)
print("A * 3:", array_a * 3)
print("A / 2:", array_a / 2)

In [None]:
# Mathematical functions
print("Mathematical functions:")
print("Square root of A:", np.sqrt(array_a))
print("Exponential of A:", np.exp(array_a))
print("Natural log of B:", np.log(array_b))
print("Sine of A:", np.sin(array_a))

In [None]:
# Statistical operations
print("Statistical operations on Array B:")
print("Sum:", np.sum(array_b))
print("Mean:", np.mean(array_b))
print("Standard deviation:", np.std(array_b))
print("Minimum:", np.min(array_b))
print("Maximum:", np.max(array_b))
print("Index of maximum:", np.argmax(array_b))

**Exercise 4.1:** Create two arrays of your choice and perform the following operations:
1. Element-wise subtraction
2. Element-wise division
3. Find the mean and standard deviation of each array
4. Apply the cosine function to one of the arrays

**Think About It:** Why is vectorization faster than using loops in Python?

In [None]:
# Your mathematical operations practice here


## Section 5: Array Reshaping and Manipulation

Data often comes in one format but needs to be restructured for analysis or visualization. NumPy provides powerful tools for changing array shapes while preserving the underlying data.

In [None]:
# Create a 1D array
original = np.arange(1, 13)  # Numbers 1 through 12
print("Original array:", original)
print("Shape:", original.shape)

# Reshape into different dimensions
reshaped_2d = original.reshape(3, 4)  # 3 rows, 4 columns
reshaped_3d = original.reshape(2, 2, 3)  # 2x2x3 array

print("\nReshaped to 3x4:")
print(reshaped_2d)
print("Shape:", reshaped_2d.shape)

print("\nReshaped to 2x2x3:")
print(reshaped_3d)
print("Shape:", reshaped_3d.shape)

In [None]:
# Flatten arrays back to 1D
flattened = reshaped_2d.flatten()
print("Flattened back to 1D:", flattened)

# Transpose (swap rows and columns)
transposed = reshaped_2d.T
print("\nOriginal 3x4:")
print(reshaped_2d)
print("\nTransposed (4x3):")
print(transposed)
print("Shape:", transposed.shape)

In [None]:
# Stack arrays together
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])
array3 = np.array([7, 8, 9])

# Vertical stacking (stack as rows)
vstacked = np.vstack([array1, array2, array3])
print("Vertically stacked:")
print(vstacked)

# Horizontal stacking (concatenate side by side)
hstacked = np.hstack([array1, array2, array3])
print("\nHorizontally stacked:", hstacked)

**Exercise 5.1:** Practice reshaping with these tasks:
1. Create a 1D array with 24 elements
2. Reshape it into a 4x6 matrix
3. Transpose the matrix
4. Flatten it back to 1D
5. Reshape into a 3D array of your choice (make sure the total elements = 24)

**Challenge:** Can you reshape a 12-element array into a 3x5 matrix? Why or why not?

In [None]:
# Your reshaping practice here


## Section 6: Boolean Indexing and Conditional Operations

Real-world data analysis often requires filtering data based on conditions. NumPy's boolean indexing provides an elegant way to select elements that meet specific criteria.

In [None]:
# Create sample data
temperatures = np.array([68, 72, 75, 80, 85, 90, 95, 88, 82, 78])
cities = np.array(['NYC', 'LA', 'Chicago', 'Houston', 'Phoenix', 
                   'Philadelphia', 'San Antonio', 'San Diego', 'Dallas', 'San Jose'])

print("Temperatures:", temperatures)
print("Cities:", cities)

In [None]:
# Create boolean masks
hot_days = temperatures > 85
very_hot_days = temperatures >= 90
moderate_days = (temperatures >= 75) & (temperatures < 85)

print("Hot days (>85°F):", hot_days)
print("Very hot days (>=90°F):", very_hot_days)
print("Moderate days (75-84°F):", moderate_days)

In [None]:
# Use boolean indexing to filter data
hot_cities = cities[hot_days]
hot_temps = temperatures[hot_days]

print("Cities with hot weather:")
for city, temp in zip(hot_cities, hot_temps):
    print(f"{city}: {temp}°F")

In [None]:
# Conditional operations
# Replace extreme temperatures with more moderate values
adjusted_temps = np.where(temperatures > 90, 90, temperatures)
print("Original temperatures:", temperatures)
print("Adjusted temperatures (capped at 90°F):", adjusted_temps)

# Count elements meeting conditions
num_hot_days = np.sum(hot_days)  # True counts as 1, False as 0
num_moderate_days = np.sum(moderate_days)
print(f"\nNumber of hot days: {num_hot_days}")
print(f"Number of moderate days: {num_moderate_days}")

# Find indices of elements meeting conditions
hot_indices = np.where(temperatures > 85)[0]
print("Indices of hot days:", hot_indices)

**Exercise 6.1:** Practice boolean indexing with this scenario:

You have test scores for a class of students. Create arrays for student names and their scores, then:
1. Find all students who scored above 90
2. Find students who scored between 80-89 (inclusive)
3. Count how many students failed (scored below 60)
4. Replace all failing scores with 60 (minimum passing)
5. Calculate the average score for students who originally passed

**Think About It:** Why do you think boolean indexing is so powerful for data analysis?

In [None]:
# Create your student data and practice boolean indexing here
# Suggestion: Use names like ['Alice', 'Bob', 'Charlie', ...] and realistic test scores


## Lab Summary and Reflection

Congratulations! In this lab, you've practiced the essential NumPy skills that form the foundation of data science in Python:

**Key Skills Mastered:**
- Creating arrays with different methods (`array`, `zeros`, `ones`, `arange`, `linspace`)
- Indexing and slicing arrays to access or modify data
- Performing mathematical operations using vectorization and broadcasting
- Reshaping and stacking arrays for different data representations
- Filtering data with boolean indexing and conditional operations

**Why These Skills Matter:**
NumPy's vectorized operations are typically 10-100 times faster than equivalent Python loops, making it essential for working with large datasets. The indexing and reshaping capabilities you've learned are crucial for preparing data for machine learning algorithms and statistical analysis.

**Looking Ahead:**
These NumPy fundamentals prepare you perfectly for the next step: learning Pandas. Pandas is built on top of NumPy, so understanding arrays, vectorization, and boolean indexing will make Pandas DataFrames much easier to understand and manipulate.

**Reflection Questions:**
1. Which NumPy concept was most challenging to understand, and why?
2. How do you think vectorization will change the way you approach data processing tasks?
3. Can you think of a real-world scenario where boolean indexing would be particularly useful?
4. What questions do you still have about NumPy arrays?

## Additional Practice (Optional)

If you'd like more practice, try these challenge exercises:

In [None]:
# Challenge 1: Create a function that normalizes an array to have mean 0 and standard deviation 1
# def normalize_array(arr):
    # """Normalize array to have mean=0, std=1"""
    # Your code here
    # pass

# Test your function
# test_data = np.array([10, 20, 30, 40, 50])
# normalized = normalize_array(test_data)
# print(f"Original mean: {np.mean(test_data):.2f}, std: {np.std(test_data):.2f}")
# print(f"Normalized mean: {np.mean(normalized):.2f}, std: {np.std(normalized):.2f}")



In [None]:
# Challenge 2: Matrix operations - Create two 3x3 matrices and perform matrix multiplication
# Hint: Use np.dot() or the @ operator for matrix multiplication (different from element-wise *)



In [None]:
# Challenge 3: Advanced boolean indexing - Given a 2D array, find all rows where the sum is greater than 15
