# NumPy for Data Science Interviews: 10 Essential Exercises

## 🔢 Master Numerical Computing for ML/DS Roles

Welcome to this comprehensive NumPy practice notebook designed specifically for data science and machine learning interviews! These exercises cover the most commonly asked NumPy concepts in technical interviews, from basic array operations to advanced mathematical computations used in real ML workflows.

### 📚 What You'll Master
- Array creation, indexing, and broadcasting
- Statistical operations and data analysis
- Linear algebra for machine learning
- Data preprocessing and feature engineering
- Performance optimization techniques
- Real-world ML data manipulation scenarios

### 🎯 Interview Focus Areas
- **Entry Level**: Basic operations, indexing, simple statistics
- **Mid Level**: Broadcasting, linear algebra, optimization, feature engineering
- **Advanced**: Custom functions, performance considerations, ML implementations

---

### 1. Array Creation and Basic Operations (Entry Level)

<details>
<summary>💡 Click for hint</summary>

**Interview Focus**: Basic NumPy array creation and manipulation - fundamental skill for any DS role.

**Key concepts**: np.array(), np.zeros(), np.ones(), np.arange(), np.linspace(), array properties

**Common Questions**: "How do you create arrays of different types?" "What's the difference between np.arange() and np.linspace()?"

</details>

In [None]:
import numpy as np

# Create different types of arrays commonly used in data science

# 1. Create a 1D array of integers from 0 to 9
arr_1d = # Your code here

# 2. Create a 3x4 matrix of zeros (common for initializing weights)
zeros_matrix = # Your code here

# 3. Create a 2x3 matrix of ones with float32 dtype (memory efficient)
ones_matrix = # Your code here

# 4. Create an array of 50 evenly spaced values between 0 and 1 (for plotting/analysis)
linspace_arr = # Your code here

# 5. Create a 5x5 identity matrix (used in linear algebra)
identity_matrix = # Your code here

# 6. Create a random array with shape (100,) from normal distribution (simulating data)
np.random.seed(42)
random_data = # Your code here

print(f"1D array: {arr_1d}")
print(f"Zeros matrix shape: {zeros_matrix.shape}, dtype: {zeros_matrix.dtype}")
print(f"Ones matrix: \n{ones_matrix}")
print(f"Linspace first 5 values: {linspace_arr[:5]}")
print(f"Identity matrix: \n{identity_matrix}")
print(f"Random data stats - mean: {random_data.mean():.3f}, std: {random_data.std():.3f}")

In [None]:
# Test: Verify array creation
print(f"1D array correct: {len(arr_1d) == 10 and arr_1d[0] == 0 and arr_1d[-1] == 9}")
print(f"Zeros matrix correct: {zeros_matrix.shape == (3, 4) and np.all(zeros_matrix == 0)}")
print(f"Ones matrix correct: {ones_matrix.shape == (2, 3) and ones_matrix.dtype == np.float32}")
print(f"Identity matrix correct: {np.allclose(identity_matrix, np.eye(5))}")

### 2. Array Indexing and Slicing for Data Selection (Entry Level)

<details>
<summary>💡 Click for hint</summary>

**Interview Focus**: Data selection and filtering - critical for data preprocessing and feature selection.

**Key concepts**: Boolean indexing, fancy indexing, slicing, conditional selection

**Common Questions**: "How do you select data based on conditions?" "Extract specific rows/columns from a dataset?"

</details>

In [None]:
# Create a sample dataset (similar to what you'd get from a CSV)
np.random.seed(123)
data = np.random.randn(100, 5)  # 100 samples, 5 features
labels = np.random.choice([0, 1], 100)  # Binary classification labels

# Indexing and slicing exercises

# 1. Select first 10 rows and first 3 columns
subset = # Your code here

# 2. Select all rows where the first feature > 0 (positive values)
positive_first_feature = # Your code here

# 3. Select samples where label == 1 (positive class)
positive_samples = # Your code here

# 4. Select rows where any feature value > 2 (outlier detection)
outlier_rows = # Your code here

# 5. Select specific rows by index (e.g., indices [5, 10, 15, 20])
specific_rows = # Your code here

# 6. Select last column (often the target variable)
last_column = # Your code here

# 7. Select every other row (data sampling)
sampled_data = # Your code here

print(f"Original data shape: {data.shape}")
print(f"Subset shape: {subset.shape}")
print(f"Positive first feature samples: {len(positive_first_feature)}")
print(f"Positive class samples: {len(positive_samples)}")
print(f"Outlier rows: {len(outlier_rows)}")
print(f"Specific rows shape: {specific_rows.shape}")
print(f"Sampled data shape: {sampled_data.shape}")

In [None]:
# Test: Verify indexing operations
print(f"Subset correct: {subset.shape == (10, 3)}")
print(f"Positive filtering works: {np.all(positive_first_feature[:, 0] > 0)}")
print(f"Label filtering works: {len(positive_samples) == np.sum(labels == 1)}")
print(f"Specific rows correct: {specific_rows.shape == (4, 5)}")
print(f"Sampling correct: {sampled_data.shape == (50, 5)}")

### 3. Statistical Operations for Data Analysis (Entry-Mid Level)

<details>
<summary>💡 Click for hint</summary>

**Interview Focus**: Descriptive statistics and data summarization - essential for EDA and feature analysis.

**Key concepts**: np.mean(), np.std(), np.percentile(), axis parameter, aggregation functions

**Common Questions**: "Calculate statistics along different axes?" "How to handle missing data in calculations?"

</details>

In [None]:
# Create a dataset with some missing values (NaN)
np.random.seed(456)
features = np.random.randn(200, 8)  # 200 samples, 8 features

# Introduce some missing values
missing_mask = np.random.random((200, 8)) < 0.1  # 10% missing values
features[missing_mask] = np.nan

# Statistical operations

# 1. Calculate mean for each feature (column-wise)
feature_means = # Your code here (handle NaN)

# 2. Calculate standard deviation for each sample (row-wise)
sample_stds = # Your code here (handle NaN)

# 3. Find 25th, 50th, and 75th percentiles for each feature
percentiles = # Your code here (handle NaN)

# 4. Calculate correlation matrix between features
# First, create data without NaN for correlation
clean_features = features[~np.isnan(features).any(axis=1)]  # Remove rows with any NaN
correlation_matrix = # Your code here

# 5. Find features with highest variance (important for feature selection)
feature_variances = # Your code here
high_variance_features = # Your code here (top 3 indices)

# 6. Detect outliers using z-score (|z| > 3)
z_scores = # Your code here
outlier_count = # Your code here

# 7. Calculate skewness for each feature (measure of asymmetry)
def skewness(x):
    # Your code here - implement skewness formula
    pass

feature_skewness = # Your code here

print(f"Dataset shape: {features.shape}")
print(f"Missing values: {np.isnan(features).sum()} ({np.isnan(features).mean()*100:.1f}%)")
print(f"Feature means: {feature_means}")
print(f"Feature variances: {feature_variances}")
print(f"High variance features (indices): {high_variance_features}")
print(f"Outliers detected: {outlier_count}")
print(f"Correlation matrix shape: {correlation_matrix.shape}")

In [None]:
# Test: Verify statistical calculations
print(f"Feature means calculated: {len(feature_means) == 8}")
print(f"No NaN in means: {not np.isnan(feature_means).any()}")
print(f"Correlation matrix is symmetric: {np.allclose(correlation_matrix, correlation_matrix.T)}")
print(f"Diagonal of correlation matrix is 1: {np.allclose(np.diag(correlation_matrix), 1)}")
print(f"High variance features found: {len(high_variance_features) == 3}")

### 4. Broadcasting and Vectorization (Mid Level)

<details>
<summary>💡 Click for hint</summary>

**Interview Focus**: Efficient computation without loops - critical for performance in ML pipelines.

**Key concepts**: Broadcasting rules, vectorized operations, performance optimization

**Common Questions**: "How to normalize data efficiently?" "Compute distances between all pairs of points?"

</details>

In [None]:
# Create sample data for ML preprocessing
np.random.seed(789)
X = np.random.randn(1000, 10)  # 1000 samples, 10 features
y = np.random.choice([0, 1, 2], 1000)  # 3-class classification

# Broadcasting and vectorization exercises

# 1. Standardize features (zero mean, unit variance) using broadcasting
X_standardized = # Your code here

# 2. Min-Max normalization to [0, 1] range
X_normalized = # Your code here

# 3. Calculate pairwise Euclidean distances (first 100 samples for efficiency)
X_subset = X[:100]
distances = # Your code here - use broadcasting to avoid loops

# 4. Apply different transformations to different features
# Log transform to features 0-2, square root to features 3-5, leave others unchanged
X_transformed = X.copy()
# Your code here

# 5. One-hot encode the target variable
n_classes = len(np.unique(y))
y_onehot = # Your code here

# 6. Calculate class-wise feature means using broadcasting
class_means = # Your code here

# 7. Compute softmax probabilities (common in neural networks)
logits = np.random.randn(100, 3)  # 100 samples, 3 classes
def softmax(x):
    # Your code here - implement numerically stable softmax
    pass

probabilities = softmax(logits)

print(f"Original data - mean: {X.mean():.3f}, std: {X.std():.3f}")
print(f"Standardized data - mean: {X_standardized.mean():.3f}, std: {X_standardized.std():.3f}")
print(f"Normalized data - min: {X_normalized.min():.3f}, max: {X_normalized.max():.3f}")
print(f"Distance matrix shape: {distances.shape}")
print(f"One-hot encoded shape: {y_onehot.shape}")
print(f"Class means shape: {class_means.shape}")
print(f"Softmax probabilities sum to 1: {np.allclose(probabilities.sum(axis=1), 1)}")

In [None]:
# Test: Verify broadcasting operations
print(f"Standardization correct: {np.allclose(X_standardized.mean(axis=0), 0, atol=1e-10)}")
print(f"Normalization correct: {X_normalized.min() >= 0 and X_normalized.max() <= 1}")
print(f"Distance matrix symmetric: {np.allclose(distances, distances.T)}")
print(f"Distance matrix diagonal is zero: {np.allclose(np.diag(distances), 0)}")
print(f"One-hot encoding correct: {y_onehot.shape == (1000, 3)}")
print(f"Softmax probabilities valid: {np.all(probabilities >= 0) and np.all(probabilities <= 1)}")

### 5. Linear Algebra for Machine Learning (Mid Level)

<details>
<summary>💡 Click for hint</summary>

**Interview Focus**: Linear algebra operations fundamental to ML algorithms.

**Key concepts**: Matrix multiplication, eigenvalues, SVD, solving linear systems

**Common Questions**: "Implement PCA from scratch?" "Solve linear regression using normal equation?"

</details>

In [None]:
# Linear algebra operations for ML
np.random.seed(101)

# Create sample data for linear regression
n_samples, n_features = 1000, 5
X = np.random.randn(n_samples, n_features)
true_weights = np.array([1.5, -2.0, 0.5, 3.0, -1.0])
noise = np.random.randn(n_samples) * 0.1
y = X @ true_weights + noise

# Add bias term
X_with_bias = np.column_stack([np.ones(n_samples), X])

# Linear algebra exercises

# 1. Solve linear regression using normal equation: w = (X^T X)^(-1) X^T y
weights_normal = # Your code here

# 2. Compute eigenvalues and eigenvectors of covariance matrix
cov_matrix = # Your code here
eigenvalues, eigenvectors = # Your code here

# 3. Perform SVD on the data matrix
U, s, Vt = # Your code here

# 4. Implement PCA transformation (reduce to 3 components)
n_components = 3
# Center the data first
X_centered = # Your code here
# Get principal components
principal_components = # Your code here
# Transform data
X_pca = # Your code here

# 5. Calculate matrix rank and condition number
matrix_rank = # Your code here
condition_number = # Your code here

# 6. Compute QR decomposition
Q, R = # Your code here

# 7. Calculate determinant and trace
det_cov = # Your code here
trace_cov = # Your code here

# 8. Compute matrix norms
frobenius_norm = # Your code here
spectral_norm = # Your code here

print(f"True weights: {true_weights}")
print(f"Estimated weights: {weights_normal[1:]}")
print(f"Weight estimation error: {np.linalg.norm(weights_normal[1:] - true_weights):.6f}")
print(f"Eigenvalues: {eigenvalues}")
print(f"SVD singular values: {s[:5]}")
print(f"PCA transformed shape: {X_pca.shape}")
print(f"Matrix rank: {matrix_rank}")
print(f"Condition number: {condition_number:.2f}")
print(f"Determinant: {det_cov:.6f}")
print(f"Trace: {trace_cov:.6f}")

In [None]:
# Test: Verify linear algebra operations
print(f"Normal equation solution close to true: {np.allclose(weights_normal[1:], true_weights, atol=0.1)}")
print(f"Eigenvalues are real: {np.all(np.isreal(eigenvalues))}")
print(f"SVD reconstruction works: {np.allclose(X_centered, U @ np.diag(s) @ Vt)}")
print(f"PCA reduces dimensionality: {X_pca.shape[1] == n_components}")
print(f"QR decomposition correct: {np.allclose(X_with_bias, Q @ R)}")
print(f"Trace equals sum of eigenvalues: {np.allclose(trace_cov, eigenvalues.sum())}")

### 6. Optimization and Gradient Computation (Mid Level)

<details>
<summary>💡 Click for hint</summary>

**Interview Focus**: Implementing optimization algorithms from scratch - common in ML engineering roles.

**Key concepts**: Gradient descent, cost functions, numerical optimization

**Common Questions**: "Implement gradient descent?" "How to compute gradients efficiently?"

</details>

In [None]:
# Optimization algorithms for machine learning
np.random.seed(202)

# Generate dataset for logistic regression
n_samples, n_features = 1000, 3
X = np.random.randn(n_samples, n_features)
true_weights = np.array([0.5, -1.2, 0.8])
logits = X @ true_weights
probabilities = 1 / (1 + np.exp(-logits))  # Sigmoid
y = np.random.binomial(1, probabilities)

# Add bias term
X_with_bias = np.column_stack([np.ones(n_samples), X])

# Optimization functions

def sigmoid(z):
    # Your code here - numerically stable sigmoid
    pass

def logistic_cost(weights, X, y):
    # Your code here - logistic regression cost function
    pass

def logistic_gradient(weights, X, y):
    # Your code here - gradient of logistic regression
    pass

def gradient_descent(X, y, learning_rate=0.01, max_iterations=1000, tolerance=1e-6):
    # Your code here - implement gradient descent
    pass

def stochastic_gradient_descent(X, y, learning_rate=0.01, max_iterations=1000, batch_size=32):
    # Your code here - implement SGD with mini-batches
    pass

# Run optimizations
weights_gd, costs_gd = gradient_descent(X_with_bias, y)
weights_sgd, costs_sgd = stochastic_gradient_descent(X_with_bias, y)

# Momentum-based gradient descent
def gradient_descent_momentum(X, y, learning_rate=0.01, momentum=0.9, max_iterations=1000):
    # Your code here - implement momentum
    pass

weights_momentum, costs_momentum = gradient_descent_momentum(X_with_bias, y)

# Adam optimizer
def adam_optimizer(X, y, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-8, max_iterations=1000):
    # Your code here - implement Adam
    pass

weights_adam, costs_adam = adam_optimizer(X_with_bias, y)

# Evaluate final models
def accuracy(weights, X, y):
    predictions = (sigmoid(X @ weights) > 0.5).astype(int)
    return np.mean(predictions == y)

print(f"True weights: {true_weights}")
print(f"GD weights: {weights_gd[1:]}")
print(f"SGD weights: {weights_sgd[1:]}")
print(f"Momentum weights: {weights_momentum[1:]}")
print(f"Adam weights: {weights_adam[1:]}")
print(f"\nAccuracies:")
print(f"GD: {accuracy(weights_gd, X_with_bias, y):.4f}")
print(f"SGD: {accuracy(weights_sgd, X_with_bias, y):.4f}")
print(f"Momentum: {accuracy(weights_momentum, X_with_bias, y):.4f}")
print(f"Adam: {accuracy(weights_adam, X_with_bias, y):.4f}")

In [None]:
# Test: Verify optimization algorithms
print(f"GD converged: {costs_gd[-1] < costs_gd[0]}")
print(f"SGD converged: {costs_sgd[-1] < costs_sgd[0]}")
print(f"All accuracies > 0.8: {all(accuracy(w, X_with_bias, y) > 0.8 for w in [weights_gd, weights_sgd, weights_momentum, weights_adam])}")
print(f"Weight estimates reasonable: {np.allclose(weights_gd[1:], true_weights, atol=0.5)}")

### 7. Feature Engineering and Data Preprocessing (Mid Level)

<details>
<summary>💡 Click for hint</summary>

**Interview Focus**: Data preprocessing pipeline - essential for real-world ML projects.

**Key concepts**: Feature scaling, polynomial features, binning, outlier handling

**Common Questions**: "How to handle different data types?" "Create interaction features?"

</details>

In [None]:
# Feature engineering for machine learning
np.random.seed(303)

# Create mixed-type dataset
n_samples = 1000
continuous_features = np.random.randn(n_samples, 3)
categorical_features = np.random.choice(['A', 'B', 'C'], (n_samples, 2))
ordinal_features = np.random.choice([1, 2, 3, 4, 5], (n_samples, 1))

# Feature engineering functions

# 1. Robust scaling (using median and IQR)
def robust_scale(X):
    # Your code here
    pass

continuous_scaled = robust_scale(continuous_features)

# 2. Create polynomial features (degree 2)
def polynomial_features(X, degree=2):
    # Your code here - include interaction terms
    pass

poly_features = polynomial_features(continuous_features[:, :2])  # Use first 2 features

# 3. Binning continuous variables
def create_bins(X, n_bins=5, strategy='quantile'):
    # Your code here - quantile or uniform binning
    pass

binned_features = create_bins(continuous_features[:, 0])

# 4. One-hot encoding for categorical variables
def one_hot_encode(X):
    # Your code here
    pass

categorical_encoded = one_hot_encode(categorical_features[:, 0])

# 5. Target encoding (mean encoding)
# Create a target variable first
target = np.random.binomial(1, 0.3, n_samples)

def target_encode(categorical, target):
    # Your code here - encode categories by target mean
    pass

target_encoded = target_encode(categorical_features[:, 0], target)

# 6. Outlier detection and handling
def detect_outliers_iqr(X, factor=1.5):
    # Your code here - IQR method
    pass

def winsorize(X, limits=(0.05, 0.05)):
    # Your code here - cap outliers at percentiles
    pass

outliers = detect_outliers_iqr(continuous_features[:, 0])
winsorized = winsorize(continuous_features[:, 0])

# 7. Feature selection using variance threshold
def variance_threshold_selection(X, threshold=0.1):
    # Your code here
    pass

# 8. Create interaction features
def create_interactions(X):
    # Your code here - pairwise products
    pass

interaction_features = create_interactions(continuous_features)

# 9. Log transformation for skewed data
skewed_data = np.random.exponential(2, (n_samples, 1))
log_transformed = # Your code here

print(f"Original continuous features shape: {continuous_features.shape}")
print(f"Polynomial features shape: {poly_features.shape}")
print(f"Categorical encoded shape: {categorical_encoded.shape}")
print(f"Interaction features shape: {interaction_features.shape}")
print(f"Outliers detected: {np.sum(outliers)}")
print(f"Original skewed data skewness: {((skewed_data - skewed_data.mean()) ** 3).mean() / skewed_data.std() ** 3:.2f}")
print(f"Log-transformed skewness: {((log_transformed - log_transformed.mean()) ** 3).mean() / log_transformed.std() ** 3:.2f}")

In [None]:
# Test: Verify feature engineering
print(f"Robust scaling worked: {np.abs(np.median(continuous_scaled, axis=0)).max() < 0.1}")
print(f"Polynomial features include interactions: {poly_features.shape[1] > continuous_features.shape[1]}")
print(f"One-hot encoding sums to 1: {np.allclose(categorical_encoded.sum(axis=1), 1)}")
print(f"Outliers detected: {np.sum(outliers) > 0}")
print(f"Log transformation reduces skewness: {abs(((log_transformed - log_transformed.mean()) ** 3).mean() / log_transformed.std() ** 3) < abs(((skewed_data - skewed_data.mean()) ** 3).mean() / skewed_data.std() ** 3)}")

### 8. Time Series Analysis with NumPy (Mid Level)

<details>
<summary>💡 Click for hint</summary>

**Interview Focus**: Time series preprocessing and feature extraction for ML models.

**Key concepts**: Rolling windows, lag features, seasonal decomposition, trend analysis

**Common Questions**: "Create features for time series prediction?" "Handle temporal dependencies?"

</details>

In [None]:
# Time series analysis and feature engineering
np.random.seed(404)

# Generate synthetic time series data
n_points = 1000
time = np.arange(n_points)

# Create time series with trend, seasonality, and noise
trend = 0.01 * time
seasonal = 2 * np.sin(2 * np.pi * time / 50) + np.sin(2 * np.pi * time / 12)
noise = np.random.randn(n_points) * 0.5
ts_data = trend + seasonal + noise + 10  # Add baseline

# Time series functions

# 1. Moving averages (simple and exponential)
def moving_average(data, window):
    # Your code here
    pass

def exponential_moving_average(data, alpha=0.3):
    # Your code here
    pass

ma_5 = moving_average(ts_data, 5)
ma_20 = moving_average(ts_data, 20)
ema = exponential_moving_average(ts_data)

# 2. Create lag features
def create_lag_features(data, lags):
    # Your code here - create matrix with lag features
    pass

lag_features = create_lag_features(ts_data, [1, 2, 3, 5, 10])

# 3. Rolling statistics
def rolling_statistics(data, window):
    # Your code here - mean, std, min, max
    pass

rolling_stats = rolling_statistics(ts_data, 10)

# 4. Difference features (for stationarity)
def create_differences(data, orders=[1, 2]):
    # Your code here
    pass

diff_features = create_differences(ts_data)

# 5. Seasonal decomposition (simplified)
def seasonal_decompose_simple(data, period=50):
    # Your code here - extract trend and seasonal components
    pass

trend_component, seasonal_component, residual = seasonal_decompose_simple(ts_data)

# 6. Fourier features for seasonality
def fourier_features(time, period, n_terms=3):
    # Your code here - sin and cos features
    pass

fourier_feats = fourier_features(time, 50)

# 7. Volatility features
def calculate_volatility(data, window=20):
    # Your code here - rolling standard deviation
    pass

volatility = calculate_volatility(ts_data)

# 8. Change point detection (simple)
def detect_change_points(data, window=20, threshold=2):
    # Your code here - detect significant changes in mean
    pass

change_points = detect_change_points(ts_data)

# 9. Autocorrelation features
def autocorrelation(data, max_lag=20):
    # Your code here
    pass

autocorr = autocorrelation(ts_data)

print(f"Time series length: {len(ts_data)}")
print(f"Moving averages shape: MA5={len(ma_5)}, MA20={len(ma_20)}")
print(f"Lag features shape: {lag_features.shape}")
print(f"Rolling statistics shape: {rolling_stats.shape}")
print(f"Difference features shape: {diff_features.shape}")
print(f"Fourier features shape: {fourier_feats.shape}")
print(f"Change points detected: {np.sum(change_points)}")
print(f"Autocorrelation shape: {autocorr.shape}")
print(f"Max autocorrelation (lag 1): {autocorr[1]:.3f}")

In [None]:
# Test: Verify time series operations
print(f"Moving averages calculated: {len(ma_5) > 0 and len(ma_20) > 0}")
print(f"Lag features have correct lags: {lag_features.shape[1] == 5}")
print(f"Rolling stats include multiple metrics: {rolling_stats.shape[1] >= 4}")
print(f"Seasonal decomposition sums correctly: {np.allclose(trend_component + seasonal_component + residual, ts_data, rtol=0.1)}")
print(f"Fourier features are periodic: {fourier_feats.shape[1] >= 6}")
print(f"Autocorrelation at lag 0 is 1: {np.isclose(autocorr[0], 1)}")

### 9. Image Processing for Computer Vision (Mid Level)

<details>
<summary>💡 Click for hint</summary>

**Interview Focus**: Basic image processing operations for CV/ML preprocessing.

**Key concepts**: Convolution, filtering, feature extraction, image transformations

**Common Questions**: "Implement convolution from scratch?" "Extract features from images?"

</details>

In [None]:
# Image processing with NumPy
np.random.seed(505)

# Create synthetic image data
height, width = 64, 64
image = np.random.rand(height, width)

# Add some structure (circles and rectangles)
y, x = np.ogrid[:height, :width]
center_y, center_x = height // 2, width // 2
circle_mask = (x - center_x) ** 2 + (y - center_y) ** 2 < 400
image[circle_mask] = 0.8

# Add rectangle
image[10:20, 10:30] = 0.2

# Image processing functions

# 1. Convolution operation
def convolve2d(image, kernel, padding='valid'):
    # Your code here - implement 2D convolution
    pass

# Define common kernels
sobel_x = np.array([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]])
sobel_y = np.array([[-1, -2, -1], [0, 0, 0], [1, 2, 1]])
gaussian_blur = np.array([[1, 2, 1], [2, 4, 2], [1, 2, 1]]) / 16
edge_detection = np.array([[0, -1, 0], [-1, 4, -1], [0, -1, 0]])

# Apply filters
edges_x = convolve2d(image, sobel_x)
edges_y = convolve2d(image, sobel_y)
blurred = convolve2d(image, gaussian_blur)
edges = convolve2d(image, edge_detection)

# 2. Edge magnitude and direction
edge_magnitude = # Your code here
edge_direction = # Your code here

# 3. Image gradients
def image_gradients(image):
    # Your code here - compute gradients using np.gradient
    pass

grad_y, grad_x = image_gradients(image)

# 4. Local Binary Pattern (simplified)
def local_binary_pattern(image, radius=1):
    # Your code here - simplified LBP
    pass

lbp = local_binary_pattern(image)

# 5. Histogram of Oriented Gradients (HOG) features
def hog_features(image, cell_size=8, n_bins=9):
    # Your code here - simplified HOG
    pass

hog = hog_features(image)

# 6. Image moments
def image_moments(image):
    # Your code here - calculate spatial moments
    pass

moments = image_moments(image)

# 7. Connected components (simplified)
def connected_components(binary_image):
    # Your code here - find connected regions
    pass

binary_image = image > 0.5
components = connected_components(binary_image)

# 8. Image statistics
def image_statistics(image):
    # Your code here - various statistical features
    pass

stats = image_statistics(image)

print(f"Original image shape: {image.shape}")
print(f"Edge magnitude shape: {edge_magnitude.shape}")
print(f"Gradient shapes: {grad_x.shape}, {grad_y.shape}")
print(f"LBP shape: {lbp.shape}")
print(f"HOG features length: {len(hog) if hog is not None else 'Not implemented'}")
print(f"Image moments: {moments}")
print(f"Connected components: {np.max(components) if components is not None else 'Not implemented'}")
print(f"Image statistics: {stats}")

In [None]:
# Test: Verify image processing operations
print(f"Convolution reduces image size: {edges_x.shape[0] < image.shape[0]}")
print(f"Edge magnitude is non-negative: {np.all(edge_magnitude >= 0)}")
print(f"Edge direction in valid range: {np.all((-np.pi <= edge_direction) & (edge_direction <= np.pi))}")
print(f"Gradients computed: {grad_x.shape == image.shape and grad_y.shape == image.shape}")
print(f"Image statistics calculated: {isinstance(stats, dict) and len(stats) > 0}")

### 10. Advanced NumPy Techniques and Performance (Mid-Advanced Level)

<details>
<summary>💡 Click for hint</summary>

**Interview Focus**: Advanced NumPy usage, memory optimization, and performance considerations.

**Key concepts**: Memory views, advanced indexing, vectorization, numba integration

**Common Questions**: "Optimize NumPy code for large datasets?" "Memory-efficient operations?"

</details>

In [None]:
# Advanced NumPy techniques
import time
np.random.seed(606)

# Large dataset for performance testing
large_data = np.random.randn(10000, 100)

# Performance optimization techniques

# 1. Memory views vs copies
def demonstrate_views_vs_copies():
    # Your code here - show difference between views and copies
    pass

# 2. Efficient array operations
def efficient_operations(data):
    # Your code here - use in-place operations, avoid temporary arrays
    pass

# 3. Advanced indexing techniques
def advanced_indexing_examples(data):
    # Your code here - fancy indexing, boolean masks, etc.
    pass

# 4. Vectorized string operations
string_data = np.array(['apple', 'banana', 'cherry', 'date'] * 1000)
def vectorized_string_ops(strings):
    # Your code here - use np.char functions
    pass

# 5. Custom ufuncs
def create_custom_ufunc():
    # Your code here - create and use custom universal function
    pass

# 6. Memory-efficient operations for large arrays
def memory_efficient_computation(data):
    # Your code here - chunked processing, generators
    pass

# 7. Structured arrays for heterogeneous data
def create_structured_array():
    # Your code here - create and manipulate structured arrays
    pass

structured_data = create_structured_array()

# 8. Performance comparison: loops vs vectorization
def performance_comparison():
    data = np.random.randn(100000)
    
    # Loop version
    start_time = time.time()
    result_loop = np.zeros_like(data)
    for i in range(len(data)):
        result_loop[i] = data[i] ** 2 + 2 * data[i] + 1
    loop_time = time.time() - start_time
    
    # Vectorized version
    start_time = time.time()
    result_vectorized = # Your code here
    vectorized_time = time.time() - start_time
    
    return loop_time, vectorized_time, np.allclose(result_loop, result_vectorized)

loop_time, vec_time, results_match = performance_comparison()

# 9. Advanced broadcasting examples
def advanced_broadcasting():
    # Your code here - complex broadcasting scenarios
    pass

# 10. Numerical stability considerations
def numerical_stability_examples():
    # Your code here - demonstrate numerical issues and solutions
    pass

stability_examples = numerical_stability_examples()

print(f"Large data shape: {large_data.shape}")
print(f"Performance comparison:")
print(f"  Loop time: {loop_time:.6f} seconds")
print(f"  Vectorized time: {vec_time:.6f} seconds")
print(f"  Speedup: {loop_time/vec_time:.1f}x")
print(f"  Results match: {results_match}")
print(f"Structured array created: {structured_data is not None}")
print(f"Numerical stability examples: {stability_examples is not None}")

In [None]:
# Test: Verify advanced techniques
print(f"Vectorization is faster: {vec_time < loop_time}")
print(f"Significant speedup achieved: {loop_time/vec_time > 5}")
print(f"Large data processed: {large_data.size == 1000000}")
print(f"String operations work: {len(string_data) == 4000}")