# Module 4 - Exercise 3: Backpropagation

<a href="https://colab.research.google.com/github/jumpingsphinx/jumpingsphinx.github.io/blob/main/notebooks/module4-neural-networks/exercise3-backpropagation.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Learning Objectives

By the end of this exercise, you will be able to:

- Implement backpropagation from scratch
- Understand gradient computation through the chain rule
- Calculate gradients for different activation functions
- Debug backpropagation using gradient checking
- Train neural networks with gradient descent
- Understand common training issues and solutions

## Prerequisites

- Completion of Exercise 2 (Feedforward Networks)
- Understanding of calculus and chain rule
- Familiarity with gradient descent

## Setup

Run this cell first to import required libraries:

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons, make_circles, make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Set random seed for reproducibility
np.random.seed(42)

print("NumPy version:", np.__version__)
print("Setup complete!")

---

## Part 1: The Chain Rule and Computational Graphs

### Background

Backpropagation is simply the **chain rule** applied to neural networks.

**Example:** For $f(x) = (3x + 2)^2$

Let $u = 3x + 2$ and $f = u^2$

Chain rule: $$\frac{df}{dx} = \frac{df}{du} \cdot \frac{du}{dx} = 2u \cdot 3 = 6(3x + 2)$$

### Exercise 1.1: Simple Chain Rule

**Task:** Compute derivatives using the chain rule.

In [None]:
def compute_simple_derivative():
    """
    Compute df/dx for f(x) = (3x + 2)^2 at x = 1
    """
    x = 1
    
    # Forward pass
    u = 3 * x + 2
    f = u ** 2
    
    # Backward pass
    # Your code here
    df_du = 2 * u  # derivative of u^2
    du_dx =        # derivative of 3x + 2
    
    df_dx = df_du * du_dx  # chain rule
    
    return f, df_dx

f_val, df_dx = compute_simple_derivative()
print(f"f(1) = {f_val}")
print(f"df/dx at x=1 = {df_dx}")

# Verify with numerical gradient
epsilon = 1e-7
x = 1
numerical_grad = ((3*(x+epsilon) + 2)**2 - (3*x + 2)**2) / epsilon
print(f"Numerical gradient: {numerical_grad:.6f}")
print(f"Match: {np.isclose(df_dx, numerical_grad)}")

### Exercise 1.2: Computational Graph for Neural Network

**Task:** Draw and understand the computational graph for a simple neuron.

For a single neuron: $y = \sigma(wx + b)$

In [None]:
def sigmoid(z):
    """Sigmoid activation function."""
    return 1 / (1 + np.exp(-np.clip(z, -500, 500)))

def sigmoid_derivative(z):
    """Derivative of sigmoid: σ'(z) = σ(z)(1 - σ(z))"""
    s = sigmoid(z)
    return s * (1 - s)

# Forward pass for a single neuron
x = 2.0
w = 0.5
b = 1.0

# Compute forward
z = w * x + b
y = sigmoid(z)

print("Forward Pass:")
print(f"x = {x}")
print(f"z = wx + b = {w}*{x} + {b} = {z}")
print(f"y = σ(z) = {y:.4f}")

# Backward pass: compute dy/dw, dy/db, dy/dx
# Your code here
dy_dz = sigmoid_derivative(z)  # derivative of σ(z)
dz_dw =                        # derivative of wx + b w.r.t. w
dz_db =                        # derivative of wx + b w.r.t. b
dz_dx =                        # derivative of wx + b w.r.t. x

dy_dw = dy_dz * dz_dw
dy_db = dy_dz * dz_db
dy_dx = dy_dz * dz_dx

print("\nBackward Pass (Gradients):")
print(f"dy/dw = {dy_dw:.4f}")
print(f"dy/db = {dy_db:.4f}")
print(f"dy/dx = {dy_dx:.4f}")

# Visualize computational graph
print("\nComputational Graph:")
print("x ----> [×w] ----> [+b] ----> [σ] ----> y")
print("         |          |          |")
print("         w          b          ")
print("\nBackward flow:")
print("dy/dx <-- dy/dz·w <-- dy/dz <-- dy/dy=1")

---

## Part 2: Manual Gradient Computation for 2-Layer Network

### Background

For a 2-layer network:

**Forward:**
$$\mathbf{Z}^{[1]} = \mathbf{W}^{[1]} \mathbf{X} + \mathbf{b}^{[1]}$$
$$\mathbf{A}^{[1]} = \text{ReLU}(\mathbf{Z}^{[1]})$$
$$\mathbf{Z}^{[2]} = \mathbf{W}^{[2]} \mathbf{A}^{[1]} + \mathbf{b}^{[2]}$$
$$\mathbf{A}^{[2]} = \sigma(\mathbf{Z}^{[2]})$$

**Loss (Binary Cross-Entropy):**
$$L = -\frac{1}{m} \sum_{i=1}^{m} [y^{(i)} \log(a^{[2](i)}) + (1-y^{(i)}) \log(1-a^{[2](i)})]$$

### Exercise 2.1: Derive Backward Pass Equations

**Task:** Complete the backward pass derivations.

In [None]:
def relu(z):
    """ReLU activation."""
    return np.maximum(0, z)

def relu_derivative(z):
    """Derivative of ReLU: 1 if z > 0, else 0"""
    return (z > 0).astype(float)

def binary_cross_entropy(y_true, y_pred):
    """
    Binary cross-entropy loss.
    
    Parameters:
    -----------
    y_true : ndarray, shape (1, m)
        True labels
    y_pred : ndarray, shape (1, m)
        Predicted probabilities
    """
    m = y_true.shape[1]
    epsilon = 1e-15  # for numerical stability
    y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
    
    loss = -1/m * np.sum(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
    return loss

print("Backward Pass Equations:")
print("=" * 70)
print("")
print("Output layer:")
print("  dL/dA² = -(Y/A² - (1-Y)/(1-A²))")
print("  dL/dZ² = dL/dA² ⊙ σ'(Z²) = A² - Y")
print("")
print("Hidden layer:")
print("  dL/dW² = (1/m) · dL/dZ² · (A¹)ᵀ")
print("  dL/db² = (1/m) · sum(dL/dZ², axis=1, keepdims=True)")
print("  dL/dA¹ = (W²)ᵀ · dL/dZ²")
print("  dL/dZ¹ = dL/dA¹ ⊙ ReLU'(Z¹)")
print("  dL/dW¹ = (1/m) · dL/dZ¹ · Xᵀ")
print("  dL/db¹ = (1/m) · sum(dL/dZ¹, axis=1, keepdims=True)")
print("")
print("Note: ⊙ denotes element-wise multiplication")

### Exercise 2.2: Implement Backward Pass

**Task:** Complete the backward propagation function.

In [None]:
def backward_propagation(X, Y, cache, parameters):
    """
    Implement backward propagation.
    
    Parameters:
    -----------
    X : ndarray, shape (n_input, m)
        Input data
    Y : ndarray, shape (1, m)
        True labels
    cache : dict
        Contains Z1, A1, Z2, A2 from forward pass
    parameters : dict
        Contains W1, b1, W2, b2
    
    Returns:
    --------
    gradients : dict
        Contains dW1, db1, dW2, db2
    """
    m = X.shape[1]
    
    # Retrieve from cache
    Z1 = cache['Z1']
    A1 = cache['A1']
    Z2 = cache['Z2']
    A2 = cache['A2']
    
    # Retrieve parameters
    W2 = parameters['W2']
    
    # Backward propagation
    # Output layer
    # Your code here
    dZ2 = A2 - Y  # derivative of binary cross-entropy with sigmoid
    dW2 = 
    db2 = 
    
    # Hidden layer
    # Your code here
    dA1 = 
    dZ1 = dA1 * relu_derivative(Z1)
    dW1 = 
    db1 = 
    
    gradients = {
        'dW1': dW1,
        'db1': db1,
        'dW2': dW2,
        'db2': db2
    }
    
    return gradients

print("Backward propagation function implemented!")

---

## Part 3: Gradient Checking

### Background

Gradient checking verifies that backpropagation is correct by comparing analytical gradients with numerical gradients.

Numerical gradient: $$\frac{\partial L}{\partial \theta} \approx \frac{L(\theta + \epsilon) - L(\theta - \epsilon)}{2\epsilon}$$

### Exercise 3.1: Implement Gradient Checking

**Task:** Implement numerical gradient computation and comparison.

In [None]:
def forward_propagation(X, parameters):
    """
    Forward propagation.
    
    Returns:
    --------
    A2 : predictions
    cache : intermediate values
    """
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    
    Z1 = np.dot(W1, X) + b1
    A1 = relu(Z1)
    Z2 = np.dot(W2, A1) + b2
    A2 = sigmoid(Z2)
    
    cache = {'Z1': Z1, 'A1': A1, 'Z2': Z2, 'A2': A2}
    
    return A2, cache

def gradient_check(X, Y, parameters, gradients, epsilon=1e-7):
    """
    Check if backpropagation gradients are correct.
    
    Parameters:
    -----------
    X : input data
    Y : labels
    parameters : network parameters
    gradients : gradients from backprop
    epsilon : small value for numerical gradient
    
    Returns:
    --------
    difference : relative difference between gradients
    """
    # Convert parameters and gradients to vectors
    param_keys = ['W1', 'b1', 'W2', 'b2']
    
    # Flatten parameters
    theta = np.concatenate([parameters[key].ravel() for key in param_keys])
    grad_theta = np.concatenate([gradients['d' + key].ravel() for key in param_keys])
    
    # Compute numerical gradients
    num_gradients = np.zeros_like(theta)
    
    for i in range(len(theta)):
        # Perturb theta[i] by +epsilon
        theta_plus = theta.copy()
        theta_plus[i] += epsilon
        
        # Reconstruct parameters
        params_plus = dict_from_vector(theta_plus, parameters)
        A2_plus, _ = forward_propagation(X, params_plus)
        loss_plus = binary_cross_entropy(Y, A2_plus)
        
        # Perturb theta[i] by -epsilon
        theta_minus = theta.copy()
        theta_minus[i] -= epsilon
        
        # Reconstruct parameters
        params_minus = dict_from_vector(theta_minus, parameters)
        A2_minus, _ = forward_propagation(X, params_minus)
        loss_minus = binary_cross_entropy(Y, A2_minus)
        
        # Compute numerical gradient
        # Your code here
        num_gradients[i] = 
    
    # Compute relative difference
    numerator = np.linalg.norm(grad_theta - num_gradients)
    denominator = np.linalg.norm(grad_theta) + np.linalg.norm(num_gradients)
    difference = numerator / denominator
    
    return difference, num_gradients, grad_theta

def dict_from_vector(theta, parameters):
    """Reconstruct parameter dictionary from vector."""
    params = {}
    idx = 0
    
    for key in ['W1', 'b1', 'W2', 'b2']:
        shape = parameters[key].shape
        size = np.prod(shape)
        params[key] = theta[idx:idx+size].reshape(shape)
        idx += size
    
    return params

print("Gradient checking functions implemented!")

### Exercise 3.2: Test Gradient Checking

**Task:** Run gradient checking on a small dataset.

In [None]:
# Create small dataset for testing
X_test = np.random.randn(2, 3)
Y_test = np.array([[1, 0, 1]])

# Initialize small network
parameters_test = {
    'W1': np.random.randn(3, 2) * 0.01,
    'b1': np.zeros((3, 1)),
    'W2': np.random.randn(1, 3) * 0.01,
    'b2': np.zeros((1, 1))
}

# Forward and backward pass
A2_test, cache_test = forward_propagation(X_test, parameters_test)
gradients_test = backward_propagation(X_test, Y_test, cache_test, parameters_test)

# Run gradient checking
print("Running Gradient Checking...")
print("=" * 70)
difference, num_grad, ana_grad = gradient_check(X_test, Y_test, 
                                                 parameters_test, gradients_test)

print(f"\nRelative difference: {difference:.10f}")
print("")
if difference < 1e-7:
    print("✓ EXCELLENT! Gradient check passed with high precision.")
elif difference < 1e-5:
    print("✓ GOOD! Gradient check passed.")
elif difference < 1e-3:
    print("⚠ WARNING: Gradient check borderline. Check implementation.")
else:
    print("✗ FAIL: Gradient check failed. Bug in backpropagation!")

# Show some gradient comparisons
print("\nSample Gradient Comparisons:")
print(f"{'Index':<10} {'Analytical':<20} {'Numerical':<20} {'Difference'}")
print("-" * 70)
for i in range(min(5, len(ana_grad))):
    diff = abs(ana_grad[i] - num_grad[i])
    print(f"{i:<10} {ana_grad[i]:<20.10f} {num_grad[i]:<20.10f} {diff:.10f}")

---

## Part 4: Complete Training Loop

### Exercise 4.1: Implement Training Function

**Task:** Build a complete training loop with gradient descent.

In [None]:
def initialize_parameters(n_input, n_hidden, n_output):
    """
    Initialize network parameters.
    
    Returns:
    --------
    parameters : dict
    """
    np.random.seed(42)
    
    parameters = {
        'W1': np.random.randn(n_hidden, n_input) * 0.01,
        'b1': np.zeros((n_hidden, 1)),
        'W2': np.random.randn(n_output, n_hidden) * 0.01,
        'b2': np.zeros((n_output, 1))
    }
    
    return parameters

def update_parameters(parameters, gradients, learning_rate):
    """
    Update parameters using gradient descent.
    
    Parameters:
    -----------
    parameters : dict
        Current parameters
    gradients : dict
        Gradients from backprop
    learning_rate : float
        Learning rate
    
    Returns:
    --------
    parameters : dict
        Updated parameters
    """
    # Your code here
    parameters['W1'] = parameters['W1'] - learning_rate * gradients['dW1']
    parameters['b1'] = 
    parameters['W2'] = 
    parameters['b2'] = 
    
    return parameters

def train_network(X, Y, n_hidden=4, learning_rate=0.01, num_iterations=10000, 
                  print_cost=True):
    """
    Train a 2-layer neural network.
    
    Parameters:
    -----------
    X : ndarray, shape (n_input, m)
        Training data
    Y : ndarray, shape (1, m)
        Labels
    n_hidden : int
        Number of hidden units
    learning_rate : float
        Learning rate
    num_iterations : int
        Number of training iterations
    print_cost : bool
        Whether to print cost
    
    Returns:
    --------
    parameters : dict
        Trained parameters
    costs : list
        Loss history
    """
    n_input = X.shape[0]
    n_output = 1
    
    # Initialize
    parameters = initialize_parameters(n_input, n_hidden, n_output)
    costs = []
    
    # Training loop
    for i in range(num_iterations):
        # Forward propagation
        # Your code here
        A2, cache = forward_propagation(X, parameters)
        
        # Compute cost
        cost = binary_cross_entropy(Y, A2)
        
        # Backward propagation
        # Your code here
        gradients = 
        
        # Update parameters
        # Your code here
        parameters = 
        
        # Record cost
        if i % 100 == 0:
            costs.append(cost)
            if print_cost:
                accuracy = np.mean((A2 > 0.5) == Y)
                print(f"Iteration {i:5d}: Loss = {cost:.6f}, Accuracy = {accuracy:.4f}")
    
    return parameters, costs

print("Training function implemented!")

### Exercise 4.2: Train on XOR Problem

**Task:** Train the network to solve XOR.

In [None]:
# XOR dataset
X_xor = np.array([[0, 0, 1, 1],
                  [0, 1, 0, 1]])
Y_xor = np.array([[0, 1, 1, 0]])

print("Training on XOR Problem")
print("=" * 70)

# Train the network
parameters_xor, costs_xor = train_network(X_xor, Y_xor, n_hidden=4, 
                                          learning_rate=1.0, 
                                          num_iterations=5000)

# Test final predictions
A2_final, _ = forward_propagation(X_xor, parameters_xor)
predictions = (A2_final > 0.5).astype(int)

print("\nFinal Results:")
print("=" * 70)
print(f"{'x1':<5} {'x2':<5} {'True':<8} {'Predicted':<12} {'Probability':<15} {'Correct'}")
print("-" * 70)
for i in range(4):
    x1, x2 = X_xor[:, i]
    true_y = Y_xor[0, i]
    pred_y = predictions[0, i]
    prob_y = A2_final[0, i]
    correct = '✓' if pred_y == true_y else '✗'
    print(f"{x1:<5.0f} {x2:<5.0f} {true_y:<8.0f} {pred_y:<12.0f} {prob_y:<15.4f} {correct}")

accuracy = np.mean(predictions == Y_xor)
print(f"\nFinal Accuracy: {accuracy:.2%}")

if accuracy == 1.0:
    print("✓ SUCCESS! XOR problem solved with backpropagation!")
else:
    print("⚠ Network did not fully converge. Try adjusting hyperparameters.")

---

## Part 5: Learning Curves and Visualization

### Exercise 5.1: Plot Learning Curve

**Task:** Visualize how the loss decreases during training.

In [None]:
def plot_learning_curve(costs, title="Learning Curve"):
    """
    Plot the learning curve (loss over iterations).
    """
    plt.figure(figsize=(10, 6))
    plt.plot(costs, linewidth=2)
    plt.xlabel('Iterations (x100)', fontsize=12)
    plt.ylabel('Loss', fontsize=12)
    plt.title(title, fontsize=14, fontweight='bold')
    plt.grid(True, alpha=0.3)
    plt.show()

plot_learning_curve(costs_xor, "Learning Curve: XOR Problem")

print("\nObservations:")
print(f"  • Initial loss: {costs_xor[0]:.6f}")
print(f"  • Final loss: {costs_xor[-1]:.6f}")
print(f"  • Loss reduction: {(costs_xor[0] - costs_xor[-1]):.6f}")
print("\nA good learning curve shows:")
print("  1. Rapid initial decrease")
print("  2. Gradual convergence to a low value")
print("  3. Smooth curve (not too noisy)")

### Exercise 5.2: Decision Boundary Visualization

**Task:** Visualize the decision boundary learned by the network.

In [None]:
def plot_decision_boundary(parameters, X, Y, title="Decision Boundary"):
    """
    Plot decision boundary for 2D data.
    """
    # Create mesh
    x_min, x_max = -0.5, 1.5
    y_min, y_max = -0.5, 1.5
    h = 0.01
    
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    
    # Predict on mesh
    mesh_input = np.c_[xx.ravel(), yy.ravel()].T
    Z, _ = forward_propagation(mesh_input, parameters)
    Z = Z.reshape(xx.shape)
    
    # Plot
    plt.figure(figsize=(10, 7))
    plt.contourf(xx, yy, Z, levels=20, cmap='RdYlBu', alpha=0.8)
    plt.colorbar(label='Predicted Probability')
    
    # Plot data points
    plt.scatter(X[0, Y[0]==0], X[1, Y[0]==0], c='blue', s=200,
               edgecolors='k', marker='o', label='Class 0', linewidths=2)
    plt.scatter(X[0, Y[0]==1], X[1, Y[0]==1], c='red', s=200,
               edgecolors='k', marker='s', label='Class 1', linewidths=2)
    
    # Decision boundary
    plt.contour(xx, yy, Z, levels=[0.5], colors='black', linewidths=3)
    
    plt.xlabel('$x_1$', fontsize=14)
    plt.ylabel('$x_2$', fontsize=14)
    plt.title(title, fontsize=14, fontweight='bold')
    plt.legend(fontsize=12)
    plt.grid(True, alpha=0.3)
    plt.xlim(x_min, x_max)
    plt.ylim(y_min, y_max)
    plt.show()

plot_decision_boundary(parameters_xor, X_xor, Y_xor, 
                      "Decision Boundary: XOR Solved!")

---

## Part 6: Real Dataset - Moons

### Exercise 6.1: Train on Moons Dataset

**Task:** Apply backpropagation to a more complex dataset.

In [None]:
# Generate moons dataset
from sklearn.datasets import make_moons

X_moons, y_moons = make_moons(n_samples=300, noise=0.2, random_state=42)
X_moons = X_moons.T
y_moons = y_moons.reshape(1, -1)

# Visualize dataset
plt.figure(figsize=(10, 6))
plt.scatter(X_moons[0, y_moons[0]==0], X_moons[1, y_moons[0]==0],
           c='blue', alpha=0.6, edgecolors='k', label='Class 0')
plt.scatter(X_moons[0, y_moons[0]==1], X_moons[1, y_moons[0]==1],
           c='red', alpha=0.6, edgecolors='k', label='Class 1')
plt.xlabel('Feature 1', fontsize=12)
plt.ylabel('Feature 2', fontsize=12)
plt.title('Moons Dataset', fontsize=14, fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

print("Training on Moons Dataset")
print("=" * 70)

# Your code here: Train on moons dataset
# Try different hidden layer sizes (4, 8, 16)
parameters_moons, costs_moons = train_network(X_moons, y_moons, 
                                              n_hidden=8,
                                              learning_rate=0.5,
                                              num_iterations=5000)

# Evaluate
A2_moons, _ = forward_propagation(X_moons, parameters_moons)
accuracy_moons = np.mean((A2_moons > 0.5) == y_moons)
print(f"\nFinal Accuracy: {accuracy_moons:.2%}")

### Exercise 6.2: Visualize Moons Results

**Task:** Plot learning curve and decision boundary.

In [None]:
# Plot learning curve
plot_learning_curve(costs_moons, "Learning Curve: Moons Dataset")

# Plot decision boundary
plot_decision_boundary(parameters_moons, X_moons, y_moons,
                      "Decision Boundary: Moons Dataset")

---

## Challenge Problems (Optional)

### Challenge 1: Implement Momentum

Extend gradient descent with momentum for faster convergence.

In [None]:
def update_parameters_with_momentum(parameters, gradients, velocity, 
                                   learning_rate, beta=0.9):
    """
    Update parameters using gradient descent with momentum.
    
    Momentum update:
    v = beta * v + (1 - beta) * gradient
    parameter = parameter - learning_rate * v
    """
    # Your code here
    pass

print("Challenge: Implement momentum optimizer!")

### Challenge 2: Implement Adam Optimizer

Implement the Adam optimization algorithm.

In [None]:
def update_parameters_with_adam(parameters, gradients, v, s, t,
                               learning_rate=0.001, beta1=0.9, beta2=0.999):
    """
    Update parameters using Adam optimizer.
    
    Adam combines momentum and RMSprop:
    v = beta1 * v + (1 - beta1) * gradient
    s = beta2 * s + (1 - beta2) * gradient^2
    v_corrected = v / (1 - beta1^t)
    s_corrected = s / (1 - beta2^t)
    parameter = parameter - learning_rate * v_corrected / (sqrt(s_corrected) + epsilon)
    """
    # Your code here
    pass

print("Challenge: Implement Adam optimizer!")

### Challenge 3: Visualize Gradient Flow

Create a visualization showing how gradients flow backward through the network.

In [None]:
# Your code here: Visualize gradient magnitudes at each layer
# Compare gradient norms for different layers

print("Challenge: Visualize gradient flow!")

---

## Reflection Questions

1. **Why is backpropagation efficient compared to numerical gradients?**
   - Think about computational complexity

2. **What causes vanishing/exploding gradients?**
   - How does activation function choice affect this?

3. **Why is gradient checking important?**
   - When should you use it?

4. **How does learning rate affect convergence?**
   - What happens if it's too large or too small?

5. **What is the role of initialization?**
   - Why not initialize all weights to zero?

---

## Summary

In this exercise, you learned:

- The chain rule and computational graphs
- How to manually derive and implement backpropagation
- Gradient checking to verify correctness
- Building a complete training loop from scratch
- Visualizing learning curves and decision boundaries
- Training on real datasets (XOR, Moons)

**Key Takeaways:**

- Backpropagation = chain rule applied efficiently
- Gradients flow backward from output to input
- Always verify with gradient checking during development
- Learning curves reveal training dynamics
- Proper initialization and learning rate are crucial

**Next Steps:**

- Complete Exercise 4 on NumPy Implementation
- Review [Lesson 3: Backpropagation](https://jumpingsphinx.github.io/module4-neural-networks/03-backpropagation/)
- Experiment with different optimization algorithms

---

**Need help?** Check the solution notebook or open an issue on [GitHub](https://github.com/jumpingsphinx/jumpingsphinx.github.io/issues).