Here is an explanation of each function in the code:

### 1. **`initialize_parameters(input_size, hidden_size, output_size)`**
- **Purpose**: This function initializes the weights and biases for the neural network.
- **Inputs**:
  - `input_size`: Number of input features (2 for this dataset: `X1` and `X2`).
  - `hidden_size`: Number of neurons in the hidden layer.
  - `output_size`: Number of output neurons (1 for binary classification).
- **Outputs**:
  - A dictionary containing randomly initialized weights (`W1` and `W2`) and biases (`b1` and `b2`).

---

### 2. **`forward_propagation(X, weights)`**
- **Purpose**: Computes the output of the network by propagating inputs through the layers.
- **Steps**:
  1. Compute the weighted sum of inputs for the hidden layer (`Z1`).
  2. Apply the activation function (tanh) to get the hidden layer output (`A1`).
  3. Compute the weighted sum of hidden layer outputs for the output layer (`Z2`).
  4. Apply the sigmoid activation function to get the final output (`A2`).
- **Inputs**:
  - `X`: Input data matrix.
  - `weights`: The current weights and biases of the network.
- **Outputs**:
  - `A2`: Final predictions (output layer).
  - `cache`: A dictionary storing intermediate values (`Z1`, `A1`, `Z2`, `A2`), which are reused in backward propagation.

---

### 3. **`compute_loss(y_true, y_pred)`**
- **Purpose**: Calculates the binary cross-entropy loss, which measures the error in predictions.
- **Formula**:
  \[
  \text{Loss} = -\frac{1}{m} \sum \left( y \cdot \log(\hat{y}) + (1 - y) \cdot \log(1 - \hat{y}) \right)
  \]
  where \(m\) is the number of samples.
- **Inputs**:
  - `y_true`: Actual labels of the dataset.
  - `y_pred`: Predicted probabilities from the network.
- **Outputs**:
  - A scalar value representing the average loss.

---

### 4. **`backward_propagation(X, y, weights, cache)`**
- **Purpose**: Computes gradients for the weights and biases by applying the chain rule of derivatives.
- **Steps**:
  1. Compute the gradient of the loss with respect to the output layer (`dZ2`).
  2. Calculate gradients for the output layer weights (`dW2`) and biases (`db2`).
  3. Propagate the gradient back to the hidden layer (`dZ1`).
  4. Calculate gradients for the hidden layer weights (`dW1`) and biases (`db1`).
- **Inputs**:
  - `X`: Input data matrix.
  - `y`: Actual labels.
  - `weights`: Current weights and biases.
  - `cache`: Intermediate values from forward propagation.
- **Outputs**:
  - A dictionary containing gradients for all weights and biases.

---

### 5. **`update_parameters(weights, gradients, learning_rate)`**
- **Purpose**: Updates the weights and biases using gradient descent.
- **Formula**:
  \[
  W = W - \text{learning_rate} \cdot \text{gradient}
  \]
- **Inputs**:
  - `weights`: Current weights and biases.
  - `gradients`: Gradients computed from backward propagation.
  - `learning_rate`: Step size for updating weights.
- **Outputs**:
  - Updated weights and biases.

---

### 6. **`train_network(X, y, hidden_size, learning_rate, epochs)`**
- **Purpose**: Orchestrates the training process by iterating over multiple epochs and updating weights.
- **Steps**:
  1. Initialize weights and biases.
  2. Perform forward propagation to compute predictions.
  3. Calculate the loss.
  4. Perform backward propagation to compute gradients.
  5. Update weights using gradient descent.
  6. Optionally print loss after every 100 epochs.
- **Inputs**:
  - `X`: Input data matrix.
  - `y`: Actual labels.
  - `hidden_size`: Number of neurons in the hidden layer.
  - `learning_rate`: Step size for gradient descent.
  - `epochs`: Number of training iterations.
- **Outputs**:
  - Trained weights and biases.

---

### 7. **`plot_decision_boundary(X, y, weights)`**
- **Purpose**: Visualizes how the trained network classifies the dataset by plotting the decision boundary.
- **Steps**:
  1. Create a grid of points covering the feature space.
  2. Use the trained network to predict the class for each grid point.
  3. Plot the grid with the predicted classes as a decision boundary.
  4. Overlay the original data points with their true labels.
- **Inputs**:
  - `X`: Input data matrix.
  - `y`: Actual labels.
  - `weights`: Trained weights and biases.
- **Outputs**:
  - A 2D plot showing the decision boundary and data points.

---

Let me know if you need further explanation about any function or concept!