# 🔹 Phase 3: Implementing a Neural Network in TensorFlow & PyTorch

**Concepts to Cover**

- **Why use TensorFlow & PyTorch?** – Higher-level APIs for defining models.
- **Defining Layers** – Using tf.keras (TensorFlow) and torch.nn (PyTorch).
- **Training Loops** – Using built-in optimizers & loss functions.
- **Batch Processing** – Efficient training using mini-batches.


## 📌 Exercise 3: Implement the same 3-layer XOR network in TensorFlow & PyTorch

**🔹 Task**

- Implement a **3-layer neural network** using:
    - TensorFlow (`tf.keras`)
    - PyTorch (`torch.nn.Module`)
- Train on the **XOR dataset**.
- Use the **Binary Cross-Entropy Loss (`BCE`)**.
- Compare both frameworks.

# ✅ Implementation in TensorFlow

In [1]:
import tensorflow as tf
import numpy as np

# XOR dataset (logical XOR function)
X = np.array([[0, 0],  # Input: (0,0)
              [0, 1],  # Input: (0,1)
              [1, 0],  # Input: (1,0)
              [1, 1]], # Input: (1,1)
             dtype=np.float32)  # Data type: float32 for TensorFlow compatibility

y = np.array([[0],  # Expected XOR output for (0,0)
              [1],  # Expected XOR output for (0,1)
              [1],  # Expected XOR output for (1,0)
              [0]], # Expected XOR output for (1,1)
             dtype=np.float32)  # Labels must match the input data type

# Define the neural network model using Keras Sequential API
model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(2,)),  # Explicit input layer with 2 input features
    tf.keras.layers.Dense(4, activation='sigmoid'),  # Hidden Layer: 4 neurons, sigmoid activation
    tf.keras.layers.Dense(1, activation='sigmoid')   # Output Layer: 1 neuron, sigmoid activation
])

# Compile the model with an optimizer, loss function, and evaluation metric
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.1),  # Adam optimizer with LR=0.1
              loss='binary_crossentropy',  # Binary Cross-Entropy for classification
              metrics=['accuracy'])  # Track accuracy during training

# Train the model on the XOR dataset
model.fit(X, y, epochs=5000, verbose=0)  # Train silently for 5000 epochs

# Make predictions using the trained model
predictions = model.predict(X)

# Display final predictions after training
print("\nTensorFlow Predictions:")
for i, p in enumerate(predictions):
    print(f"Input: {X[i]}, Predicted Output: {p[0]:.4f}")  # Print formatted predictions


2025-03-15 19:42:05.896800: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M1 Pro
2025-03-15 19:42:05.896834: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 16.00 GB
2025-03-15 19:42:05.896843: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 5.33 GB
2025-03-15 19:42:05.896862: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2025-03-15 19:42:05.896874: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2025-03-15 19:42:07.305021: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type GPU is enabled.


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 61ms/step

TensorFlow Predictions:
Input: [0. 0.], Predicted Output: 0.0000
Input: [0. 1.], Predicted Output: 1.0000
Input: [1. 0.], Predicted Output: 1.0000
Input: [1. 1.], Predicted Output: 0.0001


# ✅ Implementation in TensorFlow (Optimised)

In [2]:
import tensorflow as tf
import numpy as np
import os

# XOR dataset
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=np.float32)  # Input features
y = np.array([[0], [1], [1], [0]], dtype=np.float32)  # Expected XOR outputs

# Check for GPU availability
gpus = tf.config.list_physical_devices('GPU')
if gpus:
    print("✅ GPU Available! Training on GPU...")
else:
    print("⚠️ No GPU detected, training on CPU...")

# Define the optimized neural network model
model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(2,)),  # Explicit Input Layer with 2 input features
    tf.keras.layers.Dense(4, activation='sigmoid'),  # Hidden Layer: 4 neurons, sigmoid activation
    tf.keras.layers.Dense(1, activation='sigmoid')  # Output Layer: 1 neuron, sigmoid activation
])

# Compile with optimized settings
model.compile(
    optimizer=tf.keras.optimizers.RMSprop(learning_rate=0.1),  # Faster optimizer (RMSprop)
    loss='binary_crossentropy',  # Loss function for binary classification
    metrics=['accuracy']  # Tracking accuracy during training
)

# Setup callbacks: Early stopping & TensorBoard logging
log_dir = "logs/xor_model"
os.makedirs(log_dir, exist_ok=True)

callbacks = [
    tf.keras.callbacks.EarlyStopping(monitor='loss', patience=50, restore_best_weights=True),  # Stop if no improvement
    tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)  # TensorBoard for visualization
]

# Train with mini-batch gradient descent (batch_size=2)
with tf.device('/GPU:0' if gpus else '/CPU:0'):
    history = model.fit(X, y, epochs=1000, batch_size=2, verbose=0, callbacks=callbacks)  # Silent training

# Save the trained model
model.save("xor_model.h5")
print("\n✅ Model trained and saved as 'xor_model.h5'.")

# Load the trained model
loaded_model = tf.keras.models.load_model("xor_model.h5")

# Final Predictions using the loaded model
predictions = loaded_model.predict(X)
print("\nOptimized TensorFlow Predictions:")
for i, p in enumerate(predictions):
    print(f"Input: {X[i]}, Predicted Output: {p[0]:.4f}")

# Instructions for TensorBoard usage
print("\n📊 To visualize training logs, run:")
print("   tensorboard --logdir=logs/xor_model")


✅ GPU Available! Training on GPU...





✅ Model trained and saved as 'xor_model.h5'.
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 51ms/step

Optimized TensorFlow Predictions:
Input: [0. 0.], Predicted Output: 0.0000
Input: [0. 1.], Predicted Output: 1.0000
Input: [1. 0.], Predicted Output: 1.0000
Input: [1. 1.], Predicted Output: 0.0000

📊 To visualize training logs, run:
   tensorboard --logdir=logs/xor_model


# 🔹 Optimizations Applied

1. ✅ Reduced Epochs: Down from 5000 to 1000 (should still converge).
2. ✅ Used RMSprop Optimizer: Faster than Adam for small datasets.
3. ✅ Batch Training (batch_size=2): Instead of feeding the entire dataset at once.
4. ✅ Forced GPU Usage: Uses with tf.device('/GPU:0') if available.

This should significantly speed up training while keeping accuracy high. 🚀

# ✅ Implementation in PyTorch

In [3]:
import torch
import torch.nn as nn
import torch.optim as optim

# XOR dataset (Input features and corresponding target labels)
X_torch = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=torch.float32)  # Input: 2 features
y_torch = torch.tensor([[0], [1], [1], [0]], dtype=torch.float32)  # Output: XOR results

# Define the Neural Network Model
class XORNeuralNet(nn.Module):
    def __init__(self):
        super(XORNeuralNet, self).__init__()
        self.hidden = nn.Linear(2, 4)  # Hidden Layer with 4 neurons
        self.output = nn.Linear(4, 1)  # Output Layer with 1 neuron
    
    def forward(self, x):
        x = torch.sigmoid(self.hidden(x))  # Activation function for hidden layer
        x = torch.sigmoid(self.output(x))  # Activation function for output layer
        return x

# Initialize the model
model = XORNeuralNet()

# Define the loss function (Binary Cross-Entropy since it's a binary classification problem)
criterion = nn.BCELoss()

# Use Adam optimizer with a learning rate of 0.1
optimizer = optim.Adam(model.parameters(), lr=0.1)

# Training loop for 5000 epochs
epochs = 5000
for epoch in range(epochs):
    optimizer.zero_grad()  # Reset gradients before each step
    outputs = model(X_torch)  # Forward pass: Compute predictions
    loss = criterion(outputs, y_torch)  # Compute loss
    loss.backward()  # Backpropagation: Compute gradients
    optimizer.step()  # Update weights based on computed gradients

# Final Predictions (Inference)
with torch.no_grad():  # Disable gradient computation for efficiency
    predictions = model(X_torch)

# Print the final predictions
print("\nPyTorch Predictions:")
for i, p in enumerate(predictions):
    print(f"Input: {X_torch[i].numpy()}, Predicted Output: {p.item():.4f}")  # Convert tensors to NumPy for display



PyTorch Predictions:
Input: [0. 0.], Predicted Output: 0.0000
Input: [0. 1.], Predicted Output: 1.0000
Input: [1. 0.], Predicted Output: 1.0000
Input: [1. 1.], Predicted Output: 0.0000


## 📌 Significance of This Exercise

- **TensorFlow vs PyTorch** – You now see how both frameworks implement the same logic.
- **Higher-Level Abstraction** – TensorFlow provides `Sequential()`, while PyTorch gives more control.
- **Backpropagation Handling** – PyTorch requires `loss.backward()`, while TensorFlow abstracts it away.
- **Optimization** – Adam optimizer helps in faster convergence.