# Neural Networks: Learn a Quadratic Function from Dataset

This notebook teaches a neural network to learn a quadratic function using a pre-generated CSV dataset.

We'll:
- Load training and test datasets from CSV files
- Build a non-linear neural network model
- Train using the dataset
- Evaluate performance

**Target Function**: `f(a, b) = 2a² + 3b + 1`


In [None]:
import torch
from torch import nn
import matplotlib.pyplot as plt
import pandas as pd
print(torch.__version__)


## Load the dataset from GitHub
We'll load the pre-generated quadratic datasets from the GitHub repository.


In [None]:
# Load datasets from GitHub repository
import requests
import io
import os

# GitHub repository URLs for the CSV files
github_base_url = "https://raw.githubusercontent.com/gopinaath/ai-class/main/"
train_url = github_base_url + "quadratic_train.csv"
test_url = github_base_url + "quadratic_test.csv"

print("Loading datasets from GitHub repository...")
print(f"Training data URL: {train_url}")
print(f"Test data URL: {test_url}")

try:
    # Download and load training data
    train_response = requests.get(train_url)
    train_response.raise_for_status()  # Raise an exception for bad status codes
    train_df = pd.read_csv(io.StringIO(train_response.text))

    # Download and load test data
    test_response = requests.get(test_url)
    test_response.raise_for_status()
    test_df = pd.read_csv(io.StringIO(test_response.text))

    print("✅ Datasets loaded successfully from GitHub!")
    
except Exception as e:
    print(f"❌ Failed to load from GitHub: {e}")
    print("🔄 Falling back to local dataset generation...")
    
    # Fallback: generate datasets locally
    if not os.path.exists('quadratic_train.csv'):
        exec(open('generate_quadratic_dataset.py').read())
    
    train_df = pd.read_csv('quadratic_train.csv')
    test_df = pd.read_csv('quadratic_test.csv')
    print("✅ Local datasets loaded successfully!")


## Exploratory Data Analysis
Let's explore our dataset to understand what we're working with.


In [None]:
# Convert to PyTorch tensors
train_inputs = torch.tensor(train_df[['a', 'b']].values, dtype=torch.float32)
train_targets = torch.tensor(train_df['target'].values, dtype=torch.float32).unsqueeze(1)
test_inputs = torch.tensor(test_df[['a', 'b']].values, dtype=torch.float32)
test_targets = torch.tensor(test_df['target'].values, dtype=torch.float32).unsqueeze(1)

print(f"Training data shape: {train_inputs.shape}")
print(f"Test data shape: {test_inputs.shape}")
print(f"First few training examples:")
print(train_df.head())


In [None]:
# Basic statistics
print("=== Dataset Overview ===")
print(f"Training samples: {len(train_df)}")
print(f"Test samples: {len(test_df)}")
print(f"Features: {list(train_df.columns[:-1])}")  # All columns except 'target'
print(f"Target: {train_df.columns[-1]}")

print("\n=== Training Data Statistics ===")
print(train_df.describe())

print("\n=== Test Data Statistics ===")
print(test_df.describe())


In [None]:
# Visualize the data
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# 1. Distribution of inputs
axes[0, 0].hist(train_df['a'], bins=30, alpha=0.7, label='a', color='blue')
axes[0, 0].hist(train_df['b'], bins=30, alpha=0.7, label='b', color='red')
axes[0, 0].set_title('Distribution of Input Features')
axes[0, 0].set_xlabel('Value')
axes[0, 0].set_ylabel('Frequency')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# 2. Distribution of target
axes[0, 1].hist(train_df['target'], bins=30, alpha=0.7, color='green')
axes[0, 1].set_title('Distribution of Target Values')
axes[0, 1].set_xlabel('Target Value')
axes[0, 1].set_ylabel('Frequency')
axes[0, 1].grid(True, alpha=0.3)

# 3. Scatter plot: a vs target
axes[1, 0].scatter(train_df['a'], train_df['target'], alpha=0.6, s=10)
axes[1, 0].set_title('Feature a vs Target')
axes[1, 0].set_xlabel('Feature a')
axes[1, 0].set_ylabel('Target')
axes[1, 0].grid(True, alpha=0.3)

# 4. Scatter plot: b vs target
axes[1, 1].scatter(train_df['b'], train_df['target'], alpha=0.6, s=10, color='red')
axes[1, 1].set_title('Feature b vs Target')
axes[1, 1].set_xlabel('Feature b')
axes[1, 1].set_ylabel('Target')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()


## Key Insights from EDA

**What we learned:**
- **Input range**: Both `a` and `b` are uniformly distributed between -2 and 2
- **Target range**: Target values range from ~-3 to ~9 (since min is 2×(-2)² + 3×(-2) + 1 = 3, max is 2×2² + 3×2 + 1 = 15)
- **Non-linear relationship**: We can see clear quadratic patterns in the scatter plots
- **Correlations**: 
  - `a` has strong positive correlation with target (due to a² term)
  - `b` has moderate positive correlation with target (linear term)

**Why this needs a non-linear model:**
- The relationship involves a² (quadratic term), which requires non-linear activation functions
- A simple linear model cannot capture the quadratic relationship
- Multiple hidden layers with ReLU can approximate this function


## Build the model
We need a non-linear model with multiple layers to learn the quadratic function.


In [None]:
model = nn.Sequential(
    nn.Linear(2, 20),    # Input layer: 2 inputs -> 20 hidden neurons
    nn.ReLU(),           # Non-linear activation
    nn.Linear(20, 20),   # Hidden layer: 20 -> 20
    nn.ReLU(),           # Non-linear activation
    nn.Linear(20, 10),   # Hidden layer: 20 -> 10
    nn.ReLU(),           # Non-linear activation
    nn.Linear(10, 1)     # Output layer: 10 -> 1 output
)
model


## Train the model
We'll train using the entire dataset with mini-batches and Adam optimizer.

In [None]:
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
loss_history = []

batch_size = 32
num_epochs = 50

for epoch in range(num_epochs):
    epoch_loss = 0
    num_batches = 0
    
    # Process data in mini-batches
    for i in range(0, len(train_inputs), batch_size):
        batch_inputs = train_inputs[i:i+batch_size]
        batch_targets = train_targets[i:i+batch_size]
        
        # Forward pass
        outputs = model(batch_inputs)
        loss = criterion(outputs, batch_targets)
        
        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        epoch_loss += loss.item()
        num_batches += 1
        loss_history.append(loss.item())
    
    avg_loss = epoch_loss / num_batches
    if (epoch + 1) % 10 == 0:
        print(f"Epoch {epoch+1}/{num_epochs}, Average Loss: {avg_loss:.6f}")

print("Training complete!")


In [None]:
plt.figure(figsize=(10, 4))

plt.subplot(1, 2, 1)
plt.plot(loss_history)
plt.title('Training Loss')
plt.xlabel('Batch')
plt.ylabel('MSE Loss')
plt.grid(True)

plt.subplot(1, 2, 2)
# Plot every 10th point to reduce noise
plt.plot(loss_history[::10])
plt.title('Training Loss (Every 10th Batch)')
plt.xlabel('Batch (x10)')
plt.ylabel('MSE Loss')
plt.grid(True)

plt.tight_layout()
plt.show()


## Evaluate the model
Test the model on the test dataset and some specific examples.


In [None]:
# Test on the test dataset
with torch.no_grad():
    test_outputs = model(test_inputs)
    test_loss = criterion(test_outputs, test_targets)
    
print(f"Test Loss: {test_loss.item():.6f}")

# Test on specific examples
test_cases = [(1.0, 1.0), (2.0, -1.0), (0.5, 0.5), (-1.0, 2.0)]

print("\nTesting on specific examples:")
print("Input (a, b) | Prediction | Expected | Error")
print("-" * 50)

for a, b in test_cases:
    pred = model(torch.tensor([a, b], dtype=torch.float32))
    expected = 2*a**2 + 3*b + 1  # The actual quadratic function
    error = abs(pred.item() - expected)
    print(f"({a:4.1f}, {b:4.1f})    | {pred.item():8.3f} | {expected:7.3f} | {error:.3f}")

print(f"\nExpected function: f(a, b) = 2a² + 3b + 1")


In [None]:
# Additional evaluation: Test on edge cases
print("\n=== Additional Test Cases ===")
edge_cases = [(0, 0), (-2, -2), (2, 2), (0, -2), (2, 0)]

print("Edge cases:")
print("Input (a, b) | Prediction | Expected | Error")
print("-" * 50)

for a, b in edge_cases:
    pred = model(torch.tensor([a, b], dtype=torch.float32))
    expected = 2*a**2 + 3*b + 1
    error = abs(pred.item() - expected)
    print(f"({a:4.1f}, {b:4.1f})    | {pred.item():8.3f} | {expected:7.3f} | {error:.3f}")


## Model Architecture Summary

**Network Structure:**
- **Input Layer**: 2 neurons (features a, b)
- **Hidden Layer 1**: 20 neurons with ReLU activation
- **Hidden Layer 2**: 20 neurons with ReLU activation  
- **Hidden Layer 3**: 10 neurons with ReLU activation
- **Output Layer**: 1 neuron (predicted target)

**Why this architecture works:**
- **Multiple hidden layers** allow the network to learn complex non-linear patterns
- **ReLU activations** enable the network to approximate the quadratic function
- **Sufficient neurons** provide enough capacity to learn the relationship
- **Adam optimizer** helps with convergence on this non-linear problem

**Key differences from linear model:**
- Requires non-linear activations (ReLU)
- Needs multiple layers to capture quadratic relationship
- More complex training process due to non-linearity
