# Deep Learning with PyTorch

**DOST-ITDI AI Training Workshop**  
**Day 1 - Session 4: Deep Learning Fundamentals with PyTorch**

---

## Learning Objectives
1. Understand neural networks and deep learning
2. Build neural networks with PyTorch
3. Train models with custom training loops
4. Apply deep learning to molecular property prediction
5. Understand overfitting and regularization

## What is Deep Learning?

Deep Learning uses neural networks with multiple layers to learn complex patterns from data.

**Advantages**:
- Can learn complex, non-linear relationships
- Automatic feature learning
- State-of-the-art performance on many tasks

**Chemistry Applications**:
- Molecular property prediction
- Drug-target interaction
- Reaction prediction
- Molecular generation

## 1. Setup and Installation

In [None]:
# Install PyTorch and dependencies
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu -q
!pip install rdkit scikit-learn -q

print("✓ Installation complete!")

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from rdkit import Chem
from rdkit.Chem import Descriptors
import warnings
warnings.filterwarnings('ignore')

# PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, TensorDataset

# Scikit-learn
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error

# Set random seeds for reproducibility
np.random.seed(42)
torch.manual_seed(42)

# Check if CUDA is available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
print(f"PyTorch version: {torch.__version__}")

# Plotting
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (10, 6)

## 2. PyTorch Basics

### 2.1 Tensors - The Building Blocks

In [None]:
# Creating tensors
# Similar to NumPy arrays but can run on GPU

# From Python lists
x = torch.tensor([1, 2, 3, 4, 5])
print("1D Tensor:", x)
print("Shape:", x.shape)
print("Data type:", x.dtype)

# 2D tensor (matrix)
y = torch.tensor([[1, 2, 3],
                  [4, 5, 6]])
print("\n2D Tensor:\n", y)
print("Shape:", y.shape)

In [None]:
# Creating tensors with specific values
zeros = torch.zeros(3, 4)
ones = torch.ones(2, 3)
random = torch.randn(2, 3)  # Random normal distribution

print("Zeros:\n", zeros)
print("\nOnes:\n", ones)
print("\nRandom:\n", random)

In [None]:
# Tensor operations
a = torch.tensor([1.0, 2.0, 3.0])
b = torch.tensor([4.0, 5.0, 6.0])

print("Addition:", a + b)
print("Multiplication:", a * b)
print("Dot product:", torch.dot(a, b))
print("Mean:", a.mean())
print("Sum:", a.sum())

In [None]:
# Converting between NumPy and PyTorch
numpy_array = np.array([1, 2, 3, 4, 5])
tensor_from_numpy = torch.from_numpy(numpy_array)
print("From NumPy:", tensor_from_numpy)

tensor = torch.tensor([6, 7, 8, 9, 10])
numpy_from_tensor = tensor.numpy()
print("To NumPy:", numpy_from_tensor)

### 2.2 Autograd - Automatic Differentiation

In [None]:
# PyTorch can automatically compute gradients
# This is essential for training neural networks

x = torch.tensor([2.0], requires_grad=True)  # Track gradients for this tensor
y = x ** 2 + 3 * x + 1  # y = x² + 3x + 1

print(f"x = {x.item()}")
print(f"y = {y.item()}")

# Compute gradient dy/dx
y.backward()  # Calculate gradients
print(f"dy/dx = {x.grad.item()}")  # Should be 2x + 3 = 7 when x=2

## 3. Building Neural Networks

### 3.1 Simple Neural Network

In [None]:
# Define a simple neural network
class SimpleNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNN, self).__init__()
        # Define layers
        self.fc1 = nn.Linear(input_size, hidden_size)  # First layer
        self.relu = nn.ReLU()  # Activation function
        self.fc2 = nn.Linear(hidden_size, output_size)  # Output layer

    def forward(self, x):
        # Define forward pass
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

# Create model
model = SimpleNN(input_size=10, hidden_size=20, output_size=1)
print(model)

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
print(f"\nTotal parameters: {total_params}")

In [None]:
# Test forward pass
sample_input = torch.randn(1, 10)  # Batch size=1, features=10
output = model(sample_input)
print(f"Input shape: {sample_input.shape}")
print(f"Output shape: {output.shape}")
print(f"Output value: {output.item():.4f}")

## 4. Loading and Preparing Chemistry Data

In [None]:
# Load ESOL solubility dataset
url = "https://raw.githubusercontent.com/deepchem/deepchem/master/datasets/delaney-processed.csv"
df = pd.read_csv(url)

print(f"Dataset shape: {df.shape}")
df.head()

In [None]:
# Feature engineering
df['mol'] = df['smiles'].apply(Chem.MolFromSmiles)
df = df[df['mol'].notna()].copy()

df['LogP'] = df['mol'].apply(Descriptors.MolLogP)
df['NumHAcceptors'] = df['mol'].apply(Descriptors.NumHAcceptors)
df['NumAromaticRings'] = df['mol'].apply(Descriptors.NumAromaticRings)

# Select features
feature_columns = [
    'Molecular Weight',
    'Number of H-Bond Donors',
    'Number of Rings',
    'Number of Rotatable Bonds',
    'Polar Surface Area',
    'LogP',
    'NumHAcceptors',
    'NumAromaticRings'
]

X = df[feature_columns].values
y = df['measured log solubility in mols per litre'].values.reshape(-1, 1)

print(f"Features shape: {X.shape}")
print(f"Target shape: {y.shape}")

In [None]:
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Feature scaling
scaler_X = StandardScaler()
scaler_y = StandardScaler()

X_train_scaled = scaler_X.fit_transform(X_train)
X_test_scaled = scaler_X.transform(X_test)
y_train_scaled = scaler_y.fit_transform(y_train)
y_test_scaled = scaler_y.transform(y_test)

print(f"Training set: {X_train_scaled.shape[0]} samples")
print(f"Test set: {X_test_scaled.shape[0]} samples")

In [None]:
# Convert to PyTorch tensors
X_train_tensor = torch.FloatTensor(X_train_scaled)
y_train_tensor = torch.FloatTensor(y_train_scaled)
X_test_tensor = torch.FloatTensor(X_test_scaled)
y_test_tensor = torch.FloatTensor(y_test_scaled)

print("Tensor shapes:")
print(f"X_train: {X_train_tensor.shape}")
print(f"y_train: {y_train_tensor.shape}")

In [None]:
# Create DataLoader for batch training
batch_size = 32

train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

test_dataset = TensorDataset(X_test_tensor, y_test_tensor)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

print(f"Number of batches in train_loader: {len(train_loader)}")
print(f"Batch size: {batch_size}")

## 5. Building a Molecular Property Predictor

### 5.1 Define the Neural Network

In [None]:
class MolecularPredictor(nn.Module):
    def __init__(self, input_size, hidden_sizes, output_size, dropout=0.2):
        super(MolecularPredictor, self).__init__()

        # Input layer
        self.fc1 = nn.Linear(input_size, hidden_sizes[0])
        self.bn1 = nn.BatchNorm1d(hidden_sizes[0])  # Batch normalization
        self.relu1 = nn.ReLU()
        self.dropout1 = nn.Dropout(dropout)

        # Hidden layer
        self.fc2 = nn.Linear(hidden_sizes[0], hidden_sizes[1])
        self.bn2 = nn.BatchNorm1d(hidden_sizes[1])
        self.relu2 = nn.ReLU()
        self.dropout2 = nn.Dropout(dropout)

        # Output layer
        self.fc3 = nn.Linear(hidden_sizes[1], output_size)

    def forward(self, x):
        x = self.fc1(x)
        x = self.bn1(x)
        x = self.relu1(x)
        x = self.dropout1(x)

        x = self.fc2(x)
        x = self.bn2(x)
        x = self.relu2(x)
        x = self.dropout2(x)

        x = self.fc3(x)
        return x

# Create model
input_size = X_train_scaled.shape[1]
hidden_sizes = [64, 32]
output_size = 1

model = MolecularPredictor(input_size, hidden_sizes, output_size, dropout=0.2)
model = model.to(device)

print(model)
print(f"\nTotal parameters: {sum(p.numel() for p in model.parameters())}")

### 5.2 Define Loss Function and Optimizer

In [None]:
# Loss function (Mean Squared Error for regression)
criterion = nn.MSELoss()

# Optimizer (Adam)
learning_rate = 0.001
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# Learning rate scheduler
scheduler = optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, mode='min', patience=10, factor=0.5
)

print(f"Optimizer: {optimizer}")
print(f"Loss function: {criterion}")

### 5.3 Training Loop

In [None]:
# Training function
def train_epoch(model, train_loader, criterion, optimizer, device):
    model.train()  # Set to training mode
    running_loss = 0.0

    for batch_X, batch_y in train_loader:
        batch_X = batch_X.to(device)
        batch_y = batch_y.to(device)

        # Forward pass
        outputs = model(batch_X)
        loss = criterion(outputs, batch_y)

        # Backward pass and optimization
        optimizer.zero_grad()  # Clear gradients
        loss.backward()  # Compute gradients
        optimizer.step()  # Update weights

        running_loss += loss.item()

    avg_loss = running_loss / len(train_loader)
    return avg_loss

# Validation function
def validate(model, test_loader, criterion, device):
    model.eval()  # Set to evaluation mode
    running_loss = 0.0

    with torch.no_grad():  # Disable gradient computation
        for batch_X, batch_y in test_loader:
            batch_X = batch_X.to(device)
            batch_y = batch_y.to(device)

            outputs = model(batch_X)
            loss = criterion(outputs, batch_y)
            running_loss += loss.item()

    avg_loss = running_loss / len(test_loader)
    return avg_loss

In [None]:
# Train the model
num_epochs = 100
train_losses = []
val_losses = []
best_val_loss = float('inf')

print("Training started...")
print("=" * 60)

for epoch in range(num_epochs):
    # Train
    train_loss = train_epoch(model, train_loader, criterion, optimizer, device)
    train_losses.append(train_loss)

    # Validate
    val_loss = validate(model, test_loader, criterion, device)
    val_losses.append(val_loss)

    # Learning rate scheduler step
    scheduler.step(val_loss)

    # Save best model
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        torch.save(model.state_dict(), 'best_model.pth')

    # Print progress
    if (epoch + 1) % 10 == 0:
        print(f"Epoch [{epoch+1}/{num_epochs}] | "
              f"Train Loss: {train_loss:.4f} | "
              f"Val Loss: {val_loss:.4f}")

print("\n✓ Training complete!")
print(f"Best validation loss: {best_val_loss:.4f}")

### 5.4 Visualize Training Progress

In [None]:
# Plot training curves
plt.figure(figsize=(10, 6))
plt.plot(train_losses, label='Training Loss', linewidth=2)
plt.plot(val_losses, label='Validation Loss', linewidth=2)
plt.xlabel('Epoch', fontsize=12)
plt.ylabel('Loss (MSE)', fontsize=12)
plt.title('Training and Validation Loss', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.show()

# Check for overfitting
final_train_loss = train_losses[-1]
final_val_loss = val_losses[-1]
gap = abs(final_val_loss - final_train_loss)

print(f"\nFinal Training Loss: {final_train_loss:.4f}")
print(f"Final Validation Loss: {final_val_loss:.4f}")
print(f"Gap: {gap:.4f}")

if gap > 0.1:
    print("⚠️ Model may be overfitting (large gap between train and val loss)")
else:
    print("✓ Model appears to be generalizing well")

#### Comparing Augmentation Techniques

| Technique | How It Works | Pros | Cons | When to Use |
|-----------|--------------|------|------|-------------|
| **Gaussian Noise** | Add random noise to features | Simple, fast | May add unrealistic values | Small datasets, robust models needed |
| **Mixup** | Interpolate between samples | Smooth decision boundaries | Requires modified loss | Medium datasets, classification |
| **Feature Dropout** | Random zero features | Acts as regularization | Loss of information | High-dimensional data |

**Best Practices:**
1. Start without augmentation - establish baseline
2. Try Gaussian noise first (simplest)
3. Use small noise std (0.01 - 0.1 for scaled data)
4. For Mixup, use alpha = 0.1-0.4
5. Monitor if augmentation helps or hurts
6. More useful for small datasets (<1000 samples)

**For Molecular Data Specifically:**
- Noise augmentation simulates measurement uncertainty
- Mix up creates molecules with intermediate properties
- Be careful not to create chemically impossible values
- Consider domain constraints (e.g., MW > 0)

In [None]:
# Mixup augmentation function
def mixup_data(x, y, alpha=0.2):
    """
    Returns mixed inputs, pairs of targets, and lambda
    """
    if alpha > 0:
        lam = np.random.beta(alpha, alpha)
    else:
        lam = 1

    batch_size = x.size()[0]
    index = torch.randperm(batch_size)

    mixed_x = lam * x + (1 - lam) * x[index, :]
    y_a, y_b = y, y[index]
    return mixed_x, y_a, y_b, lam

# Example: Create mixup samples
sample1_idx = 0
sample2_idx = 10

x1 = torch.FloatTensor(X_train_scaled[sample1_idx])
x2 = torch.FloatTensor(X_train_scaled[sample2_idx])
y1 = y_train_scaled[sample1_idx]
y2 = y_train_scaled[sample2_idx]

# Create mixup samples with different lambdas
lambdas = [0.0, 0.25, 0.5, 0.75, 1.0]
mixup_samples = []

for lam in lambdas:
    mixed_x = lam * x1 + (1 - lam) * x2
    mixed_y = lam * y1 + (1 - lam) * y2
    mixup_samples.append((mixed_x.numpy(), mixed_y))

# Visualize mixup
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Feature values across mixup ratios
feature_idx = 0
lambda_values = []
feature_values = []

for i, lam in enumerate(lambdas):
    lambda_values.append(lam)
    feature_values.append(mixup_samples[i][0][feature_idx])

axes[0].plot(lambda_values, feature_values, 'o-', linewidth=2, markersize=10)
axes[0].axhline(y=x1[feature_idx].numpy(), color='r', linestyle='--',
               label=f'Sample 1: {x1[feature_idx]:.3f}')
axes[0].axhline(y=x2[feature_idx].numpy(), color='b', linestyle='--',
               label=f'Sample 2: {x2[feature_idx]:.3f}')
axes[0].set_xlabel('Lambda (λ)', fontsize=12)
axes[0].set_ylabel(f'Mixed Feature Value', fontsize=12)
axes[0].set_title(f'Mixup Interpolation ({feature_columns[0]})',
                 fontsize=13, fontweight='bold')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Target values across mixup ratios
target_values = [mixup_samples[i][1][0] for i in range(len(lambdas))]

axes[1].plot(lambda_values, target_values, 's-', linewidth=2,
            markersize=10, color='green')
axes[1].axhline(y=y1[0], color='r', linestyle='--',
               label=f'Target 1: {y1[0]:.3f}')
axes[1].axhline(y=y2[0], color='b', linestyle='--',
               label=f'Target 2: {y2[0]:.3f}')
axes[1].set_xlabel('Lambda (λ)', fontsize=12)
axes[1].set_ylabel('Mixed Target Value', fontsize=12)
axes[1].set_title('Target Interpolation', fontsize=13, fontweight='bold')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Mixup creates virtual samples between existing ones")
print(f"λ=0: 100% sample 2, λ=1: 100% sample 1")
print(f"λ=0.5: equal mix of both samples")

#### Technique 2: Mixup

Create virtual training samples by linearly interpolating between pairs of samples.

**Formula:** x_new = λ × x1 + (1-λ) × x2, where λ ~ Beta(α, α)

In [None]:
# Custom Dataset with Gaussian Noise Augmentation
class AugmentedDataset(Dataset):
    def __init__(self, X, y, noise_std=0.1, augment=True):
        self.X = torch.FloatTensor(X)
        self.y = torch.FloatTensor(y)
        self.noise_std = noise_std
        self.augment = augment

    def __len__(self):
        return len(self.X)

    def __getitem__(self, idx):
        x = self.X[idx]
        y = self.y[idx]

        # Add Gaussian noise during training
        if self.augment:
            noise = torch.randn_like(x) * self.noise_std
            x = x + noise

        return x, y

# Create augmented dataset
train_dataset_aug = AugmentedDataset(X_train_scaled, y_train_scaled,
                                      noise_std=0.05, augment=True)
train_loader_aug = DataLoader(train_dataset_aug, batch_size=32, shuffle=True)

# Visualize effect of noise
sample_idx = 0
original = torch.FloatTensor(X_train_scaled[sample_idx])
augmented_samples = []

for i in range(5):
    noise = torch.randn_like(original) * 0.05
    augmented = original + noise
    augmented_samples.append(augmented.numpy())

# Plot
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Feature values comparison
feature_idx = 0  # First feature
axes[0].plot([0], original[feature_idx].numpy(), 'ro', markersize=15,
            label='Original', zorder=5)
for i, aug in enumerate(augmented_samples):
    axes[0].plot([i+1], aug[feature_idx], 'bo', alpha=0.6, markersize=10)
axes[0].set_xlabel('Sample Number', fontsize=12)
axes[0].set_ylabel(f'Feature Value ({feature_columns[0]})', fontsize=12)
axes[0].set_title('Effect of Gaussian Noise on One Feature', fontsize=13, fontweight='bold')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Distribution of augmented values
all_aug_values = [aug[feature_idx] for aug in augmented_samples]
axes[1].hist([original[feature_idx].numpy()], bins=20, alpha=0.7,
            label='Original', color='red', edgecolor='black')
axes[1].hist(all_aug_values, bins=20, alpha=0.7,
            label='Augmented', color='blue', edgecolor='black')
axes[1].set_xlabel(f'Feature Value', fontsize=12)
axes[1].set_ylabel('Count', fontsize=12)
axes[1].set_title('Distribution: Original vs Augmented', fontsize=13, fontweight='bold')
axes[1].legend()
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print(f"Noise standard deviation: 0.05")
print(f"Original value: {original[feature_idx]:.4f}")
print(f"Augmented values: {[f'{v:.4f}' for v in all_aug_values]}")

#### Technique 1: Adding Gaussian Noise

Add small random noise to feature values during training.

### 5.5 Data Augmentation for Molecular Data

Data augmentation artificially increases training data by creating modified versions of existing samples.

**For Images:** Rotation, flipping, cropping  
**For Molecular/Tabular Data:** Different approaches needed

**Common Techniques:**
1. **Adding Noise:** Gaussian noise to feature values
2. **Mixup:** Create virtual samples by interpolating between pairs
3. **Feature Dropout:** Randomly zero out features during training

**Why use augmentation?**
- Prevents overfitting
- Improves generalization
- Especially useful with small datasets
- Acts as regularization

## 6. Model Evaluation

In [None]:
# Load best model
model.load_state_dict(torch.load('best_model.pth'))
model.eval()

# Make predictions
with torch.no_grad():
    y_train_pred_scaled = model(X_train_tensor.to(device)).cpu().numpy()
    y_test_pred_scaled = model(X_test_tensor.to(device)).cpu().numpy()

# Inverse transform to original scale
y_train_pred = scaler_y.inverse_transform(y_train_pred_scaled)
y_test_pred = scaler_y.inverse_transform(y_test_pred_scaled)

# Calculate metrics
train_r2 = r2_score(y_train, y_train_pred)
test_r2 = r2_score(y_test, y_test_pred)
test_rmse = np.sqrt(mean_squared_error(y_test, y_test_pred))
test_mae = mean_absolute_error(y_test, y_test_pred)

print("Model Performance:")
print("=" * 40)
print(f"Training R²: {train_r2:.4f}")
print(f"Test R²: {test_r2:.4f}")
print(f"Test RMSE: {test_rmse:.4f}")
print(f"Test MAE: {test_mae:.4f}")

In [None]:
# Actual vs Predicted plot
plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_test_pred, alpha=0.5, s=50, edgecolors='black', linewidth=0.5)

# Perfect prediction line
min_val = min(y_test.min(), y_test_pred.min())
max_val = max(y_test.max(), y_test_pred.max())
plt.plot([min_val, max_val], [min_val, max_val], 'r--', linewidth=2, label='Perfect Prediction')

plt.xlabel('Actual Log Solubility', fontsize=12)
plt.ylabel('Predicted Log Solubility', fontsize=12)
plt.title('PyTorch Neural Network: Actual vs Predicted', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.legend()

# Add R² annotation
plt.text(0.05, 0.95, f'R² = {test_r2:.4f}\nRMSE = {test_rmse:.4f}',
         transform=plt.gca().transAxes, fontsize=11, verticalalignment='top',
         bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.7))

plt.show()

## 7. Comparing with Scikit-learn

In [None]:
from sklearn.ensemble import RandomForestRegressor

# Train Random Forest for comparison
rf = RandomForestRegressor(n_estimators=100, random_state=42)
rf.fit(X_train_scaled, y_train.ravel())

y_test_pred_rf = rf.predict(X_test_scaled).reshape(-1, 1)

# Metrics
rf_r2 = r2_score(y_test, y_test_pred_rf)
rf_rmse = np.sqrt(mean_squared_error(y_test, y_test_pred_rf))
rf_mae = mean_absolute_error(y_test, y_test_pred_rf)

# Compare
comparison = pd.DataFrame({
    'Model': ['PyTorch NN', 'Random Forest'],
    'Test R²': [test_r2, rf_r2],
    'Test RMSE': [test_rmse, rf_rmse],
    'Test MAE': [test_mae, rf_mae]
})

print("\nModel Comparison:")
print(comparison.to_string(index=False))

## 8. Understanding Neural Network Components

### Key Concepts:

1. **Layers**
   - `nn.Linear`: Fully connected layer
   - `nn.BatchNorm1d`: Normalizes layer inputs
   - `nn.Dropout`: Prevents overfitting

2. **Activation Functions**
   - `ReLU`: Most common, fast
   - `Sigmoid`: For binary classification
   - `Tanh`: Centered around 0

3. **Loss Functions**
   - `MSELoss`: Regression
   - `CrossEntropyLoss`: Classification

4. **Optimizers**
   - `Adam`: Adaptive learning rate (most popular)
   - `SGD`: Stochastic Gradient Descent
   - `RMSprop`: Good for RNNs

5. **Regularization**
   - Dropout: Randomly drop neurons
   - Batch Normalization: Stabilize training
   - Weight Decay: L2 regularization

## 9. Summary and Best Practices

### Key Takeaways:

1. **Data Preparation**
   - Always scale features
   - Use DataLoader for efficient batch processing
   - Convert data to tensors

2. **Model Architecture**
   - Start simple, add complexity if needed
   - Use batch normalization for stability
   - Add dropout to prevent overfitting

3. **Training**
   - Monitor both training and validation loss
   - Use learning rate scheduling
   - Save the best model
   - Watch for overfitting

4. **Evaluation**
   - Use appropriate metrics (R², RMSE, MAE)
   - Visualize predictions
   - Compare with baseline models

### When to use Deep Learning?

✓ **Use Neural Networks when:**
- Large dataset (>10,000 samples)
- Complex, non-linear relationships
- High-dimensional data
- Need feature learning

✗ **Stick with traditional ML when:**
- Small dataset (<1,000 samples)
- Simple relationships
- Need interpretability
- Limited computational resources

## 10. Exercise

In [None]:
# TODO: Experiment with the neural network
# 1. Try different architectures (more/fewer layers)
# 2. Experiment with dropout rates
# 3. Try different learning rates
# 4. Add more hidden layers
# 5. Compare performance

# Your code here:


---

## Resources

- [PyTorch Documentation](https://pytorch.org/docs/stable/index.html)
- [PyTorch Tutorials](https://pytorch.org/tutorials/)
- [Deep Learning Book](https://www.deeplearningbook.org/)
- [Neural Networks Playground](https://playground.tensorflow.org/)

**Next Notebook: HuggingFace Transformers for Chemistry**