# Introduction to Deep Learning

Deep learning has revolutionized the field of machine learning and artificial intelligence over the past decade.
This chapter introduces the fundamental concepts of deep learning using modern Python tools.

We'll cover:

- Neural network fundamentals
- Training deep networks
- Convolutional Neural Networks (CNNs)
- Recurrent Neural Networks (RNNs)
- Transfer learning and modern architectures

## What is Deep Learning?

Deep learning is a subfield of machine learning based on artificial neural networks with multiple layers.
These networks are inspired by the structure and function of the human brain, consisting of interconnected nodes that process information.

Key characteristics:
- **Multiple layers**: Networks with many hidden layers (hence "deep")
- **Hierarchical feature learning**: Each layer learns increasingly complex features
- **Automatic feature extraction**: Unlike traditional ML, features are learned automatically
- **Scalability**: Performance often improves with more data and computation

In [None]:
# Import essential deep learning libraries
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

print(f"TensorFlow version: {tf.__version__}")
print(f"Keras version: {keras.__version__}")

## Building Your First Neural Network

Let's start with a simple neural network for classification using the classic MNIST dataset.

In [None]:
# Load and preprocess the MNIST dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Normalize pixel values to [0, 1]
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Flatten the images for the simple neural network
x_train_flat = x_train.reshape(60000, 784)
x_test_flat = x_test.reshape(10000, 784)

print(f"Training data shape: {x_train_flat.shape}")
print(f"Test data shape: {x_test_flat.shape}")
print(f"Training labels shape: {y_train.shape}")

In [None]:
# Build a simple neural network
model = models.Sequential([
    layers.Dense(128, activation='relu', input_shape=(784,)),
    layers.Dropout(0.2),
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.summary()

In [None]:
# Train the model
history = model.fit(x_train_flat, y_train,
                    epochs=10,
                    batch_size=128,
                    validation_split=0.2,
                    verbose=1)

In [None]:
# Evaluate the model
test_loss, test_acc = model.evaluate(x_test_flat, y_test, verbose=0)
print(f"Test accuracy: {test_acc:.4f}")

# Plot training history
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

# Plot accuracy
ax1.plot(history.history['accuracy'], label='Training Accuracy')
ax1.plot(history.history['val_accuracy'], label='Validation Accuracy')
ax1.set_title('Model Accuracy')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Accuracy')
ax1.legend()

# Plot loss
ax2.plot(history.history['loss'], label='Training Loss')
ax2.plot(history.history['val_loss'], label='Validation Loss')
ax2.set_title('Model Loss')
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Loss')
ax2.legend()

plt.tight_layout()
plt.show()

## Convolutional Neural Networks (CNNs)

CNNs are specifically designed for processing grid-like data such as images. They use convolutional layers to automatically learn spatial hierarchies of features.

In [None]:
# Build a CNN for image classification
cnn_model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

# Reshape data for CNN (add channel dimension)
x_train_cnn = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test_cnn = x_test.reshape(x_test.shape[0], 28, 28, 1)

# Compile the CNN
cnn_model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])

cnn_model.summary()

In [None]:
# Train the CNN
cnn_history = cnn_model.fit(x_train_cnn, y_train,
                           epochs=5,
                           batch_size=64,
                           validation_split=0.2,
                           verbose=1)

In [None]:
# Compare performance
cnn_test_loss, cnn_test_acc = cnn_model.evaluate(x_test_cnn, y_test, verbose=0)
print(f"Simple NN Test Accuracy: {test_acc:.4f}")
print(f"CNN Test Accuracy: {cnn_test_acc:.4f}")
print(f"Improvement: {(cnn_test_acc - test_acc) * 100:.2f}%")

## Visualizing Neural Network Activations

Understanding what neural networks learn is crucial for debugging and improving them.

In [None]:
# Create a model to extract intermediate activations
layer_outputs = [layer.output for layer in cnn_model.layers[:6]]
activation_model = models.Model(inputs=cnn_model.input, outputs=layer_outputs)

# Get activations for a sample image
sample_image = x_test_cnn[0:1]
activations = activation_model.predict(sample_image)

# Visualize the activations
layer_names = [layer.name for layer in cnn_model.layers[:6]]

fig, axes = plt.subplots(2, 3, figsize=(15, 10))
for i, (layer_name, activation) in enumerate(zip(layer_names, activations)):
    if len(activation.shape) == 4:  # Convolutional layers
        # Show first few feature maps
        for j in range(min(6, activation.shape[-1])):
            if i * 3 + j < 6:
                ax = axes[i // 3, i % 3] if len(axes.shape) == 2 else axes[i]
                ax.imshow(activation[0, :, :, j], cmap='viridis')
                ax.set_title(f'{layer_name} - Filter {j}')
                ax.axis('off')
                break

plt.tight_layout()
plt.show()

## Transfer Learning

Transfer learning allows us to leverage pre-trained models for new tasks, significantly reducing training time and improving performance.

In [None]:
# Example: Using a pre-trained model (conceptual)
# Note: This would require additional packages like tensorflow_datasets

def create_transfer_model(input_shape, num_classes):
    """
    Create a transfer learning model using a pre-trained base.
    """
    # Load pre-trained model (e.g., MobileNetV2)
    base_model = tf.keras.applications.MobileNetV2(
        input_shape=input_shape,
        include_top=False,
        weights='imagenet'
    )
    
    # Freeze the base model
    base_model.trainable = False
    
    # Add custom classification head
    model = models.Sequential([
        base_model,
        layers.GlobalAveragePooling2D(),
        layers.Dense(128, activation='relu'),
        layers.Dropout(0.2),
        layers.Dense(num_classes, activation='softmax')
    ])
    
    return model

print("Transfer learning model function defined.")
print("This would be used with larger image datasets.")

## Key Deep Learning Concepts

### 1. **Backpropagation**
The algorithm used to train neural networks by computing gradients of the loss function with respect to each weight.

### 2. **Gradient Descent**
Optimization algorithm that iteratively adjusts weights to minimize the loss function.

### 3. **Regularization**
Techniques like dropout and L2 regularization to prevent overfitting.

### 4. **Activation Functions**
Non-linear functions that introduce complexity into the network (ReLU, sigmoid, tanh, etc.).

### 5. **Loss Functions**
Measure of how well the model is performing (cross-entropy, MSE, etc.).

## Best Practices for Deep Learning

1. **Start Simple**: Begin with simple architectures and gradually increase complexity
2. **Use Transfer Learning**: Leverage pre-trained models when possible
3. **Monitor Overfitting**: Use validation sets and early stopping
4. **Data Augmentation**: Increase dataset size through transformations
5. **Hyperparameter Tuning**: Systematically search for optimal parameters
6. **GPU Acceleration**: Use GPUs for faster training
7. **Experiment Tracking**: Keep track of experiments and results

## Modern Deep Learning Frameworks

### TensorFlow/Keras
- Industry-standard framework
- Excellent production deployment options
- Strong community support

### PyTorch
- Research-friendly and flexible
- Dynamic computation graphs
- Growing rapidly in popularity

### JAX
- High-performance numerical computing
- Functional programming approach
- Excellent for research and large-scale training

## Further Resources

- [Deep Learning with Python](https://www.manning.com/books/deep-learning-with-python) by François Chollet
- [Hands-On Machine Learning](https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032618/) by Aurélien Géron
- [fast.ai](https://www.fast.ai/) - Practical deep learning courses
- [Papers with Code](https://paperswithcode.com/) - Latest research and implementations

This introduction provides the foundation for exploring more advanced deep learning topics in subsequent notebooks.