## Deep Learning with TensorFlow/Keras and PyTorch
### IMPOTANT Note: Both TensorFlow/Keras and PyTorch paths included in parallel. Chose one path at once and follow that, then go through on the other path separately if you would.

**Tensors**

*   **Concept:** Tensors are multi-dimensional arrays, similar to NumPy's `ndarray`.  They are the fundamental data structure in deep learning.
*   **TensorFlow/Keras:**

    ```python
    import tensorflow as tf
    import numpy as np

    # Create tensors
    scalar = tf.constant(5)  # 0-dimensional tensor (scalar)
    vector = tf.constant([1, 2, 3])  # 1-dimensional tensor (vector)
    matrix = tf.constant([[1, 2], [3, 4]])  # 2-dimensional tensor (matrix)
    tensor3d = tf.constant([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])  # 3-dimensional tensor

    print("Scalar:", scalar)
    print("Vector:", vector)
    print("Matrix:", matrix)
    print("3D Tensor:", tensor3d)

    # Basic operations
    print("Addition:", vector + 5)
    print("Multiplication:", matrix * 2)
    print("Matrix Multiplication:", tf.matmul(matrix, matrix))

    # Convert between NumPy arrays and TensorFlow tensors
    numpy_array = np.array([1, 2, 3])
    tf_tensor = tf.convert_to_tensor(numpy_array)
    numpy_back = tf_tensor.numpy()

    # Check the shape and data type
    print("Shape:", matrix.shape)
    print("Data type:", matrix.dtype)

    # Using GPUs (if available)
    if tf.config.list_physical_devices('GPU'):
        with tf.device('/GPU:0'):
            matrix_gpu = tf.constant([[1, 2], [3, 4]])
            print("Matrix on GPU:", matrix_gpu)
    ```

*   **PyTorch:**

    ```python
    import torch
    import numpy as np

    # Create tensors
    scalar = torch.tensor(5)
    vector = torch.tensor([1, 2, 3])
    matrix = torch.tensor([[1, 2], [3, 4]])
    tensor3d = torch.tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

    print("Scalar:", scalar)
    print("Vector:", vector)
    print("Matrix:", matrix)
    print("3D Tensor:", tensor3d)

    # Basic operations
    print("Addition:", vector + 5)
    print("Multiplication:", matrix * 2)
    print("Matrix Multiplication:", torch.matmul(matrix, matrix))  # or matrix @ matrix

    # Convert between NumPy arrays and PyTorch tensors
    numpy_array = np.array([1, 2, 3])
    torch_tensor = torch.from_numpy(numpy_array)  # Shares memory (if on CPU)
    numpy_back = torch_tensor.numpy()

    # Check the shape and data type
    print("Shape:", matrix.shape)  # or matrix.size()
    print("Data type:", matrix.dtype)

    # Using GPUs (if available)
    if torch.cuda.is_available():
        device = torch.device('cuda')
        matrix_gpu = matrix.to(device)  # Move tensor to GPU
        # or matrix_gpu = torch.tensor([[1, 2], [3, 4]], device=device) # Create directly on GPU
        print("Matrix on GPU:", matrix_gpu)

        # Operations on GPU
        result_gpu = matrix_gpu * 2
        print("Result on GPU", result_gpu)

        # Move back to CPU
        result_cpu = result_gpu.cpu()
        print("Result back to CPU", result_cpu)

    ```

**Neural Network Basics**

*   **Neurons:**  The basic building block of a neural network.  A neuron takes inputs, multiplies them by weights, adds a bias, and applies an activation function.

*   **Activation Functions:** Introduce non-linearity into the network, allowing it to learn complex patterns.
    *   **ReLU (Rectified Linear Unit):**  `f(x) = max(0, x)` (most common)
    *   **Sigmoid:** `f(x) = 1 / (1 + exp(-x))` (outputs values between 0 and 1, often used in output layer for binary classification)
    *   **Tanh (Hyperbolic Tangent):** `f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))` (outputs values between -1 and 1)

*   **Layers:**
    *   **Dense (Fully Connected):** Each neuron in the layer is connected to every neuron in the previous layer.
    *   **Convolutional (Conv2D):**  Used for image processing, applies filters to extract features.
    *   **Recurrent (LSTM, GRU):**  Used for sequence data, has internal memory to process sequences of varying lengths.

*   **Loss Functions:** Measure the difference between the model's predictions and the true values.
    *   **Mean Squared Error (MSE):**  Common for regression.
    *   **Binary Cross-Entropy:** Common for binary classification.
    *   **Categorical Cross-Entropy:** Common for multi-class classification.

*   **Optimizers:**  Algorithms that adjust the model's weights to minimize the loss function.
    *   **SGD (Stochastic Gradient Descent):**  Basic optimizer.
    *   **Adam (Adaptive Moment Estimation):**  Popular and often performs well (adaptive learning rates).
    * **RMSprop:** Another optimizer with adaptive learning rates.

**Building and Training Models**

*   **TensorFlow/Keras (Sequential API):**  Simple way to build models layer by layer.

    ```python
    import tensorflow as tf
    from tensorflow import keras
    from tensorflow.keras.layers import Dense, Flatten

    # Define the model
    model = keras.Sequential([
        Flatten(input_shape=(28, 28)),  # Flatten 28x28 images to a 784-dimensional vector
        Dense(128, activation='relu'),  # Dense layer with 128 neurons and ReLU activation
        Dense(10, activation='softmax')  # Output layer with 10 neurons (for 10 classes) and softmax activation
    ])

    # Compile the model
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',  # For integer labels
                  metrics=['accuracy'])

    # Load and preprocess the MNIST dataset
    (x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
    x_train, x_test = x_train / 255.0, x_test / 255.0  # Normalize pixel values

    # Train the model
    model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))

    # Evaluate the model
    loss, accuracy = model.evaluate(x_test, y_test, verbose=0)
    print("Test Loss:", loss)
    print("Test Accuracy:", accuracy)

    # Make predictions
    predictions = model.predict(x_test[:5])
    print("Predictions:", np.argmax(predictions, axis=1)) # Get the class with the highest probability
    print("True Labels:", y_test[:5])
    ```

* **TensorFlow/Keras (Functional API):** More flexible, allows for complex model architectures (e.g., models with multiple inputs or outputs).
    ```python
    from tensorflow.keras.layers import Input, Dense, Flatten
    from tensorflow.keras.models import Model

    # Define the input
    input_tensor = Input(shape=(28,28))

    # Define the layers
    x = Flatten()(input_tensor)
    x = Dense(128, activation='relu')(x)
    output_tensor = Dense(10, activation='softmax')(x)

    # Create the model
    model = Model(inputs=input_tensor, outputs=output_tensor)
    # Compile and train (same as Sequential API)
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))

    ```

*   **PyTorch:**

    ```python
    import torch
    import torch.nn as nn
    import torch.optim as optim
    import torch.nn.functional as F
    from torchvision import datasets, transforms
    from torch.utils.data import DataLoader

    # Define the model (as a class inheriting from nn.Module)
    class Net(nn.Module):
        def __init__(self):
            super(Net, self).__init__()
            self.flatten = nn.Flatten()
            self.fc1 = nn.Linear(28 * 28, 128)  # Fully connected layer
            self.fc2 = nn.Linear(128, 10)

        def forward(self, x):
            x = self.flatten(x)
            x = F.relu(self.fc1(x))  # Apply ReLU activation
            x = self.fc2(x)
            return F.log_softmax(x, dim=1)  # Apply log_softmax for numerical stability

    # Instantiate the model
    model = Net()

    # Define the optimizer
    optimizer = optim.Adam(model.parameters())

    # Define the loss function
    criterion = nn.CrossEntropyLoss() # Combines LogSoftmax and NLLLoss

    # Load and preprocess the MNIST dataset
    transform = transforms.Compose([
        transforms.ToTensor(),  # Convert to tensor
        transforms.Normalize((0.1307,), (0.3081,))  # Normalize
    ])
    train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
    test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
    train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
    test_loader = DataLoader(test_dataset, batch_size=1000, shuffle=False)

    # Training loop
    def train(model, device, train_loader, optimizer, criterion, epoch):
      model.train() # Set the model to training mode
      for batch_idx, (data, target) in enumerate(train_loader):
          data, target = data.to(device), target.to(device) # Move data to device
          optimizer.zero_grad()  # Zero the gradients
          output = model(data)
          loss = criterion(output, target)
          loss.backward()  # Backpropagation
          optimizer.step()  # Update weights
          if batch_idx % 100 == 0:
            print(f'Train Epoch: {epoch} [{batch_idx * len(data)}/{len(train_loader.dataset)} ({100. * batch_idx / len(train_loader):.0f}%)]\tLoss: {loss.item():.6f}')


    # Test Loop
    def test(model, device, test_loader, criterion):
        model.eval()  # Set the model to evaluation mode
        test_loss = 0
        correct = 0
        with torch.no_grad():  # Disable gradient calculation during testing
            for data, target in test_loader:
                data, target = data.to(device), target.to(device)
                output = model(data)
                test_loss += criterion(output, target).item()  # Sum up batch loss
                pred = output.argmax(dim=1, keepdim=True)  # Get the index of the max log-probability
                correct += pred.eq(target.view_as(pred)).sum().item()

        test_loss /= len(test_loader.dataset)
        print(f'\nTest set: Average loss: {test_loss:.4f}, Accuracy: {correct}/{len(test_loader.dataset)} ({100. * correct / len(test_loader.dataset):.0f}%)\n')

    # Move model to device (GPU if available)
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)
    # Run training and testing
    epochs = 5
    for epoch in range(1, epochs + 1):
        train(model, device, train_loader, optimizer, criterion, epoch)
        test(model, device, test_loader, criterion)
    ```

**Data Loading and Preprocessing**

*   **TensorFlow/Keras:**
    *   `tf.data.Dataset`:  Efficient way to create data pipelines.  Handles loading, preprocessing, batching, and shuffling.

        ```python
        # Create a dataset from a NumPy array
        dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
        dataset = dataset.shuffle(buffer_size=10000).batch(32).prefetch(tf.data.AUTOTUNE)

        # Load data from files (e.g., images)
        # list_ds = tf.data.Dataset.list_files('path/to/images/*.jpg')
        # def process_path(file_path):
        #  # Load and preprocess image
        #  img = tf.io.read_file(file_path)
        #  img = tf.image.decode_jpeg(img, channels=3)
        #  img = tf.image.resize(img, [224, 224])
        #   img = tf.cast(img, tf.float32) / 255.0 # Normalize
        #   label = ...  # Extract label from file path or other source
        #  return img, label
        # image_ds = list_ds.map(process_path).batch(32)

        # ... then use dataset in model.fit()
        # model.fit(dataset, epochs=...)
        ```

*   **PyTorch:**
    *   `torch.utils.data.Dataset`:  Abstract class representing a dataset.  You create a custom dataset class by inheriting from `Dataset` and implementing `__len__` and `__getitem__`.
    *   `torch.utils.data.DataLoader`:  Provides an iterable over a dataset, handling batching, shuffling, and parallel data loading.

        ```python
        # (See the PyTorch example in Module 3 for a complete example using
        # torchvision.datasets.MNIST and DataLoader)

        # Example of a custom dataset
        from torch.utils.data import Dataset, DataLoader
        class CustomDataset(Dataset):
            def __init__(self, data, targets, transform=None):
                self.data = data
                self.targets = targets
                self.transform = transform

            def __len__(self):
                return len(self.data)

            def __getitem__(self, idx):
                sample = self.data[idx]
                target = self.targets[idx]
                if self.transform:
                    sample = self.transform(sample)
                return sample, target

        # Usage
        # custom_dataset = CustomDataset(data, targets, transform=...)
        # data_loader = DataLoader(custom_dataset, batch_size=32, shuffle=True)

        # ... then use data_loader in the training loop
        ```

**Convolutional Neural Networks (CNNs)**

*   **Concept:**  CNNs are designed for processing data with a grid-like topology, such as images.  They use convolutional layers to extract local features and pooling layers to reduce dimensionality.

*   **TensorFlow/Keras:**

    ```python
    from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

    model = keras.Sequential([
        Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),  # 32 filters, 3x3 kernel
        MaxPooling2D((2, 2)),  # 2x2 pooling
        Conv2D(64, (3, 3), activation='relu'),
        MaxPooling2D((2, 2)),
        Flatten(),
        Dense(10, activation='softmax')
    ])

    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])

    # Reshape MNIST data to include a channel dimension (required for Conv2D)
    (x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
    x_train = x_train.reshape((60000, 28, 28, 1)).astype('float32') / 255
    x_test = x_test.reshape((10000, 28, 28, 1)).astype('float32') / 255

    model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))
    ```

*   **PyTorch:**

    ```python
    import torch.nn as nn
    import torch.nn.functional as F

    class CNN(nn.Module):
        def __init__(self):
            super(CNN, self).__init__()
            self.conv1 = nn.Conv2d(1, 32, kernel_size=3)  # 1 input channel, 32 output channels, 3x3 kernel
            self.conv2 = nn.Conv2d(32, 64, kernel_size=3)
            self.fc1 = nn.Linear(1600, 10) # 64 channels * 5 * 5 after two pooling layers

        def forward(self, x):
            x = F.relu(F.max_pool2d(self.conv1(x), 2))  # Convolution -> ReLU -> Max Pooling
            x = F.relu(F.max_pool2d(self.conv2(x), 2))
            x = x.view(-1, 1600)  # Flatten
            x = self.fc1(x)
            return F.log_softmax(x, dim=1)

    # (Training loop is similar to the previous PyTorch example)
    model = CNN()
    # ... optimizer, loss, data loading ...
    ```

**Recurrent Neural Networks (RNNs)**

*   **Concept:** RNNs are designed for processing sequential data. They have recurrent connections that allow information to persist across time steps.  LSTMs (Long Short-Term Memory) and GRUs (Gated Recurrent Units) are variants of RNNs that address the vanishing gradient problem.

*   **TensorFlow/Keras:**

    ```python
    from tensorflow.keras.layers import LSTM, SimpleRNN, GRU, Embedding, Dense

    model = keras.Sequential([
        Embedding(input_dim=10000, output_dim=32),  # Embedding layer for text data
        LSTM(64),  # LSTM layer with 64 units
        # Or: SimpleRNN(64)
        # Or: GRU(64)
        Dense(1, activation='sigmoid')  # Output layer for binary classification
    ])

    model.compile(optimizer='adam',
                  loss='binary_crossentropy',
                  metrics=['accuracy'])

    # Example with dummy sequence data (replace with your actual data)
    x_train = np.random.randint(0, 10000, size=(1000, 50))  # 1000 sequences of length 50
    y_train = np.random.randint(0, 2, size=(1000, 1)) # Binary Labels

    model.fit(x_train, y_train, epochs=5)
    ```

*   **PyTorch:**

    ```python
    import torch.nn as nn

    class RNN(nn.Module):
        def __init__(self, input_size, hidden_size, num_layers, num_classes):
            super(RNN, self).__init__()
            self.hidden_size = hidden_size
            self.num_layers = num_layers
            self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
            # Or: self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)
            # Or: self.gru = nn.GRU(input_size, hidden_size, num_layers, batch_first=True)
            self.fc = nn.Linear(hidden_size, num_classes)

        def forward(self, x):
            # Initialize hidden and cell states
            h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
            c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)

            # Forward propagate LSTM
            out, _ = self.lstm(x, (h0, c0))  # out: tensor of shape (batch_size, seq_length, hidden_size)

            # Decode the hidden state of the last time step
            out = self.fc(out[:, -1, :])
            return out

    # Example usage
    model = RNN(input_size=28, hidden_size=128, num_layers=2, num_classes=10)

    # (Training loop is similar to the previous PyTorch examples)
    ```

**Transfer Learning**

*   **Concept:** Reuse a pre-trained model (trained on a large dataset) as a starting point for a new task.  This can significantly improve performance and reduce training time, especially when you have limited data.

*   **TensorFlow/Keras:**

    ```python
    from tensorflow.keras.applications import VGG16  # Example: VGG16 pre-trained on ImageNet
    from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
    from tensorflow.keras.models import Model

    # Load the pre-trained VGG16 model (excluding the top classification layer)
    base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

    # Freeze the base model's weights (don't train them)
    base_model.trainable = False

    # Add custom layers on top
    x = base_model.output
    x = GlobalAveragePooling2D()(x)  # Pool the feature maps
    x = Dense(1024, activation='relu')(x)
    predictions = Dense(10, activation='softmax')(x)  # Output layer for 10 classes

    # Create the final model
    model = Model(inputs=base_model.input, outputs=predictions)

    # Compile the model
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

    # (Load and preprocess your data, then train the model)
    # model.fit(...)

    # Fine-tuning (optional): Unfreeze some of the top layers of the base model
    # and train with a very low learning rate.
    # for layer in base_model.layers[:15]: # Example: Freeze first 15 layers
    #   layer.trainable = False
    # for layer in base_model.layers[15:]:
    #    layer.trainable = True

    # from tensorflow.keras.optimizers import Adam
    # model.compile(optimizer=Adam(learning_rate=1e-5), loss='categorical_crossentropy', metrics=['accuracy']) # Low Learning Rate

    ```

*   **PyTorch:**

    ```python
    import torchvision.models as models
    import torch.nn as nn

    # Load a pre-trained ResNet18 model
    model = models.resnet18(pretrained=True)

    # Freeze the base model's weights
    for param in model.parameters():
        param.requires_grad = False

    # Replace the final fully connected layer
    num_ftrs = model.fc.in_features
    model.fc = nn.Linear(num_ftrs, 10)  # New FC layer for 10 classes

    # (Move model to device, define optimizer and loss, load data, and train)
    # ...
    # Fine-tuning (Optional)
    # for param in model.fc.parameters(): # Unfreeze the last layer
    #   param.requires_grad = True
    # optimizer = optim.Adam(model.parameters(), lr=0.0001) # Use a smaller learning rate.

    ```

**Regularization**

*   **Concept:** Techniques to prevent overfitting.

*   **Dropout:**  Randomly "drop out" (set to zero) neurons during training.  This forces the network to learn more robust features.

    ```python
    # TensorFlow/Keras
    from tensorflow.keras.layers import Dropout
    model = keras.Sequential([
        # ... other layers ...
        Dense(128, activation='relu'),
        Dropout(0.5),  # Dropout with a rate of 0.5 (50% of neurons dropped)
        Dense(10, activation='softmax')
    ])

    # PyTorch
    class MyModel(nn.Module):
        def __init__(self):
          super().__init__()
          # ... other layers ...
          self.fc1 = nn.Linear(128, 64)
          self.dropout = nn.Dropout(0.5)
          self.fc2 = nn.Linear(64, 10)
        def forward(self,x):
          # ...
          x = F.relu(self.fc1(x))
          x = self.dropout(x)
          x = self.fc2(x)
          return x
    ```

*   **L1/L2 Regularization:** Add a penalty to the loss function based on the magnitude of the weights.  L1 regularization encourages sparsity (some weights become zero), while L2 regularization encourages smaller weights.

    ```python
    # TensorFlow/Keras
    from tensorflow.keras.regularizers import l1, l2

    model = keras.Sequential([
        # ... other layers ...
        Dense(128, activation='relu', kernel_regularizer=l2(0.01)),  # L2 regularization
        # Or: Dense(128, activation='relu', kernel_regularizer=l1(0.01)),  # L1 regularization
        Dense(10, activation='softmax')
    ])
    # PyTorch (add to the loss)
    l2_lambda = 0.01
    l1_lambda = 0.01
    l2_reg = torch.tensor(0., requires_grad=True)
    l1_reg = torch.tensor(0., requires_grad=True)
    for param in model.parameters():
        l2_reg = l2_reg + torch.norm(param, 2) #L2
        l1_reg = l1_reg + torch.norm(param, 1) #L1

    loss = criterion(output, target) + l2_lambda * l2_reg + l1_lambda * l1_reg # Add to loss
    loss.backward()
    optimizer.step()

    ```

**Model Saving and Loading**

*   **TensorFlow/Keras:**

    ```python
    # Save the entire model (architecture, weights, optimizer state)
    model.save('my_model.h5')  # Or: model.save('my_model') (SavedModel format)

    # Load the model
    loaded_model = keras.models.load_model('my_model.h5')

    # Save only the weights
    model.save_weights('my_model_weights.h5')
    # Load only the weights (requires the model architecture to be defined first)
    # model.load_weights('my_model_weights.h5')

    ```

*   **PyTorch:**

    ```python
    # Save the model's state dictionary (recommended)
    torch.save(model.state_dict(), 'my_model.pth')

    # Load the model's state dictionary
    model = Net()  # Create an instance of the model architecture
    model.load_state_dict(torch.load('my_model.pth'))
    model.eval() # Important: Set to evaluation mode after loading

    # Save the entire model (less flexible)
    # torch.save(model, 'my_model_full.pth')
    # loaded_model = torch.load('my_model_full.pth')
    ```

This course provides a comprehensive introduction to deep learning with TensorFlow/Keras and PyTorch. The parallel presentation of code for both frameworks allows for easy comparison and helps you decide which one best fits your needs and preferred coding style. Remember to practice by building and training your own models, experimenting with different architectures and hyperparameters, and exploring the extensive documentation and resources available for both frameworks.
