# Exercise 2: Solar Compass from Polarization Images (Kaggle Version)
![validation_samples.png](attachment:validation_samples.png)
---
## Table of Contents
0. [Kaggle Environment Setup](#kaggle-environment-setup)
1. [Introduction](#introduction)
2. [Tool: Hydra](#tool-hydra)
    - [Hiearachical Configs](#hiearachical-configs)
    - [Default Lists](#default-lists)
    - [Variable Interpolation](#variable-interpolation)
    - [Optional Default Lists and `_self_`](#optional-defaults-list-and-_self_)
    - [Instantiate API](#instantiate-api)
3. [Tip: autoreload in Jupyter Notebooks](#tip-autoreload-in-jupyter-notebooks)
4. [Polarized Images](#polarized-images)
5. [Feature Selection](#feature-selection)
6. [Backbone](#backbone)
7. [Data Augmentation](#data-augmentation)
8. [CUDA (GPU Acceleration)](#cuda-gpu-acceleration)
9. [Training Loop](#training-loop)
    - [Tool: Weights & Biases (W&B)](#tool-weights--biases-wb)


## Kaggle Environment Setup

Welcome to this notebook! The goal in this section is to set up your Kaggle environment for working on your project. We will walk you through a few essential steps to ensure everything is properly configured for your work. These steps are automated using the functions provided, allowing you to focus on modifying and improving your code. Below is a quick overview of what each part does:

- **Downloading Files from GitHub:** We use a function to download the required files from your repository into Kaggle.
- **Installing Dependencies:** Dependencies specified in the `env.yml` file will be automatically installed, ensuring the right packages are available in the Kaggle environment.
- **Editing Files within Kaggle:** You can modify the `dataset.py` and `model.py` files.

Let's proceed and set up the environment for your project!

### Downloading Files from GitHub

In this part, we will download the necessary files from your personal GitHub repository into the Kaggle environment. To do this, simply replace the placeholder values in the following lines (at the bottom of the cell) with your own GitHub credentials and run the cell:

```python
download_files_in_folder("<your-username>", "<your-personal-repo-name>", "ex_2", token="<your-personal-access-token>")
download_files_in_folder("<your-username>", "<your-personal-repo-name>", "env.yml", token="<your-personal-access-token>")
```

Ensure you have a GitHub personal access token. If you do not have one, you can create it by following the [Creating a personal access token (classic)](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#creating-a-personal-access-token-classic) section of the GitHub guide.

In [None]:
import os
import requests


def download_files_in_folder(
    owner: str, 
    repo: str, 
    folder_path: str, 
    local_dir: str = '', 
    branch: str = 'main', 
    token: str = None
) -> None:
    """
    Downloads files from your personal GitHub repository into Kaggle.
    
    Parameters:
        owner (str): GitHub username that owns the repository.
        repo (str): The repository name.
        folder_path (str): Path to the folder or file in the repository to download.
        local_dir (str, optional): The local directory to store the downloaded files. Defaults to ''.
        branch (str, optional): The branch to download from. Defaults to 'main'.
        token (str, optional): GitHub personal access token for authentication. Defaults to None.

    Raises:
        requests.HTTPError: If the request to the GitHub API fails.
    """
    # Set up the headers for the request, adding authentication if a token is provided
    headers = {'Authorization': f'token {token}'} if token else {}

    # GitHub API URL to access the contents of the folder or file
    api_url = f'https://api.github.com/repos/{owner}/{repo}/contents/{folder_path}?ref={branch}'
    response = requests.get(api_url, headers=headers)
    response.raise_for_status()
    contents = response.json()

    # Check if the provided path is a file
    if isinstance(contents, dict) and contents['type'] == 'file':
        item_name = contents['name']
        download_url = contents['download_url']
        local_path = os.path.join(local_dir or '.', item_name)

        # Create directories if needed
        os.makedirs(os.path.dirname(local_path), exist_ok=True)
        print(f'Downloading {item_name}...')

        # Download the file
        file_response = requests.get(download_url, headers=headers)
        file_response.raise_for_status()

        # Save the file locally
        with open(local_path, 'wb') as f:
            f.write(file_response.content)
        return

    # If it's a directory, recursively download the contents
    elif isinstance(contents, list):
        if not local_dir:
            local_dir = folder_path

        # Create the local directory if it doesn't exist
        os.makedirs(local_dir, exist_ok=True)

        # Iterate through the directory contents and download files/subdirectories
        for item in contents:
            item_name = item['name']
            item_path = item['path']
            download_url = item['download_url']
            item_type = item['type']

            local_path = os.path.join(local_dir, item_name)

            if item_type == 'file':
                print(f'Downloading {item_path}...')
                file_response = requests.get(download_url, headers=headers)
                file_response.raise_for_status()

                # Save each file
                with open(local_path, 'wb') as f:
                    f.write(file_response.content)

            elif item_type == 'dir':
                # Recursively download subdirectories
                download_files_in_folder(owner, repo, item_path, local_path, branch, token)
    else:
        # Handle cases where the folder path is not found or empty
        print('Folder not found or is empty.')

download_files_in_folder("<your-username>", "<your-personal-repo-name>", "ex_2", token="<your-personal-access-token>")
download_files_in_folder("<your-username>", "<your-personal-repo-name>", "env.yml", token="<your-personal-access-token>")

### Installing Dependencies

To ensure that your Kaggle environment has all the necessary libraries and packages for your project, we will install dependencies listed in the `env.yml` file. Run the following cell to install the dependencies.

In [None]:
import yaml
import subprocess
import sys

def install_pip_dependencies_from_env(file_path: str) -> None:
    """
    Installs pip dependencies listed in the `env.yml` file.

    Parameters:
        file_path (str): Path to the `env.yml` file containing the dependencies.

    Raises:
        FileNotFoundError: If the specified `env.yml` file is not found.
        yaml.YAMLError: If there is an error parsing the `env.yml` file.
        subprocess.CalledProcessError: If the pip installation fails.
    """
    # Load the env.yml file
    try:
        with open(file_path, 'r') as file:
            env_data = yaml.safe_load(file)
    except FileNotFoundError as e:
        print(f"Error: {e}")
        return
    except yaml.YAMLError as e:
        print(f"Error parsing YAML file: {e}")
        return

    # Check if there are pip dependencies
    pip_dependencies = env_data.get('dependencies', [])
    pip_packages = []

    # Iterate through the dependencies to find the 'pip' section
    for dep in pip_dependencies:
        if isinstance(dep, dict) and 'pip' in dep:
            pip_packages = dep['pip']
            break

    # Install each pip package
    if pip_packages:
        print(f"Installing pip dependencies: {pip_packages}")
        try:
            subprocess.check_call([sys.executable, '-m', 'pip', 'install'] + pip_packages)
        except subprocess.CalledProcessError as e:
            print(f"Error installing packages: {e}")
    else:
        print("No pip dependencies found in the env.yml file.")

install_pip_dependencies_from_env('env.yml')

### Editing Files within Kaggle

You can edit your `dataset.py` and `model.py` files directly in Kaggle. Follow these steps:

1. **Run the Setup Cell:** This initializes the `%insert_code_from_py` and `%%save_py` commands for loading and saving files in Kaggle.

2. **Load Files with `%insert_code_from_py filepath`:** Use `%insert_code_from_py filepath` to load files from your GitHub repository into this notebook.

   > 💡 When you first open the notebook, the `%insert_code_from_py ex_2/dataset.py` and `%insert_code_from_py ex_2/model.py` commands will be pre-configured. Running these commands will automatically load the `dataset.py` and `model.py` files for you.

3. **Save Changes with `%%save_py <filepat>`:** Use `%%save_py <filepath>` to save any changes made in Kaggle. When executing cells starting with `%%save_py <filepath>`, the code in that cell will be saved to <filepath>.

   > 💡 After executing `%insert_code_from_py ex_2/dataset.py` and `%insert_code_from_py ex_2/model.py`, the `%%save_py ex_2/dataset.py` and `%%save_py ex_2/model.py` commands will be automatically added to the beginning of the cell.

   > ⚠️ Avoid modifying the `%%save_py` commands; otherwise, your changes will not be saved in Kaggle.

Changes made in Kaggle notebooks will be automatically saved but the changes will **_not_** sync with your GitHub repository by default. To keep your GitHub repository updated, you’ll need to manually push any changes made in Kaggle to GitHub. For detailed instructions, see the [How to Manage and Save Code Changes to GitHub](#how-to-manage-and-save-code-changes-to-github) section below.

#### How to Manage and Save Code Changes to GitHub

The `dataset.py` and `model.py` files in Kaggle have been downloaded from your GitHub repository. **_Changes made in Kaggle will not automatically sync back to your repository._** To keep your code up-to-date with GitHub or to save your changes to GitHub, follow these steps:

1. **Fetching Updates from GitHub:**
   > ⚠️ **Important:** The current cell contents will be overwritten! Be sure to commit and push your changes to GitHub before proceeding. You only need to do this when you want to load the files of the latest revision from your Github repository.
   
   - Re-download the Repository: Run the cell with the `download_files_in_folder` function ([Downloading Files from GitHub](#downloading-files-from-github)) to fetch the most recent files from your repository.

   - Reload the Code: Either replace the current cell or add a new code cell by clicking `+ Code`. Use `%insert_code_from_py filepath` to load the updated file. Replace `filepath` with the appropriate path (e.g., `ex_2/dataset.py`).

   - Run the cell for the changes to take effect.

2. **Saving Changes to GitHub:**
   - Copy the updated code from the Kaggle notebook cell.

   - Paste it into the corresponding file in your DevContainer environment in Visual Studio Code.
   
   - Commit and push the changes to your GitHub repository from there.

Following these steps ensures that your Kaggle environment stays synchronized with your GitHub repository and your changes are properly saved.

In [None]:
from IPython.core.magic import register_line_magic, register_cell_magic
from IPython import get_ipython
import re

@register_line_magic
def insert_code_from_py(file: str) -> None:
    """
    Inserts code from a Python file into the current Jupyter notebook cell.

    Parameters:
        file (str): Path to the Python file (in your GitHub repository) to be inserted.
    """
    comment = f"""# Note:
# The code in this cell has been loaded from the file {file} in the Git repository.
# Changes made here will not be automatically synced back to the repository.
#
# To update the code with the latest changes from GitHub:
#    Important: The content of the current cell will be overwritten! Ensure you have committed and pushed your changes to GitHub first.
# 1. Re-download the Repository: Execute the cell with the `download_files_in_folder` function.
# 2. Reload the Code: Replace the content of this cell with `%insert_code_from_py {file}` and execute it to load the updated file.
#
# To apply changes made in this cell:
# 1. Run this cell for changes to take effect.
#
# To save your changes to GitHub:
# 1. Copy the updated code from this cell.
# 2. Paste it into the corresponding file ({file}) in your DevContainer workspace in Visual Studio Code.
# 3. Commit and push the changes to your GitHub repository from there.
#
# Follow these steps to keep your Kaggle environment and GitHub repository synchronized.
"""

    save_magic = f"%%save_py {file}\n"
    file_path = file.strip()
    
    try:
        with open(file_path, 'r') as f:
            code = f.read()
            # Remove the __main__ block if present
            code_without_main = re.sub(
                r"if __name__ == ['\"]__main__['\"]:\n([ \t]*.*\n)+", "", code, flags=re.MULTILINE
            ).rstrip()
    except FileNotFoundError as e:
        print(f"Error: {e}")
        return
    
    # Dynamically insert the code as a new cell
    shell = get_ipython()
    shell.set_next_input(save_magic + comment + code_without_main, replace=True)

@register_cell_magic
def save_py(line: str, cell: str) -> None:
    """
    Appends code to a specified Python file and executes it.

    Parameters:
        line (str): The first line of the cell, used to determine the filename.
        cell (str): The code to be saved to the file.
    """
    file_name = line.strip() if line.strip() else "saved_script.py"
    
    try:
        with open(file_name, 'a') as f:
            f.write(cell + '\n')
    except IOError as e:
        print(f"Error writing to file: {e}")
        return
    
    # Execute the cell code
    exec(cell, globals())


In [None]:
%insert_code_from_py ex_2/dataset.py

In [None]:
%insert_code_from_py ex_2/model.py

## Introduction
In this exercise, you will be tasked with building a neural network solution to estimate sun direction from images captured by a polarized camera. You will apply knowledge about data augmentation, convolutional networks into practice.

The goal is to train a model that predicts the sun’s direction based on polarized images. Accurate heading estimation is crucial for outdoor autonomous navigation, where robots typically rely on magnetometers. However, these sensors are vulnerable to electromagnetic interference. Inspired by insects’ ability to navigate using skylight polarization, this method presents a promising alternative.

To facilitate this, we have collected a dataset of polarized skylight images from the rooftop of our AE building. By using the exact timestamps and GPS coordinates of the images, we have deduced the sun’s direction as ground truth. Your task is to train a neural network using this ground truth data and develop a model robust enough to be deployed in real-world conditions.

![image.png](attachment:image.png)

## Tool: Hydra
---
As the complexity of your machine learning program grows, you will find that the number of hyperparameters and configuration options increases significantly. Managing these parameters in a single YAML file can quickly become overwhelming, leading to inflexible setups and hard-to-reproduce experiments. [Hydra](https://hydra.cc/docs/intro/) addresses this challenge by providing a powerful configuration management system that allows you to easily organize, override, and experiment with different settings. It streamlines the process of managing hyperparameters, datasets, models, and other components, making it easier to scale your projects, experiment with various setups, and maintain clear, flexible, and reproducible workflows.

We will illustrate its basic features with `ex_2/config/train.yaml`.

### Hiearachical Configs
In Hydra, configurations can be organized into hierarchical parameter groups. For instance, parameters like `epochs` can be set at the top level. However, certain components, such as the model, optimizer, dataset, and dataloader, often require multiple hyperparameters for a complete definition.

Take the optimizer as an example: it typically needs at least the optimizer type (e.g., Adam) and the learning rate. In such cases, related parameters can be grouped together by nesting them under a line with the format `<group_name>:` (e.g., `optimizer:`), using indentation to indicate their association. After the config is read into Python as `cfg`, you can access this group of parameters by `cfg.<group_name>`.

Note that parameter groups can be nested within other parameter groups. For example, the `demo` parameter group contains two subgroups, `param_group_1` and `param_group_2`. Calling `cfg.demo` would return all parameters within the group, including those from its subgroups.

### Default Lists
In machine learning experiments, ablation studies are often necessary. For instance, one may need to test various model architectures or experiment with different loss functions to select the optimal approach. In such cases, it is highly useful to define parameters for different options in separate YAML files, and later easily switching between these options using Hydra’s default lists mechanism.

In our case, we have defined such modules in `config/model/backbone`, `config/model/representation`, `config/model/readout`. In `train.yaml`, for each default list options, you can specify the filename (without the `.yaml`) in the corresponding folder to select your desired configs for that parameter group.

For example, in our `train.yaml`, if we choose `vanilla` as the setting for `model/backbone`, Hydra will read the parameters in `vanilla.yaml` file under `config/model/backbone` and put it under the parameter group `model.backbone`.

We can also override the destination for parameters in default lists. By default, parameters are placed in a group based on the file’s location. For example, files in `model/backbone` would be placed under the `model.backbone` parameter group. However, by using the `@` symbol, we can override this destination. For instance, we overrode the destination of parameters in the `config/lr` folder to be placed under the `optimizer` parameter group, ensuring the learning rate settings in those files are properly registered as part of the optimizer configuration.

### Variable Interpolation
When selecting configurations for default lists, one often needs to select according to the options set for some other default lists. For example, the learning rate for an experiment depends on the model architecture to be trained, hence we need an optimal learning rate for each of the model architecture. We could use Hydra's variable interpolation scheme. As an example, for the `lr` option, `${model/backbone}_${model/representation}_${model/readout}` is going to be resolved into `vanilla_IQU_angle` or `resnet_raw_vector` depending on the settings.

### Optional Defaults List and `_self_`
By default, Hydra fails with an error if a config specified in the defaults does not exist. The `optional` argument preceding the defaults item (e.g. our `lr` setting) instructs Hydra to ignore it if the specified config cannot be found.

When constructing the configuration, Hydra processes the defaults list from top to bottom. The `_self_` entry represents the parameters defined in the current YAML file, outside the defaults list. By placing `_self_` at the beginning of the defaults list, the parameters in the defaults will override the locally defined ones. Conversely, placing `_self_` at the end (or omitting it, as Hydra automatically appends it at the end) ensures that locally defined parameters take precedence over those in the defaults list.

### Instantiate API
In PyTorch, it’s common to instantiate models, loss functions, optimizers, and other components using a long list of hyperparameters to configure experiments. We could manually retrieve these parameters from Hydra and instantiate objects like this:

```python
from dataset import PolImgDataset

dataset = PolImgDataset(cfg.dataset.dataset_path, cfg.dataset.prefix)
```

However, using Hydra's `instantiate` API, we can achieve the same result in a single line, even without needing the `import` statement:

```python
dataset = instantiate(cfg.dataset)
```

This approach simplifies the code, enhances readability, and minimizes the risk of errors from manually specifying all the arguments.


## Tip: autoreload in Jupyter Notebooks
In a Jupyter notebook, the `%load_ext autoreload` and `%autoreload 2` commands are used to automatically reload any imported Python modules you modify, without needing to restart the kernel or manually re-import the module. You only need to run these commands once at the beginning of your notebook. After doing so, the autoreload extension will remain active, ensuring that any changes you make to your code (e.g. `ex_2/model.py`) are immediately reflected when you re-run cells. This makes it easier and more efficient to iterate on your code as you develop, since you don’t have to worry about re-importing the modules or restarting your kernel to see the updates you made.

### Modifying Hydra Configuration in Kaggle

Due to limitations in the Kaggle environment, you cannot directly open or modify YAML configuration files. Instead, you can use the `overrides` argument in the `hydra.compose` API to change configurations. 

For example, to modify the default list options, such as setting the model backbone to `resnet` and the readout option to `vector`, uncomment the relevant lines in the code below.

#### Dataset Location

In Kaggle, your dataset should be located in one of the following directories:

- `/kaggle/input/polarization`
- `/kaggle/input`

Use the directory that works for you. To set the dataset path, modify the `dataset.dataset_path` parameter as follows:

- For `/kaggle/input/polarization`:

    ```python
    dataset.dataset_path = '/kaggle/input/polarization'
    ```

- For `/kaggle/input`:

    ```python
    dataset.dataset_path = '/kaggle/input'
    ```

💡 **Troubleshooting Tip:** If you’ve added the polarization dataset but can't locate it in the specified directories, use the command `%ls /kaggle/input` in the console to explore the `/kaggle/input` directory. Navigate through the directories (using `%cd ...`)to find the folder containing `dataset.h5`, and update the `dataset.dataset_path` with the correct path based on your findings.

In [None]:
# Load the autoreload extension
%load_ext autoreload

# Set autoreload mode
%autoreload 2

import os
import hydra
from hydra.utils import instantiate
from omegaconf import OmegaConf


with hydra.initialize(version_base=None, config_path="ex_2/config"):
    cfg = hydra.compose(config_name="train", overrides=["dataset.dataset_path=/kaggle/input/polarization",
                                                        # "model/backbone=resnet", 
                                                        # "model/readout=vector",
                                                        ])
    print(OmegaConf.to_yaml(cfg))

We now instantiate all the components needed to train our model. Since the data is captured continuously over time, to prevent any overlap between the training and validation sets, we use the first 85% of the samples for training and the final 15% for validation.

⚠️ We have provided the code for the dataset, dataloader, and loss function to save you time during the exercises. However, if you’re unfamiliar with how to implement these modules, we encourage you to review the code. You will need to implement them on your own for the competition, and they may also be tested in the final exam.

In [None]:
from torch.utils.data import Subset

model = instantiate(cfg.model)
train_dataset = instantiate(cfg.dataset)
split_index = int(0.85 * len(train_dataset))
train_set = Subset(train_dataset, range(0, split_index))
val_dataset = instantiate(cfg.dataset)
val_dataset.augment = False
val_set = Subset(val_dataset, range(split_index, len(val_dataset)))
train_loader = instantiate(cfg.dataloader)(train_set)
val_loader = instantiate(cfg.dataloader)(val_set)
optimizer = instantiate(cfg.optimizer)(model.parameters())
criterion = instantiate(cfg.loss)

## Polarized Images

Polarization cameras, like the Sony XCG-CP510 used in this dataset, capture not just the intensity and color of light but also detailed information about its polarization state. For each pixel in an image, these cameras typically produce a 4-dimensional vector, corresponding to the light intensity through polarizers oriented at four different angles: 0°, 45°, 90°, and 135°. These angles correspond to the following components:

- **0° Polarization ($I_0$)**: Intensity of light polarized horizontally.
- **45° Polarization ($I_{45}$)**: Intensity of light polarized at a 45° angle.
- **90° Polarization ($I_{90}$)**: Intensity of light polarized vertically.
- **135° Polarization ($I_{135}$)**: Intensity of light polarized at 135°.

<center>
<img src="assets/pol_angles.png" alt="image" width="250">
</center>

These four intensities allow for the calculation of the Stokes parameters: I, Q, and U, which describe the polarization state of the light.

- $I=\frac{1}{2} (I_0 + I_{45} + I_{90} + I_{135})$: The total intensity, representing the sum or average of the polarized intensities, providing the overall brightness of the light.
- $Q=I_0-I_{90}$: The difference in intensity between horizontally (0°) and vertically (90°) polarized light, indicating the degree of linear polarization along these axes.
- $U=I_{45}-I_{135}$: The difference in intensity between diagonally polarized light at 45° and 135°, complementing Q by describing linear polarization along diagonal axes.

In addition to IQU, two other important quantites can be derived: the Degree of Polarization (DOP) and the Angle of Polarization (AOP).

- $DOP = \frac{\sqrt{Q^2 + U^2}}{I}$ Degree of Polarization measures the fraction of light that is polarized compared to the total light intensity.

- $AOP = \frac{1}{2} \tan^{-1}\left(\frac{U}{Q}\right)$ Angle of Polarization describes the angle at which the light is polarized.

In [None]:
import random
import matplotlib.pyplot as plt
import torch

for dataset, dataset_name in zip([train_dataset], ["Real"]):
    n_samples = 10
    fig, axes = plt.subplots(n_samples, 9, figsize=(12, 15))

    sample_indexes = random.sample(range(len(dataset)), n_samples)
    start_pos = dataset[0][0][0].shape[1] // 2, dataset[0][0][0].shape[0] // 2
    radius = dataset[0][0][0].shape[0] // 2 * 0.9

    for i, idx in enumerate(sample_indexes):
        sample = dataset[idx]
        for j in range(9):
            axes[i, j].axis("off")
            axes[i, j].arrow(
                start_pos[0],
                start_pos[1],
                radius * sample[1][0],
                radius * -sample[1][1],
                color="red",
                head_width=1,
                label="GT",
            )

        axes[i, 0].imshow(sample[0][0], cmap="gray")
        axes[0, 0].set_title("CH 1")

        axes[i, 1].imshow(sample[0][1], cmap="gray")
        axes[0, 1].set_title("CH 2")

        axes[i, 2].imshow(sample[0][2], cmap="gray")
        axes[0, 2].set_title("CH 3")

        axes[i, 3].imshow(sample[0][3], cmap="gray")
        axes[0, 3].set_title("CH 4")

        I = 0.5 * sample[0].sum(axis=0)
        Q = sample[0][0] - sample[0][2]
        U = sample[0][1] - sample[0][3]
        DOP = (Q**2 + U**2) ** 0.5 / I
        AOP = 0.5 * torch.arctan2(U, Q)

        axes[i, 4].imshow(I, cmap="gray")
        axes[0, 4].set_title("I")

        axes[i, 5].imshow(Q, cmap="gray")
        axes[0, 5].set_title("Q")

        axes[i, 6].imshow(U, cmap="gray")
        axes[0, 6].set_title("U")

        axes[i, 7].imshow(DOP, cmap="gray")
        axes[0, 7].set_title("DOP")

        axes[i, 8].imshow(AOP, cmap="hsv")
        axes[0, 8].set_title("AOP")

        axes[i, 0].text(
            -0.2,
            0.5,
            f"Index: {idx}",
            fontsize=12,
            ha="right",
            va="center",
            transform=axes[i, 0].transAxes,
            rotation=45,
        )
    plt.legend()
    plt.suptitle(f"{dataset_name} dataset random {n_samples} samples", fontsize=16)
    plt.tight_layout()

## Feature Selection

Even though deep neural networks can be trained end-to-end, it’s still important to consider feature engineering, which focuses on transforming input data into the most effective representation. This becomes especially crucial when the amount of training data is limited.

In this exercise, we have several representations we could use:
- raw data (4 channels: CH1-CH4)
- IQU (3 channels: I,Q,U)
- DOP+AOP (2 channels: DOP and AOP)
- IQU+DOP+AOP (5 channels: I, Q, U, DOP and AOP)

<strong style="color:red;">TODO: Complete the implementation of the representation layers in `ex_2/model.py: get_representation_layer`</strong>

For the output layer, we also have an choice of representations:
- scalar solar azimuth angle in radiance ([0, 2*pi] or [-pi, pi])
- 2-dim unit vector pointing to the sun

In the experiments, we are going to try out different representations to select the best one.

## Backbone
For the backbone of the network, we ask you to implement a vanilla CNN architecture and then in the experiments, compare it to the performance of a [ResNet18](https://pytorch.org/vision/main/models/generated/torchvision.models.resnet18.html).


<strong style="color:red;">TODO: Complete the implementation of the vanilla CNN in `ex_2/model.py: class VanillaCNN`</strong>

After you have finish running the experiments, which model wins in our case? What might be the reasons that a smaller vanilla CNN model might underperform/outperform the bigger ResNet18 model?

## Data Augmentation

Let’s visualize the distribution of sun direction in azimuth angles. As we can observe, the training and validation sets have different distributions, which poses a problem. The network could potentially overfit to the distribution of the training set and perform poorly on the validation set. This could result in a model that outputs only angles greater than 180 degrees, while failing to predict angles below that threshold for example.

To address this issue, we can augment the training data using horizontal and vertical flipping. When we apply these transformations, we also adjust the labels accordingly, ensuring the azimuth angles are flipped both horizontally and vertically. This process allows us to cover the full 360-degree range of angles, improving the network’s generalizability. As a result, the network is forced to learn a solution that maps to all angles, encompassing the range found in the validation set.


<strong style="color:red;">TODO: Complete the data augmentation code in `ex_2/dataset.py: PolImgDataset.__getitem__`</strong>

Tip: You may first skip this part to work on other parts of the pipeline. Once your training pipeline works, you may come back here to implement augmentation to further improve performance.

Note: It is **very important** that you implement the augmentation in accordance with the correct physical laws. Performing augmentations incorrectly will lead to deterioration of performance and even unstable training. Hint: what happens to the four polarization channels when the sun is flipped in nature?

In [None]:
import matplotlib.pyplot as plt

fig, (ax1, ax2) = plt.subplots(1, 2, subplot_kw={"projection": "polar"})
for ax, data, data_type in zip(
    [ax1, ax2],
    [train_dataset.angles[:split_index], val_dataset.angles[split_index:]],
    ["Train", "Validation"],
):
    ax.hist(data / 180 * 3.14, bins=100)
    ax.set_theta_zero_location("N")
    ax.set_theta_direction(-1)
    ax.set_title(data_type)
fig.suptitle("Distribution of azimuth angles in dataset")
plt.subplots_adjust(wspace=0.4)
plt.show()

## CUDA (GPU Acceleration)
To use the GPU for training and inference, one just needs to move the tensors and `nn.Modules` for compute to the GPU (CUDA) device by using `.to(device)` call:
```python
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MyModel().to(device)
input_data = input_data.to(device)
```

In [None]:
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

## Training Loop
Our main training loop is defined below. 

To thoroughly conduct the experiments, <strong style="color:red;">you should try out all the combinations (model/backbone, model/representation, model/readout). For each combination, you should tune the batch size and learning rate.</strong> Note the optimal learning rate will vary from case to case.

The optimal combination should yield a validation error less than 15 degrees (mean) and 5 degress (median) after trained to convergence.

Tip: First try to get all the TODOs implemented. Verify the implementation by making sure the training pipeline runs without error and the training loss and validation error is going down after every epoch. Start with a small number of epochs (e.g. 3) on your laptop or codespaces. Once you are confident the code works, move to a GPU environment for full-scale experiments and hyperparameter tuning.

In [None]:
from tqdm.notebook import tqdm
def evaluate(dataloader, model, criterion):
    """
    Evaluate the performance of a model on a given dataloader.

    Args:
        dataloader (torch.utils.data.DataLoader): The dataloader containing the evaluation data.
        model (torch.nn.Module): The model to be evaluated.
        criterion (torch.nn.Module): The loss function used for evaluation.

    Returns:
        tuple: A tuple containing the average loss, evaluation metric (mean, median, and all values), and predictions.

    """
    model.eval()

    loss = []
    pred = []
    metric = []

    with torch.no_grad():
        for batch in tqdm(dataloader, desc="Validation", leave=False):
            inputs, vector_gt, angle_gt = batch
            inputs = inputs.to(device)
            angle_gt = angle_gt.to(device)
            vector_gt = vector_gt.to(device)
            outputs = model(inputs)
            loss.append(criterion(outputs, angle_gt, vector_gt).item())
            angle_pred = model.get_angle(outputs)
            pred.extend(angle_pred.tolist())
            angle_error = torch.abs(angle_pred - angle_gt)
            angle_error = torch.min(angle_error, 360 - angle_error)
            metric.extend(angle_error.tolist())

    metric = torch.tensor(metric)
    pred = torch.tensor(pred)

    return sum(loss) / len(loss), (metric.mean(), metric.median(), metric), pred

In [None]:
from tqdm.notebook import tqdm


def train_epoch(train_loader, val_loader, model, optimizer, criterion):
    """
    Trains the model for one epoch using the provided data loaders, model, optimizer, and criterion.
    Args:
        train_loader (torch.utils.data.DataLoader): Data loader for the training set.
        val_loader (torch.utils.data.DataLoader): Data loader for the validation set.
        model (torch.nn.Module): The model to be trained.
        optimizer (torch.optim.Optimizer): The optimizer used for training.
        criterion (torch.nn.Module): The loss function used for training.
    Returns:
        tuple: A tuple containing the training loss, training performance, validation performance, and validation predictions.
    """
    model.train()

    for x, y_gt_vector, y_gt_angles in tqdm(train_loader, desc="Training", leave=False):

        x = x.to(device)
        y_gt_angles = y_gt_angles.to(device)
        y_gt_vector = y_gt_vector.to(device)

        optimizer.zero_grad()

        y_pred = model(x)

        loss = criterion(y_pred, y_gt_angles, y_gt_vector)
        loss.backward()

        optimizer.step()

    train_loss, train_performance, _ = evaluate(train_loader, model, criterion)
    _, val_performance, val_pred = evaluate(val_loader, model, criterion)

    return train_loss, train_performance, val_performance, val_pred



In [None]:
import plotly.graph_objects as go
import numpy as np

def plot_angle_histogram(pred, gt):
    """
    Plots a histogram of angles in polar coordinates using Plotly.

    Parameters:
    - pred (array-like): Array of predicted angles in degrees.
    - gt (array-like): Array of ground truth angles in degrees.

    Returns:
    fig (plotly.graph_objs.Figure): The generated Plotly figure object.
    """
    # Convert angles from degrees to radians
    pred_radians = np.radians(pred)
    gt_radians = np.radians(gt)

    # Create histogram traces
    pred_hist = np.histogram(pred_radians, bins=100, range=[0, 2*np.pi])
    gt_hist = np.histogram(gt_radians, bins=100, range=[0, 2*np.pi])

    # Create polar figure
    fig = go.Figure()

    # Add predicted angles
    fig.add_trace(go.Barpolar(
        r=pred_hist[0],
        theta=np.degrees(pred_hist[1][:-1]),
        name='Prediction',
        opacity=0.7
    ))

    # Add ground truth angles
    fig.add_trace(go.Barpolar(
        r=gt_hist[0],
        theta=np.degrees(gt_hist[1][:-1]),
        name='Ground Truth',
        opacity=0.7
    ))

    # Update layout for polar plot
    fig.update_layout(
        polar=dict(
            angularaxis=dict(
                direction="clockwise",
                rotation=90,
                tickmode="array",
                tickvals=[0, 90, 180, 270],
                ticktext=["N", "E", "S", "W"]
            )
        ),
        showlegend=True,
        template='plotly'
    )

    return fig

### Tool: Weights & Biases (W&B)

Since TensorBoard is no longer supported on Kaggle, another popular logging tool you can leverage is Weights and Biases (W&B). W&B is a versatile and powerful platform for tracking machine learning experiments, providing similar functionalities to TensorBoard but with added flexibility and collaboration features.

While TensorBoard focuses on local visualization of metrics, W&B logs and sends all experiment data to their cloud server, allowing for centralized access to metrics across all experiments. This cloud-based approach enables seamless collaboration, where teams can easily share, compare, and reproduce results. Additionally, W&B supports real-time tracking of various training parameters, hyperparameter tuning, and model versioning, all within an intuitive interface. Its easy integration with frameworks like PyTorch makes W&B a go-to tool for anyone looking to scale their machine learning workflow, especially in cloud environments like Kaggle.

#### Setup Guide

1. **Create a W&B Account:**  
   If you don't already have a W&B account, sign up at [Weights & Biases](https://wandb.ai/site).

2. **Obtain Your API Key:**
   - Log in to your W&B account.
   - Click on your profile in the top right corner and select `User settings`.
   - Scroll down to the `Danger zone` section.
   - Under `API keys`, click `Reveal` to view your API key and copy it.

3. **Add API Key to Kaggle Secrets:**
   - Go to your Kaggle notebook.
   - Click on the `Add-ons` tab and select `Secrets`.
   - Click `Add secret` and enter `wandb_api` into the label field.
   - Paste your W&B API key into the value field.

By running the cell below, you authenticate with the W&B API key you entered into Kaggle Secrets and log into W&B.

In [None]:
import wandb
from kaggle_secrets import UserSecretsClient

# Initialize the UserSecretsClient to access Kaggle secrets
user_secrets = UserSecretsClient()

# Retrieve the Weights & Biases API key from Kaggle secrets
wandb_api = user_secrets.get_secret("wandb_api")

# Log in to Weights & Biases using the retrieved API key
wandb.login(key=wandb_api)

By running the cell below, you train the network and W&B keeps track of your progress. It saves all the details about your training, like performance metrics and charts, to the W&B website. This way, you can easily see how your experiment is going.

In [None]:
# Initialize Weights & Biases run
run = wandb.init(
    project="AE4353",
    group="ex_2",
    config=OmegaConf.to_container(cfg, resolve=True),
    notes="replace_with_your_comments"
)

try:
    # Iterate over epochs
    for epoch in tqdm(range(cfg.epochs), desc="Epochs"):
        (
            train_loss,
            (train_error_mean, train_error_median, train_error_tensor),
            (val_err_mean, val_err_median, val_error_tensor),
            val_pred,
        ) = train_epoch(train_loader, val_loader, model, optimizer, criterion)
        
        # Log metrics to Weights & Biases
        wandb.log(
            {
                "train_loss": train_loss,
                "train_error_mean": train_error_mean,
                "train_error_median": train_error_median,
                "val_error_mean": val_err_mean,
                "val_error_median": val_err_median,
                "val_error_hist": wandb.Histogram(val_pred),
                "val_hist": plot_angle_histogram(val_pred, dataset.angles[split_index:]),
            }
        )
    
    run.finish()

except Exception as exception_msg:
    # Finish the run with an error code if an exception occurs
    run.finish(exit_code=1)