# P2: Dramatic Data!


## Table Of Content

1. Introduction
2. Preliminaries
3. Software Setup
5. Grading Rubric
6. Submission guidelines

## 1. Introduction

In this project, you will design and implement a deep neural network capable of segmenting PeAR racing windows.
A sample output is shown below,

![Segmentation Image](./assets/segsample.jpg)

This project has three parts,
- Part 1 - Synthetic dataset generation
- Part 2 - Training and Validating the CNN
- Part 3 - Instance Segmentation (identify different instances of the window)

### Part 1 - Dataset generation
No dataset has been provided for this project. You are required to generate a synthetic dataset for training your network. The network must generalize to the real-world video feed given in test_video.mp4. For more details, please refer to section 2.1.

### Part 2 - Training and validating the CNN
You will be designing a custom CNN for image segmentation, and then run inference on frames from the provided video file (test_video.mp4). Your task is to display the inferred segmentation output frames side by side with the original input frames and render the combined frames as a new video.

Please note that test_video.mp4 contains multiple windows. Your network is expected to segment all the windows present in each input frame.

### Part 3 - Instance segmentation

This section is more open-ended. You will need to perform instance segmentation, which can be achieved either by applying classical techniques on the segmentation output obtained in Part 2, or by using deep learning methods. No starter code or specific instructions are provided for this section, allowing you the freedom to choose your approach.


**Please review the submission guidelines and grading rubric before starting your work.**

## 2. Preliminaries

### 2.1. Dataset generation and sim2real:

We highly recommend using Blender for generating your dataset. The window image is located at ./assets/window.png.

You will need to generate the following:

1. **Realistic RGB Images:** These images should contain the provided window in various settings.
2. **Segmentation Masks:** Create binary images that indicate the presence of the window in the RGB images. Cycles rendering engine in Blender can output ID Masks for you. [reference](https://www.youtube.com/watch?v=o2JKviMX9rE)

![Sample Data](./assets/sampledata.png)

To ensure your network generalizes well from simulation to the real world, make sure to incorporate the following variations in your dataset:
1. **Camera location and orientation:** Vary the position and angle of the camera relative to the window.
1. **Background:** Add some background image. A few images are provided in ./part1/environment for your reference.
1. **Lighting:** Simulate various lighting conditions.
1. **Occlusion:** Add objects that block the window.
1. **Multiple Windows:** Show scenes with more than one window.
1. **Noise:** Introduce gaussian noise to the image.
1. **Blur:** Add blur to the image to simulate out of focus conditions.
1. **Color Jitter:** Adjust color properties to simulate diverse scenarios.

You will receive full credit for `Part 1` if your dataset images are properly augmented with the factors mentioned above.

You are free to use any gui/commandline operation. The following tools may be helpful for automating the data generation,
1. [Blender script](https://docs.blender.org/api/current/info_quickstart.html) - Dataset generation (randomizing the placement of camera/windows, automated rendering, etc).
2. [BlenderSynth (beta)](https://github.com/OllieBoyne/BlenderSynth) - Dataset generation.
3. [TorchVision](https://pytorch.org/vision/stable/generated/torchvision.transforms.GaussianBlur.html) - Data augmentation (noise, blur,...) on-the-fly during training.

**Recommended Rendering Settings:**

In case you don’t know where these settings are, don’t worry! You are free to use any settings you like. 

1. Rendering Engine - Cycles (with GPU compute enabled)
2. Max Samples - 4
3. Noise Threshold - 0.1
4. Denoise - Optix 
5. Final Render/ Persistent Data - Enabled (this gives 10x speed up but Blender may crash). Try rendering few 100 images at a time.
6. Render resolution - 640x360

**Sample Dataset Images**

![Sample Dataset](./assets/sampleDataset.png)

You’ll need a total of approximately 50,000 augmented images for effective training. We generated around 5,000 images using Blender, and then applied augmentations such as noise, blur, and color jitter during the training process using torchvision.

### 2.2 Segmentation network in PyTorch:

Architectures similar to U-Net are recommended as a good starting point.
To keep the design simple and to speed up inference, you are free to rescale the image to smaller square format (such as 256x256). Please make sure you scale the segmented output back to the original dimension.

Here is a helpful article on [image segmentation with Pytorch](https://towardsdatascience.com/efficient-image-segmentation-using-pytorch-part-1-89e8297a0923).

## 3. Software Setup

### Part 1
No sample code for data generation is provided. However, a few sample background images can be found in the part1/environment folder, and the window image is available in the assets/ folder. Please add any code you write in the part1 folder.

### Part 2
A sample network training pipeline (and wandb logging code) is provided for reference. You are free to edit it as you like.
You are expected to fix any issues you face while running that pipeline. Please use [wandb](https://docs.wandb.ai/tutorials/experiments) or tensorboard to visualize the training process. 
<!-- ![Sample Wandb Webpage](./assets/wandb.png) -->

<img src="./assets/wandb.png" alt="Sample Wandb Image" width="50%">



## 4. Grading Rubric

- part 1: 40
- part 2: 40
- part 3: 20

- For RBE474X: part1 + part2 = 100% of the grade (80/80).
- For RBE595-A01-SP: You are expected to implement part1-part3 for getting full credits (100/100).

## 5. Submission Guidelines

### Report

Please include the following in your report,

1. If you are using Blender, screenshots of your Blender GUI Window.
2. Sample images and labels (segmentation mask) from the dataset.
3. Network architecture diagram.
4. Explain the loss function you used for training.
5. Explain the failure cases, you may have encountered.
6. Tabulate the hyperparameters you used for training. Learning rate, optimizer, etc.
7. Include the inference results for a few sample images from your validation set and the provided video.
8. Plot the training and validation loss curves per epoch. Explain any observation.

### Video

Part 2: Save the video as part2.mp4. Please use H.264 encoding for the video.

Part 3: Save the video as part3.mp4. Please use H.264 encoding for the video.

### Rules

1. You can choose any network architecture, but you must design it yourself. Prebuilt networks or pre-existing implementations (e.g., those found in libraries like PyTorch's torchvision) are not allowed.
2. You are allowed to use basic PyTorch layers, such as convolutional layers, transposed convolutions, pooling layers, activation functions, dropout, and batch normalization.
3. Do not upload any logs or model_checkpoint(pth) or data or wandb folder. Doing so will result in zero credits.
4. Do not upload your dataset to Canvas! Doing so will result in zero credits.
5. Report must be in Latex.


### Folder Structure
Your submission on ELMS/Canvas must be a ``zip`` file, following the naming convention ``GroupGROUPNUM_p2.zip``. If your group number is ``4``, then the submission file should be named ``Group4_p2.zip``. The `GROUPNUM` can be found on Canvas. The file **must have the following directory structure**. Do not change the files to run the code. You can have any helper functions in sub-folders as you wish, be sure to index them using relative paths and if you have command line arguments for your codes, make sure to have default values too. Please provide detailed instructions on how to run your code in ``README.md`` file. 

<p style="background-color:#ddd; padding:5px">
<b>NOTE:</b> 
Furthermore, the size of your submission file should <b>NOT</b> exceed more than <b>100MB</b>.
</p>

The file tree of your submission <b>SHOULD</b> resemble this:

```
GroupGROUPNUM_p2.zip
├── assets
├── part1
    ├── code files (do not submit environment folder)
├── part2
    ├── network.py
    ├── train.py
    ├── turing.sh
    ├── loadParam.py
    ├── dataloader.py
    ├── utils.py
    ├── Any other code files
├── part3
    ├── code files
├── Report.pdf 
├── main_notebook.ipynb
├── part2.mp4
├── part3.mp4
└── README.md
```
