## It's coding time again!

<center>
<div>
<img src="Images/Lecture-3/programming_skills.png" width="1800" alt='programming_skills'/>
</div>
</center>

<br/>
<br/>
<br/>

## Let's recall your interests

61.8% is interested in programming mistakes!

<center>
<div>
<img src="Images/Lecture-1/google_form_true.png" width="2200" alt='google_form_true'/>
</div>
</center>


Let's focus on this topic!

## Why do we need this lecture?

Whether you like it or not, experimental setting might require you to do some coding stuff.

Coding translates to: 

1. Transparency (*don't you dare do some cheap tricks!*)
2. Correctness (*your code should reflect your paper statements*) 
3. **Readability** (*please, don't make this a nightmare*)
4. Efficiency (*time is money*)
5. **Maintainability** (*I'm sure you'll re-use this code*)

In the past lecture, We have shown some tools to make our code more efficient in Tensorflow and Pytorch [4]

We now provide some tips & tricks concerning [3, 5]

## What are we going to cover?

- Debugging
- General coding best practices
- Tensorflow best practices
- Torch best practices
- Misc: code documentation, README, controlled environments, updates tracking

*The 5-minute-in-the-future of yourself and your friends will appreciate!*

# Debugging

### What are we going to see

TODO

# Coding Best Practices

### What are we going to see

- Type hints, type checking, annotations
- Naming choice
- Comments
- Nesting
- Inheritance
- Abstraction
- Organization
- Profiling
- Code optimization
- Testing

# Tensorflow Best Practices

# Torch Best Practices

- Layers definition
- Custom losses
- Detaching
- Freeing GPU memory
- Eval() mode
- Numpy code
- Calling layers

### File Organization

Split your model into individual layers and losses to

- Enhance re-usability (*easier to spot errors*)
- Enhance readability (*top-down view of a model*)

The same applies for nested models, layers and losses

In [None]:
# losses.py
class CustomLoss(th.nn.Module):
    def __init__(self, *args, **kwargs):
        ...
        
    def forward(self, inputs):
        ...
        
# layers.py
class CustomLayer(th.nn.Module):
    def __init__(self, *args, **kwargs):
        ...
        
    def forward(self, inputs):
        ...
        
# models.py
class CustomModel(th.nn.Module):
    def __init__(self, *args, **kwargs):
        self.layer = CustomLayer(...)
        self.loss_op = CustomLoss(...)
        
    def forward(self, inputs):
        pass

### Sequential Layers

In many cases, we may have to define a sequential network

#### Which one do you use?

<table><tr>
<td> <img src="Images/Lecture-4/th-examples-1-1.png" width="1100"/> </td>
<td> <img src="Images/Lecture-4/th-examples-1-2.png" width="1100"/> </td>
<td> <img src="Images/Lecture-4/th-examples-1-3.png" width="1100"/> </td>
</tr>
<tr>
<td style="text-align: center; vertical-align: middle;"> <strong>(A)</strong> </td>
<td style="text-align: center; vertical-align: middle;"> <strong>(B)</strong> </td>
<td style="text-align: center; vertical-align: middle;"> <strong>(C)</strong> </td>
</tr>
</table>

#### (A) is the best choice

- Uses nn.Sequential(...) to define a sequential network $\rightarrow$ higher efficiency, readibility
    
#### (B) may give some problems with list wrapping

- Consider using ```th.nn.ModuleList(...)``` rather than a list
    
#### (C) is terrible!

- Generates layers at each forward pass $\rightarrow$ losing track of model weights to update
- You need to define layers in the ```__init__(...)``` method so that model weights are kept throughout the life the of the model

### Mixing operations

Consider the following code snippet

In [None]:
# Numpy version
loss = np.square(y_pred - y_true).sum()

# Torch version
loss = (y_pred - y_true).pow(2).sum()

The numpy code is always run on the CPU, while the torch code may also run on the GPU

- Avoid mixing numpy and torch operations in ```forward(...)``` method since numpy operations slow down your code execution!

### Getting results

How to properly collect model outputs?

#### Which one do you use?

<table><tr>
<td> <img src="Images/Lecture-4/th-examples-2-1.png" width="1100"/> </td>
<td> <img src="Images/Lecture-4/th-examples-2-2.png" width="1100"/> </td>
</tr>
<tr>
<td style="text-align: center; vertical-align: middle;"> <strong>(A)</strong> </td>
<td style="text-align: center; vertical-align: middle;"> <strong>(B)</strong> </td>
</tr>
</table>

#### Detaching is need!

- Remove tensor from torch tracking for automatic differentation
- If you don't do that, the unnecessary recording of these tensors slows down your program execution!

### Evaluation mode

Torch has two model modalities: ```model.train()``` and ```model.eval()```

#### Which one do you use?

<table><tr>
<td> <img src="Images/Lecture-4/th-examples-3-1.png" width="1100"/> </td>
<td> <img src="Images/Lecture-4/th-examples-3-2.png" width="1100"/> </td>
</tr>
<tr>
<td style="text-align: center; vertical-align: middle;"> <strong>(A)</strong> </td>
<td style="text-align: center; vertical-align: middle;"> <strong>(B)</strong> </td>
</tr>
</table>

#### Both are correct!

#### Model.eval()

- Just changes model execution so that layers like Dropout, BatchNorm can execute correctly

#### torch.no_grad()

- Disables automatic differentiation saving up memory and time


In the common case where you don't compute any gradient during evaluation, you can use both to gain some speed-up and use less memory.

### Numerical Stability

Mathematical correctness of your code doesn't necessarily translates to correct results

#### Some examples

<table><tr>
<td> <img src="Images/Lecture-4/th-examples-4-1.png" width="1100"/> </td>
<td> <img src="Images/Lecture-4/th-examples-4-2.png" width="1100"/> </td>
<td> <img src="Images/Lecture-4/th-examples-4-3.png" width="1100"/> </td>
</tr>
<tr>
<td style="text-align: center; vertical-align: middle;"> <strong>(A)</strong> </td>
<td style="text-align: center; vertical-align: middle;"> <strong>(B)</strong> </td>
<td style="text-align: center; vertical-align: middle;"> <strong>(C)</strong> </td>
</tr>
</table>

#### Stable versions

<table><tr>
<td> <img src="Images/Lecture-4/th-examples-5-1.png" width="1100"/> </td>
<td> <img src="Images/Lecture-4/th-examples-5-2.png" width="1100"/> </td>
<td> <img src="Images/Lecture-4/th-examples-5-3.png" width="1100"/> </td>
</tr>
<tr>
<td style="text-align: center; vertical-align: middle;"> <strong>(A)</strong> </td>
<td style="text-align: center; vertical-align: middle;"> <strong>(B)</strong> </td>
<td style="text-align: center; vertical-align: middle;"> <strong>(C)</strong> </td>
</tr>
</table>

### Mixed-precision

In many cases, you may want to speed-up your training by relying on mixed-precision operations

<table><tr>
<td> <img src="Images/Lecture-4/th-examples-6-1.png" width="1100"/> </td>
<td> <img src="Images/Lecture-4/th-examples-6-2.png" width="1100"/> </td>
</tr>
</table>

#### torch.cuda.amp.autocast()

- Automatically casts down heavy operations (e.g., convolution, matrix multiplication) to 16-bit
- Allows mixed-precision computations


#### torch.cuda.amp.GradScaler()

- Allows to work with 16-bit gradient values while avoid under/over-flows
- Scales up loss to avoid underflows
- Scale gradient values down during gradient update to ensure correct model weights update

# Concluding Remarks

- TODO

# 次回 (Jikai!)

Actually, there's nothing left to show you...

Since I wanted to hold a 10-hours course (thus, 2 CFUs), I thought it could have been a good opportunity to show you something I've been working on.

- Deasy-learning (*a tiny tiny custom library for research*)
- Course feedback (*don't forget to leave a like and hit subscribe!* ~semicit)
- **Motivational outro** (*please, don't miss this!*)

# Any questions?

<center>
<div>
<img src="Images/Lecture-1/jojo-arrivederci.gif" width="1200" alt='JOJO_arrivederci'/>
</div>
</center>