# Tutorial Resource
- [李宏毅](https://speech.ee.ntu.edu.tw/~hylee/index.php)
- Yeh James
  - [Web](https://medium.com/jameslearningnote/tagged/machine-learning)
  - [Code](https://gist.github.com/yehjames)
- [Clay Technology World](https://clay-atlas.com/blog/category/ml/)
- Sebastian Raschka
  - [Lecture](https://github.com/rasbt/stat479-deep-learning-ss19)
  - [Book](https://github.com/rasbt/python-machine-learning-book-3rd-edition)
- [Python](https://steam.oxxostudio.tw/category/python/index.html)
  - [w3schools](https://www.w3schools.com/python/)
  - [NumPy](https://numpy.org/doc/stable/reference/index.html)
  - [pandas](https://pandas.pydata.org/docs/reference/index.html)
  - [matplotlib](https://matplotlib.org/stable/api/index.html)
  - [PyTorch](https://pytorch.org/docs/stable/index.html)
    - [Examples](https://github.com/pytorch/examples)
  - [sklearn](https://scikit-learn.org/stable/modules/classes.html)
  - [tqdm](https://github.com/tqdm/tqdm)
- Pages
  - [Activation Functions: Comparison of Trends in Practice and Research for Deep Learning](https://arxiv.org/pdf/1811.03378.pdf)
  - [labml.ai Annotated PyTorch Paper Implementations](https://nn.labml.ai/)

## Video
### Optimization
#### Momentum
- [類神經網路訓練不起來怎麼辦 (二)： 批次 (batch) 與動量 (momentum)](https://www.youtube.com/watch?v=zzbr1h9sF54)
#### Adaptive Learning Rate
 > Adagrad、RMSProp、Adam、SWATS、AMSGrad、AdaBound、Cyclical LR、SGDR、One-cycle LR
- [類神經網路訓練不起來怎麼辦 (三)：自動調整學習速率 (Learning Rate)](https://www.youtube.com/watch?v=HYUXEeh3kwY&list=PLJV_el3uVTsMhtt7_Y6sgTHGHp1Vb2P2J&index=6)
- [Optimization for Deep Learning - 1](https://www.youtube.com/watch?v=4pUmZ8hXlHM)
- [Optimization for Deep Learning - 2](https://www.youtube.com/watch?v=e03YKGHXnL8&feature=youtu.be)

# Real Application
- BERT
- Transformer
- Tacotron
- YOLO
- Mask R-CNN
- ResNet
- Big-GAN
- MAML

# Running PyTorch on the M1 GPU
> [Sebastian Raschka](https://sebastianraschka.com/blog/2022/pytorch-m1-gpu.html)  
> [MPS BACKEND](https://pytorch.org/docs/stable/notes/mps.html)  
   ```sh
   pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
   ```

## Check Python Version
```python
import platform

print(platform.uname()[4])
```

## Validate MPS

### Official Valid
```sh
python3.11 ./official/mps_valid/main.py
```

### Valid And Run

In [None]:
import torch
import math
# Checktorch MPS is available

if not torch.backends.mps.is_available():
    if not torch.backends.mps.is_built():
        print("MPS not available because the current PyTorch install was not "
              "built with MPS enabled.")
    else:
        print("MPS not available because the current MacOS version is not 12.3+ "
              "and/or you do not have an MPS-enabled device on this machine.")

else:
    dtype = torch.float
    device = torch.device("mps")

    # Create random input and output data
    x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
    y = torch.sin(x)

    # Randomly initialize weights
    a = torch.randn((), device=device, dtype=dtype)
    b = torch.randn((), device=device, dtype=dtype)
    c = torch.randn((), device=device, dtype=dtype)
    d = torch.randn((), device=device, dtype=dtype)

    learning_rate = 1e-6
    for t in range(2000):
        # Forward pass: compute predicted y
        y_pred = a + b * x + c * x ** 2 + d * x ** 3

        # Compute and print loss
        loss = (y_pred - y).pow(2).sum().item()
        if t % 100 == 99:
            print(t, loss)

    # Backprop to compute gradients of a, b, c, d with respect to loss
        grad_y_pred = 2.0 * (y_pred - y)
        grad_a = grad_y_pred.sum()
        grad_b = (grad_y_pred * x).sum()
        grad_c = (grad_y_pred * x ** 2).sum()
        grad_d = (grad_y_pred * x ** 3).sum()

        # Update weights using gradient descent
        a -= learning_rate * grad_a
        b -= learning_rate * grad_b
        c -= learning_rate * grad_c
        d -= learning_rate * grad_d


    print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')

### Compare MPS And CPU

In [None]:
import timeit

a_cpu = torch.rand(250, device='cpu')
b_cpu = torch.rand((250, 250), device='cpu')
a_mps = torch.rand(250, device='mps')
b_mps = torch.rand((250, 250), device='mps')

print('cpu', timeit.timeit(lambda: a_cpu @ b_cpu, number=100_000))
print('mps', timeit.timeit(lambda: a_mps @ b_mps, number=100_000))