In [1]:
from pathlib import Path
import gin
import numpy as np
import torch
from typing import List
from torch.nn.utils.rnn import pad_sequence
from mltrainer import rnn_models, Trainer
from torch import optim

from mads_datasets import datatools

# 1 Iterators
We will be using an interesting dataset. [link](https://tev.fbk.eu/resources/smartwatch)

From the site:
> The SmartWatch Gestures Dataset has been collected to evaluate several gesture recognition algorithms for interacting with mobile applications using arm gestures. Eight different users performed twenty repetitions of twenty different gestures, for a total of 3200 sequences. Each sequence contains acceleration data from the 3-axis accelerometer of a first generation Sony SmartWatch™, as well as timestamps from the different clock sources available on an Android device. The smartwatch was worn on the user's right wrist. 


In [2]:
from mads_datasets import DatasetFactoryProvider, DatasetType
from mltrainer.preprocessors import PaddedPreprocessor
preprocessor = PaddedPreprocessor()

gesturesdatasetfactory = DatasetFactoryProvider.create_factory(DatasetType.GESTURES)
streamers = gesturesdatasetfactory.create_datastreamer(batchsize=32, preprocessor=preprocessor)
train = streamers["train"]
valid = streamers["valid"]

[32m2025-02-17 18:00:12.330[0m | [1mINFO    [0m | [36mmads_datasets.base[0m:[36mdownload_data[0m:[36m121[0m - [1mFolder already exists at /home/sarmad/.cache/mads_datasets/gestures[0m
100%|[38;2;30;71;6m██████████[0m| 2600/2600 [00:00<00:00, 5759.56it/s]
100%|[38;2;30;71;6m██████████[0m| 651/651 [00:00<00:00, 5869.55it/s]


In [3]:
len(train), len(valid)

(81, 20)

In [4]:
trainstreamer = train.stream()
validstreamer = valid.stream()
x, y = next(iter(trainstreamer))
x.shape, y

(torch.Size([32, 30, 3]),
 tensor([ 6, 16,  9,  1,  2, 13,  6,  2, 11,  8, 14,  3,  6,  0, 10, 18, 10,  8,
         17, 18,  4,  6,  0, 17, 12, 10,  2, 18, 14,  4,  0, 16]))

In [5]:
x, y = next(iter(trainstreamer))
x.shape, y

(torch.Size([32, 34, 3]),
 tensor([13, 10,  3,  0, 17,  0,  3, 13, 15,  8,  1,  7,  0,  8,  2, 18, 10, 15,
          6, 12, 18, 14, 13,  6,  3,  5,  8,  8, 17,  5,  1, 15]))

Can you make sense of the shape?
- What does it mean that the shapes are sometimes (32, 29, 3), but a second time might look like (32, 31, 3)? In other words, the second (or first, if you insist on starting at 0) dimension changes. Why is that? 

<font color='green'>

**The shape of `x` in (32, 29, 3) and (32, 31, 3) means there are 32 samples in a batch, each with a sequence of different lengths (29, 31, etc.), and each step has 3 features. Sequences have different lengths because each sample is recorded for different timesteps. Some might have 29 timesteps, others 31. So, each batch adjusts to the longest sequence in it instead of having a fixed length.**

</font>

- How does the model handle this? 

<font color='green'>

**RNNs and LSTMs can handle variable-length sequences, but when processing in batches, they usually need sequences to be the same length within a batch for efficient computation. This is why padding is often used to match the longest sequence in a batch. So, while RNNs/LSTMs don't inherently require a fixed number of timesteps, batch training typically requires padding or packing to handle variable-length inputs efficiently.**
</font>


- Do you think this is already padded, or still has to be padded?

<font color='green'>

**It is already padded with zeros to match the longest sequence within the batch. The `PaddedPreprocessor()` takes care of padding, ensuring all sequences in a batch have the same length for efficient processing. It can also be seen in the following cell-output. The `x` is padded with zeros at the end of the sequence.**
</font>





In [8]:
x[0]

tensor([[-2.6049, -1.6855,  9.6534],
        [-2.6049, -1.3791,  9.3470],
        [-2.7581, -1.5323,  9.9599],
        [-3.0646, -1.9920, 10.8793],
        [-3.6775, -0.7661,  8.2744],
        [-3.0646, -0.7661,  0.4597],
        [ 0.1532, -1.9920, -4.2904],
        [ 9.5002, -5.5162,  2.6049],
        [22.0650, -2.1452, 11.3389],
        [19.3068, -2.4517, 14.8632],
        [11.9519, -2.4517, 12.5648],
        [15.1697, -4.4436,  9.6534],
        [19.3068, -3.8307,  9.0405],
        [15.7826, -2.6049,  3.5243],
        [ 5.6695, -0.1532, -0.4597],
        [-0.3065,  2.2984,  2.7581],
        [-1.3791,  4.7501,  5.3630],
        [-1.6855,  6.2824,  7.3550],
        [-1.9920,  4.1372,  9.0405],
        [-1.8387,  3.8307,  8.7340],
        [-1.0726,  3.3710,  9.8067],
        [-0.7661,  2.6049,  9.8067],
        [-1.6855,  2.7581,  9.1937],
        [ 0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000],
 

# 2 Excercises
Lets test a basemodel, and try to improve upon that.

Fill the gestures.gin file with relevant settings for `input_size`, `hidden_size`, `num_layers` and `horizon` (which, in our case, will be the number of classes...)
<font color='green'>

**The `gestures.gin` file is modified to include the requred parameters values. `horizon` parameter is changed to match the number of classes in out case (i.e. 20). The other varaibles are initialized like `input_size=3` (number of features), `hidden_size=128`, and `num_layers=3`. It can be modified accordingly to improve the results.**
</font>

As a rule of thumbs: start lower than you expect to need!

In [6]:
from mltrainer import TrainerSettings, ReportTypes
from mltrainer.metrics import Accuracy

accuracy = Accuracy()

settings = TrainerSettings(
    epochs=50,
    metrics=[accuracy],
    logdir=Path("gestures"),
    train_steps=len(train),
    valid_steps=len(valid),
    reporttypes=[ReportTypes.GIN, ReportTypes.TENSORBOARD, ReportTypes.MLFLOW],
    scheduler_kwargs={"factor": 0.5, "patience": 5},
    earlystop_kwargs=None
)
settings

[32m2025-02-13 23:52:57.848[0m | [1mINFO    [0m | [36mmltrainer.settings[0m:[36mcheck_path[0m:[36m61[0m - [1mCreated logdir d:\Workspace\upperkaam\notebooks_review\Deliverable_Part_1\notebooks\3_recurrent_networks\gestures[0m


epochs: 50
metrics: [Accuracy]
logdir: gestures
train_steps: 81
valid_steps: 20
reporttypes: [<ReportTypes.GIN: 1>, <ReportTypes.TENSORBOARD: 2>, <ReportTypes.MLFLOW: 3>]
optimizer_kwargs: {'lr': 0.001, 'weight_decay': 1e-05}
scheduler_kwargs: {'factor': 0.5, 'patience': 5}
earlystop_kwargs: None

In [7]:
gin.parse_config_file("gestures.gin")
model = rnn_models.BaseRNN()

  decorated_class = decorating_meta(cls.__name__, (cls,), overrides)


<font color='green'>

**Note:** The above warning is due to library mismatch. This is an depreceation warning and it will not impact in the results of our experiments.
</font>

In [8]:
gin.get_bindings("BaseRNN")

{'input_size': 3, 'hidden_size': 128, 'num_layers': 3, 'horizon': 20}

<font color='green'>

**The above cell-output shows the parameters for our model that we defined in the `genture.gin` file.**.
</font>

Test the model. What is the output shape you need? Remember, we are doing classification!

<font color='green'>

**Answer:** The output from the model represent a softmax layer output, which provide the probability for each class to be the correct answer. So, that is why we have 20 values for each sample as we have 20 classes. We have to perform further opeations to get 1 actual class value for each sample we provide.
</font>

In [9]:
yhat = model(x)
yhat.shape

torch.Size([32, 20])

Test the accuracy

In [10]:
accuracy(y, yhat)

tensor(0.0312)

What do you think of the accuracy? What would you expect from blind guessing?

<font color='green'>

**Answer:** The results from blind guessing will always be not appropiate as we haven't trained the model with our dataset. So, testing the model before any training performed will always produce poor results. Here, the accuracy is 0.03% in the blind guessing scenario.
</font>

Check shape of `y` and `yhat`

In [12]:
yhat.shape, y.shape

(torch.Size([32, 20]), torch.Size([32]))

And look at the output of yhat

In [18]:
yhat[0], y[0]

(tensor([-0.0747, -0.0717, -0.0035, -0.0010, -0.1243, -0.0194, -0.0434, -0.0391,
         -0.0643, -0.0120, -0.0018, -0.0682,  0.0706,  0.0907, -0.0356, -0.1305,
         -0.0114, -0.1306, -0.0176,  0.0217], grad_fn=<SelectBackward0>),
 tensor(1))

Does this make sense to you? If you are unclear, go back to the classification problem with the MNIST, where we had 10 classes.

<font color='green'>

**Answer:** The model's output comes from a softmax layer, which gives the probability of each class being the correct answer. To get the final class prediction, we need to apply `argmax` to select the class with the highest probability.
</font>


We have a classification problem, so we need Cross Entropy Loss.
Remember, [this has a softmax built in](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html) 

In [14]:
loss_fn = torch.nn.CrossEntropyLoss()
loss = loss_fn(yhat, y)
loss

tensor(2.9904, grad_fn=<NllLossBackward0>)

In [15]:
gin.get_bindings("BaseRNN")

{'input_size': 3, 'hidden_size': 128, 'num_layers': 3, 'horizon': 20}

In [16]:
import torch
if torch.backends.mps.is_available() and torch.backends.mps.is_built():
    device = torch.device("mps")
    print("Using MPS")
elif torch.cuda.is_available():
    device = "cuda:0"
    print("using cuda")
else:
    device = "cpu"
    print("using cpu")

# on my mac, at least for the BaseRNN model, mps does not speed up training
# probably because the overhead of copying the data to the GPU is too high
# however, it might speed up training for larger models, with more parameters
device = "cpu"

using cpu


In [19]:
import mlflow
from datetime import datetime

mlflow.set_tracking_uri("sqlite:///mlflow.db")
mlflow.set_experiment("gestures")
modeldir = Path("../../models/gestures/").resolve()
if not modeldir.exists():
    modeldir.mkdir(parents=True)

gin.parse_config_file("gestures.gin")

with mlflow.start_run():
    mlflow.set_tag("model", "GRUmodel")
    mlflow.set_tag("dev", "raoul")
    mlflow.log_params(gin.get_bindings("BaseRNN"))

    model = rnn_models.BaseRNN()

    trainer = Trainer(
        model=model,
        settings=settings,
        loss_fn=loss_fn,
        optimizer=optim.Adam,
        traindataloader=trainstreamer,
        validdataloader=validstreamer,
        scheduler=optim.lr_scheduler.ReduceLROnPlateau,
        device=device,
    )
    trainer.loop()

    tag = datetime.now().strftime("%Y%m%d-%H%M")
    modelpath = modeldir / (tag + "model.pt")
    torch.save(model, modelpath)

2025/02/13 23:55:09 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2025/02/13 23:55:09 INFO mlflow.store.db.utils: Updating database tables
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> 451aebb31d03, add metric step
INFO  [alembic.runtime.migration] Running upgrade 451aebb31d03 -> 90e64c465722, migrate user column to tags
INFO  [alembic.runtime.migration] Running upgrade 90e64c465722 -> 181f10493468, allow nulls for metric values
INFO  [alembic.runtime.migration] Running upgrade 181f10493468 -> df50e92ffc5e, Add Experiment Tags Table
INFO  [alembic.runtime.migration] Running upgrade df50e92ffc5e -> 7ac759974ad8, Update run tags with larger limit
INFO  [alembic.runtime.migration] Running upgrade 7ac759974ad8 -> 89d4b8295536, create latest metrics table
INFO  [89d4b8295536_create_latest_metrics_table_py] Migration complete!
INFO  

In [20]:
yhat = model(x)
accuracy(y, yhat)

tensor(0.2500)

In [None]:
mlflow.end_run()

Try to update the code above with the following two commands.
    
```python
gin.parse_config_file('gestures_gru.gin')
model = rnn_model.GRUmodel()
```
<font color='green'>

The above code is added in the following code section to convert the `BaseRNN` with the `GRUmodel`. The config file `gestures_gru.bin` contains the parameters defination of the GRU model.

</font>


To discern between the changes, also modify the tag mlflow.set_tag("model", "new-tag-here") where you add
a new tag of your choice. This way you can keep the models apart.

<font color='green'>

A new tag (i.e: `GRUmodel_improve`) is added to seperatelly log the experiments for the GRU model. It will seperate the model logs for later inspection. This is also added in the following code section. Comments are also added to see the changes.

</font>


Excercises:

- improve the RNN model
<font color='green'>

**Improvements**
- RNN model, tends to lack in capturing the long-term relations in the time-series data, whereas LSTM or GRU layers can learn the temporal dependencies in the time-series data. 
- Hence, we can convert the `BaseRNN` with the `GRUmodel` to get the GRU model. The config file `gestures_gru.bin` contains the defination of the GRU model.

</font>

- test different things. What works? What does not?
<font color='green'>

**Answer**
- Converting from the `BaseRNN` to `GRUmodel` really improves the model accuracy.
- With our previous experience of improviing the model accuracy, I have tested with different values of hidden and layers size. I found that keeping the model simple improves the model and keeps the training time in limit. So, found the combination as `hidden_size=32` and `num_layers=3` (changes are made in the `gestures_gru.bin` file) to acheive best accuracy.

</font>

- experiment with either GRU or LSTM layers, create your own models + ginfiles. 
<font color='green'>

- **Answer:** The `gestures_gru.bin` file is created to work with the GRU model. We can modify this file to change the model configuration.

</font>

You should be able to get above 90% accuracy with the dataset.
<font color='green'>

**Answer:** We are able to acheive the accuracy greater than 90% with `hidden_size=32` and `num_layers=3`.
</font>

In [24]:
# use the new GRU model by using the 'gestures_gru.gin' file
# the parameters are defined in that file for the GRU model
# CODE CHANGES ARE HERE!!
gin.parse_config_file('gestures_gru.gin')

settings = TrainerSettings(
    epochs=50,
    metrics=[accuracy],
    logdir=Path("gestures"),
    train_steps=len(train),
    valid_steps=len(valid),
    reporttypes=[ReportTypes.GIN, ReportTypes.TENSORBOARD, ReportTypes.MLFLOW],
    scheduler_kwargs={"factor": 0.5, "patience": 5},
    optimizer_kwargs={'lr': 0.001, 'weight_decay': 1e-05},
    earlystop_kwargs=None
)

with mlflow.start_run():
    # The tag is added to diffirentiate the GRU model improvement experiments.
    # CODE CHANGES ARE HERE!!
    mlflow.set_tag("model", "GRUmodel_improve")
    
    mlflow.set_tag("dev", "raoul")
    mlflow.log_params(gin.get_bindings("GRUmodel"))

    # The following code will load the GRU model
    # the parameters for the GRU model is defined in the 'gestures_gru.gin' file
    # CODE CHANGES ARE HERE!!
    model = rnn_models.GRUmodel()

    trainer = Trainer(
        model=model,
        settings=settings,
        loss_fn=loss_fn,
        optimizer=optim.Adam,
        traindataloader=trainstreamer,
        validdataloader=validstreamer,
        scheduler=optim.lr_scheduler.ReduceLROnPlateau,
        device=device,
    )
    trainer.loop()

    tag = datetime.now().strftime("%Y%m%d-%H%M")
    modelpath = modeldir / (tag + "model.pt")
    torch.save(model, modelpath)

[32m2025-02-14 00:12:36.615[0m | [1mINFO    [0m | [36mmltrainer.trainer[0m:[36mdir_add_timestamp[0m:[36m29[0m - [1mLogging to gestures\20250214-001236[0m
  0%|[38;2;30;71;6m          [0m| 0/50 [00:00<?, ?it/s]

100%|[38;2;30;71;6m██████████[0m| 81/81 [00:04<00:00, 16.34it/s]
[32m2025-02-14 00:12:42.006[0m | [1mINFO    [0m | [36mmltrainer.trainer[0m:[36mreport[0m:[36m191[0m - [1mEpoch 0 train 2.8885 test 2.5970 metric ['0.1031'][0m
100%|[38;2;30;71;6m██████████[0m| 81/81 [00:05<00:00, 15.04it/s]
[32m2025-02-14 00:12:47.785[0m | [1mINFO    [0m | [36mmltrainer.trainer[0m:[36mreport[0m:[36m191[0m - [1mEpoch 1 train 2.4697 test 2.3909 metric ['0.1266'][0m
100%|[38;2;30;71;6m██████████[0m| 81/81 [00:05<00:00, 14.82it/s]
[32m2025-02-14 00:12:53.810[0m | [1mINFO    [0m | [36mmltrainer.trainer[0m:[36mreport[0m:[36m191[0m - [1mEpoch 2 train 2.3351 test 2.2435 metric ['0.1531'][0m
100%|[38;2;30;71;6m██████████[0m| 81/81 [00:05<00:00, 14.68it/s]
[32m2025-02-14 00:12:59.725[0m | [1mINFO    [0m | [36mmltrainer.trainer[0m:[36mreport[0m:[36m191[0m - [1mEpoch 3 train 2.2035 test 2.1282 metric ['0.2437'][0m
100%|[38;2;30;71;6m██████████[0m| 81/81 [00:05

In [25]:
yhat = model(x)
accuracy(y, yhat)

tensor(0.9688)

<font color='green'>

The accruacy of the model validates that we now have better accuracy over the `BaseRNN` by using the `GRUmodel`, which is more efficient in capturing the time-series data and tends to achieve better results as compared to the vaniala RNNs. We are able to acheive the accuracy of 96% with `hidden_size=32` and `num_layers=3`. This is the best config for the model, with the experiments I performed.
</font>
