# IPIN 2025 Flowcean Hands-on Session

This is the tutorial file for the IPIN 2025 workshop. In this Jupyter notebook, we will go through the steps of training machine learning models using the Flowcean framework.

In this example, we use the package [turtlesim](http://wiki.ros.org/turtlesim), a tool designed for teaching the Robot Operating System (ROS). The example trains models to predict the next pose of the turtle, $\textbf{x}_{k+1}$, based on the current pose $\textbf{x}_{k}$ and velocity commands $\textbf{u}_{k}$.  

ROS uses _topics_ to communicate between _nodes_. Nodes are processes that perform computations. They can **subscribe** to topics (receive messages from other nodes) or **publish** on topics (send messages to other nodes). In this example, the velocity commands are published to the `/turtle1/cmd_vel` topic by the turtle's teleop node (a node that lets you control the turtle interactively). The turtlesim node receives these velocity commands by subscribing to `/turtle1/cmd_vel` and publishes the turtle's pose to `/turtle1/pose`.

![Motion model for turtlesim](https://github.com/flowcean/ipin2025-workshop/blob/main/images/turtlesim_model.svg?raw=1)

You can record ROS bag files using the _rosbag_ command-line tool. A ROS bag is a file format for storing ROS message data. It is commonly used for logging data during robot operation, which can later be played back for analysis or testing.
The tutorial uses ROS bag data recorded from turtlesim, processes it into supervised samples, trains multiple models, evaluates them using several metrics, and plots predictions versus ground truth.

![Turtlesim simulation](https://github.com/flowcean/ipin2025-workshop/blob/main/images/turtlesim.png?raw=1)

**Topics used:**

- `/turtle1/cmd_vel` with fields `linear.x`, `angular.z`  
- `/turtle1/pose` with fields `x`, `y`, `theta`  

`/turtle1/cmd_vel` specifies the desired linear and angular velocities for the turtle. `/turtle1/pose` provides the turtle's actual position and orientation in the simulation environment.


## Section 1 : Load and Prepare the Training Data
We will use the turtlesim example dataset for this tutorial. The dataset is a ROS2 bag file that contains the pose and velocity of a turtle in the turtlesim simulator. The goal is to predict the next pose of the turtle given its current pose and velocity.

A config file specifies paths and parameters for the training run. This allows you to easily modify the training setup without changing the code and improves reproducibility. The contents of the config file can then be accessed in the code as a dictionary.  
See `config.yaml` for details of the configuration used in this workshop.  

The example expects two ROS bag directories in the config:

- Training: `rosbag.training_path`  
- Evaluation: `rosbag.evaluation_path`  

In [None]:
# @title Install and Setup
print("Cloning workshop repository...")
! [ -d "ipin2025-workshop" ] || git clone --quiet https://github.com/flowcean/ipin2025-workshop.git
print("Installing flowcean...")
! pip install --quiet flowcean==0.7.0b2
import os
os.chdir("ipin2025-workshop")

In [None]:
import flowcean.cli

# The function below looks for a config.yaml in the current directory.
# In the config.yaml, we specify settings for our training run
config = flowcean.cli.initialize()


### Task 1.1 Load Rosbags and Choose Inputs

First, we need to load the ROS2 bag file and extract the relevant topics and fields. We will use the `load_rosbag` function from the `flowcean.ros` module to do this.

The topics and fields we use load are:   
```yaml  
  - /turtle1/cmd_vel
      - linear.x
      - angular.z
  - /turtle1/pose
      - x
      - y
      - theta
```

**Instructions**

Specify the topics as a dictionary of lists

In [None]:
from flowcean.polars import DataFrame

topics = {
    "/turtle1/cmd_vel": [
        "linear.x",
        "angular.z",
    ],
    "/turtle1/pose": [
        "x",
        "y",
        "theta",
    ],
}

# show current data structure without transforms
rosbag_train = DataFrame.from_rosbag(config.rosbag.training_path, topics=topics)
rosbag_eval = DataFrame.from_rosbag(config.rosbag.evaluation_path, topics=topics)
print(rosbag_train.observe().collect())


<details>
  <summary>💡 Click to see the solution</summary>

```python
topics = {
    "/turtle1/cmd_vel": [
        "linear.x",
        "angular.z",
    ],
    "/turtle1/pose": [
        "x",
        "y",
        "theta",
    ],
}
```

</details>

Note that the current data structure is nested, has only one line and is not yet suitable for training. We will address this in the next tasks.


### Task 1.2 Create Training Data Frame

Now that we have loaded the ROS2 bag file, we need to create tabular training data that contains the input features and the target variable. The input features are the current pose and velocity of the turtle, and the target variable is the next pose of the turtle.

The following table shows how the values of the current pose vector $\textbf{x}_{k}$ are shifted to form the next pose vector $\textbf{x}_{k+1}$. The next pose vector is the supervised target we want to predict.

| current pose $ \textbf{x}_{k} $ | velocity command $ \textbf{u}_{k} $     | next pose $ \textbf{x}_{k+1} $   |
|----------------------|----------------------|----------------------  |
| [0, 0, 0]            | [1, 0]               | <span style="color:red">[5, 0, 0]</span>            |
| <span style="color:red">[5, 0, 0]</span>          | [0, 1]               | <span style="color:green">[5, 0, 2]</span>          |
| <span style="color:green">[5, 0, 2]</span>       | [5, 0]               | [7, 4, 2]            |
| ...                    | ...                | ...                    |
| [4, 2, 2]     | [0, 0]                | null                    |

Note that this results in a null value for the next pose vector in the last row, which is filtered out.
We will use the `ZeroOrderHold`, `ExplodeTimeSeries`, and `ShiftInTime` transforms from the `flowcean.polars` module to create the training data frame.
You can concatenate/chain transforms to a dataframe with the `|` operator.

**Instructions**

Modify the `load_and_process_rosbag` function to create a training data frame with the following steps:
- Call the `ZeroOrderHold` Transform:
  - our features are our topics
  - name the new column "measurments"

- Use the `|` operator to chain the `ExplodeTimeSeries` Transform: Apply the `ExplodeTimeSeries` transform to the measurement column

- Also chain the `ShiftInTime` Transform: Apply the `ShiftInTime` transform to the measurement column
  - shift the topics `/turtle1/pose/x`, `/turtle1/pose/y`, `/turtle1/pose/theta` by 1 step and give the new columns the suffix "_next"




In [None]:
from _helper_functions import ShiftInTime
from flowcean.polars import ExplodeTimeSeries, ZeroOrderHold


transforms = None # TODO: Task 1.2


# Show the transformed data structure
training_environment = rosbag_train | transforms
evaluation_environment = rosbag_eval | transforms
print(training_environment.observe().collect())

<details>
  <summary>💡 Click to see the solution</summary>

```python
transforms = (
    ZeroOrderHold(
        features=[
            "/turtle1/cmd_vel",
            "/turtle1/pose",
        ],
        name="measurements",
    )
    | ExplodeTimeSeries("measurements")
    | ShiftInTime(
        features=["/turtle1/pose/x", "/turtle1/pose/y", "/turtle1/pose/theta"],
        steps=1,
        suffix="_next",
    )
)
```

</details>



Now we have tabular data with aligned timestamps and separate columns for each feature.

## Section 2 : Select Learners across Libraries

Now that we have our training and evaluation samples, we can select learners from different libraries. We will use a `RandomForestRegressor` and a `RegressionTree` from sklearn, a `MultilayerPerceptron` from PyTorch, and an `XGBoostRegressor` from XGBoost.

In [None]:
inputs = [
    "/turtle1/pose/x",
    "/turtle1/pose/y",
    "/turtle1/pose/theta",
    "/turtle1/cmd_vel/linear.x",
    "/turtle1/cmd_vel/angular.z",
]
outputs = [
    "/turtle1/pose/x_next",
    "/turtle1/pose/y_next",
    "/turtle1/pose/theta_next",
]


### Task 2.1 Learner configuration
We will use the configurations defined in the `config.yaml` file to initialize our learners. The configurations are stored in the `config.learners` attribute.

**Instructions**

Initialize a regression tree, a random forest, a multilayer perceptron, and a XGBoost learner:
   - pass the tree configuration to the regression tree
   - pass the forest configuration to the random forest
   - pass a multilayer perceptron instance to the lightning learner and pass both their respective configurations
   - pass a XGBoost regressor instance to the XGBoost learner


<details>
  <summary>💡 Click to see a hint</summary>
    HINT: We defined our configurations in the config.yaml file.                     
</details>



In [None]:
from flowcean.sklearn import RandomForestRegressorLearner, RegressionTree
from flowcean.torch import LightningLearner, MultilayerPerceptron
from flowcean.xgboost import XGBoostRegressorLearner

regression_tree = RegressionTree(
    max_leaf_nodes=None # TODO: Task 2.1
)

random_forest = RandomForestRegressorLearner(
    n_estimators=None,  # TODO: Task 2.1
    max_depth=None,  # TODO: Task 2.1
)
mlp = LightningLearner(
    module=MultilayerPerceptron(
        learning_rate=None,  # TODO: Task 2.1
        output_size=None,  # TODO: Task 2.1
    ),
    batch_size=None,  # TODO: Task 2.1
    max_epochs=None,  # TODO: Task 2.1
)

xgb = XGBoostRegressorLearner(
    n_estimators=None,  # TODO: Task 2.1
    max_depth=None,  # TODO: Task 2.1
)

<details>
  <summary>💡 Click to see the solution</summary>

```python
regression_tree = RegressionTree(max_leaf_nodes=config.training.tree.max_leaf_nodes)

random_forest = RandomForestRegressorLearner(
    n_estimators=config.training.forest.n_estimators,
    max_depth=config.training.forest.max_depth,
)

mlp = LightningLearner(
    module=MultilayerPerceptron(
        learning_rate=config.training.mlp.learning_rate,
        output_size=len(outputs),
    ),
    batch_size=config.training.mlp.batch_size,
    max_epochs=config.training.mlp.max_epochs,
)

xgb = XGBoostRegressorLearner()
```

</details>

### Task 2.2 Prepare Sequential Learning
We want to train all of our models in a looped fassion. To do this, we need to create a list of our learners.

**Instructions**

   - Create list for the learners

In [None]:
learners = [
    # TODO: Task 2.2
]

<details>
  <summary>💡 Click to see the solution</summary>

```python
learners = [
    regression_tree,
    random_forest,
    mlp,
    xgb,
]
```

</details>

## Section 3: Training of the Models

We will now train our models by looping over the learners list we created in the previous section and apply the learn_offline strategy.



### Task 3.1 Create a Sequential Learning Loop

We will now create a training loop that will train each learner on the training samples. We will store the trained models in a list.

**Instructions**

Implement the training loop:
  - call the `learn_offline` function and pass the required parameters
  - append the trained model to the models list

In [None]:
from flowcean.core import learn_offline

models = []
for learner in learners:
    print(f"Training model: {learner.name}")

    model = learn_offline # TODO: Task 2.2

    models.append(model)



<details>
  <summary>💡 Click to see the solution</summary>

  ```python
models = []
for learner in learners:
    print(f"Training model: {learner.name}")
    model = learn_offline(
        training_environment,
        learner,
        inputs=inputs,
        outputs=outputs,
    )
    models.append(model)

  ```
  
</details>

## Section 4 : Evaluation and Model Comparison

We will now evaluate our trained models on the evaluation environment. We will use the `evaluate_offline` function from the `flowcean.training` module to do this. We will also compare the performance of the models using the `compare_models` function from the same module.

### Task 4.1 Chose Metrics for Evaluation

We want to evaluate our models using different metrics. We will define a list of metrics that we want to use for evaluation. The metrics we want to use are:
- Mean Absolute Error
- Mean Squared Error
- Regression Score (R2Score)
- Mean Euclidean Distance

**Instructions**   
Define a list of metrics that we want to use for evaluation and add the required metrics to a list

Note: The euclidean distance requires the features/columns it should be calculated on (`/turtle1/pose/x_next`, `/turtle1/pose/y_next`).


In [None]:
from euclidean_distance import MeanEuclideanDistance
from flowcean.sklearn import MeanAbsoluteError, MeanSquaredError, R2Score

metrics = [
    # TODO: 4.1
]

<details>
  <summary>💡 Click to see the solution</summary>

```python
metrics = [
    MeanAbsoluteError(),
    MeanSquaredError(),
    R2Score(),
    MeanEuclideanDistance(
        features=["/turtle1/pose/x_next", "/turtle1/pose/y_next"],
    ),
]
```

</details>

### Task 4.2 Create an Evaluation Loop

We will use the `evaluate_offline` strategy to evaluate each trained model on the evaluation samples. The resulting report object contains the results of all metrics for each model which can be displayed in a table using the `great_table()` method.

**Instructions**

Implement the evaluation loop:
   - call the evaluate_offline function and pass the required parameters
   - store the reports in a dict for later comparison


In [None]:
from flowcean.core import evaluate_offline

_ = evaluate_offline
report = None  # TODO: 4.2
report.great_table()

<details>
  <summary>💡 Click to see the solution</summary>

```python
report = evaluate_offline(
    models,
    environment=evaluation_environment,
    metrics=metrics,
    inputs=inputs,
    outputs=outputs,
)
report.great_table()
```
</details>


###  Task 4.3 Select a Model and Visualization  

We want to select the best model based on the evaluation reports we created in the previous task. We will also visualize the predictions of the best model against the ground truth using the `predictions_vs_ground_truth` function from the same module.

 **Instructions**

   - Choose the best model based on the evaluation report



In [None]:
from _helper_functions import plot_predictions_vs_ground_truth

best_model = None  # TODO: 4.3
print(f"Best model: {best_model.name}")

# Plots are saved under plots/
plot_predictions_vs_ground_truth(
    environment=evaluation_environment,
    input_names=inputs,
    output_names=outputs,
    models=models,
)

# save model to disk
best_model.save("model.fml")

<details>
  <summary>💡 Click to see a solution</summary>

The best model is the XGBoost model.


</details>