# IPIN 2025 Flowcean Hands-on Session

This is the tutorial file for the IPIN 2025 workshop. In this Jupyter notebook, we will go through the steps of training machine learning models with the Flowcean framework.

In this example we are using the package [turtlesim](http://wiki.ros.org/turtlesim) is a tool made for teaching the Robot Operating System (ROS). This example trains models to predict the next pose of the turtle $\textbf{x}_{k+1}$ based on the current pose $\textbf{x}_{k}$ and velocity commands $\textbf{u}_{k}$.

![Motion model for turtlesim](./images/turtlesim_model.svg)

It uses ROS bag data recorded from turtlesim, processes it into supervised samples, learns multiple models, evaluates them with several metrics, and plots predictions versus ground truth.

![Turtlesim simulation](./images/turtlesim.png)

A config file is used to specify paths and parameters for the training run. See `config.yaml` for details.
The example expects two ROS bag directories in the config:

- Training: `rosbag.training_path`
- Evaluation: `rosbag.evaluation_path`

Topics used:

- `/turtle1/cmd_vel` with fields `linear.x`, `angular.z`
- `/turtle1/pose` with fields `x`, `y`, `theta`

`/turtle1/cmd_vel` is the desired velocity command for the turtle. It specifies the linear and angular velocities that the turtle should follow. `/turtle1/pose` provides the actual position and orientation of the turtle in the simulation environment.

In [None]:
# some imports we will need
import logging

logger = logging.getLogger(__name__)

## Section 1 : Load and Prepare the Training Data
We will use the turtlesim example dataset for this tutorial. The dataset is a ROS2 bag file that contains the pose and velocity of a turtle in the turtlesim simulator. The goal is to predict the next pose of the turtle given its current pose and velocity.

In [None]:
# Import flowcean and cli
import flowcean
import flowcean.cli

# Import some helper functions for loading ROS data
from os import PathLike

from _collections_abc import Iterable
from flowcean.core.transform import Lambda

# import transforms
from flowcean.polars import DataFrame, ExplodeTimeSeries, ZeroOrderHold
from flowcean.ros import load_rosbag

from _helper_functions import shift_in_time

# The function below looks for a config.yaml in the current directory
# In the config.yaml, we specify settings for our training run 
config = flowcean.cli.initialize()


### Task 1.1 Load Rosbags and Choose Inputs

First, we need to load the ROS2 bag file and extract the relevant topics and fields. We will use the `load_rosbag` function from the `flowcean.ros` module to do this.

The topics and fields we use load are:   
```yaml  
  - /turtle1/cmd_vel
      - linear.x
      - angular.z
  - /turtle1/pose
      - x
      - y
      - theta
```

**Instructions**

Call the load_rosbag function and pass:    
  - the bag_path                           
  - requires topics and their fields       
  - the message_path                       

In [None]:

# Configure the load_rosbag() function below
def load_and_process_rosbag(
    path: str | PathLike,
    message_paths: Iterable[str | PathLike] | None = None,
) -> DataFrame:
    logger.info("Loading rosbag from: %s", path)

    rosbag = load_rosbag(
        # TODO: TASK 1.1
    )
    return (
        DataFrame(rosbag) 
    )

# using our loaded config we want to create training and evaluation samples
samples_train = load_and_process_rosbag(
    config.rosbag.training_path,
    config.rosbag.message_paths,
)
samples_eval = load_and_process_rosbag(
    config.rosbag.evaluation_path,
    config.rosbag.message_paths,
)

<details>
  <summary>💡 Click to see the solution</summary>

```python
    rosbag = load_rosbag(
        path=path,
        topics={
            "/turtle1/cmd_vel": [
                "linear.x",
                "angular.z",
            ],
            "/turtle1/pose": [
                "x",
                "y",
                "theta",
            ],
        },
        message_paths=message_paths,
    )   
```

</details>


### Task 1.2 Create Training Data Frame

Now that we have loaded the ROS2 bag file, we need to create a training data frame that contains the input features and the target variable. The input features are the current pose and velocity of the turtle, and the target variable is the next pose of the turtle.

The following table shows how the values of the current pose vector $\textbf{x}_{k}$ are shifted to form the next pose vector $\textbf{x}_{k+1}$. The next pose vector is the supervised target we want to predict.

| $ \textbf{x}_{k} $ | $ \textbf{u}_{k} $     | $ \textbf{x}_{k+1} $   |
|----------------------|----------------------|----------------------  |
| [0, 0, 0]            | [1, 0]               | <span style="color:red">[5, 0, 0]</span>            |
| <span style="color:red">[5, 0, 0]</span>          | [0, 1]               | <span style="color:green">[5, 0, 2]</span>          |
| <span style="color:green">[5, 0, 2]</span>       | [5, 0]               | [7, 4, 2]            |
| ...                    | ...                | ...                    |
| [4, 2, 2]     | [0, 0]                | null                    |

Note that this results in a null value for the next pose vector in the last row, which is filtered out.
We will use the `ZeroOrderHold`, `ExplodeTimeSeries`, and `Lambda` transforms from the `flowcean.polars` module to create the training data frame. The `Lambda` transform can apply arbitrary functions to the data, and we will apply the `shift_in_time` function to shift the current pose vector to form the next pose vector.

**Instructions**

Modify the `load_and_process_rosbag` function to create a training data frame with the following steps:
- Call the `ZeroOrderHold` Transform:
  - our features are our topics
  - name the new column "measurments"

- Chain the `ExplodeTimeSeries` Transform: Apply the `ExplodeTimeSeries` transform to the measurement column

- Chain the `Lambda` Transform: pass the function `shift_in_time()`

<details>
  <summary>💡 Click to see a hint</summary>

You can concatenate/chain transforms to a dataframe with the `|` operator.
</details>

In [None]:
# Modify the return statement below to include the necessary transforms
def load_and_process_rosbag(
        path: str | PathLike,
        message_paths: Iterable[str | PathLike] | None = None,
    ) -> DataFrame:
    logger.info("Loading rosbag from: %s", path)

    rosbag = load_rosbag(
        path=path,
        message_paths=message_paths,
        topics={
            "/turtle1/cmd_vel": [
                "linear.x",
                "angular.z",
            ],
            "/turtle1/pose": [
                "x",
                "y",
                "theta",
            ],
        },
    )
    return (
        DataFrame(rosbag)
        | ZeroOrderHold(
            features=[
                "/turtle1/cmd_vel",
                "/turtle1/pose",
            ],
            name="measurements",
        )
        | ExplodeTimeSeries("measurements")
        | Lambda(shift_in_time)
    )


# using our loaded config we want to create training and evaluation samples
samples_train = load_and_process_rosbag(
    config.rosbag.training_path,
    config.rosbag.message_paths,
)
samples_eval = load_and_process_rosbag(
    config.rosbag.evaluation_path,
    config.rosbag.message_paths,
)

<details>
  <summary>💡 Click to see the solution</summary>

```python
return (
    DataFrame(rosbag)
    | ZeroOrderHold(
        features=[
            "/turtle1/cmd_vel",
            "/turtle1/pose",
        ],
        name="measurements",
    )
    | ExplodeTimeSeries("measurements")
    | Lambda(shift_in_time)
)
```

</details>



## Section 2 : Select Learners across Libraries 

Now that we have our training and evaluation samples, we can select learners from different libraries. We will use a `RandomForestRegressor` and a `RegressionTree` from sklearn, a `MultilayerPerceptron` from PyTorch, and an `XGBoostRegressor` from XGBoost.

In [None]:
# we load all the learners for our training loop
from flowcean.sklearn import RandomForestRegressorLearner, RegressionTree
from flowcean.torch import LightningLearner, MultilayerPerceptron
from flowcean.xgboost import XGBoostRegressorLearner

inputs = [
        "/turtle1/pose/x",
        "/turtle1/pose/y",
        "/turtle1/pose/theta",
        "/turtle1/cmd_vel/linear.x",
        "/turtle1/cmd_vel/angular.z",
]
outputs = [
        "/turtle1/pose/x_next",
        "/turtle1/pose/y_next",
        "/turtle1/pose/theta_next",
]


### Task 2.1 Learner configuration
We will use the configurations defined in the `config.yaml` file to initialize our learners. The configurations are stored in the `config.learners` attribute.

**Instructions**

Initialize a regression tree, a random forest, a multilayer perceptron, and a XGBoost learner:
   - pass the tree configuration to the regression tree
   - pass the forest configuration to the random forest
   - pass a multilayer perceptron instance to the lightning learner and pass both their respective configurations
   - pass a XGBoost regressor instance to the XGBoost learner


<details>
  <summary>💡 Click to see a hint</summary>
    HINT: We defined our configurations in the config.yaml file.                     
</details>



In [None]:
# create and configure the learners below
regression_tree = None  # TODO: Task 2.1

random_forest = None    # TODO: Task 2.1

mlp = None              # TODO: Task 2.1

xgb = None              # TODO: Task 2.1

<details>
  <summary>💡 Click to see the solution</summary>

```python
regression_tree = RegressionTree(
    **config.training.tree
)

random_forest = RandomForestRegressorLearner(
    **config.training.forest,
)

mlp = LightningLearner(
    module=MultilayerPerceptron(
        learning_rate=config.training.mlp.learning_rate,
    ),
    batch_size=config.training.mlp.batch_size,
    max_epochs=config.training.mlp.max_epochs,
)

xgb = XGBoostRegressorLearner()
```

</details>

### Task 2.2 Prepare Sequential Learning
We want to train all of our models in a looped fassion. To do this, we need to create a list of our learners.

**Instructions**

   - Create list for the learners

In [None]:
# create the dictionaries and lists below
learners = [
    # TODO: Task 2.2
]

<details>
  <summary>💡 Click to see the solution</summary>

```python
learners = [
    regression_tree,
    random_forest,
    mlp,
    xgb,
]
```

</details>

## Section 3: Training of the Models

We will now train our models by looping over the learners dictionary we created in the previous section and apply the `learn_offline strategy.

In [None]:
# we load our learning strategy
from flowcean.core import learn_offline



### Task 3.1 Create a Sequential Learning Loop

We will now create a training loop that will train each learner on the training samples. We will store the trained models in a list.

**Instructions**

Implement the training loop:
  - call the `learn_offline` function and pass the required parameters
  - append the trained model to the models list

In [None]:
models = []
for learner in learners:
    logger.info("Training model: %s", learner.name)
    model = None  # TODO: Task 3.1



<details>
  <summary>💡 Click to see the solution</summary>

  ```python
models = []
for learner in learners:
    logger.info("Training model: %s", learner.name)
    model = learn_offline(
        samples_train,
        learner,
        inputs=inputs,
        outputs=outputs,
    )
    models.append(model)
  ```
  
</details>

## Section 4 : Evaluation and Model Comparison

We will now evaluate our trained models on the evaluation samples. We will use the `evaluate_offline` function from the `flowcean.training` module to do this. We will also compare the performance of the models using the `compare_models` function from the same module.

In [None]:
# we load our metrics for comparison
from flowcean.sklearn import MeanAbsoluteError, MeanSquaredError, R2Score
from custom_metrics.euclidean_distance import MeanEuclideanDistance

# import function for model comparison
from flowcean.core import evaluate_offline

### Task 4.1 Chose Metrics for Evaluation

We want to evaluate our models using different metrics. We will define a list of metrics that we want to use for evaluation. The metrics we want to use are:
- Mean Absolute Error
- Mean Squared Error
- Regression Score (R2Score)
- Mean Euclidean Distance

**Instructions**   
Define a list of metrics that we want to use for evaluation and add the required metrics to a list

<details>
  <summary>💡 Click to see a hint</summary>
    HINT: The euclidean distance requires the columns it should be calculated on.                   

</details>



In [None]:
# specify metrics for evaluation
metrics = [
    # TODO: 4.1
]

<details>
  <summary>💡 Click to see the solution</summary>

```python
metrics = [
    MeanAbsoluteError(),
    MeanSquaredError(),
    R2Score(),
    MeanEuclideanDistance(
        columns=[
            "/turtle1/pose/x_next",
            "/turtle1/pose/y_next",
        ],
    ),
]
```

</details>

### Task 4.2 Create an Evaluation Loop

We will use the `evaluate_offline` strategy to evaluate each trained model on the evaluation samples. The resulting report object contains the results of all metrics for each model which can be displayed in a table using the `great_table()` method.

**Instructions**

Implement the evaluation loop:
   - call the evaluate_offline function and pass the required parameters
   - store the reports in a dict for later comparison


In [None]:
report = None  # TODO: 4.2
report.great_table()

<details>
  <summary>💡 Click to see the solution</summary>

```python
report = evaluate_offline(
    models,
    environment=samples_eval,
    metrics=metrics,
    inputs=inputs,
    outputs=outputs,
)
report.great_table()
```
</details>


###  Task 4.3 Select a Model and Visualization  

We want to select the best model based on the evaluation reports we created in the previous task. We will also visualize the predictions of the best model against the ground truth using the `predictions_vs_ground_truth` function from the same module. 

 **Instructions**

   - Choose the best model based on the evaluation report
   - call the `predictions_vs_ground_truth` function and pass the required parameters

<details>
  <summary>💡 Click to see a hint</summary>
 HINT: we can observe and collect samples   
</details>

In [None]:
from _helper_functions import plot_predictions_vs_ground_truth

best_model = None # TODO: 4.3 
best_model = models[0]
logger.info("Best model: %s", best_model.name)

# Plots are saved under plots/
plot_predictions_vs_ground_truth(
    samples_eval=samples_eval.observe().collect(),
    input_names=inputs,
    output_names=outputs,
    models=models,
)

<details>
  <summary>💡 Click to see a solution</summary>

The best model is the XGBoost model.

</details>

## Section 5 : Final Task

After you completed the tasks, just run the following cell.

In [None]:
from _helper_functions import surprise

surprise()