# IPIN 2025 Flowcean Workshop
This is the tutorial file for the IPIN 2025 workshop. In this Jupyter notebook, we will go through the steps of training machine learning models with the
Flowcean framework.


In [10]:
# some imports we will need
import logging

logger = logging.getLogger(__name__)

## Section 1 : Load and Prepare the Training Data
We will use the turtlesim example dataset for this tutorial. The dataset is a ROS2 bag file that contains the pose and velocity of a turtle in the turtlesim simulator. The goal is to predict the next pose of the turtle given its current pose and velocity.

In [None]:
# Import flowcean and cli
import flowcean
import flowcean.cli

# The function below looks for a config.yaml in the current directory
# In the config.yaml, we specify settings for our training run 
config = flowcean.cli.initialize()

# Import some helper functions for loading ROS data
from os import PathLike
from _collections_abc import Iterable
from flowcean.ros import load_rosbag
from _helper_functions import shift_in_time

# import transforms
from flowcean.polars import DataFrame, ExplodeTimeSeries, ZeroOrderHold
from flowcean.core.transform import Lambda


### Task 1.1 Load Rosbags and Choose Inputs

First, we need to load the ROS2 bag file and extract the relevant topics and fields. We will use the `load_rosbag` function from the `flowcean.ros` module to do this.

The topics and fields we use load are:   
```yaml  
  - /turtle1/cmd_vel
      - linear.x
      - angular.z
  - /turtle1/pose
      - x
      - y
      - theta
```

**Instructions**

Call the load_rosbag function and pass:    
  - the bag_path                           
  - requires topics and their fields       
  - the message_path                       

In [None]:
# Configure the load_rosbag() function below
def load_and_process_rosbag(
    path: str | PathLike,
    message_paths: Iterable[str | PathLike] | None = None,
) -> DataFrame:
    logger.info("Loading rosbag from: %s", path)

    rosbag = load_rosbag(
        # TODO: TASK 1.1
    )

    return (
        DataFrame(rosbag) 
    )

# using our loaded config we want to create training and evaluation samples
samples_train = load_and_process_rosbag(
    config.rosbag.training_path,
    config.rosbag.message_paths,
)
samples_eval = load_and_process_rosbag(
    config.rosbag.evaluation_path,
    config.rosbag.message_paths,
)

2025-09-08 11:08:50,177 [__main__][INFO] Loading rosbag from: recordings/turtle_training
2025-09-08 11:08:50,178 [flowcean.ros.rosbag][INFO] Loading data from cache...
2025-09-08 11:08:50,178 [__main__][INFO] Loading rosbag from: recordings/turtle_evaluation
2025-09-08 11:08:50,179 [flowcean.ros.rosbag][INFO] Loading data from cache...


<details>
  <summary>💡 Click to see the solution</summary>

```python
    rosbag = load_rosbag(
        path=path,
        topics={
            "/turtle1/cmd_vel": [
                "linear.x",
                "angular.z",
            ],
            "/turtle1/pose": [
                "x",
                "y",
                "theta",
            ],
        },
        message_paths=message_paths,
    )   
```

</details>


### Task 1.2 Create Training Data Frame

Now that we have loaded the ROS2 bag file, we need to create a training data frame that contains the input features and the target variable. The input features are the current pose and velocity of the turtle, and the target variable is the next pose of the turtle.
We will use the `ZeroOrderHold`, `ExplodeTimeSeries`, and `Lambda` transforms from the `flowcean.polars` module to create the training data frame.

**Instructions**

Modify the `load_and_process_rosbag` function to create a training data frame with the following steps:
- Call the `ZeroOrderHold` Transform:
  - our features are our topics
  - name the new column "measurments"

- Chain the `ExplodeTimeSeries` Transform: Apply the `ExplodeTimeSeries` transform to the measurement column

- Chain the `Lambda` Transform: pass the function `shift_in_time`, which is imported at the start of the cell

<details>
  <summary>💡 Click to see a hint</summary>

You can concatenate/chain transforms to a dataframe with the `|` operator.
</details>

In [None]:
# Modify the return statement below to include the necessary transforms
def load_and_process_rosbag(
        path: str | PathLike,
        message_paths: Iterable[str | PathLike] | None = None,
    ) -> DataFrame:
    logger.info("Loading rosbag from: %s", path)

    rosbag = load_rosbag(
        path=path,
        message_paths=message_paths,
        topics={
            "/turtle1/cmd_vel": [
                "linear.x",
                "angular.z",
            ],
            "/turtle1/pose": [
                "x",
                "y",
                "theta",
            ],
        },
    )
    return (
        DataFrame(rosbag) 
        # TODO: TASK 1.2
    )   


# using our loaded config we want to create training and evaluation samples
samples_train = load_and_process_rosbag(
    config.rosbag.training_path,
    config.rosbag.message_paths,
)
samples_eval = load_and_process_rosbag(
    config.rosbag.evaluation_path,
    config.rosbag.message_paths,
)

2025-09-08 11:10:20,429 [__main__][INFO] Loading rosbag from: recordings/turtle_training
2025-09-08 11:10:20,430 [flowcean.ros.rosbag][INFO] Loading data from cache...
2025-09-08 11:10:20,431 [__main__][INFO] Loading rosbag from: recordings/turtle_evaluation
2025-09-08 11:10:20,431 [flowcean.ros.rosbag][INFO] Loading data from cache...


<details>
  <summary>💡 Click to see the solution</summary>

```python
    def load_and_process_rosbag(
            path: str | PathLike,
            message_paths: Iterable[str | PathLike] | None = None,
        ) -> DataFrame:
            logger.info("Loading rosbag from: %s", path)

            rosbag = load_rosbag(
                path=path,
                message_paths=message_paths,
                topics={
                    "/turtle1/cmd_vel": [
                        "linear.x",
                        "angular.z",
                    ],
                    "/turtle1/pose": [
                        "x",
                        "y",
                        "theta",
                    ],
                },
            )

            return (
                DataFrame(rosbag)
                | ZeroOrderHold(
                    features=[
                        "/turtle1/cmd_vel",
                        "/turtle1/pose",
                    ],
                    name="measurements",
                )
                | ExplodeTimeSeries("measurements")
                | Lambda(shift_in_time)
            )   
```

</details>



## Section 2 : Select Learners across Libraries 

Now that we have our training and evaluation samples, we can select learners from different libraries. We will use a `RandomForestRegressor` and a `RegressionTree` from sklearn and a `MultilayerPerceptron` from PyTorch.

In [None]:
# we load all the learners for our training loop
from flowcean.sklearn import RandomForestRegressorLearner, RegressionTree
from flowcean.torch import LightningLearner, MultilayerPerceptron


### Task 2.1 Learner configuration
We will use the configurations defined in the `config.yaml` file to initialize our learners. The configurations are stored in the `config.learners` attribute.

**Instructions**

Initialize a regression tree, a random forest, and a Lightning Learner:
   - pass the tree configuration to the regression tree
   - pass the forest configuration to the random forest
   - pass a multilayer perceptron instance to the lightning learner and pass both their respective configurations


<details>
  <summary>💡 Click to see a hint</summary>
    HINT: We defined our configurations in the config.yaml file.                     
</details>



In [None]:
# create and configure the learners below
regression_tree = None  # TODO: Task 2.1

random_forest = None    # TODO: Task 2.1

mlp = None              # TODO: Task 2.1

<details>
  <summary>💡 Click to see the solution</summary>

```python
regression_tree = RegressionTree(
    **config.training.tree
)

random_forest = RandomForestRegressorLearner(
    **config.training.forest,
)

mlp = LightningLearner(
    module=MultilayerPerceptron(
        learning_rate=config.training.mlp.learning_rate,
    ),
    batch_size=config.training.mlp.batch_size,
    max_epochs=config.training.mlp.max_epochs,
)
```

</details>

### Task 2.2 Prepare Sequential Learning
We want to train all of our models in a looped fassion. To do this, we need to create a dictionary that maps the name of the learner to the learner instance. We also need to create two lists that contain the input and output fields of the topic we want to predict.

**Instructions**

   - Create dictionary for the learners
   - Create a list that contains all fields of a topic that are part of the input
   - Create a list that contains all fields of a topic that are part of the output

In [None]:
# create the dictionaries and lists below
learners = {
    # TODO: Task 2.2
}

inputs = [
    # TODO: Task 2.2
]

outputs = [
    # TODO: Task 2.2
]

<details>
  <summary>💡 Click to see the solution</summary>

```python
learners = {
    "regression_tree": regression_tree,
    "random_forest": random_forest,
    "multilayer_perceptron": mlp,
}

inputs = [
    "/turtle1/pose/x",
    "/turtle1/pose/y",
    "/turtle1/pose/theta",
    "/turtle1/cmd_vel/linear.x",
    "/turtle1/cmd_vel/angular.z",
]
outputs = [
    "/turtle1/pose/x_next",
    "/turtle1/pose/y_next",
    "/turtle1/pose/theta_next",
]
```

</details>

## Section 3: Training of the Models

We will now train our models by looping over the learners dictionary we created in the previous section and apply the `learn_offline strategy.

In [None]:
# we load our learning strategy
from flowcean.core import learn_offline



### Task 3.1 Create a Sequential Learning Loop

We will now create a training loop that will train each learner on the training samples. We will store the trained models in a dictionary.

**Instructions**

Implement the training loop:
  - call the `learn_offline` function and pass the required parameters
  - store the trained models in a dict

In [None]:
models = {}
for learner_name, learner in learners.items():
    logger.info("Training model: %s", learner_name)

    model = None                # TODO: Task 3.1
    models[learner_name] = None # TODO: Task 3.1


<details>
  <summary>💡 Click to see the solution</summary>

  ```python
  models = {}
  for learner_name, learner in learners.items():
      logger.info("Training model: %s", learner_name)
      
      model = learn_offline(
          samples_train,
          learner,
          inputs=inputs,
          outputs=outputs,
      )
      models[learner_name] = model
  ```
  
</details>

## Section 4 : Evaluation and Model Comparison

We will now evaluate our trained models on the evaluation samples. We will use the `evaluate_offline` function from the `flowcean.training` module to do this. We will also compare the performance of the models using the `compare_models` function from the same module.

In [None]:
# we load our metrics for comparison
from flowcean.sklearn import MaxError, MeanAbsoluteError, MeanSquaredError, R2Score
from custom_metrics.euclidean_distance import MeanEuclideanDistance

# import functions for comparison and visualization
from flowcean.core import evaluate_offline
from flowcean.core.strategies.offline import print_report_table, select_best_model
from _helper_functions import plot_predictions_vs_ground_truth

### Task 4.1 Chose Metrics for Evaluation

We want to evaluate our models using different metrics. We will define a list of metrics that we want to use for evaluation. The metrics we want to use are:
- Maximum Error
- Mean Absolute Error
- Regression Score
- Mean Euclidean Distance

**Instructions**   
Define a list of metrics that we want to use for evaluation and add the required metrics to a list

<details>
  <summary>💡 Click to see a hint</summary>
    HINT: The euclidean distance requires the columns it should be calculated on.                   

</details>



In [None]:
# specify metrics for evaluation
metrics = [
    # TODO: 4.1
]

<details>
  <summary>💡 Click to see the solution</summary>

```python
metrics = [
    MaxError(),
    MeanAbsoluteError(),
    MeanSquaredError(),
    R2Score(),
    MeanEuclideanDistance(
        columns=[
            "/turtle1/pose/x_next",
            "/turtle1/pose/y_next",
        ],
    ),
]
```

</details>

### Task 4.2 Create an Evaluation Loop

We will now create an evaluation loop that will evaluate each trained model on the evaluation samples using the `evaluate_offline` strategy. We will store the evaluation reports in a dictionary.

**Instructions**

Implement the evaluation loop:
   - call the evaluate_offline function and pass the required parameters
   - store the reports in a dict for later comparison


In [None]:
reports = {}
for model_name, model in models.items():
    logger.info("Evaluating model: %s", model_name)

    report = None # TODO 4.2
    reports[model_name] = None # TODO : 4.2

    print(report)
    print_report_table(report)


<details>
  <summary>💡 Click to see the solution</summary>

```python
reports = {}
for model_name, model in models.items():
    logger.info("Evaluating model: %s", model_name)
    
    report = evaluate_offline(
        model=model,
        environment=samples_eval,
        metrics=metrics,
        inputs=inputs,
        outputs=outputs,
    )
    reports[model_name] = report

    print(report)
    print_report_table(report)

```
</details>


###  Task 4.3 Select a Model and Visualization  

We want to select the best model based on the evaluation reports we created in the previous task. We will use the `select_best_model` function from the `flowcean.training` module to do this. We will also visualize the predictions of the best model against the ground truth using the `predictions_vs_ground_truth` function from the same module.

 **Instructions**

   - call the `select_best_model` function and pass the required parameters
   - we want to compare the mean euclidean distance
   - call the `predictions_vs_ground_truth` function and pass the required parameters

<details>
  <summary>💡 Click to see a hint</summary>
 HINT: we can observe and collect samples   
</details>

In [None]:
best_model_name = None # TODO: 4.3

logger.info("Best model: %s", best_model_name)

plot_predictions_vs_ground_truth(
    # TODO: 4.3
)

<details>
  <summary>💡 Click to see a solution</summary>

```python
best_model_name = select_best_model(
    reports,
    output_name="multi_output",
    metric_name="MeanEuclideanDistance",
)

logger.info("Best model: %s", best_model_name)

plot_predictions_vs_ground_truth(
    samples_eval=samples_eval.observe().collect(),
    input_names=inputs,
    output_names=outputs,
    models=models,
)
```

</details>

## Section 5 : Final Task

After you completed the tasks, just run the following cell.

In [24]:

from _helper_functions import surprise

surprise()

🎉 Congratulations! You finished the tutorial! 🎉

🎉💫💫💫✨🎊🎊✨🌟🌟✨🎉✨✨🌟💫✨🌟🎉🎉🎊🌟💫🎊🎊💫🎊🎊🎉💫🎉🌟💫💫🌟✨✨✨💫🎊🎊✨🎉🌟🎉🎊💫💫🌟💫
💫🎊🎊🎊🌟✨🎊✨✨🌟✨✨🌟🌟🌟🎉🌟🌟💫🌟💫💫✨🌟✨💫🎉✨🌟🎉🌟✨💫🎉🌟🎊✨🌟💫💫🎊🌟🎊🎉✨💫🎉🎊💫💫
🎊✨🎉🎊🎊🎉💫🎉💫🎉🎊🎊🎉🎊✨🎉💫✨🌟🌟🎊🎉✨🌟🎊🎊🎉🎉✨💫💫✨🎊🌟🎊🎉✨🎉🎊🌟🎉💫💫🎉🌟🎉🎉🎉✨🎊
💫🎊🎉🌟💫💫🌟🎉🌟🌟✨💫🎊🌟🎉✨💫✨✨✨🌟💫🎉🎉🎊✨✨🌟💫🌟🎉✨🌟🎊🎊🎊💫✨🌟🎉🌟✨🌟🌟🎊🎊🌟🎉💫🌟
🌟🎉🎉🌟🎊💫✨🎊💫🌟🎊💫✨🎉🌟🎊🎉💫💫💫🌟✨✨🎉🎊💫🎉🎊🎊🎊🎉🌟✨✨🎊✨💫🌟🌟🎊🌟🌟🎊💫🎉✨🌟✨💫🎉
🎉🎉✨💫🎊🎉🎊💫🎉🌟🎉💫🎊🎊💫✨🎉🌟✨🎉✨✨✨🎊🌟🌟🌟✨✨🌟🌟💫🎊✨🌟🌟💫💫🌟✨🎊✨💫🌟💫🎊💫🌟🎉🌟
🌟🎊💫🌟🎉🎉🎊✨🎊💫🎉💫💫✨🎉✨🎉✨🌟✨🌟✨💫✨✨🌟🌟🎉🎉💫✨🌟🎉🌟🎊🎊🎉🎊🎊💫🌟✨💫🌟🎉🌟🌟✨🎊✨
💫🎊🎉✨✨🎊🌟✨🌟💫🎉🌟🌟🌟🎊💫💫🎊💫🎉🎊✨🎊💫✨💫🌟💫✨✨🎊🎉🎊🎊🎉✨🎊🎉🎉🎊🎊🎉🎉✨🎊🌟🎊🌟🎊💫
🌟🌟✨🌟💫🌟🌟🎉✨💫✨🌟🌟🎊💫✨🎉✨🎊💫🎊🎊🌟🎉🎉🎊🎉🎉🎉🎊💫💫🎉💫🌟💫💫🌟🌟🎉🎉🎉💫🎉✨🌟💫✨🌟🎊
🌟🎉💫🌟✨🎉🌟🎊💫🎊✨💫💫🎉🌟✨🎊✨💫✨✨✨✨🎉✨💫💫✨✨💫✨🎊✨🎉🎊✨💫🌟🌟🎉🌟✨🎉✨🎉✨✨🎊🎉🎉
🎊✨🎊🌟🎉🌟✨💫🌟🎊🌟✨🎊🎉💫💫🌟💫🎊🌟🌟🎉✨🌟🌟🎊✨✨🎊🎉💫✨💫✨🎉✨💫🎉🎊✨🎊🎉💫🎊🎉✨🎊🎉✨✨
🎉🎊💫💫🌟🌟✨✨💫💫🎉🎊🌟✨🎊✨🌟✨🎉🎊🌟🎉✨🎊💫🎊✨🎊🎉🎊🎊🎉🎊✨💫🎊✨✨✨🎉✨💫🎊✨✨🎊✨🌟💫🎉
✨✨🌟🌟✨✨🎉🎉🌟🎊🎊🎊💫✨🎉🎉✨🌟🎊💫🎊🌟🎊✨🎉💫🎉🎊🌟🎉💫🎉🎉🎉🌟✨🎊✨🎉🎊🎉✨🎊🎉💫🎉🎊💫🎉💫
✨🎉✨🌟🎊✨🌟✨✨🌟🎊🎊✨🎊🎉✨🎉🎉🎊🌟🎉✨🌟🎉🌟💫✨🎊🎊💫✨🎊✨🌟🎊💫🌟🎉✨🎊🎊💫💫💫🎉🎉🎊🌟🎊🎊
🎉✨🌟✨✨🌟🎊🎉🎊🎊🎊💫🎊✨💫🎉💫🌟✨💫💫🌟🌟🌟💫🎊🌟🎊✨🎉🌟🎉🎊💫🎉🎊💫🌟💫🎊🎉🎉💫✨🌟✨💫🎊✨✨

🎯 Great job! Keep learning & experimenting! 🚀
